Start by using rr to record your application:
$ rr record /your/application --args ... FAIL: oh no!
The entire execution, including the failure, was saved to disk. That recording can now be debugged.
$ rr replay GNU gdb (GDB) ... ... 0x4cee2050 in _start () from /lib/ld-linux.so.2 (gdb)
Remember, you're debugging the recorded trace deterministically; not a live, nondeterministic execution. The replayed execution's address spaces, register contents, syscall data etc are exactly the same in every run.
Most of the common gdb commands can be used.
(gdb) break mozilla::dom::HTMLMediaElement::HTMLMediaElement ... (gdb) continue Continuing. ... Breakpoint 1, mozilla::dom::HTMLMediaElement::HTMLMediaElement (this=0x61362f70, aNodeInfo=...) ...
If you need to restart the debugging session, for example
because you missed breaking on some critical execution point, no
problem. Just use gdb's run
command to restart
replay.
(gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y ... Breakpoint 1, mozilla::dom::HTMLMediaElement::HTMLMediaElement (this=0x61362f70, aNodeInfo=...) ... (gdb)
The run
command started another replay run of your
recording from the beginning. But after the session restarted,
the same execution was replayed again. And all your
debugging state was preserved across the restart.
Note that the this
pointer of the
dynamically-allocated object was the same in both replay
sessions. Memory allocations are exactly the same in each
replay, meaning you can hard-code addresses you want to watch.
Even more powerful is reverse execution. Suppose we're debugging Firefox layout:
Breakpoint 1, nsCanvasFrame::BuildDisplayList (this=0x2aaadd7dbeb0, aBuilder=0x7fffffffaaa0, aDirtyRect=..., aLists=...) at /home/roc/mozilla-inbound/layout/generic/nsCanvasFrame.cpp:460 460 if (GetPrevInFlow()) { (gdp) p mRect.width 12000We happen to know that that value is wrong. We want to find out where it was set. rr makes that quick and easy.
(gdb) watch -l mRect.width (gdb) reverse-cont Continuing. Hardware watchpoint 2: -location mRect.width Old value = 12000 New value = 11220 0x00002aaab100c0fd in nsIFrame::SetRect (this=0x2aaadd7dbeb0, aRect=...) at /home/roc/mozilla-inbound/layout/base/../generic/nsIFrame.h:718 718 mRect = aRect;This combination of hardware data watchpoints with reverse execution is extremely powerful!
This video shows a quick demo of rr recording and replaying Firefox.
This video demonstrates rr's basic capabilities in a bit more detail.
This video is a high-level technical talk by Robert O'Callahan about rr.
cd /tmp wget https://github.com/rr-debugger/rr/releases/download/5.8.0/rr-5.8.0-Linux-$(uname -m).rpm sudo dnf install rr-5.8.0-Linux-$(uname -m).rpm
cd /tmp wget https://github.com/rr-debugger/rr/releases/download/5.8.0/rr-5.8.0-Linux-$(uname -m).deb sudo dpkg -i rr-5.8.0-Linux-$(uname -m).deb
Follow the usage instructions to learn how to use rr.
If you're using rr to debug Firefox, you may find these setup instructions helpful. They cover how to use rr to record Firefox test suites.
rr's original motivation was to make debugging of intermittent failures easier. These failures are hard to debug because any given program run may not show the failure. We wanted to create a tool that would record program executions with low overhead, so you can record test executions until you see a failure, and then replay the failing execution repeatedly under a debugger until it has been completely understood.
We also hoped that deterministic replay would make debugging of any kind of bug easier. With normal debuggers, information you learn during the debugging session (e.g. the addresses of objects of interest, and the ordering of important events) often becomes obsolete when you have to rerun the testcase. With deterministic replay, that never needs to happen: your knowledge of what happens during the failing run increases monotonically.
Furthermore, since debugging is the process of tracing effects to their causes, it's much easier if your debugger can execute backwards in time. It's well-known that given a record/replay system which provides restartable checkpoints during replay, you can simulate reverse execution to a particular point in time by restoring the previous checkpoint and executing forwards to the desired point. So we hoped that if we built a low-overhead record-and-replay system that works well on the applications we care about (Firefox), we could build a really usable backend for gdb's reverse execution commands.
These goals have all been met. rr is not only a working tool, but it's being used regularly by developers on many large and small projects.
rr records a group of Linux user-space processes and captures all inputs to those processes from the kernel, plus any nondeterministic CPU effects performed by those processes (of which there are very few). rr replay guarantees that execution preserves instruction-level control flow and memory and register contents. The memory layout is always the same, the addresses of objects don't change, register values are identical, syscalls return the same data, etc.
Tools like fuzzers and randomized fault injectors become even more powerful when used with rr. Those tools are very good at triggering some intermittent failure, but it's often hard to reproduce that same failure again to debug it. With rr, the randomized execution can simply be recorded. If the execution failed, then the saved recording can be used to deterministically debug the problem.
rr lowers the cost of fixing bugs. rr helps produce higher-quality software for the same cost. rr also makes debugging more fun.
Record-and-replay debugging is an old idea; many systems preceded rr. What makes rr different are the design goals:
The overhead of rr depends on your application's workload. On Firefox test suites, rr's recording performance is quite usable. We see slowdowns down to ≤ 1.2x. A 1.2x slowdown means that if the suite takes 10 minutes to run by itself, it will take around 12 minutes to be recorded by rr. However, overhead can vary dramatically depending on the workload. For mostly-single-threaded programs, rr has much lower overhead than any competing record-and-replay system we know of.
rr …
The Extended Technical Report is our best overview of how rr works and performs.
The rr wiki contains pages that cover technical topics related to rr.
Ask on the mailing list or on #rr on chat.mozilla.org if you have questions about rr.