rr: lightweight recording and deterministic debugging

Elv13 · on Nov 6, 2018

I can't have enough thanks to Mozilla for funding the development of this. This tool works wonderfully and is a time saver when debugging hard to reproduce issue or issue that happens only in the Nth iteration of a method call.

In RR, you just have to reproduce the problem, then put a breakpoint back in time to when the state was good then reverse continue back there. It creates a very narrow window of debugging instead of hours of head scratching.

I wrote many GDB frontend extension for my personal use (they may or may not work for others and may by broken in python3, I am not a python dev). This one is very useful with RR. It allows to log "print" on auto-generated breakpoints then print them into a spreadsheet. https://gist.github.com/Elv13/92b98579e62f086cd9c12f44e510ca...

It's very useful with RR because you can modify the columns `print` as many time as you like and ask rr to regenerate it without executing again.

roca · on Nov 6, 2018

FWIW Mozilla funded it until 2016 but since then Kyle Huey and I have been maintaining it out of our own pockets while we work on our startup.

Elv13 · on Nov 6, 2018

And thanks for keeping working on it ;)

After having this problem in GNU "high priority" list for a decade, it's nice to see this thing exist. My comment was more about the initial "make it happen" part. To me, `rr` seems like a really non trivial project to get going in the first place. Not a lot of orgs would have taken a risk with this.

roca · on Nov 6, 2018

We deliberately chose a design that could be implemented with a very small team. In fact there has never been more than about one person working full time on rr, usually less. That's one reason we bet on not using code instrumentation, for example.

But yes, Mozilla deserves major credit for supporting us building this crazy thing --- and of course, releasing it.

dman · on Nov 6, 2018

Have you publicly announced what the startup works on?

roca · on Nov 6, 2018

We haven't said much. https://robert.ocallahan.org/2018/05/update-pernosco.html

dman · on Nov 6, 2018

How do I pay for this?

roca · on Nov 6, 2018

Email me: robert@ocallahan.org

mijoharas · on Nov 6, 2018

For anyone that isn't aware, this is a brilliant tool. I just wish it supported ARM hardware (as far as I remember there are some interrupts that clobber state, so this can't be used. Please fill in the details if anyone remembers).

pm215 · on Nov 6, 2018

The upstream issue with discussion about Arm support is https://github.com/mozilla/rr/issues/1373 -- the underlying problem is that rr's design assumes that if you execute N instructions you'll always deterministically end up in the same place. In architectures which implement atomics via a load-linked/store-conditional loop (including Alpha, Arm, MIPS, PPC,...) this isn't true, because differences at the OS level (eg timing of other interrupts or process scheduling) could cause an ll/sc loop to loop round more often. It's not clear how this could be addressed, because it's pretty deeply baked into rr's design.

roca · on Nov 6, 2018

CPU support for trapping on a failed LL/SC would suffice.

pm215 · on Nov 6, 2018

Do you have a sketch of how that would work? It seems plausible but I haven't thought through the details. Issue 1373 suggests a perf counter of failed-SC events and looking at "branches taken - failed_SC", which I'm definitely sceptical would be reliable.

roca · on Nov 6, 2018

Sure, I just added it here: https://github.com/mozilla/rr/issues/1373#issuecomment-43627...

worldgeek · on Nov 6, 2018

TLDR; If you want RR on ARM try UndoDB (which has a different instrumentation method). RR needs an accurate count of "retired instructions" to replay correctly. ARM has non-deterministic behaviour for this count because an instruction may fail for example due to a cache miss or a hardware interrupt,the instruction would be re-tried of course but the retired instruction count for a code path run multiple times can vary even if identical branches are taken. Longer / better description at this URL. https://github.com/mozilla/rr/issues/1373

dang · on Nov 6, 2018

Discussed in 2014: https://news.ycombinator.com/item?id=8817954

sddfd · on Nov 6, 2018

This is a great help, I'm using it most of the time. After I got used to it, plain gdb feels incomplete (mostly because rr allows you to reverse-step and reverse-continue even with watch/breakpoints).

zurn · on Nov 6, 2018

Is this language agnostic, supporting all GDB's languages, or is there a specific set of languages that it supports?

(GDB supports eg Ada, Fortran and Rust)

edit: I had a look: the web page says "C/C++", there is some evidence on the bug tracker of people using it with Rust. So my own quick peek was inconclusive.

sanxiyn · on Nov 6, 2018

It is a gdb protocol server. All of language support is on the side of gdb client, so yes, it's language agnostic.

kibwen · on Nov 6, 2018

Indeed, rr has worked with Rust since at least 2015 ( http://huonw.github.io/blog/2015/10/rreverse-debugging/ ), and I see the author of rr around the Rust community enough (I believe he works at Mozilla?) that I doubt this has regressed in the meantime.

rebelwebmaster · on Nov 6, 2018

roc left Mozilla back in 2016. https://robert.ocallahan.org/2016/03/leaving-mozilla.html

de_watcher · on Nov 6, 2018

I've got an option "--rr" on my tests that launches the program under rr. So I can debug a failed test in all directions.

cntlzw · on Nov 6, 2018

Can anyone explain how these tools work? How are they recording program execution? Don't they need to keep track of every register, memory address and such? Seems rather complicated.

roca · on Nov 6, 2018

The basic idea is that since CPUs are deterministic, if you report and replay all inputs to a process you don't have to record what goes on inside the process such as registers and memory.

pm215 · on Nov 6, 2018

rr has an "extended technical report" at https://arxiv.org/pdf/1705.05937.pdf which explains the principles it uses. In general they are inherently rather complicated -- in order to get to anything resembling useful speed of execution, you need to play clever tricks of one kind or another to avoid having to record absolutely everything about the process under debug.

MarkUndo · on Nov 7, 2018

I think GDB's built-in record/replay mode does roughly what you describe.

Tracking the effects of every instruction does is likely to give you a simpler implementation (since you just repeatedly say "next instruction, now what did that do?") but it's quite slow and can be hungry on memory, since you have to log every change.

More sophisticated record/replay/reversing tools like rr provide a more sophisticated backend that doesn't need to track state per instruction. They can be much faster - and consume less memory - but more coding is required to make sure you're tracking the right state, efficiently.

qalmakka · on Nov 6, 2018

Does anyone know if this also supports LLDB? Or is it strictly tied to gdb? I happen to slightly prefer LLDB these days (mainly because its `list` instruction is much much saner)

sanxiyn · on Nov 6, 2018

rr is a gdb protocol server. By default, rr also executes gdb client and automatically connects to server, but you can use -s PORT option to run rr in server only mode.

As I understand, after rr is running in server only mode, you can connect to it on gdb with "target remote :PORT", or on LLDB with "gdb-remote :PORT". I haven't tested this, but it should work as long as LLDB implements gdb protocol in a compatible manner.

roca · on Nov 6, 2018

LLDB could work with rr, but AFAIK LLDB doesn't support reverse-execution commands so rr's key feature would not be accessible.

lnyng · on Nov 6, 2018

It reminds me of the ReVirt [1] paper read in an advanced OS class (actually mentioned in the slides). I didn't watch the complete talk. Wondering how much is it different from ReVirt and other record debugging tools.

[1] https://www.usenix.org/legacy/events/osdi02/tech/full_papers...

roca · on Nov 6, 2018

The basic idea is the same as ReVirt, but there are a lot of different details because rr has to run as a pure user-space Linux application whereas ReVirt was baked into the hypervisor. For rr to run efficiently we have to use Linux APIs in creative ways. https://arxiv.org/abs/1705.05937 has more details.

MarkUndo · on Nov 7, 2018

VMware used to support record / replay of VMs in their commercial product, which was quite cool:

https://pubs.vmware.com/ws71_ace27/wwhelp/wwhimpl/js/html/ww...

Mic92 · on Nov 6, 2018

It is probably more mature then a research prototype as it has received more real-world testing.

DSingularity · on Nov 6, 2018

ReVirt logs non-deterministic inputs at the hypervisor level. RR will be recording these inputs at the Kernel level. So this makes the inability to replay the OS execution the biggest difference.

MarkUndo · on Nov 7, 2018

Interestingly, hypervisor-level logging of this stuff is both harder (because it has the constraints of kernel-level code) and simpler (because the non-deterministic behaviours are fewer and better-documented at the hardware level than the Linux API level!)

I think it's very likely that, overall, recording a single process is substantially more complex to implement than recording a whole VM. (With significant caveats - recording a whole VM with good performance is going to be hard and making it really useful probably is a whole load of extra code)

aargh_aargh · on Nov 6, 2018

The site is almost unreadable on mobile. Font too thin, doesn't scale and most importantly, the contrast is way too low.

justinclift · on Nov 6, 2018

In theory (!), Goland recently added support for reverse debugging of Go code via rr:

https://youtrack.jetbrains.com/issue/GO-3831

Haven't tried it out yet myself, but I'd expect it to be at least functional. :)

entelechy · on Nov 6, 2018

How does that compare to undo.io ?

andrey_utkin · on Nov 6, 2018

Undo has some features rr doesn't have. It supports shared memory operations. It works in virtual machines and "in cloud". It imposes less strict requirements on kernel or CPU features, for example, it works on AMD. Fundamentally, what differs UndoDB and Undo Live Recorder from rr is the architecture based on machine code instrumentation.

MarkUndo · on Nov 6, 2018

Actually, I think rr supports shared memory operations under some circumstances... (I'd love to have my beliefs confirmed / corrected by somebody more knowledgeable)

My understanding is that rr has some handling for read-only shared memory and for arbitrary sharing within a tree of recorded processes.

Undo's shared memory is different because it doesn't need the other process to be recorded, so you can do read/write sharing with arbitrary processes or devices.

(disclaimer: current Undo engineer)

roca · on Nov 6, 2018

You're correct.

roca · on Nov 6, 2018

rr has a similar feature set to UndoDB. rr is probably a bit more efficient during recording. However, UndoDB doesn't depend on the performance counters the way rr does, so it works in situations where those counters are unavailable (e.g. some VM guests). Also UndoDB works on ARM, but due to the way ARM implements atomics (see above) rr's approach can't work on ARM. (Until/unless we get ARM to add support for trapping on failed LL/SC.)

xvilka · on Nov 6, 2018

There is also a low level tool, disassembler and debugger, radare2 [1]. It also allows to record the session in both debug and emulation sessions and replay it[2].

[1] https://github.com/radare/radare2

[2] https://radare.gitbooks.io/radare2book/content/debugger/revd...

roca · on Nov 6, 2018

radare2 lets you take snapshots of memory and registers and restore them, but it doesn't let you, say, record the entire execution of multi-process Firefox from startup to shutdown and later replay that perfectly. It's not capturing the effects of the environment that you need to make that work.

nialv7 · on Nov 6, 2018

A very useful tool that has saved me a lot of trouble. Sadly I can't use it anymore since I switched to a Ryzen CPU.

pmoriarty · on Nov 6, 2018

What makes it unusable on Ryzen?

roca · on Nov 6, 2018

Ryzen's performance counters aren't quite accurate enough for rr to use. https://github.com/mozilla/rr/issues/2034

I hope that AMD will fix this one day.

chappar · on Nov 6, 2018

Last time when I checked, rr did not support multi-threaded application well. Has that changed now?

roca · on Nov 6, 2018

rr supports multithreaded applications, but only uses a single core. So parallel applications slow down.

scott_s · on Nov 6, 2018

Being slower is not, I think, the major downside. It is that an entire class of errors - race conditions - are basically outside of the scope of the tool. Which is understandable! Race conditions are hard, and when I read about the tool, my first thought was "How are they handling race conditions?" and it turns out, essentially, they're not. But race conditions are also the hardest part about debugging multithreaded applications.

I'm not sure if the tool ensures deterministic scheduling of threads on the single core, but I doubt that it does. If it does not, then playbacks will not be deterministic on playback, which means you could encounter different race condition outcomes on playback. If it does, then while you may have deterministic playback, the tool is unlikely to help with the class of race conditions that require simultaneous execution.

To be clear: I'm not criticizing the tool or the work of the people. If I were to design such a tool, I would probably start with a single core as well. It seems like a valuable tool and great progress for software debugging. But I do think race conditions in multithreaded programs are a current limitation.

edit: The technical report says that they deterministically schedule threads (https://arxiv.org/pdf/1705.05937.pdf):

"RR preemptively schedules these threads, so context switch timing is nondeterminism that must be recorded. Data race bugs can still be observed if a context switch occurs at the right point in the execution (though bugs due to weak memory models cannot be observed)."

The "weak memory model" part means it won't help with, say, debugging lock-free algorithms where you screw up the semantics.

roca · on Nov 6, 2018

You should read https://arxiv.org/abs/1705.05937 so you don't need to speculate. rr absolutely does guarantee that threads are scheduled the same way during replay as during recording, otherwise it wouldn't work at all on applications like Firefox which use a lot of threads.

Also, rr definitely is very useful for debugging race conditions. For example Mozilla developers have debugged lots of race conditions using it. One thing that really helps is rr's "chaos mode", which randomizes thread scheduling in an intelligent way to discover possible races. See https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo... and https://robert.ocallahan.org/2016/02/deeper-into-chaos.html and https://robert.ocallahan.org/2018/05/rr-chaos-mode-improveme....

scott_s · on Nov 6, 2018

Very cool stuff! And yes, I took a look at the paper, as I noted in my edit. But I think there's still two classes of race conditions outside of its scope: ones that require simultaneous execution (where you can get surprising interleavings) and lock-free algorithms where correct use of the memory model is paramount. In my personal experience, these are the hardest problems to debug.

codehog · on Nov 6, 2018

Even those are probably not 100% outside of its scope. I forget the details of chaos mode, but that kind of induced thread-switching can cause just the kind of interleaving you seem to be talking about.

What rr cannot capture is a very small subclass of race conditions involving things like cache line misses - I think that's what you're alluding to by "correct use of the memory model is paramount" but it's a subclass even of those. Yes, those are hugely difficult to diagnose and it would be fantastic if tools like rr or UndoDB could capture them. But there's a vast swathe of also very difficult race conditions that this recording tech can and does help with today.