Hacker News new | past | comments | ask | show | jobs | submit login
rr: lightweight recording and deterministic debugging (rr-project.org)
274 points by pmoriarty on Nov 6, 2018 | hide | past | favorite | 52 comments



I can't have enough thanks to Mozilla for funding the development of this. This tool works wonderfully and is a time saver when debugging hard to reproduce issue or issue that happens only in the Nth iteration of a method call.

In RR, you just have to reproduce the problem, then put a breakpoint back in time to when the state was good then reverse continue back there. It creates a very narrow window of debugging instead of hours of head scratching.

I wrote many GDB frontend extension for my personal use (they may or may not work for others and may by broken in python3, I am not a python dev). This one is very useful with RR. It allows to log "print" on auto-generated breakpoints then print them into a spreadsheet. https://gist.github.com/Elv13/92b98579e62f086cd9c12f44e510ca...

It's very useful with RR because you can modify the columns `print` as many time as you like and ask rr to regenerate it without executing again.


FWIW Mozilla funded it until 2016 but since then Kyle Huey and I have been maintaining it out of our own pockets while we work on our startup.


And thanks for keeping working on it ;)

After having this problem in GNU "high priority" list for a decade, it's nice to see this thing exist. My comment was more about the initial "make it happen" part. To me, `rr` seems like a really non trivial project to get going in the first place. Not a lot of orgs would have taken a risk with this.


We deliberately chose a design that could be implemented with a very small team. In fact there has never been more than about one person working full time on rr, usually less. That's one reason we bet on not using code instrumentation, for example.

But yes, Mozilla deserves major credit for supporting us building this crazy thing --- and of course, releasing it.


Have you publicly announced what the startup works on?



How do I pay for this?


Email me: robert@ocallahan.org


For anyone that isn't aware, this is a brilliant tool. I just wish it supported ARM hardware (as far as I remember there are some interrupts that clobber state, so this can't be used. Please fill in the details if anyone remembers).


The upstream issue with discussion about Arm support is https://github.com/mozilla/rr/issues/1373 -- the underlying problem is that rr's design assumes that if you execute N instructions you'll always deterministically end up in the same place. In architectures which implement atomics via a load-linked/store-conditional loop (including Alpha, Arm, MIPS, PPC,...) this isn't true, because differences at the OS level (eg timing of other interrupts or process scheduling) could cause an ll/sc loop to loop round more often. It's not clear how this could be addressed, because it's pretty deeply baked into rr's design.


CPU support for trapping on a failed LL/SC would suffice.


Do you have a sketch of how that would work? It seems plausible but I haven't thought through the details. Issue 1373 suggests a perf counter of failed-SC events and looking at "branches taken - failed_SC", which I'm definitely sceptical would be reliable.



TLDR; If you want RR on ARM try UndoDB (which has a different instrumentation method). RR needs an accurate count of "retired instructions" to replay correctly. ARM has non-deterministic behaviour for this count because an instruction may fail for example due to a cache miss or a hardware interrupt,the instruction would be re-tried of course but the retired instruction count for a code path run multiple times can vary even if identical branches are taken. Longer / better description at this URL. https://github.com/mozilla/rr/issues/1373



This is a great help, I'm using it most of the time. After I got used to it, plain gdb feels incomplete (mostly because rr allows you to reverse-step and reverse-continue even with watch/breakpoints).


Is this language agnostic, supporting all GDB's languages, or is there a specific set of languages that it supports?

(GDB supports eg Ada, Fortran and Rust)

edit: I had a look: the web page says "C/C++", there is some evidence on the bug tracker of people using it with Rust. So my own quick peek was inconclusive.


It is a gdb protocol server. All of language support is on the side of gdb client, so yes, it's language agnostic.


Indeed, rr has worked with Rust since at least 2015 ( http://huonw.github.io/blog/2015/10/rreverse-debugging/ ), and I see the author of rr around the Rust community enough (I believe he works at Mozilla?) that I doubt this has regressed in the meantime.



I've got an option "--rr" on my tests that launches the program under rr. So I can debug a failed test in all directions.


Can anyone explain how these tools work? How are they recording program execution? Don't they need to keep track of every register, memory address and such? Seems rather complicated.


The basic idea is that since CPUs are deterministic, if you report and replay all inputs to a process you don't have to record what goes on inside the process such as registers and memory.


rr has an "extended technical report" at https://arxiv.org/pdf/1705.05937.pdf which explains the principles it uses. In general they are inherently rather complicated -- in order to get to anything resembling useful speed of execution, you need to play clever tricks of one kind or another to avoid having to record absolutely everything about the process under debug.


I think GDB's built-in record/replay mode does roughly what you describe.

Tracking the effects of every instruction does is likely to give you a simpler implementation (since you just repeatedly say "next instruction, now what did that do?") but it's quite slow and can be hungry on memory, since you have to log every change.

More sophisticated record/replay/reversing tools like rr provide a more sophisticated backend that doesn't need to track state per instruction. They can be much faster - and consume less memory - but more coding is required to make sure you're tracking the right state, efficiently.


Does anyone know if this also supports LLDB? Or is it strictly tied to gdb? I happen to slightly prefer LLDB these days (mainly because its `list` instruction is much much saner)


rr is a gdb protocol server. By default, rr also executes gdb client and automatically connects to server, but you can use -s PORT option to run rr in server only mode.

As I understand, after rr is running in server only mode, you can connect to it on gdb with "target remote :PORT", or on LLDB with "gdb-remote :PORT". I haven't tested this, but it should work as long as LLDB implements gdb protocol in a compatible manner.


LLDB could work with rr, but AFAIK LLDB doesn't support reverse-execution commands so rr's key feature would not be accessible.


It reminds me of the ReVirt [1] paper read in an advanced OS class (actually mentioned in the slides). I didn't watch the complete talk. Wondering how much is it different from ReVirt and other record debugging tools.

[1] https://www.usenix.org/legacy/events/osdi02/tech/full_papers...


The basic idea is the same as ReVirt, but there are a lot of different details because rr has to run as a pure user-space Linux application whereas ReVirt was baked into the hypervisor. For rr to run efficiently we have to use Linux APIs in creative ways. https://arxiv.org/abs/1705.05937 has more details.


VMware used to support record / replay of VMs in their commercial product, which was quite cool:

https://pubs.vmware.com/ws71_ace27/wwhelp/wwhimpl/js/html/ww...


It is probably more mature then a research prototype as it has received more real-world testing.


ReVirt logs non-deterministic inputs at the hypervisor level. RR will be recording these inputs at the Kernel level. So this makes the inability to replay the OS execution the biggest difference.


Interestingly, hypervisor-level logging of this stuff is both harder (because it has the constraints of kernel-level code) and simpler (because the non-deterministic behaviours are fewer and better-documented at the hardware level than the Linux API level!)

I think it's very likely that, overall, recording a single process is substantially more complex to implement than recording a whole VM. (With significant caveats - recording a whole VM with good performance is going to be hard and making it really useful probably is a whole load of extra code)


The site is almost unreadable on mobile. Font too thin, doesn't scale and most importantly, the contrast is way too low.


In theory (!), Goland recently added support for reverse debugging of Go code via rr:

https://youtrack.jetbrains.com/issue/GO-3831

Haven't tried it out yet myself, but I'd expect it to be at least functional. :)


How does that compare to undo.io ?


Undo has some features rr doesn't have. It supports shared memory operations. It works in virtual machines and "in cloud". It imposes less strict requirements on kernel or CPU features, for example, it works on AMD. Fundamentally, what differs UndoDB and Undo Live Recorder from rr is the architecture based on machine code instrumentation.


Actually, I think rr supports shared memory operations under some circumstances... (I'd love to have my beliefs confirmed / corrected by somebody more knowledgeable)

My understanding is that rr has some handling for read-only shared memory and for arbitrary sharing within a tree of recorded processes.

Undo's shared memory is different because it doesn't need the other process to be recorded, so you can do read/write sharing with arbitrary processes or devices.

(disclaimer: current Undo engineer)


You're correct.


rr has a similar feature set to UndoDB. rr is probably a bit more efficient during recording. However, UndoDB doesn't depend on the performance counters the way rr does, so it works in situations where those counters are unavailable (e.g. some VM guests). Also UndoDB works on ARM, but due to the way ARM implements atomics (see above) rr's approach can't work on ARM. (Until/unless we get ARM to add support for trapping on failed LL/SC.)


There is also a low level tool, disassembler and debugger, radare2 [1]. It also allows to record the session in both debug and emulation sessions and replay it[2].

[1] https://github.com/radare/radare2

[2] https://radare.gitbooks.io/radare2book/content/debugger/revd...


radare2 lets you take snapshots of memory and registers and restore them, but it doesn't let you, say, record the entire execution of multi-process Firefox from startup to shutdown and later replay that perfectly. It's not capturing the effects of the environment that you need to make that work.


A very useful tool that has saved me a lot of trouble. Sadly I can't use it anymore since I switched to a Ryzen CPU.


What makes it unusable on Ryzen?


Ryzen's performance counters aren't quite accurate enough for rr to use. https://github.com/mozilla/rr/issues/2034

I hope that AMD will fix this one day.


Last time when I checked, rr did not support multi-threaded application well. Has that changed now?


rr supports multithreaded applications, but only uses a single core. So parallel applications slow down.


Being slower is not, I think, the major downside. It is that an entire class of errors - race conditions - are basically outside of the scope of the tool. Which is understandable! Race conditions are hard, and when I read about the tool, my first thought was "How are they handling race conditions?" and it turns out, essentially, they're not. But race conditions are also the hardest part about debugging multithreaded applications.

I'm not sure if the tool ensures deterministic scheduling of threads on the single core, but I doubt that it does. If it does not, then playbacks will not be deterministic on playback, which means you could encounter different race condition outcomes on playback. If it does, then while you may have deterministic playback, the tool is unlikely to help with the class of race conditions that require simultaneous execution.

To be clear: I'm not criticizing the tool or the work of the people. If I were to design such a tool, I would probably start with a single core as well. It seems like a valuable tool and great progress for software debugging. But I do think race conditions in multithreaded programs are a current limitation.

edit: The technical report says that they deterministically schedule threads (https://arxiv.org/pdf/1705.05937.pdf):

"RR preemptively schedules these threads, so context switch timing is nondeterminism that must be recorded. Data race bugs can still be observed if a context switch occurs at the right point in the execution (though bugs due to weak memory models cannot be observed)."

The "weak memory model" part means it won't help with, say, debugging lock-free algorithms where you screw up the semantics.


You should read https://arxiv.org/abs/1705.05937 so you don't need to speculate. rr absolutely does guarantee that threads are scheduled the same way during replay as during recording, otherwise it wouldn't work at all on applications like Firefox which use a lot of threads.

Also, rr definitely is very useful for debugging race conditions. For example Mozilla developers have debugged lots of race conditions using it. One thing that really helps is rr's "chaos mode", which randomizes thread scheduling in an intelligent way to discover possible races. See https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo... and https://robert.ocallahan.org/2016/02/deeper-into-chaos.html and https://robert.ocallahan.org/2018/05/rr-chaos-mode-improveme....


Very cool stuff! And yes, I took a look at the paper, as I noted in my edit. But I think there's still two classes of race conditions outside of its scope: ones that require simultaneous execution (where you can get surprising interleavings) and lock-free algorithms where correct use of the memory model is paramount. In my personal experience, these are the hardest problems to debug.


Even those are probably not 100% outside of its scope. I forget the details of chaos mode, but that kind of induced thread-switching can cause just the kind of interleaving you seem to be talking about.

What rr cannot capture is a very small subclass of race conditions involving things like cache line misses - I think that's what you're alluding to by "correct use of the memory model is paramount" but it's a subclass even of those. Yes, those are hugely difficult to diagnose and it would be fantastic if tools like rr or UndoDB could capture them. But there's a vast swathe of also very difficult race conditions that this recording tech can and does help with today.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: