Co-founder here. It feels incredible to be sharing Replay with all of you. It's been a labor of love the past five years!
Replay started off as a simple experiment in what would happen if we added a step back button and rewind button to the Debugger. We quickly realized two things. First, nobody uses breakpoints. Second, being able to share is so much more powerful than being able to rewind.
Here’s how Replay works today. Somebody on the team records a bug with the Replay Browser and shares the replay url with the team. From there, developers jump in and add print statements. The logs appear in the Console immediately so you don’t need to refresh and reproduce a thing.
Over the past year we’ve talked to hundreds of users, recorded 2.5 million replays, and worked incredibly hard to ensure Replay would be fast, secure, and robust from the get go.
Want to check it out? You can download Replay today. Can’t wait to hear what you think!
The problem with breakpoints is my loop runs 1000 times and I only care about the one time it errors. Making watch logic for that sometimes is too complicated or changes the outcome (race conditions especially). This seems like a great solution. I'll be checking it out!
If you can trigger a debugger from a keyword it's usually pretty easy to trigger it conditionally or on an exception in the code rather than through a GUI.
That works well for ruby and javascript and probably lots of others.
e.g. something like the equivalent of
def foo
bar.map do |b|
b.do_something!
debugger if b.state == :something_youre_interested_in
end
rescue => e
debugger
end
I'm really excited by Replay. I think it will be invaluable.
I haven't personally used Replay, but from my experience using rr (a native debugger that also provides time-traveling features) being able to replay execution both backwards and forwards in time on a whim is amazing. These tools excel at diagnosing bugs that are hard to reproduce, because you only have to reproduce the bug once under the debugger and then you can endlessly replay that execution until you figure it out. As Jason said above, you can retroactively add print statements in places that would be useful, without having to waste time trying to reproduce the bug again!
roc (the original author of rr) founded a company to build an even more compelling product on top of rr called Pernosco. They have some mind-blowing demos I'd recommend you check out: https://pernos.co/ .
Being able to easily answer questions like "where did the value in this variable come from, and when did it get set?" makes debugging a wildly different experience.
How would you debug something inside of a loop (1000s of values) with a line breakpoint in your IDE? Your debugger will trigger the breakpoint every single time it walks through the loop, even when everything is going fine. Now if you conditionally trigger the breakpoint inside the loop using code, it will trigger once and only if it encounters whatever error you want to catch.
Obviously you remove that debug statement from your code after you're finished debugging.
Chrome does conditional breakpoints - just edit the breakpoint and type your condition in - but it massively slows down the page of that code gets hit a lot.
So much so that I prefer to put the condition in my local build as source code then put a normal breakpoint on that after it has rebuilt (a few millseconds with a good watcher)
Which debugger do you use because I live in JetBrains tools and their debugger front-end support for conditional or on exception break points is phenomenal
But typically it will throw an exception _after_ the interesting code happened. For instance it might throw a NullPointerException, but you want to know what set it to null in the first place. With breakpoints you have to suspend on exception, and then add additional breakpoints earlier in the code, and then rerun the code. With rewind you can just step backwards.
Can confirm. Use it often both on Chrome and Firefox.
I love the idea of reply, but is it targeting the wrong people to get the post leverage out of it? Print statements generally don't need to be used for debugging if you know how to use a debugger. Which really isn't difficult if you spend maybe half an hour learning it's features.
For Python, but only if execution is deterministic, I made a small library that tries to address that issue: https://github.com/breuleux/breakword (it prints data alongside a deterministic list of words and you can set a breakpoint on a word of interest in a second run).
It's really a poor man's replay, though. That tool looks really slick, I'll definitely give the Python version a go if/when it comes!
Perhaps the comment was a generalization and not a attack on your process. Its not a radical claim to suggest breakpoints are not as popular as console.logs
It saddens me that a lot of people don't use debuggers and default to adding print statements. As far as I can tell, it's for several reasons:
1. The debugger is primitive (e.g. Godot GDScript - no conditional breakpoints or watches).
2. The debugger is unstable (e.g. Android Studio - frequently hangs, or takes a long time to populate data needlessly)
3. The debugger's UI is not friendly (e.g. Android Studio - hitting a breakpoint in multiple threads causes unexpected jumps or loss of current state; VSCode C++ debugger - doesn't display information properly or easily (arrays of objects) or displays too much information (CPU registers, flags, memory addresses); C++ debugger for D - doesn't display D data types).
4. The debugger is not properly integrated into the environment - can't find symbols, libraries or source files, or finds the wrong source files, etc. Need to jump through hoops to configure those.
5. Platforms don't support debuggers properly (e.g. again Android - ANRs when debugging the main thread, can't leave a debugging session overnight without some timer killing the process)
6. Developers got used to the workflow of "add a print statement, rerun and check the console" since high school and nobody taught them a more powerful tool
7. Developers code all day, so adding print statements by coding feels more natural than switching to the debugger's UI and way of doing things. (e.g. "if (i == 100) console.log(value)" allows you to stay in the same code, as opposed to setting a breakpoint, finding out how to add the 'i == 100' condition and pray that there's no issue with variables being optimized out at runtime).
I like Replay's features and that it's improving the state of the current tools. At the end of the day, adding print statements in Replay doesn't seem to affect the state of the application, so in that sense it's similar to gdb commands in that it's just a UI choice, but I wouldn't go as far as encouraging print-based debugging.
Outside of Replay, print-based debugging is still a primitive way of analyzing the state of the app and promoting this state of affairs reduces the pool of people who use and would hopefully improve the existing debuggers.
We all appreciated Firebug and the Chrome DevTools because of the powerful features they give us to inspect the state of the application. Imagine a person who adds print statements to their code every time they want to inspect the DOM or check the current CSS attributes. It works, but we have better tools, and we should make them even better.
I think print statements are actually useful in ways that typical debuggers are not meant to be; they make it easy to show changes over time, and they provide a tight feedback loop between observing the value of some data and performing interactions that update that data. For example, if you wanted to know how a coordinate calculation changed as you scrolled the page, print statements would be more useful than a debugger. I don't think this is exclusively why debuggers get less use, but I think that print statements aren't inherently a thing to optimize away from.
That and concurrent execution is where I've found print statements to be most useful, but nothing prevents a debugger from keeping track of some value over time and then display those values on the UI, just like one would with a print statement.
My view is that using print statements is absolutely a subpar method of debugging and that we should, in fact, optimize away from it by creating better debuggers.
Anything you can do with a print statement can also be done with a logpoint, if your debugger has that concept. Logpoints can also sometimes be simulated with conditional breakpoints (log something and then return false).
The debugger saves so much time wasted recompiling/reloading with new print statements, IMO it's strictly better on every aspect.
I've been dreaming forever about writing a debugger that basically just produces a timeline/log (branching in the case of threads or processes) of program execution, you can drill down the stack or into a loop at any point, and surfaces the trace of your code as opposed to 17 layers of library indirection.
I mentioned this in a different thread, but I'd recommend you take a look at Pernosco, a debugging tool written by the original author of rr: https://pernos.co/about/callees/
Couldn't agree more. Debugger support in modern codebases has become a huge after-thought which is such a shame.
It is an amazing way to discover how a codebase works. You pick a point of interest, and then you get the entire path from the beginning of the app's execution to that point as your stack trace, and every variable along the way too. Watches are great too for tracking a value changing over time.
Micro-services and Docker also took debugging many steps backwards - one advantage of a monolith is that you can easily step-through the entire execution, whereas if you have to cross process-boundaries it becomes a lot more complex to properly debug.
I'm working on a monorepo template at the moment where everything is debuggable with a single-click. This includes debugging native addons in C++ and Rust for Node.js projects. It's not easy - which is why people avoid debuggers so much.
I recently setup debugging in a Rust project for IntelliJ was the alternative was adding `dbg()!` statements which involved 10s recomplilation. The difficulty was implementing the right pretty-printers in lldb so you could see the value of various types because support is quite immature at the moment.
Those top-to-bottom stack traces also become a lot less useful in today's highly-meta frameworks, where functions get passed around and eventually scheduled at a point totally divorced from where they live in the code. I'm not saying this is a bad thing, it just makes debuggers somewhat less useful.
It's certainly a combination of these things. I use breakpoints all the time when I'm working with C# because I'm inside Visual Studio. It's super easy to work with the debugger there. With Source Link I can even step into other libraries of ours. Debugging C++ under VS is also not bad, and Python in PyCharm is a good experience.
But if I don't have VS or PyCharm available, I'll switch to printf debugging.
Though there are some cases where even with a good debugger I'll end up debugging by modifying the code. Sometimes it's necessary for performance reasons. Conditional breakpoints when debugging C# are extremely expensive so tossing one on a line that's executed many times may make the process far too slow. In that case it's better to compile in an if statement and then drop the breakpoint inside there. Other times the debugger is just limited in what information it can provide. Pointers to arrays in C++ are a common annoyance since the debugger has no length information.
My theory is that breakpoints are not useful because they let you go forward. But if you have an issue somewhere where a variable is not in the right state, it's because somewhere in the past was the issue. But you can't go back with a normal debugger.
Replay allows you to go back in time which is to me the biggest breakthrough. This actually makes them useful!
Breakpoints are a tool to stop execution and land in the present. It's the debugger that decides where you can go from there. Typically they'll allow you to go into the past, but only to inspect the stack frames, because the values on the heap get overwritten. I vaguely remember that some debuggers are able to record heap writes and thus are able to show the entire state of the app at each frame, effectively "going back" and replaying stack frames. My guess is that Replay does something similar.
Maybe 2a. Executing code (like a print) in an auto-continuing breakpoint action makes the program itself pause; especially tiresome when you're looking at a timing or performance issue.
Just my anecdote: Personally I don't like using the one in Xcode (and maybe I'm missing something obvious) because I got so used to the debugger in JS land where I get access to a live REPL which functions just like the code I write. In Xcode, I'm stuck with some lldb prompt which I don't understand and definitely doesn't function like the one in JS tooling. I'm sure it could be more useful if I invested more time into learning it, but the barrier is there.
I’ve used good debuggers in the past, but the main downside to me is that the workflow improvement is relatively minimal compared to print debugging. The “live programming” aspect of Common Lisp and Clojure, as well as the way Cider implements tracing for Clojure _is_ a major improvement, but only because they let you be more precise in what needs to be re-run for print debugging.
I think it's often that the compiler/environment do not leave enough information for the debugger, typically by optimizing out local variable names and their values. By the time you figure out the obscure settings to be able to see the live values of variables and other state, you may have done a lot more surgery on your build system and slowed things down to a crawl compared to adding a few print statements.
Everyone in the comments is talking about using this for their own debugging, however I think the way to win with this as a business is in two places
QA and Automated QA.
If you have real human QA people in your org and they could run this while doing QA. If they hit a bug, and could then share a simple link to the dev team that captured their issue + the stack.
Same goes with automated QA. Record the UI tests using this, and if one fails, store the state + stack.
There are a LOT of hard problems in that workflow... Good luck!
You're right on the mark here. I saw rr used at Mozilla to diagnose and fix flaky tests that failed just often enough in CI to make life miserable, but were nigh-impossible to reproduce in a local development environment. Being able to take that to the next level and collaboratively investigate a bug using a recording that captures it is game-changing technology from the future.
Imagine a world where instead of ignoring, skipping, or marking "known failure" on all those flaky tests your CI hits (we all have them) you could capture recordings of them, and then actually investigate and fix them! That world is possible!
Currently the only way to sign up for a personal account is through google. Is there another way that I am missing, or are there any plans to provide email based signups in the future?
(Replay engineer) Google is currently the only way unfortunately. We wanted to support secure logins at launch, with SSO and Multi-factor Authentication, and focus most of our efforts on the core product. As a result we just went with Google for launch.
We don't currently have any plans to add additional authentication mechanisms but we've heard this feedback from a couple folks and we'll sit down to prioritize it after the excitement of launch has died down. Sorry about that!
Even for Google logins, would you consider using a token-based authentication system (like Spotify, Postman, etc. do)... i.e., your default browser opens, Google logs you in there (or you already are), and that sends a backend auth token to your service to connect your Google and Replay accounts.
I would like to try your product, but am wary of typing in my Google credentials into an unknown, black-box browser. It's too easy to MITM, especially if someone redistributes a copy with a keylogger shimmed in.
The token-based auth means you can still log in with Google but never have to share your Google password with the proprietary Replay browser. Probably many of the multi-login vendors support something similar already if you don't want to deal with it yourself.
Thanks for the update. I had a feeling that this might have been the case due to the launch crunch and it's perfectly understandable. Really excited for the future of this project!
(Replay engineer): we basically record all the inputs and outputs to a program. In the example of an HTTP request: when recording we'd record that a request was made, and the response. When replaying, rather than make the HTTP request, we return the response that was recorded.
> Likewise, the raw pointer values don’t have an effect on the behavior of the JavaScript
There are actually some cases where raw pointer values can effect user-visible browser behavior indirectly. Notably if the pointer values affect iteration order in an associative data structure which the browser uses in some way that tickles JS differently depending on the order.
There are bunches of other bizarre edge cases you'd never think of as well that never come up (until inevitably one day they do). Another example: In old versions of Blink I've seen ligatures fail to form depending on what was in the font cache when the page was loaded.
(Replay employee) Yeah, later in the post (https://medium.com/replay-io/effective-determinism-54cc91f56...) we mention needing mitigations to ensure that hashtable iteration is deterministic in cases where it affects how the browser interacts with the JS engine. There are others needed as well to ensure that the browser behaves the same across a large range of websites, my favorite is handling sites that sniff information about the system's math libraries by e.g. calling Math.sin() on specific values and testing the results --- the values when replaying need to be bitwise identical with what happened while recording.
Ah I shouldn't have stopped reading. Then as one of the rare browser replay experts (there's gotta be at least like... 7 of us right?) I can certify your project as the real deal.
Thanks for the pointer. Will take some time to digest the content. Have you considered in making a library, so whoever wants a session replay can manually initiate a recording? So you don't have to maintain a browser fork. I am asking without fully understanding the technical detail. Please forgive my ignorance
That's the point: looks like a tool made to be the best of both worlds:
You get access to the full debugger, at any point in time, instead of clicking "next" a hundred times and then not being able to go back if you missed something. Here you go forward or back in time, and you can visualize where you are.
This thread sums-up much better than me why people use print debugging (not just because they don't want to learn a tool, or are dumb), and why Replay is the best of both worlds:
https://twitter.com/geoffreylitt/status/1438152748449636360?...
I'm inclined to agree - I use breakpoints all the time when I'm debugging JS code and it is most definitely an invaluable tool.
It can be a bit cumbersome to setup and it can be a little buggy especially when you're working with transpiled code, but to say no one uses breakpoints is a bit disingenuous.
I don't like that your marketing story normalizes not learning breakpoints. It's like the popular guy at school being self-deprecating and charismatic about being bad at math. You shouldn't position yourself as an alternative to breakpoints because you're going to alienate a lot of more experienced devs who are comfortable using a debugger.
Replay does look interesting. I'm wondering about an itch I can't yet scratch. If someone on my team knows how to repro, we don't really need Replay. I have repro steps and control of the environment. But if a customer in the field has a bug reproduceable that we can't reproduce in house, this is my itch. What is the least friction and least invasive way to get a Replay recording in that scenario?
Congrats on the launch! This looks awesome. I would love to see this for Ruby and other backend languages, I will definitely check it out at that point. I really like the unique approach, and I frequently struggle with breakpoints in rails because the call stack gets so deep. Being able to set a break point where the error occurs and step back through time would be a huge win.
(Replay employee here): We have experimental support for some react-specific features using react-devtools since React is such a common framework, but the Replay is a general debugging tool and will work with other frameworks well too.
I just made an account did a recording but cannot seem to see the network request tab anywhere in the ui?
I can make a brakepoint, add console.log statements and evaluate stuff (pretty cool tech) but where do I find network requests?
Replay deterministically replays the recording, so if the state of the application when you recorded it caused a network call, then when replaying it we will also "make" a network call _but_, instead of actually going out to the network we will instead return the exact data that was returned when you recorded it.
Can you expand more on what you mean by a race condition in network calls? If it's a series of network calls that the browser could make in any order than we will make them in the order that they occurred when you recorded it. If it's a race condition that occurs in your backend, then the Replay browser won't really help there (though it will show you the responses you got from your backend when you recorded it).
For help with that, you might want to use Replay on the backend. Right now we have Node support (https://github.com/RecordReplay/node), but other runtimes are on the roadmap.
(Replay engineer): In the long run we'd love for Replay to prove to the major runtimes that they should build support for this in to the runtime itself, rather than us maintaining many forks. The API that we designed for recording a runtime is open source and available here https://replay.io/driver and could serve as a good starting point.
That’s gonna be a pipe dream unless an open source client appears. Consider releasing your tech as a self-hosted, turnkey solution and slap on some dual enterprise licensing.
Self-hosted is on the roadmap, but getting this to be a universal technique is definitely going to be hard, no arguments there. Gotta start somewhere though!
Not exactly similar, but in IntelliJ/java one can drop the current frame in the debugger. This jumps one step up in the stack, and allows you to enter the function fresh. Can drop arbitrary many stacks to go backwards.
Of course, it doesn't rollback the heap or any shared state. And playing forward again it will redo things again. So beware of side effects. But in codebases with lots of immutable data structures (Kotlin ftw) it works great.
Combined with hot swapping, one can even drop the frame, change the implementation of the function and then reenter, making it possible to test code changes without spending long time getting back into the same state/context.
Wow! Thank you so much for this comment. I have been using IntelliJ for over 10 years and I never knew this feature existed, I just gave it a try and it's incredibly useful.
One thing I wish Java debuggers supported was the ability to move the instruction pointer to a different line, as has been possible in other debuggers for ages. Is it a JVM limitation maybe? I remember being able to drag the "current line" pointer forwards or backwards in languages like C, C++, and C# in maybe 2003. I wish I could do this with Java; dropping the whole frame is useful but this feature lets you do a lot more, like break out of a loop or skip a block of code you _just_ realized shouldn't execute.
(Replay engineer): We're big fans of RR and pernosco, I love to see those tools get their due. Replay is also designed to support backend programs. We [support Node](https://github.com/RecordReplay/node) today with more runtimes coming soon.
Do you support it only for dynamic languages, where possible to monkey patch code?
All the debuggers mentioned above for the backend work only under Linux, because from what I understand, they use `ptrace` syscall, and on Mac have completely different format, and different capabilities.
Do you plan support Golang, especially on Mac, maybe with custom fork, or similar?
The runtime infrastructure can support all of those. The current recorded browser runs on mac, and the mac image is replayed in a linux backend (with the system calls being handled by the replay engine).
Our initial launch is with a modified Firefox browser on Mac, but the infrastructure itself is generalizable to other runtimes and other operating systems.
However, we do need to "paravirtualize" the runtimes that are recorded on our system (modify the underlying runtime to make it aware that it's being recorded, and do some integration work for each runtime). The design of our system allows for new runtimes to plug in and use all the same infrastructure for replaying.
So the long answer is that we can support them, but support for each runtime will arrive as we prioritize and complete the implementation for them.
Currently we have the mac Firefox-forked browser. In the works we have a chrome browser, nodejs backend, and a firefox fork for windows. But realistically we should be able to support `(any runtime x any os)` within reasonable bounds. Record and replay all the things :)
I am a JavaScript framework author, and was one of those fortunate to get early access and honestly it is the most useful tool I've ever used in the debugging space.
Sometimes things are complicated. Often there is a need to do digging to uncover the issue. Being able to move forward and backwards and even jumping between seemingly disjoint parts of the timeline are all at your disposal with Replay.
Replay has saved me hours of time. And that isn't hyperbolee. On a couple occasions due to laziness and familiarity I'd do stuff the traditional way and be stuck still after hours (sometimes days) on the same bug. With Replay I was able to shorten that time to about an hour on even the trickiest of bugs.
So stoked to now have Replay available to others to help record reproductions of their bugs.
I've also been so lucky as to have been able to play with it for a while now, and can corroborate that it's super useful. It's not just a better alternative to something else; it's a whole new category of debugging tools, in the sense that there were problems that were pretty hard to debug and are much easier now that Replay is available.
Thanks! Bret Victor has been my inspiration. Ever since I watch inventing on principle ten years ago, I knew this was what I wanted to build. In fact many folks on the team have a similar story. That's the crazy part about IoP for me. It's created a movement!
Engineer @ Replay here - If you're a console.log kind of developer (isn't everyone at least some of the time?), you should check this out. Imagine being able to add console.log() statements to code that already ran and seeing the results immediately! It's a bit mind blowing the first time you use it.
The video next to "Print statements from the future" on the webpage shows a bit of it.
You can click on the line-number next to a line of code to add a print at that location. If you hover over the line-number, it'll show you a count (exact and correct) of how many times that line of code hit during execution. Clicking on it adds a print log at that line with a default string.
At that point, the console log in the left should change immediately to include a bunch of new entries with "Loading.." text that resolves to the text of each print statement.
Clicking on the string allows you to replace it with an expression to be evaluated. The expression can include references to local scope variables, etc.
If you edit the expression, the console entries for the prints go back to "Loading.." and the new values are resolved.
The prints in the console are ordered in time sequence along with all the other events that can be shown there (regular console logs, mouse and keyboard and other events, etc.)
If you're interested in non-Javascript time travel debugging: pernos.co (which is layered over Mozilla's rr time-tracking debugger) is an absolutely amazing tool which will save you days and days of wasted development time.
Pernosco's tool is described pretty well on their website, but basically it allows you to view a program inside and out, forwards /and/ backwards, with zero replay lag. Everything from stack traces to variable displays (at any point in time in your code execution) is extremely easy to view and understand. The best part is the lightning fast search functionality (again: zero lag).
On top of this: extraordinary customer service if anything breaks (in my experience, they fix bugs within 24 hours and are highly communicative).
Through $DAYJOB, I happened to have a meeting with one of the founders of this company some time ago -- not about Replay, but about a "How do you do this" sort of question.
After prying for technical details, Replay came up, and I asked to see it out of curiosity.
Really blew my mind. Every once in a while a piece of technology comes around that doesn't quite have an equivalent.
I could immediately see where being able to have your users or coworkers record bug reproductions or features and submit them in issues or PR's would save monumental amounts of time.
Wishing Replay team best of luck, I was thoroughly impressed.
I asked about this (whether it could be done as a browser extension) and the answer I was given was that the browser was required because they do things like hook syscalls and other close-to-the-metal stuff
So if you can’t capture API calls directly, what do you do? You drop down one level and record the browser system calls. This probably sounds like a terrible idea. We started off with a simple 3 line JS program with one API call and one clock and instead of just recording these two calls, we’re now recording a program with millions of lines of C++ and the complexity of an Operating System. Yep! It’s crazy, but works, and it’s pretty awesome.
So the nice thing about system calls is there are not too many of them and they don’t change that often. This means that instead of recording an API call directly, we can record the network engine’s system calls to open socket and process packets from the server. And by recording at this level, it’s possible to replay the website exactly as it ran before, with the same performance characteristics and everything else. It’s a little bit like putting your browser into “the Matrix” and tricking it into believing everything is normal, when in fact it is just running in a simulation.
Session Replay records the DOM so it can be replayed like a video. Replay.io records the browser input so the browser session can be replayed again.
The biggest difference is that when you're viewing a replay, we're re-running the identical browser on our backend. This way you can debug the real thing.
I agree there's a lot of session replay style equivilents (and rrweb is great!), though what Replay is doing unfortunately cannot be done in production in the same way. In the ideal world this level of data could be extracted from production incidents without an individual running specialized software, but with the state of technology I'd say they're both aiming to solve different problems.
The good:
I checked out the tool and seems to work as advertised. Also nice to see Replay browser based on a Firefox fork.
The bad (and this is more your marketing/PR/branding, not product):
- You require an account signup, OK. It's a Google only signup, OK step over that. But it did not clearly mention that you this put me on a mailing list and surely 5 minutes after I signup, I get a random email to support a launch on product hunt.
- With the amount of engineering that went into it, I would expect you to be proud of the craftsmanship and your team. Instead the top of your website states you are proud of getting money from investors. This is more a vote against this trend, than your particular behavior.
- I was able to find the post "How Replay works" [1] which is the actual content addressing your target market. The post conveys 2000 characters of information and uses 4.3MB of data to do that for a signal/noise ratio of 0.04%. It is the type of web obesity [2] that we are used to nowadays, so nothing new. Mentioning this only because you are a web engineering-centric company. Promoting the right values of web performance and engineering attention to detail is IMO important for a product talking to web engineers.
I realize this may come as unpopular/beyond conventional wisdom but getting a different perspective is what HN is good for. Use the feedback at your discretion.
Props for making an innovative product and good luck!
Your comment isn’t off putting for the reasons you gave.
Your post feels smugly opinionated. If that wasn’t your intention, I don’t know what to say. Just look at your first two bullets. Passive aggressive and more.
It seems a lot better now. I just upvoted while before I abstained (while you appeared to be downvoted). I am in a bit of a rush right now so I only read the first two bullets. They both seem good to me now.
Thanks for not taking my comment badly or ignoring it!
I also found out about Orion by looking through your comments!
1) How does the step backward functionality work? Do you take snapshots every so often of the Javascript environment? How do you handle destructive assignments?
2) Does Replay record actual syscalls made by the browser, or is it recording calls to the browser APIs by the javascript code (which I guess are effectively syscalls from the javascript code's perspective)?
3) The ordered lock technique described in https://medium.com/replay-io/recording-and-replaying-d6102af... makes sure that threads access a given resource in the same order, but what about threads accessing different resources in the same order? e.g. when recording, thread 1 accesses resource A before thread 2 accesses resource B. It seems like the ordered lock technique doesn't help you maintain that ordering in the replay. Is maintaining that kind of ordering across resources not actually necessary most of the time?
1. Rather than having to restore state to the point at the previous step, we can step backwards by replaying a separate process to the point before the step, and looking at the state there (this post talks about how that works: https://medium.com/replay-io/inspecting-runtimes-caeca007a4b...). Because everything is deterministic it doesn't matter if we step around 10 times and use 10 different processes to look at the state at those points.
2. We record the calls made by the browser, though it is the calls into the system libraries rather than the syscalls themselves (the syscall interfaces aren't stable/documented on mac or windows).
3. Maintaining ordering like this isn't normally necessary for ensuring that behavior is the same when replaying. In the case of memory locations, the access made by thread 2 to location B will behave the same regardless of accesses made by thread 1 to location A, because the values stored in locations A and B are independent from one another.
Thanks for the explanation! Do you ever run into performance issues with replaying from the start on each backward step or is this not really in issue in practice? I imagine for most websites and short replays it's probably fine, but for something like a game with a physics engine it sounds like it would be too expensive and you'd need snapshots or something. I guess that's a super small percentage of the market though.
For question 3 on the ordering, I was imagining the following kind of scenario: one thread maybe calls a system library function to read a cursor position and another calls a system library function to write a cursor position. So even though they're separate functions, they interact with the same state. Do you require users to manually call to the recorder library to give the recorder runtime extra info in this kind of scenario? Sorry if this is a dumb question, I haven't really done any programming at this level.
We definitely need to avoid replaying from the start every time we want to inspect the state at some point. This is kind of an internal detail, but we can avoid having to replay parts of the recording over and over again by using fork() to create new processes at points within the recording.
Ordering constraints between different library functions do crop up from time to time. In cases like this the recorder library uses ordered locks internally (basically emulating the synchronization which the system library has to do) to ensure that the calls execute in the expected order when replaying.
Thanks for the links to the blogs. I was wondering how it worked and the "How it works" bit on that page said nothing. Nice that they've explained it. It looks like the blog does answer your questions though:
> The interface which Replay uses for the recording boundary is the API between an executable and the system libraries it is dynamically linked to.
As bhackett confirmed, you're right about recording at the system library call level. I wasn't sure if it was more of an analogy or only referred to a version of Replay targeting backend servers written in other languages like Go, especially since the author mentioned hooking into the JS runtime in https://medium.com/replay-io/effective-determinism-54cc91f56.... But it looks like I misunderstood, and their browser product is their generic record/replay library integrated into Firefox, rather than a reimplementation of the same concepts.
Replay employee here. Personally, I'm excited to see maintainers slowly phase out "Please include steps to reproduce" in issues, and replace it with "Can you make a replay"?
Partly because I want to see the product do well, but also selfishly as an engineer, because we've been dogfooding replay and it's made squashing bugs 10x easier. Having somebody attach a replay to an issue makes that issue immediately better than an expertly-written one, which as an engineer, I can start debugging in seconds with minimal back-and-forth.
I just learned about it here and tried it out on a small to medium-sized TypeScript React app. It worked right out of the gate! The original TypeScript files appeared in the Source browser. Time travel works. And the video replay too of course.
Really impressive. Will keep it around for sure to try debugging a real issue next time. Congrats on the launch and the great app!
Looks very cool. When I've been working on native code on Windows, the WinDbg time-travel feature has been a magical experience that's saved countless hours (https://docs.microsoft.com/en-us/windows-hardware/drivers/de...). I expect and hope this will do the same for web development.
p.s. On some pages (e.g. https://www.replay.io/pricing), I only see the Mac download button, even though I'm running Edge on Windows.
I use WinDbg for analyzing crash dumps and there's so much I don't understand about the process (I basically follow a step-by-step sequence I made a few years ago).
Any advice for leveling up WinDbg skills, especially as they relate to post mortem analysis? I suspect I also need to develop better assembly (or is it machine?) language skills. I'd like to learn a lot more about this stuff but resources (free or paid) are hard to find.
We've been using Replay heavily here for the past 6-7 months and it's rapidly become the preferred way to do any form of debugging. Training our support staff to capture replays and submit them in tickets has made turn around time on bugs significantly faster.
I'm an engineer at replit and I've been using this to find and fix a bunch of nasty bugs. I love the sense of confidence this gives me. When I have the recording captured — I know for sure I can get to the root cause of the problem.
Replay also makes it easier to jump into a new codebase, I can see how things work.
Replay has been amazing for me to debug hard to reproduce issues like live collaboration issues. You just record the bug, share it with others, place some console.logs (after the fact!) and scrub through the recording to see where things went wrong.
Also nice side-effect is that it's amazing to explore other code-bases, just being able to put a console.log somewhere to see how often it's run when using an application is a lot of fun.
I want this to succeed, so I want the company to succeed. On that note I think you guys should change up your pricing.
Seems like there's too big a gap b/w the free forever (individual) and $20/mo/user for team.
I'd love to pay for this as an individual at a smaller amount - like $10/mo - for a few extra features. Or maybe reduce the functionality of free forever.
(Replay employee): Thanks for the feedback and so glad to hear you're excited about Replay! Our focus right now is getting Replay into more people's workflows so we can learn and improve the product.
The great thing about the individual plan is that you have access to the complete feature set of recording and replaying and can invite collaborators to work with others.
I'd encourage you to jump in, start using the product and sharing feedback with us and that'll help Replay immensely.
In my line of work I've found the most difficult issues to debug are asynchronous dependencies on API actions. How nicely does replay work with POST data, if I scrub back and forth will I retrigger POSTs or is the request/response captured and emulated in the replays? If replay handles this nicely then I'm very very interested in adding it to our workflows.
Logan already answered this somewhat but I thought I'd elaborate. During replay we sandbox the entire replaying browser and trap all external IO requests using custom handlers which feed in the original recorded response for that IO request.
Replay does this at the system API level, catching network IO, disk IO, IPC, and any other system interaction done by the recorded browser.
Async dependencies are a tough nut to crack. The main query of importance there seems to be "when was this currently executing code first added to the scheduler". Time travel debugging gives us the infrastructure to answer that question in a single click (and transitively the entire chain back to the initial program execution).
However we haven't implemented specific support for these use cases yet. Our initial public beta feature set is "reversible-execution in debugging", "print statements that work in the past", and "cloud collaboration on debugging through shared discussions on individual replays". I'm oversimplifying but that's the gist.
We'll be prioritizing features to implement and user feedback is of course critical in directing that effort. Features for async debugging, more frameworks support, network monitoring, more runtimes and execution environments, time-travel watchpoints on variables and objects, and different domains such as CI-integration (having replays automatically made for CI test runs) and serverside (easily recording and replaying backend code - starting with nodejs).
The current MVP is pretty breathtaking, and there's a ton that can be done on top of it going forward, and we're excited to deliver more :)
It's definitely a well-considered approach. I don't have to care about the framework that's been used or set up a custom proxy to capture and replay / simulate API actions, you're literally recording system-level calls and that would be extremely valuable for replaying those annoying edge-case race conditions that seem nigh-impossible to track down without significant effort at replicating exact steps and timings.
I mentioned in my other comment about Windows support, but even better if you could do something like browserstack, where I could just direct users to a URL where in the backend you guys are running the replay browser, but from their perspective they're just "using the website", that would be a killer feature. "Here, go to this URL and make the bug happen again, as soon as it happens click the little bug icon" - wouldn't have to convince an IT department to allow custom software on their COE, I could foot the bill and pass it on in my invoices so don't need to convince their accounting to approve licensing, etc, and you wouldn't need to compile OS-specific clients...
Anyway I digress, really cool stuff and thanks for expanding a bit on how it works, taking something so low-level as syscalls and wrapping it up in a user-friendly interface is no mean feat - good luck!
It's definitely possible to abstract away HTTP requests for debug purposes; Cypress is one example, which makes a snapshot of the DOM and a log of what happened for every event, and allows the user to stub out HTTP requests. Redux's time traveling debugger omitted HTTP entirely, instead only logging changes in the state.
Cheers for the heads up on cypress, not sure how I've missed it but I've added to my reading list and I'll be doing a deep dive. I'm using redux on my current side project and loving being able to scrub through state changes.
My angle on this thread is that I don't have much control over the frontend frameworks and usually when I land in an enterprise integration job a lot of the broader architecture is already in place - longer term I can have some influence but generally debugging involves numerous stakeholders. A tool like replay seems useful because most testers / users are reporting issues from their direct experience using the frontend, and being able to record and scrub through their interactions would be a massive timesaver, in lieu of setting up custom testing frameworks etc.
I get what you're saying though, and I'll definitely check out cypress, just wanted to add some context.
Once a recording is made, you can jump around in it as much as you like. The network requests will only be made during the initial recording process as they would normally, and when debugging it will all be pulling the request/response data from the recording itself rather than querying the network.
That's a big selling point for me, props to the engineering team for accommodating this, I work mainly on integrations and the trickier bug reports are weird edge cases that involve a lot of getting specific descriptions of what happened and replicating it manually on a sandbox so as not to impact API calls, getting on the phone to talk through exactly what they're doing, asking for screenshots via email, clarifying conflicting understanding of in-house terminology etc. You know the story.
Being able to get them to record the exact process and scrub through it at my leisure without having to worry about hammering APIs would be a massive timesaver, replication is about 90% of the time taken to fix a bug, while the fixes themselves are usually trivial. Not having to worry about accidentally replaying a bugged-out API call is a huge plus.
In my case the major hurdle I can foresee is the majority of my enterprise-level clients use Windows and a brief look at the website says currently there's only Mac / Linux support for the replay browser. What's the timeline on Windows support?
The only other thing I can think may be an issue is that the recordings are cloud-based and a lot of the clients I deal with are finnicky about exfiltration and governance, and would be a lot more comfortable self-hosting where possible. Is that possible or on the roadmap?
This looks like a super useful tool to add to the toolbelt for sure, I would love to have all my clients using something like this to report issues. Nice work all!
(Replay engineer): re: Windows, we have an alpha build out now, with a wider beta release coming later this year.
re: on-prem, it's definitely something we plan to support, and something we have engineered in to the infrastructure. If you're interested email hi@replay.io, we'd love to talk more.
Work on Windows support in ongoing now and we're hoping to have at least an initial beta Windows release some time in the next month or two ideally, though it certainly depends on if anything unexpected comes up.
Man, I wish I could use something like this for my Swift macOS app [1]
I’ve got tons of crash reports in Sentry that I have no idea what to do about.
On things like:
CFRelease() called with NULL
replay could help me find where the heck that NULL came from as I don’t have any CFRelease call in my code
Or the more annoying
BUG IN CLIENT OF LIBDISPATCH: Assertion failed: Block was expected to execute on queue [com.apple.main-thread]
Like, how does that specific line run perfectly fine thousands of times and then every once in a while it decides it needs the main thread. I wish I could replay that traceback to see why the main thread is suddenly needed.
Having played with Replay already, I have to say it feels nothing short of magic. I can't tell you how much something like it would have made debugging weird issues in the depths of something like Webpack a whole lot easier.
This looks exciting and I’m going to give it a try.
For what it’s worth, if it helps with marketing: I am immediately skeptical that it’s going to be some mix of tedious to get working with my specific webpack dev server setup, be slow or unreliable, or never actually keep the state I need.
I’m hoping to be wrong on all of that. But when I scrolled the (beautiful) website, I would have loved to see some really nasty example in addition to the very elegant introductory “show don’t tell” example.
I want to see the first and think “okay I get it” and then be shown something really complicated to illustrate just how durable the tool is.
Can the creators describe this product's advantage over FullStory and LogRocket? Also, it seems challenging to get this to work in production, since users wouldn't consent to this level of monitoring.
In its current state, Replay's not meant to replace either of those products as it's not meant for 24/7 user session monitoring. Instead, it's something that we see teams reaching for in their debugging workflow.
The usual bug workflow is for one person to file an issue with steps to reproduce, which the engineer will use to reproduce the bug and try to debug the problem. Replay replaces that workflow by allowing a person to record a replay, and send that link (which is immediately debuggable) to the engineer.
One thing I've thought would be useful to have is a combined debugger/logging system. I work on SQL Server and we rely heavily on dumps to figure out what happened when something goes wrong. However, that only tells you what happened at a given point of time. Obviously recording all the system state over time would be too expensive and logs can be difficult to decipher due to their volume and lack of (explicit) connection with source code.
What I'd love to see is a logging framework which records the values of program specified variables while running as well as the current stack trace plus a monotonic time so you can piece together what happens through a thread of execution over time. However, unlike traditional logging, it would be connected to the source code like how a debugger works so you could mouse over a variable to see its state over time.
Honestly it'd be really cool with a tracing system like this to be configurable without modifying source as well. Maybe trace specific functions by noting the stack trace and arguments when invoked as well as the return value.
I love this, and apart from its intended purpose it seems like a potentially valuable learning tool for devs new to a language.
Years ago I used Adobe LiveCycle, a horrible "low code" enterprise framework foisted on us from far above.
One thing I always did like about it though was its replay tool which this reminds me of. I was always surprised it wasn't more of a thing in the dev space, it seems very useful.
At my job, we do Java, and it would be wonderful to have something like this. I litter System.err.println around the code then run it so I can better understand the behavior, so the idea of doing the equivalent in a recording would be awesome! I don't know how my colleagues manage with a regular debugger, it's like looking through a pinhole, you only see a tiny part of reality.
Try to use the debugger, learn breakpoints and watches. Unsure what IDE you use, but if you use Intellij, this will get you started very quickly.
https://www.youtube.com/watch?v=lAWnIP1S6UA
Very interesting. I can change a breakpoint to not break, but log, instead. This should be quite interesting. Also the option to "go back in time" by dropping a frame from the callstack is very useful. This way, it's possible to go over some part of the execution again without having to restart the whole application.
While I both love it and need exactly what it provides, I have to say I felt very disappointed when I clicked the CTA with a windows icon only to be greeted by a notice that it just isn't there yet.
Would it be possible to communicate this by disabling the button and adding some kind of "coming soon" messaging straight to its right?
VERY impressive. I downloaded it this morning and took it for a spin on my current project. I love how much you pack into this tool and make everything work seamlessly together. If those source files in the DevTools view are editable, I won't need my IDE anymore! Thank you for bringing us this tool.
Happy Replay user here. One of the most time consuming parts of bug triage and resolution is often just reproducing the bug in the first place. Replay helps us reproduce customer bugs and find their root cause much faster. I wish I had it in past roles.
But there was one earlier presentation I cannot find it, guy was showing live debugging of the video game. Not sure is it TED or one of conferences ...
There used to be a product called "Chronon" back 10-12 years ago... The blog spammed Dzone and a lot of other websites and refused to pay for advertising. Their CEO encouraged people to stop writing log statements out and just run their debugger all the time. Looks like it's defunct now: http://www.chrononsystems.com
We've come a long way from the web replay days. We re-wrote the recorder to support additional runtimes like Node.js and Chrome. Replay is also cloud-based, so recordings are shareable and super fast.
We're really grateful for the support we had within Mozilla in the early days, but what we've come to learn is that projects like Replay really benefit from being able to be nimble and solely focused on a great experience which is difficult when you're a small feature in a larger product.
We've come a long way from the web replay days. We re-wrote the recorder to support additional runtimes like Node.js and Chrome. Replay is also cloud-based, so recordings are shareable and super fast.
We're really grateful for the support we had within Mozilla in the early days, but what we've come to learn is that projects like Replay really benefit from being able to be nimble and solely focused on a great experience which is difficult when you're a small feature in a larger product.
If Firefox does anything to make itself more attractive to developers, then Mitchells puppet masters will stop giving her the vast sums of money needed to enrich herself and her social activist friends.
Better to spin out desirable features (like this) and buy in undesirable features(e.g. Pocket)
Replay has native support for React DevTools so you can easily inspect the component tree, props, state, etc... High level, because Replay replays the browser session everything just works. Even I can understand Angular apps and i've never built one ;)
Thanks for reporting I'll take a look at what went wrong there.
I just visited the recording and the upload seems to have completed - I can view the recording from your link and debug it. I assume you shared it publicly?
> What are the limits for the recordings in terms of time and input data size?
The best answer right now is that it varies a lot depending on overall CPU usage, the amount of memory used, and the length, so it's somewhat difficult to nail down.
Right now we recommend less than 2 minutes long as an attempt to keep things reasonable, but if it's 2 minutes at full CPU usage it may still not load.
> on Ubuntu 18.04 I get the following error:
For the Glibc error, I'll ask around but not sure off the top of my head.
Today I spent maybe more than an hour tracing a bug that a screen I created turned inaccessible after a merge, tests died suddenly.
It turned out to be multiple cases, one was that a colleague changed a default export to a named one which excluded my screen. Another was an in an unrelated test which missed a React context wrapper so I needed to refactor her tests.
I don't know what kind of magic debugging tool would help with these kinds of things.
(Replay engineer): We support all the debugger features (breakpoints, step in to, step over) and more (step back). We emphasize console.log because we found that time traveling console.logs makes for a good demo, and engineers immediately grok what is it capable of, rather than think that they need to adopt a whole new debugging workflow to use Replay.
(Replay engineer): Kind of! In the code viewer's left hand side gutter, the one that shows line numbers, if a line of code was never executed it is greyed out. Otherwise if you hover over it we show you how many times that code was executed, and from there you can set a logpoint/breakpoint.
Hey! Sorry to hear you're having an issue. That message occurs when we try to asynchronously load a JavaScript bundle for the app and it fails. Typically, that's because we've updated the app and that bundle has been replaced with another file.
That shouldn't be the case for you though so perhaps there's a network issue? Clearing your cache and trying again might resolve it. Feel free to jump on our discord (https://replay.io/discord) and we can help troubleshoot more together.
I'm using just a slightly outdated browser, which is probably the cause. I'm on Ungoogled Chromium 81 (by choice) which doesn't automatically update, and the ||= (boolean OR assignment) operator wasn't added until 85.
Your payload appears to compile down for older browsers (as is evident by the use of `var`, unless.. you actually write code using `var`) but it misses the ||= operator transformation it seems.
Thus, the browser throws a syntax error, which your app interprets as "new version" for some reason.
By the way, please don't disable right clicking. It doesn't actually solve any problems and only annoys users.
You guys have a download button for Mac OS X (apple logo), and you have a button with the Windows logo.
But there isn't a windows version yet as the pop-up tells me.
And why isn't there a linux button, if you actually have a linux version?
The button is platform specific. So if you're on linux, it should show the linux icon. Windows support is on the way. It just entered alpha two weeks ago and we're targeting Beta early october. We record low level windows os calls and replay them on linux in a virtualized environment. Frankly thrilled it works at all :)
Hi Jason, This looks great. Recommend you add an email waitlist for windows version. My team only uses windows. So we cannot try it out till that is released.
Replay started off as a simple experiment in what would happen if we added a step back button and rewind button to the Debugger. We quickly realized two things. First, nobody uses breakpoints. Second, being able to share is so much more powerful than being able to rewind.
Here’s how Replay works today. Somebody on the team records a bug with the Replay Browser and shares the replay url with the team. From there, developers jump in and add print statements. The logs appear in the Console immediately so you don’t need to refresh and reproduce a thing.
Over the past year we’ve talked to hundreds of users, recorded 2.5 million replays, and worked incredibly hard to ensure Replay would be fast, secure, and robust from the get go.
Want to check it out? You can download Replay today. Can’t wait to hear what you think!
Interested in learning more, here is our announcement blog post https://medium.com/replay-io/launching-replay-the-time-trave...