Polly.js – Record, replay, and stub HTTP interactions

stephen · on June 14, 2018

Looks well done (it uses unicode art, so it must be amazing) but I have a fundamental distrust/dislike of record/replay frameworks...just seems like you're papering over an inherently bad testing approach.

E.g. sure, when replays work, they're great, but:

a) you have to do a manual recording to create them the first time (which means manually setting up test data in production that is just-right for your test case)

b) you have to manually re-record when they fail (and again means you have manually go back and restore the test data in production that is just-right for your test case...and odds are you weren't the one who originally recorded this test, so good luck guessing exactly what that was).

In both cases, you've accepted that you can't easily setup specific, isolated test data in your upstream system, and are really just doing "slightly faster manual testing".

So, IMO, you should focus on solving the core issue: the uncontrollable upstream system.

Or, if you can't, decouple all of your automated tests from it fully, and just accept that cross-system tests against a datasource you can't control is not a fun/good way to write more than a handful of tests (e.g. ~5 smoke tests are fine, but ~100s of record/replay tests for minute boundary cases sounds terrible).

jasonmit · on June 14, 2018

> you have to do a manual recording to create them the first time (which means manually setting up test data in production that is just-right for your test case)

Your test invokes the the recorder. There isn't anything manual outside of writing & running your test.

> you have to manually re-record when they fail

Again, nothing manual. It would require running your test again with Polly in record mode if you want to "refresh" the recording with a newer set of responses.

> In both cases, you've accepted that you can't easily setup specific, isolated test data in your upstream system, and are really just doing "slightly faster manual testing".

This is by no means a replacement to E2E testing. It is a form of acceptance/integration testing where you're testing your application against a point in time that you verified all systems were talking correctly with your application. E2E tests are much slower, difficult to debug, and intended to capture those breakages in contracts.

It's a tool for your toolbox, reach for it when needed. We plan to release a tutorial/talk to should clear up any misconceptions. There are also other applications for Polly such as building features offline or giving a demo using faker to easily hide any confidential data.

stephen · on June 14, 2018

> It's a tool for your toolbox, reach for it when needed

Sure, apologies for being negative about a tool you've worked on and are rightly proud of. I'm sure you already have more users than any open source project I've ever written. :-)

I struggle a bit at this point in my career, as I've made enough mistakes and seen enough mistakes, that I generally have strong gut opinions on "yeah, that's probably not going to work/scale/etc."

So, when observing new developers/teams starting to "make a mistake" that I've seen before, my gut says "no! bad idea!"...but I know I could be wrong, so it's tempting to say "well, sure, that didn't work for us, but go ahead and try again".

Because, who knows, maybe eventually someone will figure out an innovation that makes a previously-bad approach now tenable, and even best-practice.

But, realistically, that rarely happens, and so teams, orgs, the industry as a whole stumbles around re-making the same mistakes, and codebases/teams/etc. pay the cost.

I've thought a lot about micro-service testing at scale:

http://www.draconianoverlord.com/2018/01/21/microserving-tes...

Basically there are no easy answers, short of some sort of huge, magical, up-front investment in testing infra that only someone like a top-5/top-10 tech company has the eng resources to do.

So, definitely appreciate needing to do "something else" in the mean time. ...record/replay is just not a "something else" I would go with. :-)

stephen · on June 14, 2018

> Again, nothing manual

Yes, sorry for being inexact/overusing the term--I understand the tests drive the recording.

What I meant by manual is getting the e2e system into your test's initial state.

E.g. tests are invariably "world looks like X", "system under test does Y", "world looks like Z".

In record/replay, "world looks like X" is not coded, isolated, documented in your test, and is instead implicit in "whatever the upstream system looked like when I hit record".

Which is almost always "the developer manually clicked around a test account to make it look like X".

This is basically a giant global variable that will change, and come back to haunt you when recordings fail, b/c you have to a) re-divine what "world looks like X" was for this test, and then b) manually restore the upstream system to that state.

If no one has touched the upstream test data for this specific test case, you're good, but when you get into ~10s/100s of test, it's tempting to share test accounts, someone accidentally changes it, or else you're testing mutations and your test explicitly changes it (so need to undo the mutation to re-record), or you wrote the test 2 years ago and the upstream system aged off your data.

All of these lead to manually clicking around to re-setup "world looks like X", so yes, that is what I should have limited the "manual" term to.

ryanbrunner · on June 14, 2018

But in the case we're talking about, where you're reliant on an external service that can change underneath you, "world looks like X" is genuinely not under your control. It feels like pretending that it is will lead to just as many failures as acknowledging it's inherent volatility.

stephen · on June 14, 2018

Agreed! And, to me, record/replay is still pretending like it's controllable, b/c even if you decouple for replays, records will always be a PITA.

My depressing solution is to just not even try to automate tests against the upstream system and instead invest in test builders/DSLs that make mocks/stubs on both sides as pleasant as possible.

And when bugs slip through, make sure to update your stubs/mocks on both sides to prevent the regression.

To me this gets the most agility and reliability, and will be a test suite that developers don't hate 1-2-5 years down the road.

raywu · on June 15, 2018

Can you post the tutorial here? Thanks @jasonmit

mikekchar · on June 14, 2018

I also don't like recording frameworks for TDD (and similarly dislike using fixtures). However, the place where a recording framework really pays for itself is in isolating changes in protocols. I've often had to interface with, as you put it, uncontrollable upstream systems. These are systems that are not mine -- they are upstream services from other companies that I have to interact with and I have no control over. Often these systems are badly built and they play fast and loose with the "protocols".

In these case I like to have an adaptor layer and use a recording framework to "test" the adaptor. That way I can occasionally rerecord my scenarios and be notified if something important has changed. Normally what happens is that my service stops working for some unknown reason. I rerecord the adaptor scenarios and usually the reason pops out very quickly. All the rest of my code is coded against the adaptor and I stub it out in their tests (which I can do reasonably well because I control it).

m12k · on June 14, 2018

I've worked for a while with similar scenarios of needing to integrate with systems beyond my control (e.g. Stripe, Slack, Google), and though I still don't have a good setup for it, I've come to the conclusion that a two-pronged approach would be ideal: Record/and/or stub the calls and responses from the external services so your normal tests are run entirely without external networking (like what Polly allows you to do). But also set up a server to periodically validate the responses from the external services against those that have been recorded, and alert you to any changes in their protocol/behavior. I've yet to see any middleware to tackle the latter (though granted, I haven't been looking too hard yet either)

stephen · on June 14, 2018

> isolating changes in protocols

That makes sense.

Ideally protocols are declarative/documented/typed, e.g. Swagger/GRPC, so you can be more trusting and not need these, but often in REST+JSON that does not happen.

> All the rest of my code is coded against the adaptor

Nice, I like (the majority?) of tests being isolated via that abstraction.

Although, if "the protocol" is basically "all of the JSON API calls my webapp makes to the backend REST services", at that point do you end up adapter scenarios (record/replay recordings) for basically every use case anyway? Or do you limit it to only a few endpoints or only a few primary operations?

mikekchar · on June 14, 2018

Yes, I like to have scenarios for all of the end points. It definitely doesn't reduce the number of tests :-) The advantage is in isolating the protocol from the operation of the application.

The main complaint I've seen for this approach is, "We shouldn't be testing the other person's system". There is wisdom in that advice, but it really depends on how much you depend on the 3rd party service and how much downtime you can tolerate. For example, I'm working in the travel industry right now and we often rely on small services that nobody has ever heard of. If we can't use the service then we can't sell anything and our site is essentially down. If it happens frequently (and with a lot of these travel services, they often break things weekly if not daily), then your site is not viable. In that case I'll exercise the protocol as much as I can. However, we also talk to marketing services, etc. If that breaks, and it takes a day or two to get it back up, then it's not a major problem -- our marketing effort might be a day late, which is unfortunate, but not game breaking. In that case I'll usually have a smoke test or two.

fzaninotto · on June 14, 2018

Also, recording raw HTTP requests makes tests difficult to organize. Especially if what you want to mock are requests to a REST server. In that case, all the recorded HTTP headers aren't significant, editing the recorded resources in the responses when the API changes is a pain, and testing scenarios where several REST requests are related (e.g. fetching posts than comments) is also a pain.

A better alternative IMO is to craft a list of resources in JSON, then use this data in a fake REST server that takes over fetch and XHR in the browser.

Something like:

    { posts: [{ id: 1, title: "foo" }, { id: 2, title: "bar"}], comments: [{ id: 1, post_id: 1, body : "lorem ipsum" }] }

Incidentally, that's the way [FakeRest](https://github.com/marmelab/FakeRest) has been working for years (disclaimer: I'm the author of this OSS package).

ryanbrunner · on June 14, 2018

Insignificant things like HTTP headers and extra fields are insignificant until they're not. In my experience, manually assembling what you expect an HTTP response to look like often leads to bugs when an "irrelevant" detail suddenly becomes relevant, like when status codes change, or fields are added in a way that breaks a client, etc.

I think recording tools can be a sharp tool, and require care, but (as a starting point), if you have an automated library that can generate recorded fixtures in a repeatable, automated fashion, you can eliminate a lot of the pain points while still reaping all of the benefits. That's how we set it up where I am - responses and fixtures are generated as part of a full suite execution, but persist with individual test runs.

mbell · on June 14, 2018

Not sure I understand what you are proposing as an alternative. It seems that you can either test against a 'real' system on the other end, which probably means one project pulling, building and standing up possibly dozens of other services just to run it's test suite. Or, you mock it in some way. I prefer the recording approach as it mocks at the lowest level possible giving you the most test coverage possible.

emodendroket · on June 14, 2018

I worked on a Ruby project with thousands of VCR recordings. Never again.

drb91 · on June 14, 2018

How is this distinct from other http stubbing libraries?

jasonmit · on June 14, 2018

Polly records as well as exposes a stubbing API. So it's quite different from what I've seen of the others.

drb91 · on June 14, 2018

Ahh. Well, there’s quite a history of recording as well as stubbing: see ruby’s VCR.

aspencer8111 · on June 14, 2018

Came here say the exact same thing: "Hey, look! Its VCR for JS. Yay!"

burger_moon · on June 14, 2018

I would love to hear from people involved in projects like this, what kind of work/ how much, was done to get it ready for and approved to be open sourced by the company.

Especially at large corps like Netflix I'm sure there's a lot of hoops to jump through.

lhorie · on June 14, 2018

In my company (uber), it's actually not a whole lot of hoops. Basically a light legal review that checks the license, a code review to ensure there aren't references to closed source software and infrastructure, and approval from the team manager, who is usually already on board with the desire to open source.

captain_perl · on June 14, 2018

I'm sure the first project took some time to setup, but Netflix has released dozens of projects since. So very low hoops.

staticassertion · on June 14, 2018

Netflix seems to also have a really strong culture around this though, so I wouldn't be surprised if it's a lot less hoopy than you'd imagine.

error54 · on June 14, 2018

Not to shamelessly plug but if you're in the Bay Area on June 28, we're giving a talk that's a bit about performance, a bit about Netflix engineering culture: https://jstalks2018.splashthat.com/.

MaxLeiter · on June 14, 2018

If I’m a student can I sign up to attend?

wesleytodd · on June 14, 2018

Sure! (at least I don't see why you wouldn't be able to)

koopa184 · on June 14, 2018

Exactly. They had engineers dedicated to creating an open source latency/fault tolerance library called Hystrix. Not too surprising they’re dedicating resources to other projects, too.

jorisd · on June 14, 2018

This is very cool, solving some issues I no doubt many people have when writing tests against a (fast-)moving target. I'll definitely give it a try in my next project.

I looked through the codebase, and noticed that this uses a custom data format to persist HTTP requests and responses in local storage. I'm not sure if it's technically possible in all circumstances, but I think it might be valuable to have requests and responses be stored as HAR 1.2 [1] when possible, so that the trace can be used by other tools [2] to aid in debugging, verifying and analyzing behaviour as well as perhaps automated creation of load/performance tests.

[1] - http://www.softwareishard.com/blog/har-12-spec/

[2] - e.g. https://toolbox.googleapps.com/apps/har_analyzer/

nitinreddy88 · on June 14, 2018

There is already famous Policy package for .NET with same name

https://github.com/App-vNext/Polly

vaidik · on June 14, 2018

I used to use nock which would work very well in node environments. But this works in the browser as well. So I guess this can be fairly helpful while writing tests post development. If you are doing TDD, then recording/replaying doesn't fit anywhere in the development cycle.

I like the API of this library and the browser support that was missing in nock. So thanks Netflix! Although it would have been nice to see nock add this support. Which is what I wonder - why not just contribute to existing libraries.

pimterry · on June 14, 2018

If you're looking for Nock but not just node, try Mockttp: https://github.com/pimterry/mockttp.

It lets you create & configure mock HTTP servers for JS testing, but with one API that works out of the box in Node and in browsers. This avoids the record/replay model too, so you can still take the TDD approach and define your API behaviour in the test itself.

(Disclaimer: I'm the author of Mockttp)

yawz · on June 14, 2018

I love the name! "Polly" repeats everything... and wants biscuit every now and then :).

silentguy · on June 14, 2018

And this polly would love some cookies too.

yawz · on June 14, 2018

"cookie" sounds even better. I wasn't sure if it was "cracker" or "biscuit" :).

bicubic · on June 14, 2018

Related to that, is there anything that allows to completely save the state of a modern website with all of the fetch requests and websocket related stuff it fired off?

I just want an ability to save and reopen exactly what I'm looking at. There are some cool websites which will eventually go down and I want to preserve an interactive snapshot of them.

Mandatum · on June 14, 2018

This won't work for WebSockets, really websites that use WebSockets require some interaction to generate the transmitted message which is often dependent on the servers response. Private websites, or websites that require a login are hard - but it can be done. Would suggest HTTrack.

reymus · on June 14, 2018

But it's not impossible to have some tool that records all of those interactions to reproduce later. A smart enough tool could record everything since you open the site until you click save. It would not reproduce the functionality that is backend dependent, but iy sure can replicate the dom, etc. Am I missing something?

Mandatum · on June 14, 2018

Yeah - but there's a LOT of variables that come into play for something like that. It'd likely be easier to either record it with something like BugReplay.com or video.

egfx · on June 14, 2018

This is part of a webtop I built called qKast (https://qkast.com) In fact the chrome extension https://chrome.google.com/webstore/detail/qkast/eliofljjghgd... let's you mix and match live components of webpages and make "living" snapshots, further then that though - they're not i-Framed so you can use an assortments of widgets to modify the contents and look of the components as well as broadcast the whole webtop live.

huntie · on June 14, 2018

I haven't used it so I'm not sure that it does everything you want, but take a look at https://webrecorder.io/

isuckatcoding · on June 14, 2018

Unrelated but There are so many things called Polly that it gets confusing

youdontknowtho · on June 14, 2018

Yeah. I was going to bring up the library for .net that provides policy based retries.

swyx · on June 14, 2018

i went to the slackbot that has a cute parrot logo. we at CodingBlocks love our Polly.

xelfer · on June 14, 2018

I thought of Amazon Polly.. converts text to lifelike speech.

biznickman · on June 14, 2018

So VCR gem for javascript. Great! Personally I stopped using VCR gem a while back as it blocks edge cases. However for larger projects where things can get unwieldily this makes a lot of sense. Local test suites should never hit external APIs so it's much better to have mocks/stubs than to have no tests at all.

However on smaller projects I've found that just clicking through to make sure things work and then letting my error reporting system catch bugs to be much more effective :)

It's a hard line to walk and I surely haven't perfected it. I'll give it a shot on a future project!

bookofgreg · on June 14, 2018

I personally replace all stubbed HTTP interactions between my services with contracts with good success. https://docs.pact.io/getting-started

textmode · on June 14, 2018

Alternate URL (no Javascript required):

https://github.com/Netflix/pollyjs

tills13 · on June 14, 2018

a little ironic, no?

mygo · on June 14, 2018

I know this is slightly different, but I wish more people knew about Chrome / Safari / Firefox’s “network” console tab. Great for debugging. Can look at all requests, headers, responses, timespans, etc. Some will even let you copy a given network request as a cURL command, capturing all headers, body, query strings, etc.

dudul · on June 14, 2018

And out of curiosity, what makes you think that people don't know about it? I've never met a web dev who didn't know about it in the past few years.

thijsvandien · on June 14, 2018

New people are introduced to web development every day. Assuming some things are just common knowledge is not very beginner friendly. https://xkcd.com/1053/

ascendantlogic · on June 14, 2018

I'm curious what the application could be for load testing? Tools like locust and gatling are nice but are still synthetic. I'd love to capture X minutes of traffic, then dupe it Y times and replay it as a more accurate representation of traffic patterns for load testing. Is that a thing?

jmckenz · on June 18, 2018

not had a chance to properly try yet, but https://goreplay.org/ does exactly what you are asking. Alternatively, in the container world, tools such as Istio (https://istio.io/) allow traffic shadowing - you can duplicate traffic and route it somewhere else

xab9 · on June 14, 2018

I did something similar, but as an interim proxy (can record, replay, there are modifier hooks, can slow down requests). You have to point towards a backend api and on the frontend you use the proxy url instead of the original. But it's mostly for debugging, so the scope is much more limited.

gnufied · on June 14, 2018

Also check out https://github.com/code-mancers/interceptor which as well uses browser APIs to enable users to mock http responses via a chrome extension.

john838 · on June 14, 2018

How does it hook into the browser APIs? I can't seem to find it. By what black magic would it know how to hook into my puppeteer instance? Or I'm I not understand this?

badrelmers · on June 21, 2018

Why another tshark/tcpdump? all this can be done with a simple script with few lines. Today we need javascript recorders, traffic recorders are a kid game and using a certificate to touch https is a dangerous way (but every project there is doing same). Tshark and sslkeylogfile is the only safe way... but I like this project I don't know why! I feel something.

invisible · on June 14, 2018

This seems like it will be especially useful with apollo-client for graphQL requests.

Osiris · on June 14, 2018

I've used mitmproxy + proxychains to do this. How is Polly different?

jonknee · on June 14, 2018

https://netflix.github.io/pollyjs/#/README

Why Polly?

Keeping fixtures and factories in parity with your APIs can be a time consuming process. Polly alleviates this by recording and maintaining actual server responses without foregoing flexibility.

* Record your test suite's HTTP interactions and replay them during future test runs for fast, deterministic, accurate tests. * Use Polly's client-side server to modify or intercept requests and responses to simulate different application states (e.g. loading, error, etc.).

curun1r · on June 14, 2018

Not sure that tells me how it's different from replay or the other half dozen npm modules that do the same thing. It'd be nice for them to contrast their tool with existing ecosystem options considering some of them are pretty well established.

jasonmit · on June 14, 2018

Can you share which libraries you know that achieve the same thing? I'm happy to go through and respond to the differences.

curun1r · on June 14, 2018

I only have experience with replay, but an npm search turns up:

replay, replayer, http-record, talkback, sepia, mitm-record, fetch-vcr, tape-nock, jest-playback, eight-track, axios-vcr, replayer, node-vcr, mocha-vcr, mockyeah, yakbak, nine-track, dkastner-replay, node-nock

At which point I stopped looking...recording http requests isn't exactly new territory.

simplify · on June 14, 2018

I've personally used node-replay with great success. It has minimal configuration https://github.com/assaf/node-replay

swiezy2 · on June 14, 2018

Interesting but would be more useful with support for streaming

zwentz · on June 14, 2018

On the roadmap, depending on your definition of "streaming" (e.g. buffer streams, websockets).

ksejka · on June 15, 2018

is it a port of https://github.com/App-vNext/Polly ?

incadenza · on June 14, 2018

so this will make the actual http request the first time, then keep a recording? I’m not entirely clear how this works from the docs.

sidcool · on June 14, 2018

What would a use case of this library look like?

ikeboy · on June 14, 2018

So is this basically selenium in javascript with some neat features?

passive · on June 14, 2018

I think this is more of a complement to Selenium, where you can use Selenium to drive the browser to test the UI, with Polly providing recorded back-end responses. I need to look into it more, but this might address a need we have to make it easy (and quick) for our front-end developers to run our test suite locally during development, without having to spin up anything in the backend or rely on flaky non-Production environments.

EDIT: I am aware there are many other tools that can address this, we just haven't had the time yet to implement them. :)

gleenn · on June 14, 2018

Sounds more like the VCR gem from Ruby land.

zwentz · on June 14, 2018

Yes exactly, that's the idea anyway. Has a few nice features on top such as controlling the network latency and expiring recordings (useful when working on a project supported by a big team).

tootie · on June 14, 2018

This isn't selenium. More like wiremock.

ikeboy · on June 14, 2018

What's the core distinction between this/wiremock vs selenium?

tootie · on June 14, 2018

Selenium is for behavior testing. Simulating clicks and form filling. This is for mocking http endpoints.

cottsak · on June 14, 2018

I still don't get why we do this [mocking http endpoints].

Sure this makes the problem of mocking the server less painful. Well done. But I'd take completely integrated tests over these any day. Sure they're slower but that's more or less irrelevant with feature toggling, staged roll-out and continuous production monitoring.

It's totally possible to completely avoid mocking http endpoints thus making these tools completely obsolete.

jasonmit · on June 14, 2018

See my comment above. This is not a replacement for E2E testing.

tootie · on June 14, 2018

It's not always easy. Especially not if your API is stateful.

strken · on June 14, 2018

Or if it's not your API. I've been looking for something like this to make mocking OAuth flow a lot easier.

ikeboy · on June 14, 2018

The example is form filling though:

    await fillIn('email', 'polly@netflix.com');
    await fillIn('password', '@pollyjs');

This is exactly like selenium code I've written to login. I struggle to see the difference in purpose.

stephen · on June 14, 2018

In your selenium code, the browser was talking to a database.

But sometimes that database is down, or really slow.

Polly says "browser, don't talk to the database anymore, instead here's what the database said last time".

So, yes, both Selenium and Polly poke DOM elements, but Selenium stops there, where as Polly does that + as well as tricks the browser into going through the whole test without making a real call to the database (assuming it has a previous recording of "what the database said" for that test).

tootie · on June 14, 2018

That's part of mocha or whatever. Polly is the server part.

dekz · on June 14, 2018

> /* start: pseudo test code */

vaidik · on June 14, 2018

Also, what's up with the in-your-face hiring pitch right in the documentation?

https://netflix.github.io/pollyjs/#/README?id=we39re-hiring

anchpop · on June 14, 2018

I assume that devs read the documentation, and they want to hire devs, and it's their tool, so they put their hiring pitch in their documentation for their tool to try and hire devs