Hacker News new | past | comments | ask | show | jobs | submit login

Hey, this is super cool and interesting! Are you using puppeteer to do this? Might be cool to partner on some of it if you’re looking into that (I run browserless.io).

Best!




Hey!

Thanks for the message. I like browserless!

I don't use puppeteer. I use Chrome DevTools Protocol heavily tho. I started using chrome-remote-interface but hit limits in what it can do with Targets (specifically, flat session mode) and the latest versions of the API. Now I just use the WebSocket directly.

I'd like to partner. Email me cris@dosycorp.com


Hi, is there any reason to avoid using puppeteer? Does it lack something you need when using the devtools protocol? Is it buggy?


Thank you for the great question.

Thinking back to when I started this I initially just wanted to keep everything simple and so I avoided putting in a large and high-level lib like pptr, and went with chrome-remote-interface.

I looked at pptr and IIRC at that time (~ 12 months ago) there was not a clear way for me to handle multiple tabs (a key "real UI" use case). The same goes for Cyrus' lib too.

With Cryus' lower level lib I could hack around that, by doing my own target and session management, but at some point in the last couple months I hit a wall with chrome-remote-interface. Cyrus' lib was not up to date with the latest ToT API (specifically flat session mode) and I worked out I could replace the entirety of chrome-remote-interface with some simple code that sent messages down a WebSocket, saved a Promise (by message id) and returned it, and resolved that promise when it received back a message tagged by corresponding id. It was also simple to write an 'on' function to add listeners for various events. So that was that.

Basically, the DevTools protocol is a well specced, well tested, simple protocol and all these libs (like pptr and chrome-remote-interface) began simply as wrappers around the WebSocket, with an API to map function calls to protocol messages and add listeners for events. PPTR has evolved into much more than that now, and during the same time period, I evolved my own "BG protocol" atop the CDTP (Chrome DevTools Protocol). It became easier to deal with the single source of truth that CDTP is, and get the full expressibility of the latest ToT protocol than deal with the limitations and abstractions of other things built atop that.

Specifically, PPTR did not (and I believe probably still does not, tho I have not deeply checked) an easy way to control and manage multiple tabs. And even if it does, I'd have no use for it, because I already have the code that does all that anyway. Scanning PPTR docs now I see that I prefer the abstractions, naming, etc of the CDTP protocol itself, rather than the ones PPTR provides. Like I said, the CDTP protocol is very comprehensive, consistent and makes a lot of sense, and I know it very well. For me and my use case, it's just a better fit.

The way I think about this is not that "PPTR" has some problem, it's that the "BG protocol" and PPTR (et al) are trying to solve (basically) fundamentally different problems. PPTR (et al) try to provide a clean developer experience for common tasks related to browser use cases (such as automation, getting screenshots, PDFs, testing, etc). That's a particular domain, and not exactly the same as what BG protocol does. BG protocol attempts to provide as realistic and familiar as possible experience of using a browser (when you're actually controlling a remote browser through the CDTP). That's not entirely the same domain, because some things that users want, are not required in automation, and some things that automation does are not required or done by users.

One of the ways I code is by picking the right tool for the job, and if that tool doesn't exist, or no longer works, I build the tool. I want to work with tools that fit right. So for this domain and use case BG protocol is a better fit than PPTR.


Will there be any architectural hurdle in integrating puppeteer with BrowserGap to provide a remote automated workflow?


There's no reason this couldn't happen.

pptr and BG both use Chrome DevTools to communicate with the browser, so there's that commonality.

I haven't looked extensively at the pptr source but I imagine both pptr and BG do some bookkeeping of state related to the sequence of commands (BG certainly does), rather than a purely "stateless" command queue.

For instance, for some things you need to keep track of which session is associated with which target. For other command sequences (such as Runtime.evaluate) you need to know which execution context to evaluate in. And you can keep track of open execution contexts by tracking various Runtime domain events (such as executionContextCreated, executionContextDestroyed, etc).

So to provide a sophisticated level of interaction with the page, some amount of chattiness and state is required for the protocol on top of the DevTools wire protocol.

What I'm getting at is, if there are hurdles, they will likely emerge from the different ways in which pptr and BG handle state and that chattiness to achieve particular user intents.

Also, BG does not require pptr to do automation. It's possible to simply record and replay (again with a further layer of book-keeping that's too involved to get into here), the BG command sequence (which itself is a superset of the DevTools protocol).

All the same, I've often thought about providing a functionality to "export to puppeteer" (or to nightmare, or to phantomjs, etc) in terms of getting a transcript in a widely used format that people can take and run anywhere. That's one thing that excites me about pptr X BG.

In any case, automation is not something that is currently provided in the CE, but it's in the paid version at https://browsergap.xyz


Thank you for the detailed comment. I'll explore BG further.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: