Forking Chrome to turn HTML into SVG

lifthrasiir · on Nov 13, 2022

> What if we could also vectorize 2D <canvas> elements controlled by JavaScript? Turns out, Chromium has this capability built-in for printing:

I'm very surprised to hear this. So printing, either to PDF or to actual printers, may reveal more information about what was drawn to the canvas than normal display, especially if no effort has been made to remove overdrawn paint records. That can have an interesting, if only hypothetical, consequence...

mk_stjames · on Nov 13, 2022

This is exactly the case. I've done conversions before where it was possible to see and extract underlying, hidden elements, that were not visible or even detectable in the rendered webpage in a browser.

This is actually a somewhat common method when it comes to a bit of corporate sleuthing.. anytime you see a pretty website with vector-y graphics, maybe engineering-drawing representations.. if the data hasn't been stripped completely or redrawn you can extract information that otherwise people would assume unknowable.

In a recent example... I did this on a startup company's page involving a product where they had a CAD-like side view drawing of one of their products... but the base file (in this case it was an SVG) driving the page actually contained multiple hidden views of the same product and other products and at the 'real' precision of what likely was a DXF export from a CAD program, given to the web team. This allowed a critical dimension of an unannounced product to be precisely determined (to three significant figures) which was a spec that had not been publicly released...

LoganDark · on Nov 15, 2022

> In a recent example... I did this on a startup company's page involving a product where they had a CAD-like side view drawing of one of their products... but the base file (in this case it was an SVG) driving the page actually contained multiple hidden views of the same product and other products and at the 'real' precision of what likely was a DXF export from a CAD program, given to the web team. This allowed a critical dimension of an unannounced product to be precisely determined (to three significant figures) which was a spec that had not been publicly released...

This is actually really interesting. Do you get to do this often?

eyelidlessness · on Nov 13, 2022

> > What if we could also vectorize 2D <canvas> elements controlled by JavaScript? […]

> I'm very surprised to hear this. So printing, either to PDF or to actual printers, may reveal more information about what was drawn to the canvas than normal display, especially if no effort has been made to remove overdrawn paint records. That can have an interesting, if only hypothetical, consequence...

The canvas API is all imperative code, so you might think it’s fairly opaque. That’s what I thought anyway, until recently I hacked on a someone’s generated art demo, mostly to glean insights into the algorithms used. As I looked through the actual canvas-specific code, it struck me that 1) it’s exceedingly statically analyzable and 2) that the imperative APIs could trivially translate to an incremental SVG rendering, because their primitives are nearly identical apart from the imperative/declarative distinction.

Mentioning this mainly because if there’s anything interesting to learn about a particular usage of canvas, it would probably not be a huge investment to learn it. Either by static analysis or by rendering canvas calls incrementally to SVG, anything overdrawn or obscured is sitting right there to inspect without any special browser-internal faculties.

andybak · on Nov 13, 2022

Being fair, if anyone is sending anything to the client and assumes it's not visible then they are fair game.

I just hope the code around password entry fields is carefully audited. That's all on the client.

version_five · on Nov 13, 2022

There's a story from years ago (I couldn't find it) about some government or legal documents having info redacted, but whoever did it just used some pdf editing tool to draw black boxes over the redacted parts, so all the info was still in the pdfs.

Edit: I found this but I'm not sure it's the one I'm remembering: https://www.techdirt.com/2014/01/28/new-york-times-suffers-r...

karlshea · on Nov 14, 2022

One of the Apple v Samsung (?) cases had court docs released with some App Store numbers redacted but it was just black squares over the actual numbers, just selecting the sentence and copy/pasting it gave it to you.

perth · on Nov 13, 2022

This happened with the Maxwell redacted court docs

lliamander · on Nov 14, 2022

Do you have a link to the unrelated docs?

lliamander · on Nov 14, 2022

*unredacted

Stupid autocorrect

kevincox · on Nov 13, 2022

Yes, but there is more than that there. What if I as the client try to print a page or export to PDF. I think that there is nothing sensitive visible on the page so I share the result. It turns out that there was actually sensitive info in the canvas that was not visible due to something like overdraw.

As a simple example imagine that an image is drawn to the canvas and then blacked out. You wouldn't expect that the saved PDF may contain those as separate layers.

Of course this highlights an existing issue with complex formats. You need to be very careful before sharing complex documents.

rkagerer · on Nov 14, 2022

I've seen big-firm lawyers make the same mistake more often than you'd think - a pdf "redaction" that's just a black box under which you can simply highlight and copy the obscured text. Oops.

aidos · on Nov 13, 2022

Totally. We see architectural drawings that go through a number of revisions and it’s not uncommon for designers to simply cover a whole section with a white box and then draw on top of it.

Also within PDFs (and svgs) you normally clip the area you’re going to draw into to bound it (sort of like overflow:hidden) and anything outside of that doesn’t display, but it’s still there and accessible.

I marvel more at the fact that software is capable of figuring out all the occlusions so you can print the stuff on a plotter. Cad drawings have up to 2M individual vectors in them. Its impressive that it works at all to be honest.

vbezhenar · on Nov 14, 2022

I don't really understand this paragraph. I just created test.html:

    <!DOCTYPE html>

    <canvas id="tutorial" width="150" height="150"></canvas>

    <script>
      var c = document.getElementById("tutorial");
      var ctx = c.getContext("2d");
      ctx.beginPath();
      ctx.moveTo(75, 50);
      ctx.lineTo(100, 75);
      ctx.lineTo(100, 25);
      ctx.fill();
    </script>

and used Chrome to pdf print it. Then I opened PDF in a Sumatra PDF Reader, zoomed in and it's obviously not vectorized.

lifthrasiir · on Nov 14, 2022

That's a good point, and it took me a bit of trials to force Chrome to print canvas in vector. The article links to the MDN <canvas> demo [1], which doesn't print in vector if you try it. But you can run the following in the console and then try to print the same page, which will be vectorized:

    window.onbeforeprint = () => {
        window.canvas.width |= 0; // clear the canvas
        draw();
    };

So I guess, for now, this behavior only manifests when you use the beforeprint event.

[1] https://yari-demos.prod.mdn.mozit.cloud/en-US/docs/Web/API/C...

vbezhenar · on Nov 14, 2022

Thanks a lot! Just a little typo: it should be `onbeforeprint` I guess. I just checked and it works. That approach has a potential to solve important problem for me.

lifthrasiir · on Nov 14, 2022

Oh, I had a trouble in spelling "print" vs. "paint" ;-) (Fixed.) I now wonder why Blink specifically supports this, presumably to support some weird websites?

afiori · on Nov 14, 2022

I read it as "convert to SVG" rather than "reconstruct vector data"

tyingq · on Nov 13, 2022

Pretty sure canvas.toDataURL() is or was a fingerprinting method.

piskerpan · on Nov 14, 2022

Kinda unrelated. That API exports a flattened PNG

steren · on Nov 13, 2022

Much cleaner than my hack of Chrome -> PDF -> Inkscape -> SVG

https://labs.steren.fr/2020/05/08/screenshot-as-svg/

cjr · on Nov 13, 2022

ha, I'm also guilty of using this method on https://urlbox.io to power our SVG screenshots.

To be honest, it works quite well, but there are quite a few bugs in chromium's pdf rendering, especially when it comes to determining the correct page width to apply media queries to, which sometimes affects the accuracy of these SVG's.

sings · on Nov 13, 2022

Me too.

I have found printing to PDF from Safari instead of Chrome yields better results when I have gone through the same process. It probably depends on the source material though.

If I remember correctly, the text was split into separate objects by Chrome to reproduce kerning offsets.

aidos · on Nov 13, 2022

I’ve been down a bit of this rabbit hole before. We work with PDFs, svgs, fonts and chromium too. While I don’t have any need for this tool itself, I’d highly recommend flicking through this article as a nice overview of the graphics / font pipeline.

mk_stjames · on Nov 13, 2022

I've done this for a project long ago, incredibly lazily, by using chrome/chromium to PDF and piping to a PDF to SVG tool. There are a few PDF to SVG pathways, I remember it using Cairo and the whole thing was quick and consistent.

crazygringo · on Nov 13, 2022

That was my first thought as well.

I'm genuinely curious if there are any advantages in Chrome->SVG as opposed to Chrome->PDF->SVG.

Are there any graphical effects (e.g. produced by CSS, like blurry text shadows or something) that PDF can't render without falling back to bitmap but SVG can?

Or is there other data that SVG usefully preserves that PDF discards, such as actual source text strings used for text? (As opposed to PDF where getting text out, e.g. when copying to clipboard, usually involves a lot of ugly "reverse engineering".)

aidos · on Nov 13, 2022

Neither pdf or svg do text layout. They’re both pretty similar really, though the pdf spec is really deep and broad to cater for a million things.

My advice to everyone re pdfs is to crack them open by running `mutool clean -d file.pdf` And opening in a text editor. They’re just a tree (well, graph, I guess) of obvious objects.

Ps: mutool convert does a good job of converting from pdf to svg in a fairly faithful way.

DrewADesign · on Nov 13, 2022

Do you mean there's no dynamic text layout? Svg and pdf have perfect text placement capability, but I've never even looked to see if it supports defining broadly applicable rules for text presentation.

aidos · on Nov 13, 2022

I mean there’s no layout engine to do things like wrapping and line height etc. Everything is explicitly positioned.

PDF seems a bit more bonkers because you render text as strings of glyphs and the conversion back to text is an afterthought. There’s a ToUnicode map that says “glyph 8 in the embedded font is an ‘X’” but that’s there for copy pasting / searching - not for rendering. PDFs are built to render glyphs at positions.

Edit: to go full meta, there are Type3 fonts where each glyph itself is defined as a PDF graphics stream. Which actually leads you in to what’s inside a font. Guess what? lots of them look just like PDFs inside, because the glyphs are defined in postscript. Fonts are PDFs kinda grew up together, and once you start digging into them the similarities are striking.

DrewADesign · on Nov 16, 2022

Yeah it's definitely a presentation format, but probably the most useful one there is for many. As someone who does print design, screen design, and development work, I'm glad PDF is what it is. Absolute subpixel-level perfection, support for raster and vector visuals, embedded fonts, multiple industrial color encoding formats, and other professional design features are essential for designers. Adding generalized formatting rules for dynamic layouts and content would needlessly complicate an already useful tool to adress use cases satisfied in other formats. That people, Adobe included, use it as a general document format is a problem with usage and product management, not the format. It pisses me off when it's used as the default export format unless it's sole purpose is for printing. I've had to write web apps that dynamically generated pdfs: it was a pretty miserable pierces that yielded mostly lackluster results. Glad to see Adobe has moved away from their cockamamie pdf interactive features that contributed little more to users than anxiety about security holes.

Epub should be the defacto standard when the included data remains important.

ccouzens · on Nov 13, 2022

If you print to PDF, you'll have the page's print css applied. And it is probably paginated.

If you go direct to SVG the capture will use the screen css and not be paginated.

mk_stjames · on Nov 13, 2022

Yes, this. In the project I was doing, using chromium as a command-line interface I remember having options to do the pagination to a custom resolution, which I used to define a render 'window' as if the browser screen was on something like a 1600x18000 monitor. so I had the entire webpage displayed like a full scroll without page breaking like it would have if you just printed a PDF from Chrome. And this allowed me to then extract this giant full length vector graphics result of diagrams and text into a single SVG that was perfectly spaced and rendered in the aspect ratio I wanted.

cjr · on Nov 13, 2022

It's also possible to emulate screen media queries[0] so that the pdf output uses the regular screen css.

[0] https://chromedevtools.github.io/devtools-protocol/tot/Emula...

femto113 · on Nov 13, 2022

I think the path is more clearly thought of as HTML+CSS -> display list -> *. The display list is some abstract definition of what needs to be drawn by a renderer. In theory anything that fully describes all possible operations works, including bespoke things like SkPicture or general purpose graphical languages like SVG or PostScript. In practice there's never a single language that can describe everything, because display capabilities evolve and new operations are added all the time (e.g. advanced typography features for fonts). PDF can cover a really broad set of use cases, but it also wasn't designed as an intermediate format (it was closely tied to the PDF reader) so it's easier to get into than out of. SVG is possibly a better candidate, as it is already used effectively as an intermediate representation (e.g. D3.js "renders" to SVG).

hedora · on Nov 13, 2022

I'd love to see some sort of caching proxy that did this for news stories, etc.

Basically, convert everything to an archival format, then I'll browse the archive instead of whatever adversarial server side / javascript junk the site is serving.

ccouzens · on Nov 13, 2022

If you could proxy the page to SVG without Javascript, couldn't you also proxy the page to HTML without Javascript?

Either way, you'd probably want your proxy to wait to for any onload Javascript to run before snapshotting the page.

bawolff · on Nov 13, 2022

Well both pdf and svg support javascript (albeit pdf is extremely limited)

pornel · on Nov 13, 2022

It'd be wonderful if this could be integrated into the browser and the OS to provide SVG screenshots.

ajkjk · on Nov 14, 2022

My thought as well. Would be awesome -- especially cause it'd be easy to edit them afterwards to, say, show small changes.

hutzlibu · on Nov 14, 2022

Yeah, I suppose the consequence will be, less trust in screenshots generally as it is easier to edit them.

(even though they were never secure, they are used for proof of various things)

pornel · on Nov 14, 2022

DevTools makes it even easier to edit pages convincingly, because edited text will properly wrap and affect layout. SVG freezes layout so you can't make big changes.

vbezhenar · on Nov 14, 2022

Does anyone have any good suggestions as to which approach to take if I want to produce PDF with vector graphics inside?

Right now our pipeline looks like this:

1. Code generates SVG. It contains quite a lot of elements and takes something like 250 KB unzipped or 20 KB zipped.

2. SVG is converted to big PNG using resvg. Now this SVG is something like 1.5MB unoptimized. We further use pngquant to shrink it to something like 500KB.

3. We use HTML templates to produce a document with embedded PNG image.

4. Now we use HTML to PDF (could be Chromium, but we use some Java library) to produce PDF document and this is the end result.

Good thing about this pipeline is that SVG and HTML are somewhat easy to understand and modify.

This PDF document obviously contains raster and not vector image, so it does not look good when zoomed and it takes more space that I'd like.

What I want to implement: keep HTML part (because that's kind of report and it should be changed if necessary) but embed image with some kind of vector approach, so PDF would contain vectorized image.

I tried to just embed SVG as it is. Well, it kind of worked. But Chrome printed out enormous pdf. Something like 100 MB I think. My computer almost choked trying to render it. But it was vector, yes.

I've given up on this task as I didn't find any easy approach and generating PostScript for the whole document seems not appropriate. I think I could generate PostScript file for the image, but how do I embed PostScript file into HTML so PDF print would use it as it is?

jahewson · on Nov 14, 2022

That’s pretty crazy. PDF is a vector format so rendering SVG to a bitmap to insert into it is like using email to send a photo of a fax. Seems like you’re already aware of that though!

There’s many libraries out there that can convert vector SVG directly to vector PDF. Sometimes it’s worth just paying the license for, say pdftron. Alternatively if you are generating the would-be SVG content in JS you could directly generate PDF instead of SVG with http://pdfkit.org/

Chrome usually does a good job rendering SVG directly to PDF. It’s overkill of course, but if you’re already used to using it then it’s the path of least resistance. If you’re getting giant PDFs output then I suspect there’s something weird with your input SVG. It might contain some pathological content (huge embedded images? paths with large amounts of almost invisible details? fonts that aren’t embedded sensibly? e.g. vectors of every glyph)

vbezhenar · on Nov 14, 2022

No, that image is pretty primitive. Lots of lines, lots of dots (tiny circles), some polylines, few text labels. Nothing fancy at all.

I dont really want to turn entire document into svg. There’s some text, some tables, things that fit perfectly for HTML and not so much for SVG.

Right now I’m thinking about using GNU groff. It would be a very different approach.

jahewson · on Nov 14, 2022

That’s understandable. Maybe try commenting out all the generating code and adding it back piece by piece until the document size balloons, then you’ve found your culprit.

The MuPDF ‘mutool’ is pretty handy for dumping internal PDF content to see which parts are large.

jefftk · on Nov 14, 2022

I've never used it, but maybe https://github.com/yWorks/svg2pdf.js does what you want?

amichal · on Nov 14, 2022

Commercial solution but princexml is really good at this. Its really the kind of tool you want if your "generate pdf" functionality needs any advanced pdf functionalty

ilaksh · on Nov 14, 2022

https://github.com/alafr/SVG-to-PDFKit (svg-to-pdfkit)

MayeulC · on Nov 14, 2022

Why not use inkscape? You can also use something like `scour` on your svg beforehand.

bscphil · on Nov 13, 2022

Semi-related, there's this browser extension that somehow manages to mangle HTML into SVG with pretty good accuracy. https://addons.mozilla.org/en-US/firefox/addon/svg-screensho...

I do stuff like this (vector representations of the DOM) for taking screenshots. Why?

1. High resolution screenshots are great when you're sharing from a low resolution device, or when you need to scale them up. I've seen enough crappy screenshots of Twitter in YouTube videos to last me the rest of my life.

2. If your device does sub-pixel anti-aliasing, then your screenshots all have noticeable color fringing around their text. The text rendering is done well before the data hits the buffer that the screenshot is capturing. A fun party trick is to identify someone's OS based purely on a screenshot of some text on a webpage.

3. On Linux (and maybe elsewhere, IDK), color correction (e.g. gamut mapping) is done (in X11) before the pixels get to the buffer that you capture. So with most screenshot tools, you end up capturing a bunch of distorted colors which you then have to map back to sRGB if you want them to look right in color calibrated software.

You can frequently get away with printing a PDF and then rendering that out to a large PNG. In some cases, though, figuring out how to set the page size to match what you seen on the screen can be near-impossible, and more importantly in Firefox there's no way to disable print media CSS when printing a PDF. (You can do this in Chromium.) If you need to edit the image afterwards or want to put it on a website or something, this is far easier to do with the SVG format than with PDF.

cjr · on Nov 13, 2022

> figuring out how to set the page size to match what you see on the screen can be near-impossible

I run a little API that converts URLs and HTML into PNG/PDF/SVG.. MP4 too, and the quoted part resonates :)

I recently started delving into the chromium src code in order to try and figure out the reason why max-width media queries don't seem to trigger at the expected viewport/page width, when printing to PDF, but it is quite the rabbit hole.

When I saw html2svg the first thing I wondered was whether it would have the same issue as printing to PDF.

graderjs · on Nov 14, 2022

Very cool. What are the size of these screenshots and how long do they take to produce?

I’m wondering about them as alternatives to frame capture for remote browsers isolation to save bandwidth.[0]

Also related, the Chrome Debugging Protocol exposes a similar bit of info in the LayerTree domain: you can actually get the canvas draw instructions to render a webpage on a canvas.[1]

[1] https://chromedevtools.github.io/devtools-protocol/tot/Layer...

[0] https://chromedevtools.github.io/devtools-protocol/tot/Layer...

fathyb · on Nov 16, 2022

I didn't know about the devtools protocol snapshots, pretty cool! I takes ~5 ms to generate an SVG for google.com (not including loading the page).

For your use-case I'd recommend doing something similar but using the SkPicture structure instead. It would cut the conversion overhead, should support every Skia features, and with some tuning it would allow you to efficiently split bitmaps from vectors (allowing you to send the bitmaps once, and only vector changes after that).

Something I like about using Skia for this use-case is that it allows for zero-latency scrolling.

graderjs · on Nov 14, 2022

Oops 0 was supposed to be

https://github.com/crisdosyago/BrowserBox

SigmundA · on Nov 13, 2022

Reminds me of https://github.com/gliffy/canvas2svg at a different level of abstraction.

I believe PDF.js incorporated some form canvas2svg to try and get a SVG backend working which would allow high resolution printing to PDF but not sure where that's at. I believe printing through PDF.js is blurry due to memory constraints since with normal canvas pdf pages just end up as bitmaps sent to the printer.

SVG ends up staying vector through Chromiums print pipeline resulting in much less memory usage while having much higher dpi final output. I would imagine this is due to SVG being turned into Skia drawing commands that end up as PDF that then gets printed through PDFium?

jancsika · on Nov 13, 2022

> Recently, an experimental SVG back-end has been added to Skia.

That's curious.

Anyone know why?

return_to_monke · on Nov 13, 2022

While I am not a skia person, an use case I could imagine is (flutter) web apps.

Flutter currently has 2 ways to run something on the web: 1. CanvasKit. Primarily, this uses webgl. Though, the app has to download a kind of webGl runtime on the first launch, iirc. If the browser does not support openGl, it will use Skia with a Canvas frontend, leading to blurry and poor performance results 2. webRender. This is flutter's way of trying to make a HTML DOM, but its not that great either. It's inconsistent with the rest of the flutter implementations, and has performance issues because it's not really mature/optimized and has a virtual Dom.

I think an exciting use case would be something like 1. Instead of the blurry image and bad performance of canvas redrawing, it might try to manipulate an SVG in the browser. This is pure speculation tho, correct me if I'm wrong.

TheRealPomax · on Nov 13, 2022

Calling it "recently" is a bit of a misnomer. The "experimental/svg/model/..." content was added almost five years ago.

fathyb · on Nov 14, 2022

Ah I assumed it was recent, but didn't check. It's fixed now, thanks!

montag · on Nov 14, 2022

Brilliant explanation and walkthrough of some problems encountered along the way. Thank you for demystifying Blink/Skia a little bit.

simpleintheory · on Nov 13, 2022

Interesting. Wonder how easily it would be to generalise this—turn into an API that gives out some image data that could be in turn converted to PDF, SVG, PNG, you name it… though not sure how the data would be structured though

fathyb · on Nov 13, 2022

I had a lot of people reach out this week-end for PDF support, so I'm planning on implementing it with PNG support this week. Thanks to Skia, it should just require a few lines of code.

justinclift · on Nov 13, 2022

PDF or PNG? It's not clear from your comment. :)

fathyb · on Nov 13, 2022

Hopefully both! More precisely PDF, and bitmap with an option to encode as JPEG, PNG or WebP. I'll have to find a new name..

codethief · on Nov 14, 2022

PDF support would be fantastic! I've tried every FOSS html2pdf converter under the sun and they're all either buggy or outdated (based on very old browser engines) or both.

How do I best follow your project and any upcoming announcements?

fathyb · on Nov 14, 2022

I've opened an issue to track PDF support [0], you can subscribe to get updates for this specific feature. For general project updates you can use the "Watch" button on the repository to get commit updates on your feed.

[0] https://github.com/fathyb/html2svg/issues/3

imhoguy · on Nov 13, 2022

You can make PDF or PNG from SVG.

metayrnc · on Nov 13, 2022

Can someone give some example usecases? I am curious as to how this is used. Thank you.

yvoschaap · on Nov 13, 2022

I tried something similar like this to render thumbnail of websites (at a very small file-size). E.g. https://twitter.com/yvoschaap/status/1446397003316047872

marginalia_nu · on Nov 13, 2022

How small are you getting them? I'm straight up screenshotting websites (e.g. https://search.marginalia.nu/screenshot/245804). Seem to come in at on average 17 Kb, based on a sample size of 550K screenshots.

fy20 · on Nov 14, 2022

That's pretty cool! How do you go about removing the details? Is it done in JavaScript before the screenshot is taken, or by post-processing the SVG?

mrkramer · on Nov 13, 2022

Isn't this something like archive.ph is doing? Snapshotting and screenshotting websites. I'm referring both to you and the op.

dj_gitmo · on Nov 13, 2022

https://archive.ph/1NNZr

That looks like a Web Archive (WARC) and a PNG screen shot. I think you can make a screenshot with CasperJS. The WARC can be created by wget.

mrkramer · on Nov 13, 2022

Yea you are right about PNG but wrong about WARC. Archive.ph doesn't use WARC.

dredmorbius · on Nov 13, 2022

What does it use, if you know?

Source?

mrkramer · on Nov 13, 2022

Their FAQ says: https://archive.ph/faq#:~:text=Which%20parts%20of,of%201024x....

Wikipedia says: https://en.wikipedia.org/wiki/Archive.today#:~:text=Web%20pa....

So I assumed they don't use it but idk for sure.

dredmorbius · on Nov 15, 2022

Thanks.

codetrotter · on Nov 13, 2022

That is super neat! Did you end up having any users/customers?

danielvaughn · on Nov 13, 2022

IMO the use case is limited but interesting. The most obvious would be product screenshots for landing pages, although typically design tools handle that well enough.

I'm currently building a web-app for building web pages, and I'd love for the user to be able to view a thumbnail gallery of all the pages they've built. This tool would allow me to build a zooming feature pretty easily.

Outside of those two, I'd imagine the use cases are fairly limited.

btown · on Nov 13, 2022

The thing about having access to the Skia render graph is that all of a sudden you're no longer limited to product screenshots and screen recordings. Imagine a pipeline where you can export someone's interaction session with a site, pixel-perfect, into DaVinci Resolve or Blender or Unity as a fully annotated DOM-advised render node hierarchy, with consistent node identities over time, of every rendered element on the page as it changes across frames. That's way more powerful than just pixels.

Imagine flying through your site in 3D (or even VR) with full control over timing, being able to explode and un-explode your DOM elements as they transition into being - the type of thing that only Apple would do for their WWDC demos with dedicated visualization teams.

The start is to be able to see the rendering engine as a generator for not just raster data over time, but vector data over time. Of course, there's a lot of work to do from there, but this is the core leap.

TeMPOraL · on Nov 13, 2022

Cynical take: it won't hold. It'll get neutered by vendors and eventually purposefully removed as a possibility by Google.

Here's the thing: this "core leap" you mention isn't new. It's been made long before, on input side: that's what HTML is. All those use cases you mention should be possible, but aren't. Why? Because the for-profit web doesn't want that.

Most websites on the Internet today exist not to be useful, but to use you. For that, it's most important that the website author has maximum control over what the users see. This allows them to effectively place ads, do A/B tests for maximum manipulation (er, "engagement"), force a specific experience on you, expose you to right upsells in the right places. If they could get away with serving you clickable JPEGs, they would absolutely do that. Alas, HTML + JS + CSS is the industry standard, the all things considered cheapest option - so most vendors instead just resort to going out of their way[0] to force their sites to render in specific ways, and supplement it with anti-adblock scripts/nags, randomizing DOM properties, pushing mobile apps, etc.

To be fair, they do have some point. Look at this very thread: currently, the top comments talk about using Chrome -> SVG pipe to access accidentally leaked commercial and government data, such as hidden layers in product CAD drawings, or improperly censored text[1]. Your own example, "export someone's interaction session with a site, pixel-perfect, into DaVinci Resolve or Blender or Unity" is going to be mostly used adversarially (e.g. by competitors). My own immediate application would be removal of ads, which presumably have distinct pattern in such rendering, as they get inserted at a different stage of the render pipeline than the content itself.

This is just the usual battle over control of the UX of a website. The vendor wants to wear me down with their obnoxious, bullshit UX[2], serve me ads, and use DRM to force me to play by their rules. I want my browser to be my user agent. We can't have it both ways[3], and unfortunately, Google is on the side of money.

--

[0] - Well, to be fair, a lot of this is done by default by frameworks, or encoded in webdev "best practices".

[1] - Obligatory reminder: the only fool-proof way of publishing partially censored documents or images is to censor them digitally, then print out, scan back, and distribute the scan. If you don't go through analog, you risk accidentally leaking censored information or relevant metadata.

[2] - Like e.g. every single e-commerce platform. The vendor hopes I'll get tired and make a suboptimal choice. I want to pull vendor's data into a database and run SQL queries on it, so I can make near-optimal purchase decisions in fraction of the time. This "core leap" you mention would be a big win for me, which is why it won't last.

[3] - At this point, accessibility is the only thing that's keeping websites somewhat sane. There's plenty of apologists for all the underhanded and malicious techniques that are core to webdev these days - but they can't usually dismiss the complaint that the website is not usable on a screen reader. For some sites, it would be illegal to do so.

btown · on Nov 14, 2022

I feel like we're talking at cross purposes. My point was primarily that OP's Chromium patch could be the start of an excellent tool for website creators to unlock their own sites' and web applications' rendering potential, and level the playing field between smaller startups and much larger technology companies.

I'm quite familiar with anti-scraping and anti-ad-blocking countermeasures, and the first thing any such tool would block is a non-standard rendering engine like this - so unless the website creator consents, this really doesn't hurt or help consumer-friendliness (which, I agree, is in a sorry state these days) in any meaningful way.

Humphrey · on Nov 14, 2022

I could imagine this been used, once it's more mature, as a Weasy Print (html to PDF) replacement. (assuming it ends up supporting PDF's or adding a different svg2pdf convertor)

There are countless use cases where a system needs to generate something printable, such as a PDF report, ticket, gift certificate, etc. And generating something that looks great from a system that is good at generating html can be a challenge. Being able to convert a rendered html page to a vector graphic opens up a lot of options.

I already have an AWS Lambda function running Weasy Print for when I need simple PDF generated from a webapp. It'd be great to be able to switch to something like this that does a better job at rendering HTML, and therefore having the option to make more beautiful PDF's.

somishere · on Nov 14, 2022

I have a "generator" site[1] where people are able create/share a "fairly" unique image (billions of potential combos) rendered in the dom. There's no point pre-rendering every possible version of a all-but unique social image, so instead I generate them on the fly as they are requested (i.e someone shares one).

In my case I actually render them on the client using dom-to-image[2] and then store/cache them using R2. This is basically to save on server costs (it's hosted on free CF pages with a worker). A more secure implementation (with a compute budget) might use a headless browser like this server side to render the image.

[1] perfectpollie.au

[2] npmjs.com/dom-to-image

GranPC · on Nov 13, 2022

I do something similar - but using the Print command and converting the PDF to SVG - to import websites into Blender for flashy animations. This allows me to neatly animate things in/out, and zoom into details without pixelation.

commotionfever · on Nov 13, 2022

if it works how i think it does, this could be really nice to cook up some infographics in a css framework like tailwind. then make some svgs for a github readme

for example i made this one[1] with tailwind but i just ended up taking a png screenshot

[1] https://github.com/sentriz/socr/blob/master/.github/socr.png

Scalene2 · on Nov 13, 2022

Great for screenshots to render in a video.

convolvatron · on Nov 13, 2022

if we can reduce the size of the basis footprint for a browser implementation, we can more easily produce new browsers (i.e by implementing a fully general Path, and font rendering)

mrkramer · on Nov 13, 2022

That would be cool if actually converting HTML to SVG would save you bandwidth and all the rest that goes with web requests. Imagine a web browser that only supports SVG and converts all HTML to SVG then when browsing the web you would only look at screenshots of websites and webpages. This would be something like read-only browser. It is already possible[0] tho but it is not enabled by default on Chrome nor it is exclusive feature.

[0] https://frankgroeneveld.nl/2021/08/24/most-underused-browser...

transitivebs · on Nov 14, 2022

This is a really interesting top down approach, starting with Chromium and pairing a minimal bit out of it.

https://github.com/vercel/satori is another interesting recent project that goes bottoms up to achieve something similar. Def a lot smaller aimed at edge rendering, but at the tradeoff of significantly more random compatibility / rendering issues.

Both approaches have their tradeoffs.

Rauchg · on Nov 14, 2022

Broadly in agreement, except support is not “random”. It’s deterministic, easily versioned and well tested.

We are improving the DX further so that the feedback loop of what features are supported is nicer, with advice on what to do when some CSS property is unimplemented.

Fnoord · on Nov 14, 2022

The page does not explain the why (unless I missed it). What is the benefit?

alexchantastic · on Nov 14, 2022

Vercel put out https://github.com/vercel/satori which does similar things (albeit with JSX and a slightly different use case).

Rauchg · on Nov 14, 2022

JSX is optional. It’s just a way of representing the nested object hierarchy that happens to be convenient and widely supported (TypeScript has built-in support)

password4321 · on Nov 14, 2022

I was hoping someone made it possible to pre-load one instance of Chrome, then fork at the OS level before each render.

I guess doing too much in a shared context is a security issue.

slim · on Nov 14, 2022

Hey Fathy, amazing job. This is off topic, but your spelling of your name in arabic is weired :) Maybe you are not a native arabic speaker, so here it is with the most common spelling : فتحي

But I suspect that spelling is on purpose, because it uses the root of the word. If that's the case and there is a story you want to tell, I want to hear it :)

fathyb · on Nov 14, 2022

Thanks Slim! I'm indeed using the root of the word on purpose to ensure it splits into 5 characters, mapping 1:1 with the latin version while still showing the glyph substitution. I used the Janna LT font because it renders root words really well. I figured that most readers don't read arabic so it should get the point across haha