Hacker News new | past | comments | ask | show | jobs | submit login
Why are 2D vector graphics so much harder than 3D? (2019) (mecheye.net)
135 points by thesephist on April 3, 2022 | hide | past | favorite | 71 comments



Actually we have the same problems and challenges in 3d as well. At the end 3d is result of a 2d image rendered with a projection matrix and others. Stroking is difficult, for example. You need to combine little triangles to construct a line with miters in 3d, etc.

In my opinion, 3d graphics are harder than 2d for sure. WebGL API is dirty. There's a Webgl, Webgl2, and we're going to have another one: WebGPU, and hopefully it will make 3d easier. In 2d we just have 2d drawing context.


Line miters have no valid interpretation other than a view-dependent one -- the PostScript spec says that miter limit is based on curve angle, something which only makes sense in a flat, 2D perspective. Freya Holmér came up with a set of heuristics that work well enough for Shapes (I helped debug the math a bit there), but you first have to figure out what miter means in a 3 dimensional world, and it's not easy.

But yes, you are generally correct. Maybe if I was writing this post today, I'd give it the title "why shapes are so much harder than textures".

I also disagree that WebGPU is a worse spec than WebGL 1 and 2. I basically implement a WebGPU-like wrapper for WebGL today in my own WebGL projects, and it's made my life much easier, not harder. Though, full disclaimer, I am a contributor to the WebGPU spec.


I wondered what you meant by Shapes, so I googled "Freya Holmér Shapes" and found this, which is awesome -- thanks for the tip!

Shapes: A real-time vector graphics library for Unity by Freya Holmér:

https://acegikmo.com/shapes

>It has bothered me for a long time that for some reason, html5 canvas is better at drawing primitives than Unity.

I know, right?!?

I was quite frustrated by the fact that it was so much easier to draw 2d stuff with html5 canvas than with Unity, that I integrated a web browser with Unity and used it to do 2d data driven drawing (graphing, data visualization, text, and ui widgets) with JavaScript / canvas / d3 / etc, and then blitted the textures into Unity to display in UI overlays and 3d textures.

One advantage to that approach is that you can use any of the zillions of excellent well-supported JavaScript libraries like d3 to do the 2d drawing.

And I wanted to write as much of my Unity application in JavaScript as possible anyway, so it was easier to debug and (orders of magnitude) quicker to iterate changes (and possible to live-code).

UnityJS description and discussion:

https://news.ycombinator.com/item?id=22689040

Shapes looks really wonderful, and would have been perfect for some of my Unity projects, so I'll check it out and consider using it in the future!


>I also disagree that WebGPU is a worse spec than WebGL 1 and 2. I basically implement a WebGPU-like wrapper for WebGL today in my own WebGL projects, and it's made my life much easier, not harder. Though, full disclaimer, I am a contributor to the WebGPU spec.

He hopes WebGPU would make things easier. He didn't say WebGPU is worse than WebGL.


>> WebGPU... hopefully... will make 3d easier.

That is explicitly not a goal of the WebGPU working group. They have acknowledged that WebGPU will make 3D graphics programming in the browser significantly harder. But it will make supporting the 3D graphics API in the browser easier for the browser developers.

So basically, because Apple took over a decade to finally unsqueeze their tight fingers enough to dedicate any developers to WebGL2 in Safari, instead of improving the site developer experience, the W3C has opted to take huge swaths of responsibility away from browser developers and foist it on site devs.

The "hope" is that middle-tier libraries like Three.js and Babylon.js will hide the complexity away. I'm sure they will, and by all my experiences with them, will do a fine job of it. But it's certainly not the direction I hoped things would go.


WebGPU is more explicit, but I wouldn't say it's significantly harder. Sure, if you just want to get a triangle on the screen, it's going to take more lines of code. But everyone builds their own abstractions over the base API, and once you get to that point it's about the same ease-of-use.

And that knowledge from the WebGPU API can extend to the desktop with something like Rust's WGPU, where you can author a 3D app once and get a Vulkan, DirectX, Metal, _or_ OpenGL targeted version.


In my opinion if we're going to have a stable and a consistent API, I'm totally ok if it's going to be significantly harder. I think we should not need a middle-tier library for creating 3d graphics. Most of the time We needed threejs or twgl for mapping the shader inputs (uniforms etc) with javascript. The rest of it is just pure matrix math, that we do in 2d graphics as well.

It's worth to mention that they're also going to create another shading language. https://www.w3.org/TR/WGSL/


The API may get harder, but that's not a huge deal. Not saying that's a good thing, but APIs are far from the hardest thing in 3D graphics.

In my experience using WebGL, the hardest thing is all the missing parts compared to modern APIs and the fact that, whenever you want to implement something, you can't use the newest techniques and have to go hunting around to find out how the game dev community used to do things 20 years ago. I would take a jump in API complexity in exchange for access to a modern graphics pipeline and I'm excited by the new opportunities WebGPU will afford my work.


I’ve never understood why the GPU isn’t leveraged for 2D/UI stuff. Isn’t it all fundamentally the same? Vectors and pixels?


It is. There's a terminology problem in play here. Throughout this article 2D does not mean 2D, it means "arbitrary complexity paths, eg bezier curves". This is a small subset of 2D as makes up a UI. It'd be like saying 3D exclusively means infinitely detailed tesellated shapes with path traced rendering. That's definitely an area of 3D that exists, but of course is not at all the entirety of 3D in practice in eg games or movies. Rather it's more like the holy grail.

Same thing here with eg. SVGs. GPU accelerating SVGs is stupidly complicated because it's an inherently serial algorithm, and GPUs are poop at that. But how much of your 2D UI is made up of that? Text is in that same category but how much else? Typically very little. Maybe a few icons, but that tends to be about it. Instead you have higher level shapes, like round rectangles. And those you can do with a GPU quite easily. Similarly images are usually just a textured quad. Again trivial for a GPU. You could describe them as paths if you had a fully generic, fully accelerated path rendering system. But nobody has that, so nobody actually describes them like that.

So very nearly all 2D/UI systems are GPU accelerated. It'd be perhaps more accurate to call them hybrid renderers. Things like fonts are just CPU rendered because CPUs are better at it, but the GPU is doing all the fills, gradients, "simple" shapes, texturing, etc...


I respectfully disagree that CPUs are better than GPUs at font rendering :)

There are a few related things that can be said. Doing fast, high quality font rendering on a GPU is hard; it's much easier on a CPU. Further, the traditional rasterization pipeline of a GPU is not good at rendering fonts. Fortunately, modern GPUs also have compute shaders, which are programmed somewhat like regular computers but just with an astonishingly high number of threads.

This is the topic of my research, and I intend to publish quantitative measurements backing up these assertions before long. Early results look promising.


> which are programmed somewhat like regular computers but just with an astonishingly high number of threads.

But they aren't that; they are actually wide vector processors, which means groups of threads need to be doing the same thing for it to perform properly! Branches and divergent control flow kill GPU performance.

I'm sure you already know this, but I'm just pointing out for other folks reading. If GPUs were just CPUs with stupidly high core counts then things would be way easier, but it's more complicated than that.


But any Turing-complete operation can be mapped mechanistically into a branchless ISA, can’t it? One of those “one-instruction” ISAs, for example, where every instruction is also a jump. Vector processors would compute on those just fine, just like they compute matrix-multiplication problem isomorphisms just fine.

Or, for a more obvious/less arcane restatement: can't the shader cores just be given a shader that's an interpreter, and a texture that's a spritesheet of bytecode programs?


Yes, we can make GPU programs that render vector images this way, but they tend to be slower than an equivalent CPU program. Branches are not the problem, GPUs handle those just fine now actually. The problem is duplicated work. GPUs have cores that are individually much, much slower than a CPU, but make up for this by having lots and lots of them running in parallel. Having those cores all run the same serial interpreter does not give you increased parallelism, so the result is slower.

Designing algorithms for the GPU requires rethinking your dataflow and structure to exploit the parallel nature of the GPU. GPUs are not just a "go fast" button.


> Branches are not the problem, GPUs handle those just fine now actually

Worth noting that's only kinda true. If all threads take the same branch in a thread group, then it's mostly fine. But divergent branches are basically equivalent to all cores taking both branches and just masking off all the writes with whether or not the conditional was true. This can be incredibly slow depending on the complexity of the code being branched.

Also not all GPUs can even optimize branches effectively, some of them just always take both branches & mask off the results.


Well, sure; but the problem of font rendering specifically is an "embarrassingly parallel" one, isn't it? If you've got 1000 glyphs at a specific visual size to pre-cache into alpha-mask textures; and you've got 1000 GPU shader cores to compute those glyphs on; then each shader core only needs to compute one glyph once.

Can a CPU really be so much faster than these cores that it can run this Turing-complete font rendering program (which, to be clear, is already an abstract machine run through an interpreter either way, whether implemented on the CPU or the GPU) consisting of O(N) interpreted instructions, O(N) times, for a total of O(N^2) serial CPU computation steps; in less than the time it takes the O(N) GPU cores to run only O(N) serial computation steps each? Especially on a modern low-power system (e.g. a cheap phone), where you might only have 2-4 slow CPU cores, but still have a bounty of (equally slow) GPU cores sitting there doing mostly nothing? If so, CPUs are pretty amazing.

But even if it were true that it'd be faster in some sense (time to first pixel, where the first rendered glyph becomes available?) to render on the CPU — accelerators don't just exist to make things faster, they also exist to offload problems so the CPU can focus on things that are its comparative advantage.

Analogies:

- An apprentice tradesperson doesn't have to be better at a delegated task than their mentor is; they only need to be good enough at the task to free up some time for the mentor to focus on getting something higher-priority done, that the mentor can do and the apprentice (currently) cannot. For example, the apprentices working for master oil painters did the backgrounds, so the master could focus on portrait details + anatomy. The master could have done the backgrounds faster! But then that time would be time not spent working on the foreground.

- Ethernet cards. CPUs are fast enough to "bit bang" even 10GBe down a wire just fine; but except under very specific situations (i.e. dedicated network-switches where the CPU wants to process every packet synchronously as it comes in), it's better that they don't, leaving the (slower!) Ethernet MCU to parse Ethernet frames, discard L2-misdirected ones, and DMA the rest into kernel ring-buffer memory.

- Audio processors in old game consoles like the SNES's S-SMP and the C64's SID — yes, the CPU could do everything these could do, and faster; but if the CPU had to keep music samples playing in realtime, it wouldn't have much time to do things like gameplay (which usually goes together with playing music samples!)

Offloading font (or generalized implicit-shape) rendering to the GPU might not make sense if you're just computing letterforms for billboard textures in a static 3D scene (rather the opposite!) but in a game that wants to do things like physics and AI on the CPU, load times can likely be shorter with the GPU tasked with the font rendering, no? Especially since the rendered glyph-textures then don't have to be loaded into VRAM, because they're already there.


Having a queue of 1,000 independent work items to do doesn't mean something is "embarrassingly parallel". Operating systems are a classic example of something that's hard to parallelize, and they have 1,000 independent processes they need to schedule and manage. Heterogeneous tasks makes parallelism hard!

Cores in GPUs do not operate independently, they have hierarchies of memory and command structure. They are good at sharing some parts and terrible at sharing other parts.

Exploiting the parallelism of a GPU in the context of curve rasterization is still an active research problem (Raph Levien, who has posted elsewhere in this thread, is one of the people doing the research), and it's not easy.

I restrained from commenting on the specifics of how curves are rasterized, but if you want to imagine it, think about a letter, maybe a large "g", think about the points that make it up, and then come up with an algorithm to find out whether a specific point is inside or outside that outline. What you'll quickly realize is that there's no local solution, there's only global solutions. You have to test the intersection of all curves to know whether a given pixel is inside or outside the outline, and that sort of problem is serial.

The work division you want (do a bit of work for each curve), is exactly backwards from the work division a normal GPU might give you (do a bit of work for each pixel), pushing you towards things like compute shaders.

I could go on, but this comment thread is already too deep.


That's super interesting, actually!

> The work division you want (do a bit of work for each curve), is exactly backwards from the work division a normal GPU might give you (do a bit of work for each pixel)

Doesn't this mean that you could:

1. entirely "offline", at typeface creation time:

1a. break glyphs into their component "convex curved region tiles" (where each region is either full, empty, or defined by a curve with zero inflection points)

1b. deduplicate those tiles (anneal glyph boundaries to minimize distint tiles; take advantage of symmetries), to form a minimal set of such curve-tiles, and assign those sequence numbers, forming a "distinct curves table" for the typeface;

1c. restate each glyph as a grid of paint-by-numbers references (a "name table", to borrow the term from tile-based consoles) where each grid position references its tile + any applied rotation+reflection+inversion

2. Then, at scene-load time,

2a. take each distinct curve from the typeface's distinct-curves table, at the chosen size;

2b. generate a (rather large, but helpfully at most 8bpp) texture as so: for all distinct-curve tiles (U pos), for all potential angled-vector-line intersections (V pos), copy the distinct-curve tile, and serialize the intersection data into pixels beside it

2c. run a compute shader to operate concurrently over the workload tiles in this texture to generate an output texture of the same dimensions, that encodes, for each workload, the alpha-mask for the painted curve for the specified angle, iff the intersection test was good (otherwise generating a blank alpha-mask output);

2d. (this is the part I don't know whether GPUs can do) parallel-reduce the UxV tilemap into a Ux1 tilemap, by taking each horizontal strip, and running a pixel-shader that ORs the tiles together (where, if step 2c is done correctly, at most one tile should be non-zero per strip!)

2e. treat this Ux1 output texture as a texture atlas, and each typeface nametable as a UV map for said texture atlas, and render the glyphs.

To be clear, I'm not expecting that I came up with an off-the-cuff solution to an active "independent research problem" here; I'm just curious why it doesn't work :)


If you allow yourself to do this work offline, that's one thing, but keep in mind that 2D realtime graphics are a requirement. People still need to render SVGs, HTML5 canvas, the CSS drawing model, etc. Grid fitting might eventually go out of favor for fonts, but that's something that means you need different outlines for different sizes of fonts. See Behdad's excellent document on the difficulties of text subpixel rendering and layout [0]. Also, there's things like variable fonts which we might want to support.

The work to break a number of region tiles such that each tile has at most one region might be too fine-grained (think about tiger.svg), and probably equivalent in work compared to rasterizing on the CPU, so not much of a gain there. That said, tiled options are very popular, so you're definitely on to something, though tiles often contain multiple elements.

Going down this way lies ideas like Pathfinder 3, Massively Parallel Vector Graphics (Gan et Al), and my personal favorite, the work of adamjsimmons. I have to read this comment [1] a bit between the lines, but I think it's basically that a quadtree or other form of BVH is computed on the CPU containing which curves are in which parts of the glyph, and then the pixel shader only evaluates the curves it knows are necessary for that pixel. Similar in a lot of ways to Behdad's GLyphy.

I have my own ideas I eventually want to try on top of this as well, but I think using a BVH is my preferred way to solve this problem.

[0] https://docs.google.com/document/d/1wpzgGMqXgit6FBVaO76epnnF... [1] https://news.ycombinator.com/item?id=18260138

EDIT: You changed this comment between when I was writing and when I posted it, so it's not a reply to the new scheme. The new scheme doesn't seem particularly helpful for me. If you want to talk about this further to learn why, contact information is in my HN profile.


> If you've got 1000 glyphs at a specific visual size to pre-cache into alpha-mask textures;

How often does that happen? There are definitely languages where that is a plausible scenario (eg, Chinese), but for the majority of written languages you have well under 100 glyphs of commonality for any given font style.

And then as you noted, you cache these to an alpha texture. So you need all of those 1000 glyphs to show up in the same frame even.

> Especially on a modern low-power system (e.g. a cheap phone), where you might only have 2-4 slow CPU cores, but still have a bounty of (equally slow) GPU cores sitting there doing mostly nothing?

But the GPU isn't doing nothing. It's already doing all the things it's actually good at like texturing from that alpha texture glyph cache to the hundreds of quads across the screen, filling solid colors, and blitting images.

Rather, typically it's the CPU that is consistently under-utilized. Low end phones still tend to have 6 cores (even up to 10 cores), and apps are still generally bad at utilizing them. You could throw an entire CPU core at doing nothing but font rendering and you probably wouldn't even miss it.

The places where GPU rendering of fonts becomes interesting is when glyphs get huge, or for things like smoothly animating across font sizes (especially with things like variable width fonts). High end hero features, basically. For the simple task of text as used on eg. this site? Simple CPU rendered glyphs to an alpha texture is easily implemented and plenty fast.


You probably know about Slug, but just in case: https://sluglibrary.com

I don't know much about the font space, but enough to know it's a really hard problem, and the slug team seems to do a really good job.


Absolutely, and I don't want to claim I'm the first or only one doing font rendering on GPU. There's Slug as you pointed out, Pathfinder and Spinel as Jasper cited, and also interesting experimental work including GLyphy by Behdad and algorithms by Evan Wallace and Will Dobbie, plus a whole series of academic papers including "Massively Parallel Vector Graphics," "Random Access Vector Graphics," and others.

However, I would say that a common thread is that doing this well is hard. There's no straightforward cookbook scheme that people can just implement, and there are always tradeoffs. Slug is used in a number of games (and congrats to Eric for winning those licenses), but not as far as I know in any UI toolkits, and there are reasons for that.


> Slug is used in a number of games (and congrats to Eric for winning those licenses), but not as far as I know in any UI toolkits, and there are reasons for that

Presumably because its antialiasing is crap? But there's nothing inherent to fragment-oriented approaches that prevents you from doing good aa, and they slot nicely into the existing rasterization pipeline (which is why slug has fewer feature level requirements than pathfinder). They also permit arbitrary domain transformations (some caveats here as you have to calculate a bounding box still), and given appropriate space partitioning should not be significantly slower than scanline algorithms.

Also: UI toolkits are not known for being on the leading edge of graphics research. I think fastuidraw demonstrates this rather well. Insofar as there is exciting work happening in industry, it is mainly happening in web browsers; and I would expect mozilla and google to devote their efforts pathfinder and skia, respectively.


> Presumably because its antialiasing is crap?

No, Slug’s technique can handle AA and do it well. The problem with Slug for general purpose UI frameworks is it needs to do a lot of pre-processing on it’s data to do the good job it does.


Slug aa is only 1-dimensional. From the paper (end of section 2):

> Adding and subtracting these fractions from the winding number has the effect of antialiasing in the direction of the rays. Averaging the final coverages calculated for multiple ray directions antialiases with greater isotropy, but at a performance cost. Considering only rays parallel to the coordinates axes is a good compromise, especially when combined with supersampling, as discussed later.

I.E. you don't get a real 2-d coverage result, only an amalgamation of a number of 1-d coverage results; and you must trade off performance and quality. Other approaches do not require such a tradeoff.

Analytic 2-d coverage can be done more cheaply than n 1-d samples (n is probably in the neighborhood of 4-6), and produces better (mathematically ideal, albeit with uncomfortable caveats) results. (Note 4-6 samples don't mean 4-6x slower, due to space partitioning, buffers, and other fixed costs, as well locality. And I think slug takes 2 samples by default as is.)


Oh I wasn’t pointing it out as a critical response to it being your thesis. I’m actually very interested to see how it turns out, because I’m digging into this space at the moment.

I’m trying to build a platform-agnostic styling language specifically for UI/UX designers, and it’s leading me down the path of “render everything via WebGPU”.

Is there a way I can follow your progress? Very keen on hearing more about your research if/when it’s ready.


I don't know if I'd go that far -- icons and text are vector paths. Strokes and drop shadows (aka blurs) are all things that GPUs aren't great particularly great at. Simple shapes like rounded rectangles, GPUs can be OK at, but you'd have overdraw problems if done naively.

I've worked on 2D rendering engines, so I've seen the content thrown at it in the wild. Very rarely do you have a simple case. GitHub's buttons are maybe the simplest example I can think of, and they have strokes (GPUs: ugh) on a filled rounded border (GPUs: ugh), with text inside (GPUs: ugh), sometimes with a text shadow (GPUs: ugh).

It can be done, but you basically have to get away from triangles and move into research methods which are exceptionally more tricky, aka the stuff in Pathfinder and piet-gpu.


> Strokes and drop shadows (aka blurs) are all things that GPUs aren't great particularly great at.

They can handle those just fine. Blurs are just inherently very expensive, but GPUs are no worse at them than CPUs. In fact GPUs are way faster at blurs than CPUs.

Same with filled shapes. It's not really a challenge. You have a fragment shader that knows how to essentially 'clip' to a round rect, which isn't hard, and then filling it any which way with anything is trivial.


Showing my ignorance here: Why we don't have dedicated graphics hardware for 2D?

Feels like moving 2D jobs to a fast 3D engine is wrong approach at the top level.


"Dedicated graphics hardware for 2D" is just a blitter and sprite engine, and modern GPU's do that just fine. The mouse pointer in many recent systems is a 2D hardware sprite.


Programming graphics, especially 2D stuff, is far more ergonomic and convenient in your native language already executing on the CPU. If you can get away with it performance wise, there's really no incentive to incur the myriad obnoxious bullshit inherent in GPU programming.

But with the advent of high dpi displays it's become problematic to do even simple 2D/UI rendering on the CPU just because of the enormous quantity of pixels.

When you pull in GPU support, now you're stuck having to pick a backend (gl/vulkan/d3d/metal) or some compatibility layer to make some/all of them work. You have to write shaders, you have to constantly move state in/out of the GPU across this GPU:CPU API boundary. It's just a total clusterfuck best avoided if possible.


Naive question: Can't you leverage a game engine for this? Why do you have to do all the low level work from scratch?


I'm not familiar with modern game engines, but I'd be very surprised if any of them managed to eliminate the utterly unnatural reality of writing shaders vs. writing classical 2D rendering algorithms operating on a linear buffer of pixels in memory.

For concurrency reasons shaders logically run on a single pixel. Gone are your longstanding algorithms for doing simple things like bresenham line-drawing, or something as simple as drawing a filled box like this:

  for (int y = box.y; y < box.y + box.h; y++)
    for (int x = box.x; x < box.x + box.w; x++)
      FB[y * FB_STRIDE + x] = box.color;
Nope, not happening in a shader. Every shader basically executes in isolation on a pixel and you have to operate from a sort of dead-reckoning perspective. No more sequential loops iterating rows and lines, a fashion which we have literally decades of graphics programming publications explaining how to do things. Not to mention how natural it is to think about things that way, since it closely resembles drawing on paper.

In shaders you often end up doing things that feel utterly absurd in terms of overhead because of this "you run on an arbitrary pixel" perspective. Oftentimes you're writing some kind of distance function, where previously you would have written a loop iterating across lines and rows advancing some state as you step through the pixels. In a shader it's like the paper is covered with thousands of pencils that don't move, and the shader program just determines what color the pencil should be based on its location.

GPU programming is plain annoying, even without the GPU API fragmentation clusterfuck. Especially if you've been writing 2D stuff on the CPU for decades.


What you've described is a blit operation (copying a block of pixels from a source, such as a texture, in your case a solid color). Probably you wouldn't write this out but would write:

     blit(box.x,box.y,box.w,box.h,RED)
In shader-land, this is equivalent to rendering a rectangle, with a texture or solid color as source. Sure it's more involved to implement this abstraction, since you need a mesh, need to write a small shader, be familiar with the render pipeline state etc, but it also gives you some stuff trivially, like anti-aliasing, and scaling support for the blit operation.

Many libraries already implement this stuff on top of WebGL, like pixi.js

While I don't doubt the creative possibilities of working with pixels directly, once you figure out how GPUs work, a lot of 2D stuff is actually pretty easy.


Is there an expression for determining whether a pixel is on a bezier curve?


Both of the sibling comments describe quadratic Bezier curves (used often in font rendering because TrueType only supports quadratic), while graphics APIs and CFF font outlines often mandate support for cubic Beziers. Cubics are a lot more challenging to build a closed-form solution for, and also have things like self-intersection which makes it a lot more challenging.

Most production renderers, sometimes even ones on the CPU, approximate cubic Bezier curves with a number of quadratic Bezier curves. This is a preprocessing step which needs to be done on the CPU. While it it could be done on the GPU, doing it in the pixel shader would be really wasteful.


Pretty good overview of what you asked about - https://medium.com/@evanwallace/easy-scalable-text-rendering...

Other links https://developer.nvidia.com/gpugems/gpugems3/part-iv-image-... https://jcgt.org/published/0006/02/02/ Also (https://terathon.com/i3d2018_lengyel.pdf) Which was developed into a product IIRC.

Another GPU text renderer https://github.com/hypernewbie/VEFontCache

There's been other development and some posts on Hacker News about rendering fonts on the GPU


Search in [0] for "bezier", iquilez is a magician in this department.

[0] https://iquilezles.org/www/articles/distfunctions2d/distfunc...


You can, when using modern desktop OSes, that is why DirectDraw/DirectText/CoreGraphics/.... exist.


Browsers largely make use of the GPU for UI rendering. Direct2D, Cocoa, QT and GTK(4) are all hardware accelerated as well. So not really sure what you mean?


The linked article covers the major challenges that makes it difficult to adapt the GPU to 2D vector graphics rendering.

tl;dr 2D cares about shapes and curves and proper antialiased coverage of super small shapes, the traditional rasterization GPU pipeline is very good at triangles and textures and have limited coverage options.

Direct2D, Qt and GTK+ still do a good portion of the graphics work on the CPU, and only use GPU for composition. Some limited graphics can be done on the GPU, usually with quality tradeoffs. Font rasterization is still done the CPU, and uploaded to the GPU as a texture.

Newer libraries like Pathfinder, Spinel, piet-gpu all work by not using the triangle rasterization parts but instead treating the GPU as a general-purpose parallel processor with compute shaders.


I think it is. I see all kinds of 2D projects claiming to be "GPU accelerated" - for example GTK, KDE, web browsers. I'm not sure how much of the actual processing is done on the GPU, but it's enough to call it "accelerated"!


The GPU is used to render UI stuff in any modern system.


What I'd really love to see is a UI system that gives developers access to the GPU somehow, but I'm not sure what that would look like.



Like Direct2D, QPainter (QT) and cairo (GTK)?


That's exactly what eg the game Shovel Knight is doing.

See eg https://www.youtube.com/watch?v=vjENktnbCaE


GPUs want to draw triangles, and in fact only know how to draw triangles[0]. Pretty much all graphics API innovation has been around either feeding more triangles to the GPU faster, letting the GPU create more triangles after they've been sent, or finding cool new ways to draw things on the surface of those triangles.

2D/UI breaks down into drawing curves, either as filled shapes or strokes. The preferred representation of such is a Bezier spline, which is a series of degree-three[1] polynomials that GPUs have zero support for rasterizing. Furthermore, strokes on the basis functions of those splines are not polynomials, but an even more bizarre curve type called an algebraic curve. You cannot just offset the control points to derive a stroke curve; you either have to approximate the stroke itself with Beziers, or actually draw a line sequentially in a way that GPUs are really not capable of doing.

The four things you can do to render 2D/UI on a GPU is:

- Tesselate the Bezier spline with a series of triangles. Lyon does this. Bezier curves make this rather cheap to do, but this requires foreknowledge of what scale the Bezier will be rendered at, and you cannot adjust stroke sizes at all without retessellating.

- Send the control points to the GPU and use hardware tessellation to do the above per-frame. No clue if anyone does this.

- Don't tessellate at all, but send the control points to the GPU as a polygonal mesh, and draw the actual Beziers in the fragment shaders for each polygon. For degree-two/quadratics there are a series of coordinate transforms that you can do which conveniently map all curves to one UV coordinate space; degree-three/cubics require a lot more attention in order to render correctly. If I remember correctly Mozilla Pathfinder does this[2].

- Send a signed distance field and have the GPU march it to render curves. I don't know much about this but I remember hearing about this a while back.

All of these approaches have downsides. Tessellation is the approach I'm most familiar with because it's used heavily in Ruffle; so I'll just explain it's downsides to give you a good idea of why this is a huge problem:

- We can't support some of Flash's weirder rendering hacks, like hairline strokes. Once we have a tessellated stroke, it will always be that width regardless of how we scale the shape. But hairlines require that the stroke get proportionally bigger as the shape gets smaller. In Flash, they were rendering on CPU, so it was just a matter of saying "strokes are always at least 1px".

- We have to sort of guess what scale we want to render at and hope we have enough detail that the curves look like curves. There's one particular Flash optimization trick that consistently breaks our detail estimation and causes us to generate really lo-fi polygons.

- Tessellation requires the curve shape to actually make sense as a sealed hull. We've exposed numerous underlying bugs in lyon purely by throwing really complicated or badly-specified Flash art at it.

- All of this is expensive, especially for complicated shapes. For example, pretty much any Homestuck SWF will lock up your browser for multiple minutes as lyon tries to make sense of all of Hussie's art. This also precludes varying strokes by retessellating per-frame, which would otherwise fix the hairline stroke problem I mentioned above.

[0] AFAIK quads are emulated on most modern GPUs, but they are just as useless for 2D/UI as triangles are.

[1] Some earlier 2D systems used degree-two Bezier splines, including most of Adobe Flash.

[2] We have half a PR to use this in Ruffle, but it was abandoned a while back.


> For degree-two/quadratics there are a series of coordinate transforms that you can do which conveniently map all curves to one UV coordinate space; degree-three/cubics require a lot more attention in order to render correctly. If I remember correctly Mozilla Pathfinder does this[2].

That's interesting! Do you have any references (or links to sample fragment shader code) for that quadratic case coordinate transformation?


NVIDIA put out a book with Bezier rendering algorithms in it: https://developer.nvidia.com/gpugems/gpugems3/part-iv-image-... which is itself derived from a paper whose name I don't remember.

Basically, all quadratic Bezier curves are just linear transformations[0] of the curve u^2 - v. The fragment shader just evaluates that one equation to draw the curve, and texture mapping does all the rest. As long as you're careful to ensure that your fill surface polygon actually makes sense, you get back out perfectly-rendered Beziers at any zoom factor or angle.

[0] Scale/shear/rotate - all the things you can do by matrix multiplication against a vector. Notably, not including translations; though GPUs just so happen to use a coordinate system that allows linear translations if you follow some conventions.


Thank you - I appreciate the link.

> from a paper whose name I don't remember.

I guess it's the Loop-Blinn paper from my reply to the other reply.

Resolution Independent Curve Rendering using Programmable Graphics Hardware https://www.microsoft.com/en-us/research/wp-content/uploads/...


This is basically "all parabolas are linear transformations of one parabola", and the details can be found in Loop-Blinn.


Thank you - sounds pretty straightforward.

I found a reference on HN for anyone else following along:

> In 2005 Loop & Blinn [0] found a method to decide if a sample / pixel is inside or outside a bezier curve (independently of other samples, thus possible in a fragment shader) using only a few multiplications and one subtraction per sample.

    - Integral quadratic curve: One multiplication
    - Rational quadratic curve: Two multiplications
    - Integral cubic curve: Three multiplications
    - Rational cubic curve: Four multiplications
https://www.microsoft.com/en-us/research/wp-content/uploads/...

https://news.ycombinator.com/item?id=26463464


I disagree with the fundamental assertion that 2d is harder than 3d. I think a more accurate title would be "Why are 2D vector graphics so much harder than 3D when using a 3D-oriented raster graphics pipeline?"

If we remove the existing constraints and say you have to build these things in pure software, I think the equation would look a little different. I don't know of many developers who can accurately describe what the GPU does these days. Triangle rasterization is not an easy problem if you have to solve it yourself.


It's that you're usually rasterizing much more complicated shapes than triangles in 2D, like polygons and curves and fonts with hinting (which is often actually implemented as Turing-complete byte code, not stuff that's easy to run in parallel entirely in the GPU).

https://en.wikipedia.org/wiki/PostScript_fonts#Type_2

https://en.wikipedia.org/wiki/Font_hinting

https://www.typotheque.com/articles/hinting


Writing a triangle rasterizer is not that hard. What APIs like OpenGL give you for free (other than a performance boost) is walking all the pixels that are covered by each triangle, and computing the barycentric coordinates for each pixel (and then using these to lerp the vertex data). So that's what you have to replace by a CPU program.

I find the much harder part is how to setup the architecture in such a way that the data flows through your shader pipelines without an unbearable amount of boilerplate. 3D APIs don't help with that - if anything they make it harder.


> Writing a triangle rasterizer is not that hard

Certainly. One can write a trivial version in maybe 30 lines of code. Writing a triangle rasterizer that you would want to use in a product that is consumed by another human is hard.

Also, it is my experience that none of these things can truly be built in isolation. Depth buffers and acceleration structures crosscut all aspects of a rendering engine.

I do agree regarding the 3d APIs though. Writing it yourself in software mode can be easier than learning someone else's mousetrap. This is the path I prefer, even if it is slower at first.


Is it even that? People tolerate a lot more artifacts in 3D than 2D. If you wrote a 2D graphics engine that used triangles as primitives people probably wouldn't like it (and it would probably render text very slowly.)


Easy, 2D is harder because it can have hard edges. Look at any 3D game, the edges of the polygons are obscured by antialiasing methods and texture wrapping. In 2D, you need to draw a line at some arbitrary angle (or worse, a spline!) where on one side there's black and on the other side white. You will have to fake this with subpixel rendering. You also need to detect when the subpixels would interfere with the actual line width. You need to be able to snap two objects together that fit together seamlessly in the mathematical sense, without a sliver in-between popping in and out of existence.

Anyone who has tried to fake 2D using 3D rendering (such as inside a game engine) has likely run into the above issues.



I would highly recommend learning more about webgl before diving into webGPU. https://webglfundamentals.org/ is a good place to start


from what i've read, they're pretty different abstractions, so i'm concerned that learning WebGL2 will give me a mental model that will be largely useless and just add to the confusion when switching to WebGPU. even the shader language is different (WGSL vs GLSL) [1]. how much knowledge will be transferable?

[1] https://dmnsgn.me/blog/from-glsl-to-wgsl-the-future-of-shade...


Think of it like algorithms and data structures are at their basis language agnostic.

Same applies to the 3D programming, while the way to implement something like deferred rendering (as an example) is different across APIs, it still is deferred rendering.



> bsenftner on May 10, 2019 | prev [–]

> Revision of history, wrong, and insultingly so. This post is a rewrite of serious graphics history. Read Foley Dan Vam, forget this tripe.


It's a strange take that 2D == vector graphics and 3D == resterization and then write a blog post that's really about vector vs raster under the auspices of 2D vs 3D.


Both the 2D and 3D here are vectors -- both the 2D lines and the 3D triangles are represented as a sequence of mathematical points which have to be "rasterized". But, put simply, triangle rasterization is just a much easier problem than curve rasterization.


Paul Haberli, when he was a computer graphics researcher at SGI, wrote a paper in 1993 called Texture Mapping as a Fundamental Drawing Primitive, which was about how to use texture mapping for drawing anti-aliased lines, air-brushes, anti-aliased text, volume rendering, environment mapping, color interpolation, contouring, and many other applications.

http://en.wikipedia.org/wiki/Paul_Haeberli

http://www.graficaobscura.com/texmap/index.html


The content of the article contradicts its title.

The tl;dr is that 2D vector graphics uses implicit geometry while 3D is explicitly defined using vertices and triangles, implicit means "more maths".

And as the article says, the reason we don't use implicit geometry in 3D is that it is simply too hard except for a few specific cases, at least for now.

To answer the question "why 2D is harder?" is "because we are not talking about the same thing". From easiest to hardest we have: 2D triangle-based, 3D triangle-based, 2D implicit, 3D implicit.


> "GPUs, through a consequence of history, chose not to focus on real-time implicit geometry like curves, but instead on everything that goes inside them"

Just for information : 10 years ago, NVidia has released this paper and extension "GPU Accelerated Path Rendering" : https://developer.nvidia.com/gpu-accelerated-path-rendering , so it seems GPU can still help


I've written about the history of PostScript, Interpress and NeWS before:

https://news.ycombinator.com/item?id=21968175

>Kragen is right that PostScript is a lot more like Lisp or Smalltalk than Forth, especially when you use Owen Densmore's object oriented PostScript programming system (which NeWS was based on). PostScript is semantically very different and much higher level that Forth, and syntactically similar to Forth but uses totally different names (exch instead of swap, pop instead of drop, etc). [...]

https://news.ycombinator.com/item?id=22456710

>Owen Densmore recounted John Warnock's idea that PostScript was actually a "linguistic motherboard". [...]

>Brian Reid (whose brother is Glenn Reid, author of several books on PostScript from Adobe) wrote up an excellent historical summary in 1985 on the laser-lovers mailing list of the influences and evolution of PostScript.

https://en.wikipedia.org/wiki/Brian_Reid_(computer_scientist...

http://glennreid.blogspot.com/

Here's a post I wrote earlier:

https://news.ycombinator.com/item?id=19874245

>DonHopkins 8 months ago [-]

>Brian Reid wrote about page independence, comparing Interpress' and PostScript's different approaches. Adobe's later voluntary Document Structuring Conventions actually used PostScript comments to make declarations and delimit different parts of the file -- it wasn't actually a part of the PostScript language, while Interpress defined pages as independent so they couldn't possibly affect each other:

https://groups.google.com/forum/#!topic/fa.laser-lovers/H3us...

>By now you can probably see the fundamental philosophical difference between PostScript and Interpress. Interpress takes the stance that the language system must guarantee certain useful properties, while PostScript takes the stance that the language system must provide the user with the means to achieve those properties if he wants them. With very few exceptions, both languages provide the same facilities, but in Interpress the protection mechanisms are mandatory and in PostScript they are optional. Debates over the relative merits of mandatory and optional protection systems have raged for years not only in the programming language community but also among owners of motorcycle helmets. While the Interpress language mandates a particular organization, the PostScript language provides the tools (structuring conventions and SAVE/RESTORE) to duplicate that organization exactly, with all of the attendant benefits. However, the PostScript user need not employ those tools.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: