Hacker News new | past | comments | ask | show | jobs | submit login
Hey, computer, make me a font (serce.me)
331 points by pavanyara on Oct 3, 2023 | hide | past | favorite | 148 comments



I found a few months ago that the gpt-4 code interpreter is capable of converting a black and white png of a glyph to an svg

https://twitter.com/lfegray/status/1678787763905126400

It would be cool to combine a script like the one gpt-4 gave me with an image generation model to generate fonts. The approach from this blog post is way more interesting though.

On a separate note it reminds me of this suckerpinch video :) maybe we can finally get uppestcase and lowestcase fonts

https://www.youtube.com/watch?v=HLRdruqQfRk


That's amazing. One of my favorite things to do with copilot is to comment something like "//white arrow pointing right" and then start "<svg" and have it complete it. If it doesn't get it right the first time I update my comment. Saves me time searching for the right SVG and digging through free but really paid image sites.


> Saves me time searching for the right SVG and digging through free but really paid image sites.

FWIW, Google's Material Design Icons and The Noun Project are decent sources of high quality, actually-free SVG icons:

* https://fonts.google.com/icons (Apache license)

* https://thenounproject.com/ (CC-BY)


And it saves you having to credit anyone, win-win!


Awful lot of sites have icons on them. I can't recall ever seeing icon credit. Copilot is like a year old.


That's not a good argument, I'm sorry. A lot of people spent time and effort designing and creating something; credit, or reference is the least one could do.

If it's not done, even in a comment inside the HTML... well, it would be nice if it were :)


Not a good argument for what?


For dismissing crediting someone for their work based on an appeal for the majority


I never said we should not credit people. In fact nowhere in the thread have I expressed a single should or ought style opinion on anything.

Rather I'm pointing out that this is some weird "back in my day things used to be X" where we're still in that day and I can clearly see with my own two eyes that things aren't X. Whatever other ills can be laid at llm code generations feet, icons not getting credited isn't one of them.


Fair enough, I may have seen implications that weren't there. So, I'm sorry.


All good.


Why would you have to credit anyone? Do artists credit the brushes that they use in photoshop?


This is such a good idea. Not sure why svg code escaped my mind as something copilot would be good at.


In general, copilots are a massive boon to "boilerplatey", simple syntax languages from XML/HTML to Go.


The author says he achieved text-to-SVG generation but doesn't point to a code repository for it... It would be super interesting (or does gpt-4 do it natively?)

That said, I'm not sure that you need GPT-4 for outlining a BW image and making a path out of it; Corel Draw did that well, over 25 years ago?

So yes, another approach to what the author is doing, would be to generate font bitmaps using any of the leading image generators, and then vectorize the bitmaps. Less straightforward and precise, but probably simpler.


Hi, I am the author. For text-to-SVG, check out IconShop [1]. It was the paper that I tried to reproduce results from initially. In the paper, there is a comparison of their approach against using GPT-4 [2].

Using vectorisation tools like potrace, is indeed a much more popular approach, and there are quite a few papers generating fonts this way. The most recent I believe is DualVector [3]. But I tried to approach the problem from another angle.

[1]: https://icon-shop.github.io

[2]: https://arxiv.org/pdf/2304.14400.pdf

[3]: https://openaccess.thecvf.com/content/CVPR2023/html/Liu_Dual...


Has the code and models been release for this? Looks like amazing work!


Thank you for the follow-up! Much appreciated!


ChatGPT/GPT-4 does it natively. You can say "Please generate me an SVG image of a unicorn" and it will spit out the SVG code.


Here’s my stupid question of the day:

Would you mind to explain what you mean by “native” in this context?


Not using a -plugin- probably


>I found a few months ago that the gpt-4 code interpreter is capable of converting a black and white png of a glyph to an svg

:) Easy there, let's not make all the naysayers who say it only just predicts plausible words sweat.

Your phrasing almost makes it sound like you're sharing a clear example of it analyzing and completing a complex task correctly, while perfectly understanding what it's doing.

Perhaps we should say it only just predicted words that are plausible responses to someone asking to do that, while also predicting plausible words someone might say in response to an error message along the way. It might not actually be doing any converting, just predicting words and tokens without really doing anything.

My favorite part of its predictive capabilities is how it is able to predict the other half of a conversation that literally goes "didn't work, try again", "didn't work, try again", "still didn't work, try again", "all right you finally fixed it good job" - without even telling it why it didn't work or quoting the error message. Somehow it is still able to predict the other half of the conversation so that it ends up with "finally, good job!"

Who knew that to get results that look like it knows what it's doing, it's enough to predict what could make someone say that!

We are truly living in the golden age of statistical prediction that does not involve any degree of thinking, analysis, or understanding.

Truly our age of applied statistics is going better than anyone could have, er, "predicted". :)


>:) Easy there, let's not make all the naysayers who say it only just predicts plausible words sweat.

I am a huge proponent of machine learning but these transformer architectures really _are_ just predicting the next token (word) that fits in.

Yes, it can perform basic (and even seemingly complex logic), but this is purely because for the string "What is one plus one?" the next token with the highest score would be "Two."

It's not analysing a task in that it's "ideating", it's simply generating the next best token. That's literally how the transformer architecture works.

Of course it can convert something, it's going from png image data->arbitrary tokens->svg tokens. It's still a relatively linear process. I bet if you dig into that project it'll still be doing it token by token/chunk by chunk.

I can't wait until we do see a genuinely nonlinear model though, where it can ideate using a cloud of higher dimensional non-linguistic tokens to represent a thought process or idea.

Granted, I do think that these models are in many ways still doing what parts of our brains do; I think people's resistance to some of these models is in large part an unconscious reaction to thinking that our meaty brains are "special" and that we'll never achieve consciousness in a machine.


> Your phrasing almost makes it sound like you're sharing a clear example of it analyzing and completing a complex task correctly, while perfectly understanding what it's doing.

OpenAI has hardcoded (or heavily overfit) several special-purpose functions into their ChatGPT systems. In the past few months, they've integrated other special-purpose models, so their tools can do more than just predictive text (e.g. image recognition).

GPT can do limited verbal reasoning, whatever else can do image recognition, but that does not mean the combined system can do visual reasoning. There's no mechanism by which it would (unless you specifically create one, but that's not trivial and doesn't generalise).

> Who knew that to get results that look like it knows what it's doing, it's enough to predict what could make someone say that!

Everyone. Some call it “specification gaming” or “reward hacking”, and we've known about it for a long time. It's a really obvious concept if you have a good mental model of reinforcement learning. https://doi.org/10.1162%2Fartl_a_00319 is a fun example.

> We are truly living in the golden age of statistical prediction that does not involve any degree of thinking, analysis, or understanding.

This is a straw argument. I can't speak for anyone else, but my criticisms are mainly of people seeing some thinking-like, analysis-like or understanding-like behaviour, and assuming that it is human-like thinking, analysis or understanding, while ignoring other hypotheses (some of which make successful advance predictions in a way the “it's doing what humans do!” models don't).

I will note: the people being the most loudly exuberant about ChatGPT's vast intelligence seem to view it as a tool. If I were faced with an opaque box, inside which was a being capable of general-purpose problem solving, conversation, and original thought, my first reaction would not be “I can use this for my own ends”. I am glad that I have seen nothing to convince me that ChatGPT is such a being, and I have theoretical arguments that ChatGPT probably won't ever be such a being, but if you genuinely think this technology has the potential to produce such a being, you have an ethical responsibility.


>>statistical prediction that does not involve any degree of thinking, analysis, or understanding.

>This is a straw argument.

People say that it does not understand anything, that it just predicts text as though it does.

However, I believe they're mistaken. I find that it clearly understands things.

What do you think? Do you think it understands you when you speak to it? Can it do problem solving or original thought in your opinion? My own anwer is: "100% it understands me, and 100% yes it can do problem solving and original thought - maybe not world class scientist level but to an impressive extent."

Clearly it just has a few thinking "moments", it can't spend hours extensively tackling a problem the way a human can, and it doesn't have a memory, nor can plan nor execute large projects by itself or anything like that.

But it can, as you say, do "limited verbal reasoning", and that is incredibly impressive.


That mirrors my experiences. One thing in particular that I find notable is to ask it a question after it has made an effort.

"Does this really answer the question"

Or, describe how the information you provided will reach the goal in a possible, practical way.

It does understand basics, like me, you, others. And it can handle logic expressions to a degree I find notable.


Thank you for sharing that suckerpinch, enjoyed watching that immensely


Douglas Hofstader, the author of Godel Escher Bach, thought the task of creating fonts could only be solved with general AI.

https://www.m-u-l-t-i-p-l-i-c-i-t-y.org/media/pdf/Metafont-M...

The Letter Spirit project aims to model artistic creativity by designing stylistically uniform "gridfonts" (typefaces limited to a grid).


I read that a while ago and thought that it was interesting: Hofstadter was right that it would require much more general approaches than Knuth's approach of 'think very hard and tweak a hand-engineered knob', because that's how all the past VAE/GAN/RNN work on typography-related stuff has worked.

As for the broader question of whether such approaches are general AI, well, that's a bullet Hofstadter is increasingly willing to bite, as upset as it makes him: https://www.lesswrong.com/posts/kAmgdEjq2eYQkB5PP/douglas-ho...


Hofstadter's article is very interesting and delightful (as is typical of him). But as a response to Knuth's article it's basically reacting to a straw-man or misunderstanding: by "a metafont" in "The Concept of a Meta-Font"[1] Knuth simply meant a common description of many related fonts in a family (like the Computer Modern family where different font sizes, bold, italics, sans-serif, typewriter style etc are all generated from common code and tweakable knobs) — this is a consciously chosen and designed family. But when he joked about

> The idea of a meta-font should now be clear. But what good is it? The ability to manipulate lots of parameters may be interesting and fun, but does anybody really need a 6⅐-point font that is one fourth of the way between Baskerville and Helvetica?

Hofstadter ran with it, imagining Knuth to mean a single universal "metafont" from which every single font can be achieved by suitable tweaking of knobs. This is of course nonsense.

Knuth wrote a (little-known or referenced) short response in the same journal's Vol. 17 No. 4 (1983): Volume 17.4 (p 412, or in the PDF page 89 of 96 at https://journals.uc.edu/index.php/vl/issue/view/364/183) [from the tone I imagine him very annoyed :-)]:

> I never meant to imply that all typefaces could usefully be combined into one single meta-font, not even if consideration is restricted to book faces. For example, […] Meanwhile, I'm pleased to see that my article has stimulated people to have other ideas, even if those ideas have little or no connection with the main point I was trying to make. Misunderstandings of meta-fonts may well prove to be more important than my own simple observations in the long run.

Returning to the thread a bit, all these “write code to draw an image” systems—like Metafont/MetaPost, Asymptote, TikZ (and also I guess DOT/Graphviz, Mermaid, nomnoml, …)—are IMO interesting as a way for those who think in language / symbols / concepts to do visual stuff (and vice-versa to some extent), and also (along Knuth's lines) “truly understand” shapes by translating them into precise descriptions. Metafont was never going to become popular expecting font designers to write code (and the fact that hand-writing SVG is a negligible fraction of usage makes sense), but now that LLMs can help translate back-and-forth, it's going to be interesting to see if we ever get to “understanding” shapes.

[1]: https://web.archive.org/web/20220629082019/https://s3-us-wes... / https://journals.uc.edu/index.php/vl/article/view/5329/4193


Well, GPT is a general AI.


AI is almost AI, time to move the goalposts!


Yeah it’s remarkable how downvoted my comment is. People keep talking about “when will be get AGI?” But the transformer LLM architecture is:

Artificial: man-made

General: able to solve arbitrary problems from any problem domain it was not specifically trained on

Intelligent: capable of producing efficient, sometimes best of class solutions by applying prior or transferred knowledge

A.G.I.

If this doesn’t meet your definition of general AI, then I ask you: what does? I’d like to know where we’ve moved the goalposts to, just so I can keep up please.


> To train the model, I assembled a dataset of 71k distinct fonts.

I give it a week before Monotype sues your face off.


Copyright around fonts may not support such a suit in the same as way as works of art.

Wikipedia says: "In the United States, the shapes of typefaces are not eligible for copyright but may be protected by design patent (although it is rarely applied for, the first US design patent that was ever awarded was for a typeface).[1]"

So just scanning the rendered font (as opposed to the code that generates it), may be harder to stop than scanning of artwork.

https://en.wikipedia.org/wiki/Intellectual_property_protecti...


The interesting thing about law is that even if the law doesn't absolutely protect you; the person that you believe infringing on your work for free had better be prepared to pony up lawyer fees to defend their work.

I think this would be one of the few times where I think that's useful. Typefaces take a lot of time and consideration and work to create so just blanket ripping off that work because we all take them for granted is kind of bullshit.

I have conflicting thoughts about this.

But at the end of the day if you only trained on open fonts they just aren't generally as good and the output should be not generally as good as opposed to training on nicer fonts that you technically don't have the rights to (but no one thoughts of this being an issue at the time of design patents / etc ).

But we're now in the world where we will pay money to compute an AI model to design fonts instead of just paying designers to design fonts. The race to the bottom is accelerating at an alarming rate.


> may be harder to stop than scanning of artwork.

Which has not been particularly easy to stop either


General question. What is the end game of all this? Corporations that have the money and other means are just wildly burning electricity while sucking up the world's art and knowledge work.

How can anyone possibly protect their work in this no holds barred environment?


Font law is almost as complex and fascinating as Tree law. Given how complex font licensing can be, a generative use case that produces usable fonts would be a huge threat to the foundaries and I expect they will be very litigious, just as Getty and others are in the image space.


Tree law? Please say more, sounds interesting


possibly this? https://www.atlasobscura.com/articles/tree-law-is-a-gnarly-t...

....“It’s never about the trees,” Bonapart says. “The trees often serve as lightning rods for other issues that are the psychological underpinning of a dispute that people might have with each other.”


Funny - literally doing with this right now - a tree that crosses my boundary with a neighbor dropped a branch on another neighbor's house.


That's why it's so important for the weights to be released to the public ASAP. Even when the original is sued, they can still be passed around in torrents for hobbyists and third-world businessmen to enjoy.


Not this agin /eyeroll

It’s not illegal for a human to look through 71,000 fonts and then creat their own. It can’t be illegal for a human to use a robot to look through the fonts for them.


> it can't be...

Oh boy it totally can. I'm not saying it it, but it totally can.

I feel many people (at least on HN) treat laws like programming. There are some similarities, but programmers like to eliminate arbitrary special cases, while lawmakers love to create arbitrary special cases. It's like their only job, really.


Programming and law can go together tho https://github.com/CatalaLang/catala


In fact, that failure to grok lawmakers, lies at the heart of many poor decisions made.


It depends on exactly what is learned from looking through them. If you end up copying shapes and segments then there are possible grounds for a lawsuit. If you’re able to determine the rules to make a good font from your analysis, however, then nothing is stopping you from applying them.


Yeah but honestly how much diversity does a font have? You say copying shapes and segments etc, but if you look at a genre of fonts, say scifi ones, they all have very similar features, usually with one big "character" difference (sometimes barely even that).

How is it possible to create something 100% unique when it's meant to fill a particular genre? The genre exists because it's a group of similar things.


I’ve tried out some work on generating vector fonts too, in the format of Bezier curves and a seq2seq model. The problem was that fonts outputted by ML models were imprecise. Lines were not perfectly parallel, corners were at 89°, and curves were kinked. It’s not too difficult to get fonts that look good enough, but the imperfections are glaring as fonts are normally perfectly precise. These imperfections are evident in OP’s output too, and in my opinion make these types of models unusable for actual typesetting.

A 1% error in a raster output would be pixel colors being slightly off, but a 89° corner in a vector image is immediately noticeable, making this a hard problem to solve. I haven’t looked into this problem too much since, but I’m interested to hear about possible solutions and reading material.


Without changing the fundamental learning process, one could conceivably introduce a "post-production" step, where you tighten up the output according to a set of pre-defined rules (e.g., if an angle is 89 degrees, adjust the angle to 90).

Of course, changing the learning process would be best. One idea which comes to mind is finding a way to embed relationships into the ML training system itself (e.g., output no angles other than 90 degrees or some predefined set). Such an approach is a type of contraint-based ML, where the ML agent identifies a solution given certain constraints on the output. In my experience, the right approach to accomplish this goal is using factor graphs.


I think this approach isn't ideal because you're representing pixels as 150x150 unique bins. With only 71k fonts it's likely a lot of these bins are never used, especially at the corners. Since you're quantizing anyways, you might as well use a convnet then trace the output, which would better take advantage of the 2d nature of the pixel data.

This kind of reminds me of dalle-1 where the image is represented as 256 image tokens then generated one token at a time. That approach is the most direct way to adapt a causal-LM architecture but it clearly didn't make a lot of sense because images don't have a natural top-down-left-right order.

For vector graphics, the closest analogous concept to pixel-wise convolution would be the Minkowski sum. I wonder if a Minkowski sum-based diffusion model would work for svg images.


Thank you for the suggestion. A couple of ML engineers with whom I've spoken after publishing the blog also suggested that I should try representing x and y coordinates as separate tokens.


How would the Minkowski sum be used in the diffusion model? Is the idea to look at the Minkowski sum of the prediction and label?


In pixel space a convnet uses pixel-wise convolutions and a pixel-kernel. If you represent a vector image as a polygon, the direct equivalent to a convolution would be the Minkowski sum of the vector image and a polygon-kernel.

You could start off with a random polygon and the reverse diffusion process would slowly turn it into a text glyph.


He he the machine learning naysayers gonna jump on this one for sure.

Consider a human being designing a scifi styled font; how do they get started? By opening references, of course! To examples of other scifi styled fonts that they do not have the rights to, nor will they credit.

Also consider another human being designing a scifi styled font; but instead one that is not allowed to reference the work of anybody else, as some argue machine learning models ought to do. This human being has no references to open, they have not seen any scifi media, be it movies or posters or fonts or anything else. How can they create something like this without any reference at all to it?

If a human being creates a scifi font, and their inspiration is not references to other scifi fonts but instead, I don't know, a general concept of the "vibe" they got from watching Blade Runner, must they credit Blade Runner for the inspiration? Must they pay the owner of the Blade Runner rights for their use of ideas from Blade Runner?


I've long had a project in mind involving the various typefaces of the signage around the city of Vienna, which I find very inspiring in many cases.

The idea is to just take a picture of every different typeface I can find, attached to the local buildings at street level.

There are some truly wonderful typefaces out there, on signage dating back to last century, and I find the aesthetics often quite appealing.

With this tool, could I take a collection of the various typefaces I've captured, and get it to complete the font, such that a sign that only has a few of the required characters could be 'completed' in the same style?

Because if so, I'm going to start taking more pictures of Vienna's wonderful types ..


Even if you never get around to using the photos, I think it would be a wonderful service to take the photos and put them up somewhere for non-Vienna residents to enjoy.


Oh, definitely .. but first I must amass an archive worthy of it ..


With this tool: no.

With a next-gen tool: if you do some pre-processing on the images, quite possibly.


Hmmm. The model is a ckpt instead of a safetensor.

Pondering on whether to keep proceeding trying this out or not...

EDIT: a scan with picklescan[0] found nothing.. exciting.

[0] https://github.com/mmaitre314/picklescan


Haven't seen a single maliscious ckpt file so far. Sure, there is a possibility, but huggingface scans pickled weights automatically so the likelihood of someone using that site to spread malware in this form is super low


“pickled weights”?

serious question, how on Earth should someone like me, who has completely missed the last 12 months of AI development, catch up with the state of the art?


Two separate terms here, pickling is a serialization method for Python objects (unrelated to AI per se).

Read more here: https://docs.python.org/3/library/pickle.html

Then "weights" is just referring to a model's weights, a specific instance of a python object that can be pickled.


I suppose you being here means that you are already fluent in some programming languages. If so, I would start here:

Conway & Miles - Machine Learning for Hackers: Case Studies and Algorithms to Get You Started

Once you read and understood this, I'd do an online course...


thank you


Just know that the .ckpt format has more or less been replaced by .safetensors these days.

tl;dr .ckpt files can contain Python pickles containing runnable Python code, which means a Bad Guy could create a .ckpt model containing malicious python code. Basically.


I've never spotted one in the wild either, but, y'know, I like to not be the one who first finds one out... the bad way. ;)


proxmox/virtualbox/qemu + throwaway vm


Quite, I was thinking about doing so.

I just scanned it with picklescan, which found nothing malicious. I just updated my original reply.


OK, that's cool, but those fonts are all terrible. The serifs are all different sizes and shapes, sometimes on the same letter. The kerning looks like a random walk. The stroke widths are all over the place, and/or the hinting is busted.

Now, that said, it's pretty amazing that this works at all, but it'll take some pretty specific training on a model to get something that can compete with a human made font that's curated for good usability _and_ aesthetics.

Sadly, we'll also probably see adoption of these kinds of fonts (along with graphic design, illustration, songwriting, screenwriting, etc)... because "meh, good enough" combined with some Dunning-Kruger.

TL;DR: Thanks, I hate it.


I don't think any self respecting graphic designer would use these fonts in its current state but it's a cool proof of concept and could be improved upon to a more usable state


> Sadly, we'll also probably see adoption of these kinds of fonts (along with graphic design, illustration, songwriting, screenwriting, etc)... because "meh, good enough" combined with some Dunning-Kruger.

Ironic bringing up Dunning-Kruger as you treat generic RLHF as a "pretty specific training" and make sweeping declarations about how people will use AI as if the current SOTA of several of the tasks you just mentioned didn't come from not settling for "meh, good enough" and instead applying the "pretty specific training" you alluded to (see Midjourney)


I never mentioned reinforcement learning, and my DK statement was completely around using flawed fonts for graphic design, etc.

My partner _is_ a professional graphic designer, and we _have_ seen some pretty terrible client graphics that came out of Midjourney. They're amazing for what they are, but it's very difficult to get something out of it that competes with a professional illustrator, even ignoring the whole copyrighted content in the model issue.


Reinforcement learning from human feeedback is the training you're referring to, you just don't realize it.

RLHF is why 2 years ago "They're amazing for what they are" would have been "They're so hideous no one in their right mind would use them", and why in 2 years that too will be some weaker form of argument.

There's no special knowledge needed to know "I like X over Y": RLHF allows a model to turn that into guidance at a scale that's never been possible before.


I mean the Ford Model T is pretty fucking terrible by today's standards, just saying.


Kinda funny how it works well at this whereas diffusion models go to die when it comes to drawing text but of course it works in a completely different manner.


There's a huge difference between "pictures of letters" and "writing text" though. Ask stable diffusion to write text and it'll generate hilarious weird-looking results. But, ask it to generate individual letters (e.g. "Show me an ornate uppercase letter b") and it'll do that for you with (mostly) no problems.


SDXL can do text kind of. Also isn't DALLE-3 a diffusion model?

But yeah overall diffusion has not generally been able to do it at all before.


> But yeah overall diffusion has not generally been able to do it at all before.

Imagen/Parti were doing text just fine long before DALL-E 3 was announced. GANs were also learning some text in the earlier runup (even ProGAN was doing striking 'moon runes' - amusingly, they were complete gibberish because it did mirroring data augmentation).


Okay I can't try it out anyway. "Blocksparse is not available: the current GPU does not expose Tensor cores"

My "best" GPU is an RTX 2070 Super, Turing architecture.

I've seen similar messages when using stable-diffusion... either with -webui or with automatic, can't exactly remember, but they both run fine on that RTX 2070 Super, so I can only guess that they revert to some other method than Blocksparse on seeing that it doesn't support Turing. Or something. I haven't looked into how they deal with it.

I've submitted an Issue [0] for it. I don't have enough knowledge to know if there's some way of saying "don't use Blocksparse" for fontogen.

[0] https://github.com/SerCeMan/fontogen/issues/2


Although I would be sad to see the handcrafting that goes into designing custom fonts go, some iterations down the line a model like this would greatly aid tedious glyph alignment and consistency tasks when designing CJK, hiragana, katakana and kanji fonts. Inspiring stuff.


I think that would be ideal. The 'killer' feature would be: Handcraft a set of control characters, like the letters in "handglove" and then let AI generate the rest. Designing a typeface is fun, until you need to add support for multiple languages and need to make 800+ characters. Or, maybe there is a nice (open source) font, that is unfortunately missing some characters you really need: let AI generate them.


It's already so that writing on computers is quite US-centric (at least English-centric). While this might help on some of the shortcomings, I'm also a bit afraid it will make it so that even more focus is put only on the US part, and the rest of the world get a "good enough" implementation made by AI that kinda erases some heritage.


Maybe, but the Latin font market is quite saturated, whereas the CJK space has ample opportunity for innovating and is likely even in need of it, cf. [0][1][2]

[0]: https://qz.com/522079/the-long-incredibly-tortuous-and-fasci...

[1]: https://fonts.google.com/knowledge/type_in_china_japan_and_k...

[2]: https://stackoverflow.com/a/14573813


Cool! Now generate 'upper-uppercase' and see what happens :^)


I think this is a reference to "Uppestcase and Lowestcase Letters", a submission a while back about someone training a ML model to generate lowercase/uppercase letters, and used it to uppercase letters already in uppercase. Quite fun https://news.ycombinator.com/item?id=26667852


Neat! Does it have prompt capabilities for things like FVAR, GSUB, and GPOS? E.g. "okay now include a many-to-one ligature that turns the word 'chicken' into an emoji of a chicken in the same style" or "now make a second, sans-serif, robotic style and add an axis called interpol that varies the font from the style we just made to this new style"?


Not OP, but the answer is "no".

What exactly made you suspect such abilities?


Odd phrasing, but: the part where I've worked on OpenType parsing for decades and love seeing people with a passion for digital typefaces make new and creative tools in that space. Typically folks don't stop working on a cool tool after they write a blog post, they're still refining and extending, so you never know how far someone is trying to take a tool without asking them.


So, if I understand you correctly, it was less a question "does it do x?" and more an indirect form of "hey OP, would be cool if it did X" ?


This is not the place for starting a discussion about whether context and subtext should be implicit or explicit in written English. That's what https://philosophy.stackexchange.com is for.


I don't want to start a discussion, I wanted to know if I misread your original comment, and whether you meant something different from what I thought you meant.

From your answer, tho very indirect, I now suspect that I did misunderstand your initial comment, and answering your question was missing the point.

That is all I wanted (after at first wanting to be helpful).


Fair enough, I thought you were trolling, but you've made it clear you weren't. I wrote my comment as a question that would hopefully engage the author on the capabilities (both concrete, as well as hypothetical) of this approach to font generation.


This is interesting but i think generating the next letter from the letters before may not be the best way to do it. As you mentioned they degrade with each letter.

Maybe creating one long image of a whole font would work better.

edit: in the above am misunderstanding what is happening here.

But i still think there must be another way to structure this so the attention mechanism doesn't have to work so hard.


Since the first three letters are good, and generated only with the context of the preceding letters, shouldn’t just using the first three (instead of the preceding three) as context for every other one be good enough?


Poof! You're a font.


Designing fonts for languages that use Chinese characters is often challenging due to the sheer number of glyphs.

This approach to generating fonts is very interesting… feels like it could unlock the creation of heavily stylized fonts that just wouldn’t be feasible otherwise.


"Computer, computer, make me a font | find me a glyph | catch me an X-height"

(I'll show myself out.)


Inevitable in a good way. Keep going! There's gold here.



Has anyone tried using an LLM to make a font based on their handwriting ?

EDIT: There's a couple (IIRC) of online services that offer this.


If it was my handwriting, it wouldn’t be popular.

Perhaps a cursive font might be good though I’m pretty sure one exists.

An expert system might be able to join up the letters in cursive and make intentional mistakes to give it the character of natural handwriting?


I used to make some fonts for rare, non Latin alphabets like the Orkhon script by hand using a Paint-like freewar, it was fun


> THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG

It's "a" quick brown fox, otherwise the sentence has no "a".


There is an "a" in "lazy"?


Well, I'm stupid.


The usual mistake people make in reciting this is to say the fox jumped over the lazy dog, causing them to omit an ‘s’ from the sentence.

Making sure it’s ‘the’ lazy dog rather than ‘a’ lazy dog is actually important if you care about completing the lowercase alphabet, as without it there’s only an uppercase ‘T’.


Indeed.

But it's worth having "a lazy" anyway to avoid the repetetive "the."


Huh, I never gave that sentence much thought, and I guess I never realized it conveniently covered the whole alphabet. It makes so much more sense now!


I liked the one that Apple used to use in the MacOS 7/8 era: “How razorback-jumping frogs level six piqued gymnasts!” (Might be slightly off but its unique enough that I remembered it at least 20 years after it disappeared from MacOS).


Those gymnasts remind me of some of my favourite pangrams…

Bouncy squad eking prize with pelvic flexion jump.

My Fox News TV quip? Big jerks add zilch!

View half a dozen squid & peg just six inky crumbs.


Lazy?


In honor of all the times he pressed his hands into his eyes (and myself doing the same thing):

I present: “Perplexed” by Nilsa. [0]

I have a print in my office, in lieu of a mirror.

[0] https://www.sargentsfineart.com/img/nisla/all/nisla-perplexe...


Ooh I have to try this out when I get home, looks like the weights are under 1GB too


lots of kernings misfits ftw


Stop your bigotry of Kernians.


"Fucking Hell" - first thing I yelled to myself when I saw that headline

Kudos for the project, of course, but it just saddens me a bit more. Nothing is sacred anymore.


I mean, looking at the kerning of the second example in particular, there’s still a lot to be done. And something like “extend this latin-1 font to all scripts of the BMP so that is looks stylistically consistent and, within that constraint, the glyphs and their combinations look natural and readable for native readers of each script, assuming Japanese readers for the Han characters” is probably still way off.


Just like all visual generative AI, it gets the first 95% but doesn't get the last 5% that takes 95% of the time. Kerning pairs on typefaces take an incredible amount of human time. Years of full-time work for a large type family. After all these years, even Adobe can't perfectly automate kerning because making letters look right next to each other isn't (obviously) formulaic. Maybe generative AI will nip it in the bud? Certainly hasn't so far, but maybe it will. Obviously in monospaced fonts, like that last one, kerning isn't an issue.

More correctable in these models would be the balance between the letterforms. Surely if there was some kind of prompt you could tell it to not make those Ms in that bold serif font to be obnoxiously wide?

Either way, as of now, what this gets us is exactly 90% less useful and probably of lower quality than the stuff you can get for free on dafont.com. I know it will progress, but I imagine the best use case for generative AI and font creation for commercially viable fonts would be to give roughs glyphs to fill out a large character set as an aid for a professional type designer.

And surely there will be a chorus of people insisting that it doesn't matter. Well, you're wrong. If you blindly showed people a headline, book, poster or whatever with properly kerned type and then one without, they will see how much more polished the properly kerned page is, even if they couldn't tell you specifically why. In a lot of situations, that really, really matters, even to people who haven't developed the ability to point out the differences.


People are always resistant to stuff like this, it's people's jobs and livelihoods after all.

However, none of the people in the creative space I see complaining about these models ever cried for: all of the jobs surrounding the horse industry (stables, horseback couriers, etc); thatches roof weavers; textiles weavers; knockeruppers; the people that manually lit street lamps; etc.

Of course many argue that industrialisation also created many more jobs, but I certainly suspect there were fewer than it consumed.

However the end result is that we are all generally better off for it. I think the reason the argument against machine learning models is flawed is that it is just neo-Luddism; it's hypocritical to complain about the loss of an industry or a specific job, or specific task of a role to technology whilst reaping the benefits of previously lost jobs - artists wear machine woven fabrics, after all; They use technology assembled by pick and place machine - we all do.

The big _however_, however, is that we can do better as a society to support transitions like this. We shouldn't stop technology if it has clear benefits for humanity, but unlike previous eras where those who were "replaced by machines" were forgotten, we need to assist and help anybody who is affected by this transition. This is how we should be doing things these days.

I find the same argument for train drivers, with the fear of being (rightfully) replaced by self driving trains, a technology that has been available for a long time now. Yes, of course they'd prefer to continue driving trains. But the ability to dramatically lower costs to riders of public transport outweighs this, and companies involved in such transitions (and wider society) need to take care of people involved in such transitions; free cross-training within the same industry, or free training within whatever industry they choose. Financial support through the transition period that their studies take. We can have and eat our silicon cake, we just have to be kind about it.


were fonts ever sacred? monotype has made a whole business off of making helvetica and times new roman alternatives that are basically indistinguishable to the originals but don't require licencing fees.

that seems like exactly the sort of business that deserves to be taken over by AI.


> Kudos for the project, of course, but it just saddens me a bit more. Nothing is sacred anymore.

Why does this sadden you?

I'm quite happy everything is being done by AI, time will be freed for other things that are more important.

Manual font making will not go away though and now anyone can make their own fonts for free.


You don't know that the time will be freed for other things that are more important. We don't know for sure how this is all going to work out at all.

And people who make fonts, create art, and write prose generally do these things because they like doing them, not because they're forced to. These technologies aren't automating drudgery, they're automating things that give people's lives meaning. What's the endgame here exactly?


Time will be freed, ChatGPT, DALL-E, Midjourney and Stable Diffusion has collectively saved countless people billions of hours of time and this will do the same.

The big font makers no longer have a hold on extremely pricey fonts that are inaccessible, the general endgame is most software is going free and open source thanks to AI, and that is a good thing.


Creatives will have their hopes and dreams stripped away so that artless and tasteless software engineers can type words into a box and instantly get exactly what they want, with no surprises, no feelings, and no economic upsides for anyone else. A beautiful future indeed.


Won't the creatives be able to type the software specification into a box, and add functionality to their endeavor without needing programmers? I'm not sure that the process and the paycheck are more important than the final artifact.


Why would we need people to have endeavors? That sounds like automatable drudgery.

Won't the AIs be able to infer what would best maximize engagement for their owners and type the specifications necessary to create whatever the entity running them would want their users to consume?


Why would we need people? Seems like a pointless bottleneck in the pursuit of efficiency.


Designing cereal boxes is not authentic human experssion.


You must not know any designers. Pretty much everyone I know would consider that to be pretty fun – this is exactly the kind of thing artistic kids say they want to do when they grow up. And most of us would rather get paid to design cereal boxes than to do many other things, and almost everyone would rather do it than to not get paid at all.


Time for what?


>What's the endgame here exactly?

All of us paying a set of subscriptions to the FAANGs, for literally every aspect of our lives.


With what money once everyone is out of a job?


Genuine question: What do you think is more important that won't eventually be done with AI?


Let's assume the technology will eventually work.

What if you had a "personal font"? Sure, you have a user name, but what if you had a custom-generated font which communicates your personality to other people on the Internet? The font could be on a spectrum between static (generated once and reused indefinitely) and dynamic (continuous online learning of personal information causes an adjustment of the font).

I'm just making up an example here, but say you're feeling sad, and your smart technology figures out you're feeling sad. When you send a text message to family, then your personal font takes on "sad" characteristics.


That would certainly be cool. I also hope these models, once they're more advanced, help play a role in improving the UX of our software.

While a human being decides that on the Youtube app, you can add a video to "watch later", but when it pops up in the feed, there's no option to remove it from "watch later" (except to go to that specific section), I'll be unhappy with human UX. We can only fit so many scenarios on our head I guess.

Same thing as a software developer; I do my best, but I'm sure there are bugs that I write in, certain states that I don't end up writing test cases for because I never quite thought of them. If it takes an AI/ML tool to help me be better at this, or to do it for me then I am so keen.


>Nothing is sacred anymore.

If it can be specified, it can be automated.


I know, I know. I am not disputing the technicalities.


‘Anymore’ ha

You don’t actually believe anything was ever sacred to being with do you?


It's so depressing to think that this is what people want.


What is that? The ability to quickly and easily generate creative or expressive pieces of computer wizardry without first having to delve into the depths of esoteric knowledge? Of course people want that. It turns out you can’t specialize in everything, but sometimes you just want to be able to make something good enough without having to engage the services of an expert in the field.

No this might not be the most beautiful font with the most perfect kerning or optimized code. But if it’s functional enough for the person who requests it, that should be good enough shouldn’t it? Most things people are printing on their 3d printers aren’t high quality designed parts either. Plenty of scientists and accountants have scripts and code that would make most developers cringe, but if it’s good enough then why be bothered?

The ability of people to make things with tools that they otherwise never would have been able to make before without dedicating months or years of time they may not have is awesome and we should be excited for it. I’ve watched 70 year old grandmothers learn to make little home movies of their grandkids in iMovie. No they weren’t doing “real” film editing and certainly weren’t learning any skills that would transfer to avid or Final Cut. And so what? That home movie cut together with a minimum of skill and a whole lot of technology hiding the esoterica was probably more meaningful and joy inducing for that woman than most blockbuster cinematics produced by the best minds.


Do you wholeheartedly believe this and can you truly not understand why others might be uncomfortable with the direction of generative AI? Defending your viewpoint makes sense, but I don't buy the idea that the other stance is completely incomprehensible.

The grandma doesn't need generative fonts. She already has fonts that ship with her phone and computer. She already has iMovie for her grandkid videos. Does she need the ability to generate videos of political figures saying things they didn't say?

Obviously, what concerns us is a race to the bottom for the value of all human output.


Of course I believe this. I understand why people might be "uncomfortable" with this. I also understand why people might be "uncomfortable" with all sorts of cases where their hard earned experience and knowledge becomes a commodity for the masses. And yet, it happens all the time, and as a whole we're better for it. Modern internet made early internet explorers "uncomfortable" with how many people could make a web page or publish information without needing to know how to code and how to do it properly. And modern computer languages made early assembly and C developers "uncomfortable" with how it allowed people without the hard earned knowledge and skills to write "programs". And calculators made mathematicians and teachers "uncomfortable" with how it allowed anyone to get the right answer to all sorts of math without knowing how or why it was the right answer. And on and on we can go. The clergy are always uncomfortable when the masses are given the holy books in their language. That might be a bit uncharitable but at the end that's what we're talking about. The uncomfortableness that comes from allowing free common people access to skills, material and capabilities that were previously the domain of the small, elite, learned class, and the realization that it means those same learned class can no longer control the output of those skills and capabilities.

>The grandma doesn't need generative fonts. She already has fonts that ship with her phone and computer. She already has iMovie for her grandkid videos. Does she need the ability to generate videos of political figures saying things they didn't say?

Would you say the same about people, their computers and programing languages? You already have the programs that ship with your computer. You already have Internet Explorer and Solitaire. Do you really need the ability make viruses and worms? Do you really need the power to create ransomware or crash the network?

>Obviously, what concerns us is a race to the bottom for the value of all human output.

The value of human output will always be there. Go get a quote for a hand carved and built table. Go get a quote for a bespoke set of clothes. See how much a custom song will cost you. Commission a book and get back to me on the price. Heck just go get a quote for some contractor work. Human output is in no danger of losing value.

What people are really concerned about is the reduction in the number of people that can capitalize on that output at any given time. When only 100 people can craft a custom font, 100 people have lots of work for lots of money as long as there is demand for custom fonts of any type. When a machine can generate a custom font for anyone, the only people that can make money from crafting a font are the number that can fulfill the demand for HUMAN created custom fonts. That's what people are worried about. Automation comes for us all, and we'd be better off spending time figuring out how to ensure people can still live a healthy, productive and free life in that new world than wringing our hands about how "depressing" it all is and fighting against the inevitable march of technology.


Great reply. Definitely a less depressing perspective.


Obligatory xkcd reference: https://xkcd.com/1015/


Granted... You are now a font!


Everyone knows that AIs can't draw sans...




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: