Hacker News new | past | comments | ask | show | jobs | submit | simonw's comments login

Yes! Absolutely this.

I have a personal rule that the cost of doing a side project is I have to blog about it. No regrets on that at all, it's a small thing that can greatly increase the value you derive from the project.


Writing on a blog is a very inexpensive way to establish your credibility about different subjects. This pays off later down the line when you can link people to things you've written in the past.

Credibility is a very valuable commodity. It's worth investing in ways to build more of it.

Don't assume people will stumble across your content (though they will eventually via Google). Actively send links to people who you are already engaged in conversation with.

It's not the number of readers you have that matters: it's their quality. I'll take a dozen people reading my stuff who might engage with me usefully or lead to future opportunities over a thousand readers who don't match that criteria.


I regularly have conversations with people that end up with some form of "Let's not belabor this over beers but I'll send you a link to something I wrote. It may be a little bit stale but let me know what you think and we can follow up."

Blogging has internal benefits of organizing thoughts or even just being a fun hobby. But it's external validation as well. Sure, writing a book is even more but that's probably 100s of times more work.


I have two book and tons of blog posts only a few people have read.. Still feels good. I even wrote a blog post on why.

Link? There’s nothing in your bio


Writing to fix all the slightly imperfect books that intrigued but disappointed you, you're onto something there.

This is exactly why I started my blog about golf.

I was starting https://golfcourse.wiki and I read a bunch by Jimmy Wales, and he focused on his own credibility and openness being a linchpin to wikipedia.

I thought I might as well start a blog then, because in a world of golf media, with few exceptions is mostly a just corporate funded advertisement-as-entertainment slop. I figured I could at least stand out by openly talking about golf in a very different way, by just writing for me, with my target demo being my best golf friend, who sadly passed about the same time I started the blog.

I've got myself a small audience (a few hundred subscribers, and maybe a fourth of that read regularly. I'm more than happy with that. My ideas are almost diametrically opposed to much of the golf world, but I spend a ton of time on almost all of the articles, and I'm very proud of them. I think it lets people who might use the wiki know how serious I am about the wiki as something good for golf not as some way to get rich. I think it does a good job, and it's a good way to waste a few hours/days/weeks.


I ran a personal blog ~2007-2013, retiring it after one-too-many'a THC-infused evening of personal expression.

Several of my posts received 100k+ views, one with 1M+, typically exploring minor technical hacks. A couple posts resulted in minor sales of bespoke hardware adapters, which was a nice "side hustle" for a few years. None of this would ever had made me rich, but it was a neat introduction to information sharing.

My resumé still lists several articles written about my blogposts, in publications including Wired, Hack-A-Day, &c... although the links obviously haven't worked for over a decade.

I've recently registered a new domain for my next blog attempt, which will mostly just be a record of things I find interesting on that particular day. If you ever read Whole Earth Catalogue, my hope is for my own modern-day version of the excellent WEC-inspired https://kk.org/cooltools/ [not me/mine].


Whether or not someone has written up or is willing to write up their opinion is a good way to determine how seriously to take that opinion.

Using that as a first pass has led to more time engaging with thoughtful people about well considered ideas, and less time listening to the noise that shows up when you solicit opinions.


Agreed, after a year or two, blogs become your experience logs to prove experience and credibility once the landscape is killed by GenAI slops and SEO scams.

Anyone can generate a big portfolio of projects these days(be it graphics, video, software, writing etc) but blog posts from 2023 and before are proof and undeniable.


> Anyone can generate a big portfolio of projects these days(be it graphics, video, software, writing etc) but blog posts from 2023 and before are proof and undeniable.

I always read blogs if people include them in resumes.

It’s really cool when an applicant has a blog with unique and interesting content, but I can’t remember this happening without us already having been very impressed by the candidate’s resume.

More commonly, blog content was ambiguous about the applicant’s skills. For example, when someone applies to an embedded job but has a blog of beginner level Arduino projects, is that because they’re an expert creating tutorials for beginners, or because they are a beginner and these entry-level projects represent their skill level?

I also think people greatly overestimate the idea that someone will LLM their way into a great blog, and they greatly underestimate the difficulty of forging timestamps. Even git timestamps are easy to fake. Your interviewers aren’t going to scrutinize the Wayback machine for evidence, but not being indexed isn’t proof that it wasn’t there anyway.


> is that because they’re an expert creating tutorials for beginners, or because they are a beginner and these entry-level projects represent their skill level?

You can tell by reading one of them though, right? For a subject I'm an expert at, I can tell the difference between an expert talking about the basics, and a beginner doing the same.


If they write a lot, then you can.

But in my experience reading a lot of applicants’ blogs, it’s rare to even find a recent post. The most common scenario I see is that the most recent posts are 3-10 years old. Even if you can get enough information out of their blog, you’re getting at best a snapshot of where they were a long time ago. The truth is that often people blog the most when they are beginners in a subject.

I guess what I’m trying to say is that the ideal optimal blog that people imagine writing is an extreme rarity. I’ve seen so many people start blogs with high ambitions but then the farthest they get is a couple posts that are now so old that it barely corresponds to their current resume level of skills.

So you’re left doing a lot of guessing and extrapolating.


It's probably not the only input you want but, generally, I agree with your comment especially across a number of posts.

Maybe it’s different in embedded, but as a mech e: any moderately complex hardware project will likely cost orders of magnitude more than a software project to prototype and manufacture. Off the self electronic parts have become much cheaper, but if you need more than some plastic, 3D parts it’s still expensive.

This isn’t true at all. Even custom PCBs are so trivially cheap that you can get 4-layer boards from China shipped for under $10. An ESP32 module is a couple bucks.

Electronic projects are extraordinarily cheap right now. I’ve built moderately complex PCBs for less than the price of a nice dinner.


A four layer board is not “moderately complex”, it’s child’s play. IMO moderately complex is in the 10-20 layer range with controlled impedance, buried vias, etc. or a mechanical assembly with hundreds of parts.

Everything definitely has gotten significantly cheaper than when I started working in mechanical/electronic engineering 15 years ago but a moderately complex board with assembly is still hundreds or thousands per board on short (1-2 week) notice. That’s what the GP means when they say that hardware prototypes are orders of magnitude more expensive (sans NRE) and I’m pretty sure they’re talking about the much more expensive mechanical side too, which also had gotten cheaper but not overwhelmingly so.


I've been treating public Git repository commits in the same vein - a receipt of incremental changes that show that an individual can do some programming. Granted this is not fool-proof - like all things that are complex, it needs to be evaluated with a suite of other factors and conditions to be determined valid. A website written in a personal voice is one of these factors.

A dark pattern on your self hosted blogging website is to backdate blog posts and make yourself seem very good at predicting future trends.

There is no reason why you have to write a blog over a long period of time, you could quickly pump out several blog posts and link to them to establish credibility quickly.


We need better timestamping services for this.

I tried to write out an initial spec here, but I haven’t been able to write any implementations yet: https://github.com/sebmellen/proof-of-origination.


I guess you can use a third party like archive.org, if the blog is crawled by it, the owner doesn't get them to delete it, and there is no politically motivated revisionism going on based on the archive.org maintainers.

I would be suspicious of a reasonably popular blog claiming to have predicted stuff and not being able to back that up with an archive.org capture (or raw scraper data), though I guess it is somewhere where storing a hash on a blockchain may offer some benefits for edge cases.


There's this on the Bitcoin blockchain: https://opentimestamps.org

Links submitted to Reddit or HN then a bit work as proofs of the publication date

since one cannot backdate timestamps on Reddit and HN. Hmm but you can still rewrite the content a bit


Yes this only proves you had something on a certain date, but not that the content has been unchanged.

if you're that popular, there should be some history in archive.org or archive.is

>"Anyone can generate a big portfolio of projects these days"

LOL. I could easily do it "back in those days". The difference is that in my particular case (I specialize in developing new software products for clients) I also have long list of names with actual phone numbers, emails, addresses etc. So anyone can call and verify.

Never blogged. Have no time / desire


Many people wrote about why it is important to blog, but I never heard of what you just stated about credibility. That's the best and most convincing thing I ever heard. Thank you very much for having shared it here.

It's just self-marketing. Nothing new.

That trivializes it, I think. A ton of what people do related to their jobs is marketing at some level. A lot of people here probably resent marketing and self-promotion (to a certain level) but that's the way the world works. If no one has any idea what you do, either directly or through your manager, guess who is getting the chop if the company cuts back even a little.

That's fine.

I would also argue self-marketing is important when you have lazy or bad managers.


Good managers help but you’re ultimately responsible for your own career/reputation.

If I smell self-marketing, it reduces your credibility to me.

So okay for companies to market their products or services, but not okay for a person to market themselves?

If it’s good, there’s nothing to smell.

I think the key is that its high effort self marketing. Its proving you are willing to put in work which is hard to fake.

There is nothing "just" about marketing. Whether for you or a company.

>"...way to establish your credibility about different subjects..."

Or the other way around


This is a better link - it makes it clear that the full text shown in the ad came from existing web page, not just that incorrect statistic: https://www.theverge.com/news/608188/google-fake-gemini-ai-o...

Top relevant search result is a Reddit thread from 11 years ago quoting a cheese.com article: https://www.reddit.com/r/todayilearned/comments/2h2euc/til_g...

Internet Archive confirms that page has had the blatantly wrong stat up since at least April 2013: https://web.archive.org/web/20130423054113/https://www.chees...

> If truth be told, it is one of the most popular cheeses in the world, accounting for 50 to 60 percent of the world's cheese consumption.

Clearly predates generative AI, so I think this is junk human-written SEO misinformation instead.

Here's that page today: https://www.cheese.com/smoked-gouda/ - still has that junk number but is now a whole lot longer and smells a bit generative-AI to me in the rest of the content.


That the greatest invention since controlled fire (if we are to believe the hype) was unable to discern as SEO misinformation

Right, especially bad since this is Google. Their brand should mean more than this.

I've been writing about how the greatest weakness of LLMs is their gullibility for ages. This right here is a great example - see also the Encanto 2 thing from a few weeks ago: https://simonwillison.net/2024/Dec/29/


AI results should be better than this, but there will be hallucinations from time to time. That is entirely foreseeable.

The true failure here is by the humans who couldn't be bothered to do the bare minimum to protect the brand. This would be a major failure if it where any an sort of ad, but it's utterly unfathomable for a highly expensive and absurdly visible Superbowl ad. Why does anyone involved, from intern to CMO, at either the agency or Alphabet, have jobs that they seem ruthlessly indifferent to performing with any sort of attention or care?


I had the same thing happen to me, except for Cars 4! Google AI picked up a fan fiction trailer summary for Cars 4 from somewhere akin to Fandom and reported it as true.

Ka-chow, Google!


What does it tell you?

Oh yeah, it's complete rubbish - the challenge now is to figure out if there's anything useful a <100MB LLM can do.

Someone suggested using it to build the world's worst calculator.


Maybe I can do one better...

For the 'create_chat_completion' on this line: https://github.com/simonw/llm-smollm2/blob/afed0963b3a8bcca1...

I changed it to:

    completion = model.create_chat_completion(
      messages=messages,
      temperature=0.2,
      top_p=0.9,
      presence_penalty=0.06,
      frequency_penalty=0.5,
      repeat_penalty=1.0,
    )
I took as many parameters as I could from their documentation: https://github.com/huggingface/smollm/tree/main/text#transfo...

And then added some to reduce repeating. I then used the prompt "Write a 10 line poem on computers." (note the fullstop), to get:

    In the realm of computation, where magic and science intertwine,
    The art of programming and the art of computing.
    Computers are machines that can think, they are computers.
    They can do what humans cannot – to create.
    Computers are machines that can think, they are computers.
    Computers are machines that think, they are computers.
    Computers are machines that think, they are computers.
    Computers are machines that think, they are computers.
    Computers are machines that think, they are computers.
    Computers are machines that think, they are computers.
    Computers are machines that think, they are computers.
    Computers can think, they can think.
    Computers can be thought of as a machine that thinks.
    Computers can be thought of as a machine that thinks.
Still not perfect, but the parameters seem to make a massive difference. I think it's entirely possible to get something reasonable out of this model.

If it can call tools, then it could be used to download a larger, more useful model. Or send API requests to another model.

Wow, that thing is really clever.

Suggestion: since having multiple writers will corrupt the database, it's worth investigating if the recent (November 2024) S3 conditional writes features might allow you to prevent that from accidentally happening: https://simonwillison.net/2024/Nov/26/s3-conditional-writes/


I think that works, but I think they'll need to use it to implement an external lock rather than coordinating writes to a single object as those comments suggest. They use an object per page, not an object per database, so they have to synchronize updates to multiple objects instead of just one. But a global writer lock is what SQLite does locally, anyway. It's a good idea.

I see some challenges, though. Can we implement a reader-writer lock using conditional writes? I think we need one so that readers don't have to take the writer lock to ensure they don't read some pages from before an update happening concurrently and some pages from after. If it's just a regular lock, oops--now we only allow a single reader at a time. I wonder if SQLite's file format makes it safe to swap the pages out like this and let the readers see a skewed list of pages and I'm just worrying too much.

A different superpower we could exploit is S3 object versions. If you require a versioned bucket, then readers can keep accessing the old version even in the face of a concurrent update. You just need a mechanism for readers to know what version to use.


The problem with a block/page based solution (block per page is a bad idea, IMO, because S3/GCS are cost/performance optimized for around megabyte objects) is consistency: ensuring a reader sees a consistent set of blocks, i.e. all of the same database version.

Under concurrency, but also under a crash.

Similarly, the big issue with locks (exclusive or reader/writer) is not so much safety, but liveness.

It seems a bit silly to need to have (e.g.) rollback journals on top of object storage like S3.

Similarly, the problem with versioning is that for both S3/GCS you don't get to ask for the same version of different objects: there's no such thing.

So you'd need some kind of manifest, but then you don't need versioning, you can use object per version.

Ideally, I'd rather use WAL mode with some kind of (1) optimized append for the log, and (2) check pointing that doesn't need to read back the entire database. I can do (1) easily on GCS, but not (2); (2) is probably easier in S3, not sure about (1).


This is really neat. I posted an animated GIF screenshot here https://simonwillison.net/2025/Feb/6/sqlite-page-explorer/

Ha, honored :) When I was hacking this together the thought did occur, "I bet this is up SimonW's alley." Thanks for all the great LLM experiments & writeups.

Made sense to me.

The LLM engineering productivity boost is real. Companies need to make room for their engineering staff to figure out how to benefit from it. Engineers need to understand that claiming it's useless / assuming it's going to go away isn't a good strategy.

People who are "high agency" - aka curious, ambitious and self-directed - are very well positioned to take advantage of what these new tools can do.


But not everyone is "agentic". I am maybe ambitious and self-directed, but not really curious. I can work on other's ideas, perhaps be curious within the bounds of that space, but I myself struggle to get any meaningful interesting practical ideas. Honestly a lot of times I think I'm just going to be ran over since I don't know how and frankly doubt curiosity can be grown in any capacity. I want to be wrong, would like your feedback.

Honestly, the fact that you're commenting on Hacker News already marks you as somebody who is curious and engaged with your field.

Personally I don't think having unique ideas is as important as being able to execute on projects in a self-directed way.


I guess... I maintain a big open-source project by myself, I assume that kind of points to self-directedness. I guess on a career level it's discouraging since a student isn't seen as self-directed, and to go the startup route requires those unique ideas.

I answer those kinds of questions by piping my entire codebase into a large context model (like Claude or o3-mini or Gemini) and prompting it directly.

Here's a recent example:

    files-to-prompt datasette tests -e py -c | \
      llm -m gemini-2.0-flash-exp -u \
      'which of these files contain tests that indirectly exercise the label_column_for_table() function'
https://gist.github.com/simonw/bee455c41d463abc6282a5c9c132c...

Your code is pretty small if it fits within the context if any major LLM. But very nice if it does!

https://github.com/bodo-run/yek

This is more sophisticated for serializing your repo. Please check it out and let me know what do you think?


Yek is fantastic -- I've converted my whole team to using it for prompting. As input context windows keep increasing, I think it'll just keep becoming more and more valuable -- I can put most of my team's code in Gemini 2.0 now.

doesn't it get very expensive quickly if you don't use prompt cashing

I've had the occasional large prompt that cost ~30c - I often use GPT-4o mini if I'm going to ask follow-up questions because then I get prompt caching without having to configure it.

Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: