Hacker News new | past | comments | ask | show | jobs | submit | mcqueenjordan's comments login

I think it's a mix of:

1. Queues are actually used a lot, esp. at high scale, and you just don't hear about it.

2. Hardware/compute advances are outpacing user growth (e.g. 1 billion users 10 years ago was a unicorn; 1 billion users today is still a unicorn), but serving (for the sake of argument) 100 million users on a single large box is much more plausible today than 10 years ago. (These numbers are made up; keep the proportions and adjust as you see fit.)

3. Given (2), if you can get away with stuffing your queue into e.g. Redis or a RDBMS, you probably should. It simplifies deployment, architecture, centralizes queries across systems, etc. However, depending on your requirements for scale, reliability, failure (in)dependence, it may not be advisable. I think this is also correlated with a broader understanding that (1) if you can get away with out-of-order task processing, you should, (2) architectural simplicity was underrated in the 2010s industry-wide, (3) YAGNI.


s3 is not optimized to directly serve websites, but to durably store and retrieve ~unlimited data.


I prefer AuthN and AuthZ.

I don't think sharing a prefix/root implies that they're the same thing.

Also, I don't think the suggested "permissions" and "login" terminology would work for all AuthN/Z schemes. For example, when exactly do you "login" when calling an API with a bearer token? Doesn't work for me.


>> I don't think sharing a prefix/root implies that they're the same thing.

I think the complaint is that the the shared prefix/root causes the two words to be less distinct from each other

>> For example, when exactly do you "login" when calling an API with a bearer token? Doesn't work for me.

In my mental model, you "login" to the API when you provide the bearer token.

While I would agree that this is "stretching" the meaning of the word login quite a bit, passing the bearer token serves the same functional purpose as a human keying a UID / PW combo.


In an activity where words have specific meanings and should be used in their correct place in order to prevent miscommunication of intent or purpose, "stretching the meaning" of a particular technical term can only bring confusion (and bugs).

Authentication and Authorization are correct and complete terms that have separate but related meanings, personally I don't feel them to be confusing at all.

The entire article feels like whining because the author stubbed his toe against a corner.

Lay people need explaining these concepts using non technical words? Of course, that's what documentation and manuals are for. "WE" are not lay people, and we should understand what their meanings are.


Other than the space for past notifications and/or having to issue a DELETE, are there significant reasons to prefer this over the typical table-based approach with SKIP LOCKED queries to poll the queue?

It seems to me that if the listener dies, notifications in the meantime will be dropped until a listener resubscribes, right? That seems prone to data loss.

In the SKIP LOCKED topic-poller style pattern (for example, query a table for rows with state = 'ready' on some interval and use SKIP LOCKED), you can have arbitrary readers and if they all die, inserts into the table still go through and the backlog can later be processed.


In Postgres listen/notify are inherently lossy channels — if a notification goes out while a listener wasn't around to receive it, it's gone, so they should never be relied upon in cases where data consistency is at stake.

I find that the main thing they're useful for is notifying on particular changes so that components that care about them can decrease the time until they process those changes, and without sitting in a hot loop constantly polling tables.

For example, I wrote a piece here [1] describe how we use the notifier to listen for feature flag changes so that each running program can update its flag cache. Those programs could be sitting in loops reloading flags once a second looking for changes, but it's wasteful and puts unnecessary load on the database. Instead, each listens for notifications indicating that some flag state changed, then reloads its flag cache. They also reload every X seconds so that some periodic synchronization happens in case an update notification was missed (e.g. a notifier temporarily dropped offline).

Job queues are another example. You'll still be using `SKIP LOCKED` to select jobs to work, but listen/notify makes it faster to find out that a new job became available.

[1] https://brandur.org/fragments/instant-feature-flags


Got it, thanks for the reply. The feature flag cache reload use case seems like reasonable one to me.


I implemented a queue using both LISTEN/NOTIFY for notifications to the task processor and SKIP LOCKED to sift through the pending tasks in the tasks table.

I think you can eliminate polling if you don't need to retry tasks, by simply processing pending tasks at startup and then just responding to LISTEN events. However, I'm curious if there are any alternatives to polling the queue in cases where you need to support retrying tasks at a given timestamp.


I personally think polling the queue/table via queries is a very sensible pattern and not something I have a desire to remove. In theory, you could go at it via a push approach by wiring into the WAL or something but that comes with its own rats nest of issues.


One nice thing about `NOTIFY` is that the system is very fast and scales to many `LISTEN`ers that can all get notifications with very little latency. I.e., it's a C10K system.

Because there are no access controls on who can NOTIFY to what channel, you can't rely on the payload, so you really do have to look at a work queue. But if it's just one user, and all you're trying to do is broadcast date fast, then NOTIFY works great.


The only tradeoff here is the pure NOTIFY approach (if you don't care about losing notifications) can sit there on a single connection, and probably performs a bit better than having a bunch of workers in contention for that connection (at which point you don't really need SKIP LOCKED anyway). But ultimately tuning the level of parallelism of your worker pool and how many connections to dedicate to it doesn't seem a huge hardship.


Tbh I didn't know about SKIP LOCKED until now, but it looks like you have to hold a xact open the entire time the worker runs, which can be a problem. What I've done before is timestamp cols for start/end. A worker takes any job whose end time is null and start time is not too recent, which makes retries natural and flexible.

A pubsub pattern like pg_notify can definitely make sense depending on the requirements, but I wouldn't jump to it first. The few times I've used pubsub elsewhere, it was when subscribing to some other team's service, not via a shared DB.


Yeah, you can avoid holding the xact with the means that you mentioned, e.g. SKIP LOCKED and set some value to PROCESSING, then do your processing, then update to DONE at the end. Or as you mentioned, timestamps.

I think the SKIP LOCKED part is really only useful to avoid contention between two workers querying for new work simultaneously.


Yeah, I can imagine SKIP LOCKED being faster if used that way. Just hasn't been an issue for me yet, so I haven't tested.


They can be used in conjunction.


And just to add a small clarification since I had to double take: this isn't exactly-once delivery (which isn't possible), this is exactly-once processing. But even exactly-once processing generally has issues, so it's better to assume at least once processing as the thing to design for and try to make everything within your processing ~idempotent.


I find the posturing as a thought leader and industry leader (on this topic especially) a bit ironic. A cloud provider licensing ARM Neoverse and throwing an ARM chip into their cloud compute boxes is not exactly a novel business practice.

I'm happy to see this, and it should be all goodness, but... the posturing... I don't want to be negative for the sake of being negative, but I don't understand how anyone can write that first paragraph with a straight face and publish it when you're announcing ARM chips for cloud in 2024(?, maybe 2025?).


Exploring the concept of files > app deeper, it would be interesting if we were able to foster a culture of website apps writing to local storage with files (in a similar manner to Obsidian), and if we had a common format for doing so, with an open-source daemon that sync'd writes to and from that directory to e.g. some other folder. That would unlock ownership of data even in web apps. The daemon could be app-agnostic and just dutifully sync all the things.


We can’t have LLMs giving footguns to our children. ;)


KYC before you’re allowed to use c++. This will facilitate making a list of all c++ users >:)


Another thing just to mention here is `strip`, which IIRC `cargo build --release` doesn't do by default. I think `stripping` binaries can reduce binary size by up to 80-85% in some cases (but certainly not all; just tried it locally on a 1M rust binary and got 40% reduction).

FWIW, you can configure this in Cargo.toml:

[profile.release] strip = true


Check out Making Rust binaries smaller by default (https://kobzol.github.io/rust/cargo/2024/01/23/making-rust-b...). Previously discussed a few weeks ago at https://news.ycombinator.com/item?id=39112486.

That change will be live on 21st March, so manual strips won't be required after that.


You can strip C compiled binaries too. And that halves the binary size. The point is for example a hello world Rust binaries is 300kb after striping while C compiled one is 15kb. A difference of 20 times.


Such comparison exaggerates the difference, because it’s a one-time constant overhead, not a multiplicative overhead.

i.e. all programs are larger by 275KB, not larger by 20x.

Rust doesn’t have the privilege of having a system-wide shared stdlib to make hello world executables equally small.

The overhead comes from Rust having more complex type-safe printf, and error handling code for when the print fails. C doesn’t handle the print error, and C doesn’t print stack traces on error. Most of that 200KB Rust overhead is a parser for dwarf debug info to print the stack trace.



But C doesn't statically link the standard library by default like Rust does.


Hello world deps for C :

    linux-vdso.so.1 (0x00007fff25cb8000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fe5f08d9000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fe5f0ae2000)
And for Rust

    linux-vdso.so.1 (0x00007ffc109f9000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8eda404000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f8eda222000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8eda4a8000)
Rather Rust has 1 more dynamically linked library than C.


That might be true for Hello World, but libgcc_s is where a lot of builtins for C itself go, so you'll find it ends up linked into a lot of non-trivial C programs as well. See https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html


You missed the word "statically" in the post you commented on.

Dynamically linked libs rarely contribute heavily to binary bloat


The benefit of statically linking becomes moot when it doesn't reduce the number of dynamically linked libraries. That's the point.


That's not why rust statically links the runtime. The main benefit is that they don't have to try to design and maintain a stable ABI for it. Which is not moot.

More generally, you statically link something to avoid distribution hassles of various kinds, not because you care about the specific number.


Another point of clarification that is of great importance to the results, and is a common Rust newcomer error: The benchmarks for the Rust implementation (in the original post that got all the traction) were run with a /debug/ build of rust, i.e. not an optimized binary compiled with --release.

So it was comparing something that a) didn't do meaningful parsing against b) the full parsing rust implementation in a non-optimized debug build.


Am I missing something? In the git repository [0] it says:

> needletail_benchmark folder was compiled using the command cargo build --release and ran using the following command ./target/release/<binary> <path/to/file.fq>.

Or are you talking about something else here?

[0] https://github.com/MoSafi2/MojoFastTrim


It was later edited, after it had basically made the rounds.


Ah okay, found the commit that changed the benchmark numbers

https://github.com/MoSafi2/MojoFastTrim/commit/530bffaf21663...


How much does this particular result change when running in release mode?


Depending on the code I've seen performance increases above 100x in some cases. While that's not exactly the norm, benchmarking Rust in debug mode is absolutely pointless even as a rough estimate.


Is there any compiled language that doesn't benefit heavily from release builds? That would be interesting if true.


This can happen in languages that use dynamic constructs that can't be optimized out. For example, there was a PHP-to-native compiler (HipHop/HPHPc) that lost to faster interpreters and JIT.

Apple's Rosetta 2 translates x86-64 to aarch64 that runs surprisingly fast, despite being mostly a straightforward translation of instructions, rather than something clever like a recompiling optimizing JIT.

And the plain old C is relatively fast without optimizations, because it doesn't rely on abstraction layers being optimized out.


Julia, for example runs by default with -O2 and debug info turned on. It's a good combo between debug-ability and performance.


On my machine, running the debug executable on the medium-size dataset takes ~14.5 seconds, and release mode takes ~0.8 seconds.


do you know why debug mode for rust is so slow? is it also compiling without any optimization by default? it's it checks for overflow?


The optimisation passes are expensive (not the largest source of compile time duration though).

Debug mode is designed to build as-fast-as-possible while still being correct, so that you can run your binary (with debug symbols) ASAP.

Overflow checks are present even in release mode, and some write-ups seem to indicate they have less overhead than you’d think.

Rust lets your configure your cargo configs to apply some optimisation passes even in debug, if you wish. There’s also a config to have your dependencies optimised (even in debug) if you want. The Bevy tutorial walks through doing this, as a concrete example.


That's not right, Rust only checks for overflow in release mode for numbers where its value is known at compile time. In debug mode all operations are checked for overflow.


Integer overflows can be enabled in release mode by modifying your Cargo.toml with

    [profile.release]
    overflow-checks = true
IMO it should have been the default.


Aahh, my bad. TIL.


Yes, optimization is disabled by default in debug mode, which makes your code more debuggable. Overflow checks are also present in debug mode, but removed in release mode. Bounds checking is present in release mode as well as debug mode, but can sometimes be optimized away.

There's also some debug information that is present in the file in debug mode, which leads to a larger binary size, but shouldn't meaningfully affect performance except in very simple/short programs.


This is not accurate. The blog post used `--release` for it's Rust numbers. The confusion comes from the 50% performance win being specific to running on an M2 mac. On an x86_64 Linux machine, the results are more or less equivalent.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: