The computers are fast, but you don't know it

skohan · on June 17, 2022

I remember the moment I realized how fast computers are at uni. I was in an algorithms course, and one of our projects was to make a program which would read in the entire dataset from IMDB of films and actors, and calculate the shortest path between any actor and Kevin Bacon using actors and movies as nodes and roles as edges.

I was working in C, and looking back I came up with a quite performant solution mostly by accident: all the memory allocated up front in a very cache-friendly way.

The first time I ran the program, it finished in a couple seconds. I was sure something must have failed, so I looked at the output to try to find the error, but to my surprise it was totally correct. I added some debug statements to check that all the data was indeed being read, and it was working totally as expected.

I think before then I had a mental model of a little person inside the CPU looking over each line of code and dutifully executing it, and that was a real eye-opener about how computers actually work.

rufus_foreman · on June 18, 2022

>> The first time I ran the program, it finished in a couple seconds. I was sure something must have failed

At one of my first jobs I was a DBA supporting a CRUD app in the finance industry. The app had one report that took forever and usually timed out, I was told to take a look at it. The DB query was just missing a couple indexes so I added those.

After I added them, my boss told one of the users of the app to try out the report and she said it was still broken. He asked what she meant and she said she clicked the button and the page with the results came up right away. She thought it was broken because it didn't take forever.

Balgair · on June 18, 2022

If I recall correctly, the first few ATMs near Wall Street had the same issue. They were too fast and people were suspicious. They had to add in a delay so folks would feel alright using them.

stuffbyspencer · on June 18, 2022

That's pretty funny, and is literally what I did with a CLI tool I made once. It was supposed to loop through something that was over 10,000 entries long. It finished in under a second.

I decided to add a small fraction of a second every X iterations and output some garbled data to the terminal. I got paid a nice little sum because of that. Sometimes, knowing how to make something look complicated is as important as doing something complicated.

winternett · on June 17, 2022

I was working at my UNI library around 97 when a fresh t3 (~45MBPS) line was just installed... We also got brand new top of the line Micron computers as well there. I was the first person to test the connection and after years of working on 56k modems I couldn't believe how everything I clicked suddenly worked at the speed of light. Videos I clicked on (on MTVs web site back then) loaded instantly, almost felt as if they loaded before I clicked on links... I have never had anything load as quickly since, even on my home Internet, which is directly connected to my router & 250MBPS plan.

I blame all the ads, tracking, and bloatware that is prevalent now most of all.

sparker72678 · on June 17, 2022

Gosh, yes, this. There was a brief era where websites loaded instantly on those fast network connections, and it was glorious.

It's bananas how slow the web is today on average when you're on a symmetric gigabit connection.

meshaneian · on June 17, 2022

Just like freeways - add more traffic lanes, get more traffic.

colejohnson66 · on June 18, 2022

Theres actually a name for that phenomenon: Braess’ Paradox https://en.wikipedia.org/wiki/Braess%27s_paradox

MichaelZuo · on June 19, 2022

Though it's only a paradox for a small number of additional lanes. If 100 lanes are added then traffic will speed up.

Schiendelman · on June 23, 2022

It’s unclear that’s the case, because there are other limitations at work! Merging is a huge time cost and slows down two lanes; the more lanes, the more merges are necessary to use the new lanes. There’s some work suggesting it makes more than three lanes in one direction essentially useless in urban areas - and that’s where the traffic is…

MichaelZuo · on June 23, 2022

I mean it in the literal sense, past a certain point the city no longer exists as the lanes replace all other land uses and by definition without any source of traffic, the traffic will cease to exist as well.

Schiendelman · on June 24, 2022

Ha! That’s true too.

vishnugupta · on June 17, 2022

For most of my work CPUs form the last decade will work just fine. It’s the memory and, especially, disk IO that kills the performance. SSDs have helped big time.

von_lohengramm · on June 17, 2022

I'd argue that SSDs have done more harm than good. Since the worst-case is now far superior that it used to be (HDDs), most developers see no need to optimize any further. For example, plenty of video game engines will stream copious amounts of data from disk instead of optimizing memory usage, asset size, and in general more creative solutions (i.e. shader effects instead of GBs of redundant assets). If hitting the disk slowed everything to a crawl, then maybe software would've been designed in much more efficient ways. Good (enough) is the enemy of great.

khalladay · on June 17, 2022

Ah yes game engine development, where we herd all the lazy folks who hate efficient solutions to problems.

morelisp · on June 17, 2022

It's not about (in)efficiency but creativity and development budgets.

If you have "unlimited" fast storage, the most technically efficient way to render highly-detailed realistic assets is to underpay a bunch of artists to make a metric ton of highly-detailed realistic assets, then stream them in off disk.

If you don't have that storage, the most efficient way might be to make a smaller number of assets modulated by some technical work, which is more accessible to smaller teams who have one top-shelf programmer but no army of contract artists. Or, the team is forced into a non-realistic art style which gives artists and the industry as a whole more space to design in.

It also means that when you do blow your technical budgets for whatever reason (e.g. nobody upgrades their SSDs for 2 years due to a chip shortage so your median performance projections for release were way off), it starts getting much worse very fast.

xenadu02 · on June 18, 2022

> I'd argue that SSDs have done more harm than good

I can't imagine the thought process that would lead to a statement like that. SSDs are the single best thing to happen to personal computing in the past 15 years. Absolutely no question and not even arguable.

LtdJorge · on June 17, 2022

Games are a bad choice as an example. (Some) Games are always trying to squeeze the most out the latest hardware. You can't have a massive world with 4K textures and no loading screens using an HDD and 8GB of RAM without performance degradation.

account42 · on June 20, 2022

Games are a good choice as an example. *Some* games are trying to squeeze ot the most of the latest hardware but most just target a minimum acceptable framerate for common hardware and then move on. Good enough to ship is pretty much the game industry's entire mode of operation.

Also, games generally don't make good use of extra resources you have. Have 128 GiB ram and plenty of VRAM? In almost all games you're not going to see any less loading screens in most games than someone with 8 GiB ram even in really simple scenarious like going back to the area you just came from.

goosedragons · on June 18, 2022

Has that even happened yet? PS5 and Xbox Series are the first consoles to use SSDs instead of HDDs and they're only beginning to gain serious steam. PS4 games are still being released and they have to cater to that stock 5400 RPM HDD.

ano88888 · on June 17, 2022

Why can't we have a language easy to read and maintain but also have the speed of C?

flohofwoe · on June 17, 2022

It's not "the C part" that makes code run fast, but memory access patterns. C just happens to not get in the way between the coder and the machine when it comes to explicit control over memory layout. In the late 60's and early 70's this was probably an "accidential feature", but with the widening CPU/memory performance gap it turned out that later languages (from the late 90's and early 00's) had bet on the wrong horse by trying to abstract memory away. More recently this trend is reversing again and plenty of C alternatives are starting to appear with explicit control over memory layout (Zig, Nim, Odin, Rust, ...).

qikInNdOutReply · on June 17, 2022

So a good language, should not abstract that memory pyramid away, but instead make you painfully aware of it, while developing. Rewarding DOD, punishing OO, but that results in more education time for programers, which no company is willing to pay for.

What instead is needed is a intermediate language, that takes the constructs of object orientation and the instruction flow and allows to rearrange them for maximum memory efficieny. Like, strip those OO-bjects into arrays, or directly pack them into hot-loop structs that have little regard for the Objects they started out with.

Can someone from the old guard tell me, how often we have been here in the language design cycles through the desert?

kaba0 · on June 17, 2022

I believe that the direction of a managed language with value types is a good goal.

I especially like Java’s plan of introducing them: according to the latest design iteration, there will be 3 buckets of objects: current identity-having ones, value classes and primitive classes.

The second category would drop identity, but will keep nullability (for example two DateTime instance of the same value will be considered equal on a VM-level, allowing for optimizations like allocating DateTime’s inside an array serially, or stack-allocation. Nullability is important because there is no sane default value choice for for example a DateTime. But this value can be encoded cleverly similarly to what Rust does with optional)

The third category will loose nullability as well and current primitives will be migrated to it. So a ComplexInt “class” will be possible to implement with “zero” overhead.

My point being that there are two ways of improving performance, either showing more knobs to manipulate, or to raise the abstraction level, allowing cleverer optimizations. C# does the former, while Java never went that route, and I think that the latter approach better fits a managed language and can easily push it for like 90% better performance at a fraction of developer complexity.

moffkalast · on June 17, 2022

> that results in more education time for programers, which no company is willing to pay for

That's why you finally stop teaching Java in high schools.

carlmr · on June 17, 2022

I was going to say OOP aside from the basic animals and calls example, takes years of indoctrination for people to find it the simple default way to do things.

Functional programming is much simpler, but we don't spend years hammering the concept into people's brains.

Akronymus · on June 20, 2022

With helping acquaintances learning programming from 0, I have a sample size of n = 2 that starting with FP got them to grok basic programming within a very short amount of time.

And IME moving from FP to OOP is MUCH easier than the other way around.

Don't really have the knowledge to compare iterative vs functional for a complete beginner, though I suspect that even there moving from FP to iterative is easier.

soco · on June 17, 2022

I program Java to earn my living and any hierarchy deeper than two is a bad smell for me. Not that it cannot have its place, but most of the time you're right, some find so fascinating to inherit everything from everything...

carlmr · on June 17, 2022

I remember sitting in these lectures thinking, why not just use a function without all the boilerplate? And a decade later the programming world finally came back to some semblance of sanity.

crabmusket · on June 17, 2022

Because for some reason most OO learning material seems to be focused on syntax and language features, not design. It's a real shame.

OOPMan · on June 17, 2022

Ah yes, functional programming is indeed so much simpler.

I'll just use a Kleisli

XD

bernds74 · on June 17, 2022

> Like, strip those OO-bjects into arrays, or directly pack them into hot-loop structs that have little regard for the Objects they started out with.

I think Jonathan Blow's JAI is trying to do something like this. Unfortunately it isn't exactly available yet, or it wasn't last time I looked.

Schroedingersat · on June 17, 2022

Jai has this as an explicit goal.

Zig has closely aligned goals.

Shorel · on June 17, 2022

Just allocate the objects in the stack, RAII style.

skohan · on June 17, 2022

Doesn't Rust mostly abstract the memory management away as well? It tends to be low overhead, and has sensible defaults with respect to memory management, but it's built around RAII and for instance if you use a lot of reference types as far as I know there's nothing keeping you from having fragmented memory the same way you would with another high level language.

I know Rust also offers arenas and other purpose built tools for more optimized allocation strategies, but Rust doesn't seem like the language you would reach for if your number one priority is memory performance.

It seems like there is a necessary trade-off between truly top-end memory performance and memory safety.

swiftcoder · on June 17, 2022

One of the big differences memory-wise between the C/C++/Rust family and the Javas of the world is having first class support for by-value object types (and collections thereof).

Yes, you can trash your cache in all these languages if you choose to do everything with references to a multitude of individual heap allocations... but in Java-likes you don't have the choice not to do that

flohofwoe · on June 17, 2022

Well yeah, you can write slow code in any language if you don't think about memory layout ;) I see Rust roughly in the same bucket as C++. You can abstract the details of memory management away - which mostly also means giving up control over how things are arranged in memory - but if needed the low level explicit memory management features are there.

skohan · on June 17, 2022

I guess the difference is that C++ still gives you pretty much direct access to memory (i.e. pointer arithmetic). Rust tries very hard to keep you at arms length from the actual memory as a rule, and forces you to work through a safe abstraction unless you use an escape hatch.

pkolaczk · on June 17, 2022

If by "escape hatch" you mean unsafe blocks, then whole C++ is an escape hatch.

dzaima · on June 17, 2022

It's not even just memory access patterns. It's any and all abstractions - C doesn't provide any, so you write things manually, and thus won't do things that are not required for your use-case (whether it be separated loops for actions, pre-initialization/zeroing, multiple allocations where one or none could do, and higher-level stuff like no need for iterator stability, a vector push that assumes reserved space, etc).

kaba0 · on June 17, 2022

Well, lack of abstraction can easily hinder performance as well. Just compare C’s string management story with that of C++. (Also, for very performance-oriented workloads, C++ will be preferred). In case of C you only have dumb c strings, on which you will iterate many many times completely needlessly. It is both error prone, and less performant than C++’s strings, which can do small string optimizations (storing the string inside the structure if it fits, and being a pointer to it otherwise).

dzaima · on June 17, 2022

I'd say the case of C strings is more a case of a bad abstraction C has (at least historically), rather than lack of one. But the things that expect them are just the standard library (which doesn't have many useful general-purpose things anyway), so you can write your strings as a pair/struct of char*+length or whatever else you may want just fine.

Small string optimizations, while nice (and probably do average out to being beneficial), aren't always needed, and the extra generated code for handling both cases could make it not worth it if you've got a fast allocator, and can even make some operations just outright slower. (and if your code doesn't actually have strings anywhere near hot loops, all you get is a larger binary). File paths, for example, are often large enough to not fit the small case, but still small enough where even the check for whether it is small can be a couple percent of allocation/freeing.

Being error-prone, though, is something that I can agree with. That's the cost of doing things manually.

(I'd also like to note that malloc/free are a much more important case of a bad abstraction - they have quite a bit of overhead for being able to handle various lengths & multithreading, while a big portion of allocations is on the same (often, only) thread with a constant size, and said size being known at free-time, which is a lot more trivial to handle; not even talking about the cost of calling a non-inlined function spilling things to the stack)

kaba0 · on June 17, 2022

Well the reason this one may be a good example is that you couldn’t do this optimization in C even if you wanted to. You would have to call a function at every use-site to handle your “abstraction”. And the same thing applies in other cases.

Also, I’m not sure the added conditional branch will increase the binary too much, and the reason it is inside the c++ stdlib is that it was likely measured and proved beneficial.

But I do agree that maybe your allocation example is a better one, though the solution to that is perhaps a full GC, which does have a few tradeoffs (which are worthy to take more often than not).

dzaima · on June 17, 2022

Right; making abstractions in C when you really know you should can be messy, but still possible (not with the same syntax, but I personally like the fact that a[b] is 100% guaranteed to be, at most, a single load; similarly for all operations other than function calls; makes it easier to reason about performance at a glance).

Recommending a GC is hard for me though; over your system malloc/free, maybe, but alternative allocators can be very fast, without the drawbacks of pauses (or slower execution as a result from a non-pausing GC).

bheadmaster · on June 17, 2022

Go fills that spot for me.

I've done video4linux stuff in Go, and passing an unsafe.Pointer to a Go struct in an ioctl() worked fine, which tells me that Go structs are isomorphic to C structs. Even though Go has garbage collection, it allocates everything it can on the stack, so only long-lived shared-between-goroutines objects are subject to garbage collection.

Go abstracts concurrency, completely removing all concurrent features from a language except for the "go" keyword (that launches a goroutine - which is basically a tiny virtual thread), channels (which are selectable queues) and "select" keyword that waits for the first "input" from a static set of channels.

skohan · on June 17, 2022

Go strikes me as one of the best "good enough" languages we have right now. You're not going to do HPC in Go, but it's performant enough to run circles around a lot of high level languages and dynamic languages. It abstracts away the stuff that's super error prone about manual memory management, and it's so brutally simple that it's hard for one of your colleagues to write code you're not going to be able to understand.

There's a few features that could improve it, like proper ADT's, and it's a bit lacking in expressiveness for me to choose it for personal hobby projects, but I would recommend it any time for general-case professional software development.

kaba0 · on June 17, 2022

> but it's performant enough to run circles around a lot of high level languages and dynamic languages

It ain’t running circles around any managed language, with perhaps the exception of Python.

C#, Java, JS all have comparably good performances, sometimes much better - e.g. when GC can’t be avoided.

bheadmaster · on June 17, 2022

Go is in my experience significantly faster than Java, and uses an order of magnitude less memory for equivalent functionality.

If you have any data (or better yet, benchmarks I can run on my own machine), I'd very much like to see some hard numbers.

kaba0 · on June 17, 2022

Go can be faster at small, specific programs where memory lifetimes are deterministic and you can use value types. Otherwise, Java will beat every other managed language by a huge margin when it comes to GC-related workflows. Sure, it does so at higher memory usage, but that is a good tradeoff for many use-cases (especially server).

Benchmarks are hard to do right but this one does actually measure GC quite well: https://benchmarksgame-team.pages.debian.net/benchmarksgame/... (For managed languages. It is not really a fair comparison between non-GC and GC-languages)

So all in all, for bigger programs it is hard to do a good comparison, but there is exactly where JIT compilers shine and the memory tradeoff and the like brings their return.

skohan · on June 17, 2022

Isn't a lot of server code these days small programs that just pull data out of a database and feed it to an HHTPS response?

It seems like it's a pretty good option to have your infrastructure code implemented in a systems programming language like C or Rust (probably AWS or GCP is doing this for you), and just implement your business logic in Go as the type of small, well-defined programs you're talking about.

kaba0 · on June 17, 2022

But then why go? You can also implement that logic in a likely much more readable way in Python (as at that point performance doesn’t matter), or just write the whole thing in Java/C#/Scala/Kotlin whatever, which in my opinion are more expressive for business logic.

bheadmaster · on June 17, 2022

Go is much faster than Python, uses less memory, and compiles down to a statically-linked native binary, making containerization trivial. And (IMHO) it's even more readable than Python - nowdays Python code is as easily turned into an unreadable mess as Java or C# code. Just try reading Python standard library and Go standard library - the difference is monumental.

kaba0 · on June 17, 2022

We are talking about business logic. The infrastructure is already in a lower level language, so the performance is not a concern.

And we will have to disagree on C#/Java/Python being unreadable mess. In my experience all 3 can be written in a really well maintainable way. I don’t have much experience with Go, but out of these, I would vote for it as the least maintainable (as just because each line is trivial to understand, doesn’t make the whole program flow easy to read. Otherwise why not just write assembly, every line is even more trivial there)

bheadmaster · on June 17, 2022

> In my experience all 3 can be written in a really well maintainable way.

That's true, but that is generally true of any (non-toy) language. But in the modern world of rapid development, it matters how hard is it to write code in a non-maintainable way - i.e. how well it tolerates modifications by different people. And to me, it seems easier to write readable code in Go than it is to write unreadable code.

It seems to come from the lack of features - Java, Python and C# have too many features, and any problem can be solved in N different ways, each one with its own warts. If you want to work on a wide range of codebases, you have to know each one of the approaches and their warts and footguns.

Meanwhile, Go feels like it really reached the "there should be one obvious way to do it" ideal of Python, while Python has over the years evolved into something more Perl-like. Want to build a concurrent application? Chose your tradeoff - either you get CPU scalability (multiprocessing) but lose memory sharing, or you get a simple concurrent model (threading) that isn't scalable, or you get I/O scalability (asyncio) at the cost of function coloring, error-proneness and a single-threadedness. Go solved the whole thing with the goroutine model - internally it multiplexes coroutines onto a set of OS threads, but all blocking calls are wrapped by Go runtime which makes every coroutine behave and feel like an ordinary thread, without the massive memory use of OS threads.

pkolaczk · on June 17, 2022

The number of ways to write something is only very loosely correlated with maintainability. The ease of maintenance is IMHO more a function of how much information about the properties of the code you can easily read from the text, and how well the abstractions in the code map to the abstractions you'd use when describing the solution to a friend. Lack of features doesn't help in that regard. That's why probably Go has just added generics, despite the long tradition of Go promoters claiming "lack of generics is a good thing" ;)

Languages with very little type information, e.g. dynamic ones, tend to be quite hard to maintain, unless the original developers kept the discipline of good naming and verbose commenting. Go and Java with their somewhat static, but limited typing and elements of dynamism (interface{}, Object, reflection), sit somewhere in the middle between PHP/JS and Rust/Scala/Haskell.

Languages with little expressive / abstraction power, so the ones limited in features or low-level are also often hard to maintain, because you have to reverse-engineer the high-level stuff from all the details you see. Take assembly as an example - while it may be quite obvious what the program is doing at the bits and bytes level, understanding the sense of that bit-level manipulation may be a much harder task. The assembly language might actually be very simple, but that does not help. I remember when we had a MIPS class, the whole specs was just a few pages, could be learned in an hour.

kaba0 · on June 17, 2022

Could you expand a bit on why do you think Java has too many features? It is a very small language, that is often berated because it picks up features way too slowly if anything.

I would go as far to claim that Java is an easier language than Go, or at least in the same ballpark.

bheadmaster · on June 21, 2022

> Could you expand a bit on why do you think Java has too many features?

I think you're asking for technical details, but I'm afraid I don't know Java well enough to do an objective comparison. I'll try with a subjective explanation or why I think so.

I've learned Go in 20 minutes following the Go Tour. Few months later, I feel like there isn't a single thing I don't know about Go. It's dead simple. When I open a Go repository, it's easy for me to get into the codebase, as all code is more-or-less the same.

I've learned Java back in high school, and to this day I don't feel like I "know" the language. I've tried reading some Java repositories, and every time I feel like there's some kind of friction - some implicit knowledge about it that I just don't understand.

Maybe it's just me, and I haven't spent enough time learning Java. But then again, I've spent even less time learning Go, and yet I have a much easier time using it. That's what I mean by "a very small language".

skohan · on June 17, 2022

Performance is always a concern. For instance, if running a python interpreter introduces latency for each request, that can add up to perceptively worse performance when applied throughout a product.

pclmulqdq · on June 17, 2022

In my experience, Python is far less readable than languages like go. The information density and semantic whitespace of Python really hurts readability.

Sinidir · on June 17, 2022

How does the whitespace hurt readability when basically every styleguide and linter in other languages indents code blocks?

gilbetron · on June 17, 2022

You picked binary trees, which has java better than Go, I'm guessing something about the implementation.

If you look at other examples on that site, Go and Java are roughly the same, but with some variance:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

igouy · on June 17, 2022

> … guessing something about the implementation.

The source code is shown —

    binary-trees Java #7 program

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

    binary-trees Go #2 program

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

morelisp · on June 17, 2022

Alright, I guess I'll admit this just sniped me.

Having read the rules it's difficult to know what's considered "fair" for this test - all GC tuning is off the table, sure. But what's bugging me is "Leaf nodes must be the same as interior nodes - the same memory allocation." So what constitutes "the same memory allocation" - literally the exact same call to some opaque internal allocator? If so, shouldn't Java also have to disable JIT to be fair?

Let me offer an alternate interpretation: I will do the same memory allocation if I need to allocate a node, but if my language lets me not allocate a node yet still use that node why should I? Or an alternate argument if you don't like that one: Why must my "node" be `Tree`, rather than `*Tree`?

A central idiom of Go is that zero-values of a type can be useful; a two-line change, no new special-cases, no pooling or such gauche hacks:

    // Count the nodes in the given complete binary tree.
    func (t *Tree) Count() int {
     if t == nil {
      return 1
     }
     return 1 + t.Right.Count() + t.Left.Count()
    }

    // Create a complete binary tree of `depth` and return it as a pointer.
    func NewTree(depth int) *Tree {
     if depth > 0 {
      return &Tree{Left: NewTree(depth - 1), Right: NewTree(depth - 1)}
     } else {
      return nil
     }
    }

I'm sure someone will tell me I "optimized away the work" - but in the end I believe I'm making exactly the same number of method calls on the same type of receiver. If that's not the work, what is?

igouy · on June 17, 2022

And then the Java etc programs are re-written to do the same and the test value is increased (to compensate for the reduced memory allocation) and we're back where we started?

Seems like moving the furniture around.

No doubt I've misunderstood.

morelisp · on June 18, 2022

A null value in Java has no methods / is not an object. I didn’t precompute anything.

igouy · on June 18, 2022

afaict you're making a distinction without a difference.

afaict the Java etc programs could be re-written using null where you used nil — please explain why you think that isn't correct.

morelisp · on June 18, 2022

You cannot call a method on null. Meaning:

- Practically, Java would rather have to `return 3` when it detects a null child, effectively precomputing the penultimate level.

- Semantically, Java could no longer distinguish between an absent child and a child with no children.

Honestly, I have other variants that still don't use pooling but are less idiomatic; I find this exercise is begging the question hard. Any tools, however idiomatic, the language is giving you to reduce the effects of allocation seem to be off-limits for GCd languages. Whereas then e.g. C can just throw them all in a third-party pool library. And JIT languages are presumably allowed to fuse anything they want.

igouy · on June 18, 2022

I think you just said the Java etc programs could be re-written in a similar way?

And we could add some more rules ("distinguish between an absent child and a child with no children") to reject those Java etc programs.

morelisp · on June 18, 2022

> I think you just said the Java etc programs could be re-written in a similar way?

I could rewrite the Go program to return 3 too, but I doubt you’d accept that.

I can only conclude the benchmarks game is bad-faith bullshit at this point.

igouy · on June 19, 2022

> … bad-faith bullshit…

Kudos for actually bothering to "read the rules".

None for ignoring what you read.

The answer to "… but if my language lets me not allocate a node yet still use that node why should I?" is — Because allocate a node is the basis of comparison with the other programs!

Change that for the Go programs and you change that for all the other programs; otherwise just special pleading for Go lang.

zinodaur · on June 17, 2022

Wow, I had no idea that java was so fast. One thing I like about go is that you can cram many thousands of concurrent requests into the same process. But it looks like java has some pretty robust async tools... So I bet you could do something similar

kaba0 · on June 17, 2022

The JVM is a real beast, which makes sense as a good chunk of the whole internet runs on top of it (almost every big corp has plenty of infrastructure running Java), so it had plenty of engineering time poured into it.

Regarding concurrency, I wouldn’t choose existing reactive frameworks and what not for a new system. Java will soon get Project Loom, which will introduce Go-like virtual threads - so that one can write a web server that spawns a new thread for each request as well. Since the Java ecosystem is very purely written almost exclusively in Java itself (no FFI), basically everything will turn automagically non-blocking.

moffkalast · on June 17, 2022

He said "easy to read" not "filled with weird syntax choices that make anyone from a C background barf". What the hell are those channel arrows and why do they point the wrong way.

bheadmaster · on June 17, 2022

Channel arrows are basically just read/write operations.

If `ch` is a channel, then this expression means "value obtained from reading from the channel":

    <-ch

And this expression means "write value x into the channel":

    ch <- x

Both expressions can be used as a case inside select statement:

    select {
    case val := <-ch:
        // ...
    case ch <- val:
        // ...
    }

Which will execute exactly one case, depending on which channel becomes "ready" first - channel is ready for reading if there is another goroutine blocked on a write operation, and ready for writing if there is another goroutine blocked on a read operation.

You say you're from C background - if you've ever worked will file descriptors you will notice that channels are basically userspace file descriptors. Channel reading and writing is isomorphic to read() and write() syscalls, and select keyword is isomorphic to select() syscall.

Hopefully this clears up the whole channel syntax thing. I just hope you weren't trolling.

morelisp · on June 17, 2022

> Channel reading and writing is isomorphic to read() and write() syscalls

This is a very bad mental model because channels operations cannot be canceled (without using select on two channels) or return any error status (at all).

bheadmaster · on June 17, 2022

While technically true, I don't really see the impact of the difference. It is idiomatic to use contexts for any kind of cancellation during channel operations, and it works well.

The syscall comparison was made to give intuition about general behavior of channels to someone with a background in C. Of course it's not completely identical.

morelisp · on June 17, 2022

The ability of read/write to communicate via errors as well as the actual data transferred is significant - there's a reason Go's i/o model is io.Reader/io.Writer and not chan []byte.

You might as well explain channels in terms of any blocking operation if the bar for "isomorphic" (now backtracked to "intuitively" I guess) is that low.

bheadmaster · on June 17, 2022

I don't know what's the bar for "isomorphism", but I know that the word literally means "same shape", so just because some nerd that got killed in a duel over a girl used the same word in a mathematical context doesn't give him dibs over its general use.

Unless, of course, system calls can be modeled as operators over sets. In which case, please tell me how.

morelisp · on June 18, 2022

In normal use of the term, a function's shape is its type signature; at the very least its in and out arity, and usually even more specific.

bheadmaster · on June 18, 2022

Well, in my case I used it to signify same behavior in terms of process/goroutine communication. I think it's "specific enough" to warrant the use of word "isomorphic".

moffkalast · on June 17, 2022

It was a rhetorical question.

bheadmaster · on June 19, 2022

So you were trolling. Bummer.

jakobnissen · on June 17, 2022

Rust is that if you keep the code simple without too many trait objects and macros and type magic. But then it becomes even more constrained than it actually is. It'll also be boilerplate heavy and hard to write.

Julia is fast, easy to read, and easy to write. But it's not easy to maintain. There is a direct tradeoff between dynamism on one hand making things easier to read/write and static enforceability on the other making it easier to maintain.

mcbuilder · on June 17, 2022

Haskell has an elegance, and can be written as simply as C. Usually fast enough, and can be optimized as well. The downside is that you will be sucked into a rabbit hole of academic type theory and wonder how best to express your system as a Free Monad instead of bashing it out like any sane C programmer. Just kidding, someone already figured out those hard parts for you, you just forgot to browse for it on Hackage.

bigDinosaur · on June 17, 2022

It's a fair criticism of Haskell that you can fairly easily blow up your time/space complexity without realising it though. I think in many ways it's better but Haskell specifically demands a lot even for a functional language.

pjmlp · on June 17, 2022

Pascal dialects, Modula-2 are of a similar age, while other like JOVIAL are a decade older but they did not come with a very big killer feature, an OS like UNIX.

Then there were BLISS, Mesa and PL/I, but the OSes that made use of them lost to UNIX, so.

With exception of Mac OS, written in Object Pascal and later ported to a mix of Object Pascal and C++.

Having said this, plenty of alternatives with AOT compilers exist nowadays.

The only thing C has going for it, is historical weight, UNIX/POSIX ecosystem, and some domains that are closed to any alternative suggestions, due to tooling or cargo cult against alternatives.

Banana699 · on June 17, 2022

Because "easy to read and maintain" is about humans, and "speed of C" is about machines, and there is a vast gulf between the 2 that always force you to compromise in one or the other to get the 2 together, and usually both.

Code being easier to read and maintain is a function of how close it is to human semantics. The more the algorithm is presented in terms and notations humans like and find familiar, the easier. Code being performant is a function of how close it is to machine semantics, the more the algorithm is presented as steps that the machine likes and finds familiar, the faster it will run, as the machine is doing less to execute each step.

There is a fundamental tension between the 2, even if compilation from high level languages might, at first glance, give us the illusion that we can have both. We can't, not in general. We can only do it for a class of human semantics that C++ folks call "Zero-Cost Abstractions", the set of abstractions that can be completely erased without a trace by the time you get to the executable.

But otherwise, there is a fundamental cost to making code more readable by humans: making it less readable by the machines that will execute it. This is a reflection of the fundamental alienness of computers, what they find quite easy you find quite hard and vice verca. Optimizing for huamans means generality and ruthless hiding of details, optimizing for machines is all about special cases and ruthless exploitation of assumptions.

(Incidentally, C is not all what it's cracked up to be. Generic containers, off the top of my head, resort to using void* pointers for data and function pointers for operation, which has a runtime cost besides being unsafe and error-prone. Templates in C++ can aggressively inline types and operations for you, on the other hand as if you haven't written generic code at all, no wonder templates is the poster boy for C++'s 0-Cost abstractions. Another example I hear often is how pointer semantics in C and C++ makes it extraordinarily difficult for the compiler to optimize array and memory operations, whereas a language like Fortran make it easier by not having pointers.)

Shorel · on June 17, 2022

For many things, I have found this easy language is C++.

I use JavaScript and C++ for different things, sometimes in the same day. (And python and PHP and others, but this is not relevant.)

Believe me, JavaScript can be a real head scratcher compared to C++.

And now for the purists: No, I don't use all features of C++, only the minimal necessary ones for the problem I have to solve. This ridiculous idea that you are not using C++ if you are not using every single language feature is what makes programs difficult to write and maintain.

foerbert · on June 17, 2022

Who says we don't? I think I'm past-due for my regular Ada plug, so uh, here it is.

xigoi · on June 17, 2022

We can.

https://nim-lang.org/

FeepingCreature · on June 17, 2022

D as well, though only (like Rust) if you keep the traits and template stuff low.

The nice thing about D for me is that you can generally banish the unreadable metaprogramming code to a library.

anfelor · on June 17, 2022

Ocaml 5 and Koka (https://koka-lang.github.io/koka/doc/index.html) can get quite close to the speed of C...

xigoi · on June 17, 2022

Unfortunately, Koka doesn't seem to be ready for general-purpose use…

mikewarot · on June 20, 2022

>Why can't we have a language easy to read and maintain but also have the speed of C?

Swap out { } for Begin End, and make a few other changes, and you've got Pascal. Single pass pascal compilers have been faster (at compiling) that almost anything out there since Turbo Pascal 3.0 for MS-DOS.

Modern versions, such as Free Pascal, Delphi and Lazarus also deal with strings in a manner that totally avoids needing to manually manage memory. The GUI builders are awesome as well.

killingtime74 · on June 17, 2022

Transpile to C? Zig or Rust?

bruce343434 · on June 17, 2022

You can't just "transpile to C" to get "C speed". C speed comes from low overhead. Naive transpilations will just include that overhead, like garbage collection or many layers of pointer indirection, but written in C. The unsavory answer is that you must use less abstractions if you want fast code. Compilers just aren't good enough to compile all the abstractions away.

pjmlp · on June 17, 2022

That includes C versus modern CPUs.

dahfizz · on June 17, 2022

The CPU is the CPU regardless of what language you are running on it. C will still give you the best performance on a modern CPU.

If anything, C has an even bigger advantage on modern CPUs because it has easier access to things like vectorize/SIMD intrinsics. It is also easier to tweak your data dependencies to help the branch predictor.

kaba0 · on June 17, 2022

> If anything, C has an even bigger advantage on modern CPUs because it has easier access to things like vectorize/SIMD intrinsics. It is also easier to tweak your data dependencies to help the branch predictor.

Have you seen the ridiculously complex optimizations that C compilers do to maybe turn some shitty for loop into vector instructions? C is terribly bad fit for this use case and hardware tries to get closer to C than the reverse.

C++, Rust but even C# and Java has much better SIMD support than C has.

EDIT: It really doesn’t help C (and some of the listed languages as well) that they are very imperative. SIMD is exactly the place where pureness and some form of FP is much better at allowing these kind of optimizations (a map of a pure function can be “trivially” optimized into vector instructions, while it is really hard to decide whether this for loop is safe to convert, and it is really up to the heuristics of the compiler. A bit of rearrangement can cause a failure to optimize, resulting in a huge drop in performance)

pjmlp · on June 17, 2022

Nope, those are language extensions not defined by ISO C.

Any language can have such intrinsics as extensions, for example D, Rust, C++, Swift, .NET, Java (as preview).

skohan · on June 17, 2022

Yeah I think Zig in particular is trying to be just exactly a "better C".

I don't know if transpiling will get you there, because for instance if you're transpiling a dynamic language, you're going to have to output C that is essentially emulating all those dynamic language features, so it might be faster than say, the original Python, but it's not going to be as fast as a pure C implementation.

mhaberl · on June 17, 2022

> Transpile to C?

If you transpile something to C it does't mean it will be fast. You can write slow C code (or transpile something to C that will be slow). The compilers are not the issue here.

gpderetta · on June 17, 2022

In fact it might be slower than actually targeting a proper compiler intermediate language.

thorncorona · on June 17, 2022

Julia is that for a lot of academia + mathematics based code.

meheleventyone · on June 17, 2022

corford · on June 18, 2022

We do. It's called D

corford · on July 1, 2022

Or VLang: https://news.ycombinator.com/item?id=31944777

jayp1418 · on June 17, 2022

Ada Programming Language

enriquto · on June 17, 2022

> Why can't we have a language easy to read and maintain but also have the speed of C?

Fortran is fine. Also lua (using the luajit interpreter you get really close to C speed) and julia (except for the atrocious startup time).

rboes · on June 18, 2022

I did that assignment. Did you go to uw?

forinti · on June 16, 2022

On a 3GHz CPU, one clock cycle is enough time for light to travel only 10cm.

If you hold up a sign with, say, a multiplication, a CPU will produce the result before light reaches a person a few metres away.

dragontamer · on June 16, 2022

> If you hold up a sign with, say, a multiplication, a CPU will produce the result before light reaches a person a few metres away.

The latency on multiplication (register input to register output) is 5-clock ticks, and many computers are 4GHz or 5GHz these days.

5-clock cycles at 5GHz is 1ns, which is 30-centimeters of light travel.

If we include L1 cache read and L1 cache write, IIRC its 4 clock cycles for read + 4 more for the write. So 13 clock ticks, which is almost 70 centimeters.

------------

DDR4 read and L1 cache write will add 50 nanoseconds (~250 cycles) of delay, and we're up to 13 meters.

And now you know why cache exists, otherwise computers will be waiting on DDR4 RAM all day, rather than doing work.

moonchild · on June 16, 2022

> The latency on multiplication (register input to register output) is 5-clock ticks

3

https://www.agner.org/optimize/instruction_tables.pdf

jiggawatts · on June 16, 2022

Back in the days, an integer division took something like 46 clocks (original Pentium), and now on Ice Lake it's just 12 with a reciprocal throughput of 6. Multiply that by the clock speed increase and a modern CPU can "do division" about 300-400 times faster than a Pentium could. Then multiply that by the number of cores available now versus just one core, and that increases to about 2000 times faster!

I used to play 3D games on Pentium-based machines and I thought of them as a "huge upgrade" from 486, which in turn were a huge upgrade from 286, etc...

Now, people with Ice Lake CPUs in their laptops and servers complain that things are slow.

bouke · on June 17, 2022

And things are slow as we waste all that processing power on running javascript one way or another. And everything requires a slow blocking connection to the mainframe. Nowadays the “always connected” mindset is really slowing us down.

onion2k · on June 17, 2022

That explains why Electron apps and Web pages are slow but the post you're replying to is about games...

bouke · on June 17, 2022

Ok fair enough, but the mindset is spreading: json (javascript) parsing is what caused GTA Online loading times to balloon and I dread playing Call of Duty online as it wants to download and install dozens of GBs every time I launch it.

wruza · on June 17, 2022

It wasn’t json parsing per se, but a buggy roll-your-own implementation that used a function (sscanf iirc) with a surprising nontrivial complexity on a long string. Fun part is, if they just outsourced that load to javascript and its JSON.parse, they’d never encounter that exponential slowdown. Javascript is a nice target to blame, but it isn’t the problem. CPUs got hundreds of times faster, javascript only divides it by N, which stays low and constant (at least) through decades. Do you really believe that if browsers only supported MSVC++-based DLLs without scripting, sites would run faster? That would be naive.

jiggawatts · on June 17, 2022

You may be thinking of GTA Online, but the point is still valid.

fennecfoxy · on June 17, 2022

Games do use JS, some UIs actually render using a webview (Unity etc is capable of this)

jcranberry · on June 17, 2022

I didn't see anyone complaining about games in the parent comments.

fennecfoxy · on June 17, 2022

Afaik much JS these days is optimised, see things like V8's turbofan https://v8.dev/blog/turbofan-jit

However there is definitely still less intrinsic optimisation from a dev perspective I think - people will iterate over the same array multiple times in different places rather than do it once.

I guess our industry has decided moving faster is better than running faster for a lot of stuff.

taneq · on June 17, 2022

Part of the reason computers seem slower than they did is that most programs (and most programmers) only use one of those cores. Most of the reason, though, is that programmers buy new computers, and also that programmers only optimise code that’s slow on their computer.

jnordwick · on June 16, 2022

Those insrtruction latencies are in addition to the pipeline created latency. (They are actually the number of cycles added to the dependency chain specifically). The mult port has a small pipeline itself of 3 stages (that why 3 cycles latency). Intel has a 5 stage pipeline so the minimum latency is going to be 8 for just those two things.

gpderetta · on June 17, 2022

I don't understand what you are trying to say. The dependency chain length is what is normally intended as instruction latency.

Also the pipeline length is certainly not 5 stages but more like 20-30.

jnordwick · on June 17, 2022

Sorry, I dropped all the 1s in that message when i typed it (laptop keyboard is a little sketchy right now). That should have been 15 and 18. I think the recent Intel microarchs take 14 plus howewever long at uopd decoding that minimum 1, so 15-20 or close to that.

> The dependency chain length is what is normally intended as instruction latency.

Yes, the way I read the original post and others was that you actually your response back in 3 cycles, which isn't correct. It doesn't get comitted for a while (but following instructions can use the result even if it hasn't been committed yet). You're not getting a result in less than 20 cycles basically.

gpderetta · on June 17, 2022

> Sorry, I dropped all the 1s in that message when i typed it

It makes sense now! :D

> You're not getting a result in less than 20 cycles basically

But the end of the pipeline is an arbitrary point. It will take a few more cycles to get to L1 (when it makes it out of the write buffer), a few tens more to traverse the L2 and L3 and hundreds to get to RAM (if it gets there at all). If it has to get to an human it will take thousands of cycles to get through the various busses to a screen or similar.

The only reasonably useful latency measure is what it takes for the value to be ready to be consumed by the next instruction, which is indeed 3-5 cycles depending to the specific microarchitecture.

jnordwick · on June 19, 2022

> which is indeed 3-5 cycles depending to the specific microarchitecture

I assume you are talking about from fetch to hitting the store buffer? That would be the aabsolute min time before the data could be seen elsehwere I would think. It can still potentially be rolled back, and that would be higher than reciprocal be way too fast to sustain, but for a single instr burst, I'm not sure. So much happens at the same time, An L1 read hit will cost you 4 minimum, hut all but 1 of that is hidden. can't avoid the multi cost of 3 or add 1. the decoding and uop cache hit, reservation, etc will cost a few. I have no idea.

If you know of anything describing it in such detail, I would be comopletely curiouis.

reacharavindh · on June 17, 2022

This reminds me of that “todo” I wrote for myself a long time ago. These days processors come with bigger L1,L2, and L3 caches. Would it be possible for a program that works on a tiny bit of data(few KB) to load it all up in the cache and provide ultimate response times?!

Are there any directives to the Operating System to say - “here keep this data in the fastest accessible L[1,2,3] please”?

2143 · on June 17, 2022

> Are there any directives to the Operating System to say - “here keep this data in the fastest accessible L[1,2,3] please”?

I'm probably the worst person to explain this.

Long long ago, I took a parallel programming class in grad school.

It turns out the conventional way to do matrix multiplication results in plenty of cache misses.

However, if you carefully tweak the order of the loops and do certain minor modifications — I forget the details — you could substantially increase the cache hits and make matrix multiplication go noticeably faster on benchmarks.

Some random details that may be relevant:

* When the processor loads a single number M[x][y], it sort of loads in the adjacent numbers as well. You need to take advantage of this.

* Something about row-major/column-major array is an important detail.

What I'm trying to say is, it is possible to indirectly optimize cache hits by careful manual hand tweaking. I don't know if there's a general automagic way to do this though.

This probably wasn't very useful, but I'm just putting it out there. Maybe more knowledgeable folks can explain this better.

matthias509 · on June 17, 2022

I’m guessing you are thinking of cache lines where CPUs will read/write data in 64 byte chunks from memory to/from cache.

As you said, being aware of this lets you optimize away cache misses by controlling the memory access pattern.

diroussel · on June 17, 2022

There is a data access strategy called cache oblivious algorithms which aim to make it more likely to utilise this property without knowing the actual cache size.

I used that approach once on a batch job that read two multi megabytes files to produce a multigigabyte output file. It gave a massive speed up on at 32-bit intel machine.

https://en.wikipedia.org/wiki/Cache-oblivious_algorithm

dragontamer · on June 17, 2022

> Are there any directives to the Operating System to say - “here keep this data in the fastest accessible L[1,2,3] please”?

Not for general purpose programs, because L1 caches change so quickly each year there is no point.

For embedded real-time processors, yes. For GPUs, yes. (OpenCL __local, CUDA __shared__).

This is because Microsoft's DirectX platform guarantees 32kB or something of __shared__ / tiled memory, so all GPU providers who want a DirectX11 certification are guaranteed to have that cache-like memory that programmers can rely upon. When DirectX12 or DirectX13 comes about, the new minimum specifications are published and all graphics programmers can then take advantage of it.

-------

No sane Linux/Windows programmer however would want these kinds of guarantees for normal CPU programs, outside of very strict realtime settings (at which point, you can rely upon the hardware being constant). Linux/Windows are designed as general purpose OSes.

DirectX 9 / 10 / 11 / 12 however, is willing to tie itself to the "GPUs of the time", and includes such specifications.

l33t2328 · on June 17, 2022

I don’t think you can generally control the cache with such granularity since modern processors do all sorts of instruction level parallelism and cache coherency voodoo

deredede · on June 17, 2022

On CPUs you can't really force data to stay on the cache, but if you access it frequently and there is not too much load, it will stay there anyways.

Some architectures (e.g. GPUs) provide local "scratchpad" memories instead of (or in addition to) caches. These are separate uninitialized adressable memory region with similar access times to a L2/L1 cache.

erwincoumans · on June 17, 2022

Explicit control over the fastest memory is what GPU local storage or the PlayStation 3 Cell SPUs allow/require.

For x86_64 there are cache hints, no pinning/reserving parts of the caches (as far ad I know).

I wonder if Apple M1 or M2 cpu with unified CPU/GPU memory has anything like pinning or explicit cache control?

jcranberry · on June 17, 2022

If the data is contiguous in memory and frequently accessed it will almost certainly make its way into L1 cache and be there for the life of the program.

If the data is not contiguous it could make the CPU's life much harder.

There's also the matter of program size (the amount of instructions in the actual program) and whether the program does anything which forces it to go lower cache levels or RAM.

There are intrinsics for software prefetching such as __mm_prefetch, but those are difficult to use such that they actually increase you're performance.

jodrellblank · on June 17, 2022

One of the claimed advantages of the array programming language "K" is that the interpreter is small enough for the hot paths to stay in the CPU cache. It's hard to Google but claims come from people/places like this thread: https://news.ycombinator.com/item?id=15908394

generalizations · on June 16, 2022

I ran across an animation once that showed graphically the time it takes light to travel between the planets and the sun. It's weird, but light doesn't seem that fast anymore.

ineedasername · on June 16, 2022

The speed of light has really not kept pace with Moore's Law. Engineers have focused overly much on clock speed and transistor density and completely ignored C, and it's really beginning to show.

bombcar · on June 16, 2022

People still using C as if C++ isn’t available for faster-than-light code.

aneeshnl · on June 17, 2022

Its still just 1 m/s improvement. Not worth it.

ben_w · on June 17, 2022

Only in metric units. If you’re using natural units, it’s twice as fast as c.

gilbetron · on June 17, 2022

dammit that took me a second to get

orzig · on June 16, 2022

Another consequence of our society’s reduced investment in fundamental physics - instead we go ether.

AnimalMuppet · on June 16, 2022

I recently read a science fiction short story on reddit, where humans had developed faster-than-light communication because they needed to reduce lag in networked games.

scarmig · on June 16, 2022

Perhaps the solution is more on the bioengineering side of things: make smaller people so they can fit in smaller rooms.

Scalene2 · on June 17, 2022

So, there's actually a proposal for "real-time" communication between galaxies. Just upload your consciousness to a computer and run it slow enough that a few hundred million years feels like a couple seconds.

gpderetta · on June 17, 2022

Slow time is a thing on a few Egan books were the population (of effectively immortal post humans) of some planets collectively decide to slow their internal clocks to allow some of their members to take interstellar trips without missing out too much of their original life.

The trips themselves of course consist on transmitting the mind to be downloaded on a new body on the other side.

RaoulP · on June 17, 2022

I don’t really see the benefit. But running that slow, you might experience the heat death of the universe in your lifetime!

inkblotuniverse · on June 17, 2022

The idea is that you've already dismantled the stars and stockpiled all the energy in the universe, to use at your leisure. (i.e. everyone lives around black holes they can throw mass into to reap the bawking radiation). Since you have control over the last non-entropic systems, you can tune how fast the candle burns.

And you're a mind running on a computer! You were already going to experience the heat death! This is just a scheme to get the most subjective time out of it as possible. (Running slower is more efficient.)

_dain_ · on June 17, 2022

Bay Area, 1960s: "Soon we will have computers the size of a single room"

Bay Area, 2020s: "Soon we will have rooms the size of a single computer"

ben_w · on June 17, 2022

"Vertical sleep pod, extra large (room for iPad on ceiling!), $3500/month"

ruined · on June 17, 2022

this has been tried before. the vast majority of the smaller people that get made simply keep getting bigger until they become normal size.

necovek · on June 17, 2022

We need to make them make even smaller people before they grow (up? :)!

ineedasername · on June 16, 2022

That would have a nice side benefit or making space exploration much more economical. Faster too if you can bioengineer higher G resistance at the same time. Maybe someday the outer planets will be colonized by tiny humans measured in millimeters, with 125mm humans darting around the various moons shot out of repurposed tank canons, all laughing at the slow giants stuck down the gravity well on Earth.

hermitcrab · on June 17, 2022

Their tiny brains might be an issue for space exploration. But ideal for setting up a government when they arrive.

taneq · on June 17, 2022

Or just normal sized humans simulated on a supercomputer the size of a matchbox…

fendy3002 · on June 17, 2022

Then how will tinder be? "If you're less than 10 cm don't talk to me midget".

OTOH can the heart perform equally good if human get miniaturized?

Scalene2 · on June 17, 2022

This is one reason an all female astronaut team has been considered.

somenameforme · on June 17, 2022

The much more remarkable thing is to consider that that speed of light is also the speed of causality itself. It takes light from the sun about 8 minutes to reach Earth. If the sun suddenly disappeared, we'd still see it shining brightly in the sky, and the Earth would continue revolving around it - all for another 8 minutes until reality finally caught up to us. So we're already computing at a rate on the verge of the speed of reality itself.

It's interesting to consider this paired against how technologically primitive we ostensibly must be, given that digital computers didn't even exist 90 years ago.

CMCDragonkai · on June 17, 2022

Is the speed of light really the speed of causality? Would the effect of gravity (the lack of) affect Earth earlier than us perceiving the lack of light.

d110af5ccf · on June 17, 2022

Nope. All physical effects are bounded by the speed of light. (As far as anyone knows, anyway.)

The only weird one is quantum entanglement, but even then information transfer doesn't travel faster than light and that's about all I know on that subject.

rl3 · on June 17, 2022

>The only weird one is quantum entanglement, but even then information transfer doesn't travel faster than light ...

For anyone wondering why that is, I found this explanation super interesting:

https://www.forbes.com/sites/startswithabang/2020/01/02/no-w...

drewtato · on June 17, 2022

I also liked this explanation: https://www.youtube.com/watch?v=v7jctqKsUMA

In short, the only information you gain is about the outcome of the other side's measurement. They cannot introduce information into the particle, and thus can't transmit information. The only thing you learn is what you already knew: the other particle had a 50% chance of being in one state or another, with the added fact that it's now correlated to your particle with a 75% (depending on experiment) chance. This is information that didn't exist until that moment, so it couldn't have been sent out and reach you before you measure your particle, which would inform you with 75% certainty how your particle would act and break causality.

dorgo · on June 17, 2022

> All physical effects are bounded by the speed of light.

The universe expands faster than light. So this expansion is not a physical effect?

orestarod · on June 20, 2022

The universe expanding is a property of space itself. And "faster than light" is only because miniscule space expansion in any quantum of space adds up with distance, e.g. 1 picometer per kilometer per second, add enough trillions of kilometers and you get that faster than light expansion. There is no mass, particle or information moving THROUGH space faster than the speed of light, which is what the said limit concerns.

saberience · on June 17, 2022

Why does gravity travel at the speed of light?

rieTohgh6 · on June 17, 2022

Well the other way to look at it is: gravity travels at maximum possible speed in this universe. Light in vacuum can also reach same maximum speed. I guess we would say the light travels with speed of gravity if we measured them in other order.

e63f67dd-065b · on June 17, 2022

The speed of light is the speed of information propagation. Information about the sun disappearing would only reach earth at after c/1AU.

HWR_14 · on June 17, 2022

The change in gravitational force also propagates at the speed of light. It's weird.

edbaskerville · on June 16, 2022

The thing that did for me is realizing that people on opposite sides of the United States can't play music together if it requires any rhythmic coordination, even with a true speed-of-light signal with no other sources of latency.

aeonik · on June 16, 2022

jamtaba and ninjam are an open source solutions to this problem.

They allow you to buffer everyone's playing, at a user specified interval, then replays the last measure of music to everyone.

It's definitely not the same as live playing, but it's still pretty fun, and actually forces you to get creative on different ways.

https://jamtaba-music-web-site.appspot.com/

edbaskerville · on June 17, 2022

Interesting! Presumably last 12 measures if you're playing a 12-bar blues, etc.

Big downside is you're stuck playing to a metronome, which would be enough for me to skip it, but it depends on the kind of music you're playing.

I could imagine that if the music is rhythmically slow and vague and improvised, big latencies are OK, and actually might yield some pretty interesting creative results.

Another model I've thought about is to structure players in a rooted DAG, and players can hear only people upstream of them.

E.g., you could build an orchestra by having a conductor and section leaders in a room together (or at within very low latency of each other). Other players could hear the leaders and play along, and then an audience could hear everyone. You could also do something more complicated like build things out in linear or power-of-2 layers, where each layer can hear everything upstream of it, and therefore many players would get a partial sense of the orchestral effect.

This could work nicely for improvised music, too, with causality preserved.

Aeolun · on June 17, 2022

How does that work for the one playing ahead of everyone else? He just doesn’t hear anything? Or he hears his own music from 1 second ago? Or worse, other people’s music from 1 second ago.

rocqua · on June 17, 2022

I think everyone hears everyone else playing x milliseconds ago? Where x is probably decided based on the beat.

The weird side-effects are why I guess the parent said that "it's still pretty fun, and actually forces you to get creative on different ways."

dizhn · on June 17, 2022

You have to think in bars not seconds.I believe you hear your own playing in near real time and is overlayed on the bars coming from others.

It works pretty well with well structured music like the blues. You probably couldn't play well if the piece was changing tempo or key all the time.

sklopi · on June 17, 2022

You could have a metronome in the middle

ZoomZoomZoom · on June 17, 2022

It is actually the way to go with client-server programs, such as Jamulus. People from distant locations try to chose/run a server closer to their geographical (or, more properly, with a correction on how the fiber runs) middlepoint.

shultays · on June 16, 2022

I feel like it is more like we cant comprehend how big and empty space is

booleandilemma · on June 18, 2022

I saw that exact animation you're talking about and my first thought was "there's no way that's the fastest thing in the universe."

generalizations · on June 17, 2022

Edit: found it: https://www.youtube.com/watch?v=nQUwHdSAhmw

capableweb · on June 17, 2022

If you're using a normal monitor, the bottleneck would be transferring the results of the calculation to the monitor, which commonly have a latency of 3ms or more. So when the monitor displays the calculations, the CPU has already moved on to other things :)

drewtato · on June 17, 2022

It takes 10-20ms for the pixels to transition on an LCD display. And on 60Hz, it's 8+/-8ms for the monitor to actually address the row with your information. Luckily, the CPU doesn't need to wait for the monitor. And the slowest part of the chain will almost always be getting it from your eyes to your hands (250ms+).

diroussel · on June 16, 2022

That is quite an amazing way to put it.

So the processor in my hand can compute a multiplication fast than light can cross the room?

jiggawatts · on June 16, 2022

It can complete many multiplications in that time, especially if you factor in parallelism. An 8-core machine using AVX-512 could do a few thousand 32-bit multiplications in that time. Your GPU can do tens of thousands, maybe hundreds of thousands depending on the model.

bombcar · on June 16, 2022

Or another way to look at it - the computer can do an absolute insane amount of math in the time it takes to roundtrip a single byte to the datacenter in US-West.

jiggawatts · on June 16, 2022

The more generic way I like to put it is that throughput has been improving exponentially for decades thanks to Moore's law, but latency hasn't changed much at all and has a hard limit due to the speed of light.

Hence the ratio between latency and compute has been changing exponentially. Even a linear or quadratic change would be dramatic, but exponential is something people just can't wrap their heads around. They're unable to really internalise it, in much the same way that in the early days of COVID people couldn't quite fathom how it is possible to go from 3-5 cases per day to tens of thousands.

HDD random I/O latencies are about 10 to 100x slower than a network hope. These days local SSD latencies are about 100x better than a typical network hop and this is just going to keep going. It'll soon be 1,000x better, then 10,000x, etc...

Any architecture using "remote storage" or "remote database calls" will be absolutely hamstrung by this. It'll be the equivalent of throwing away 99.99% or even 99.999% of the available performance.

People will eventually wise up to this and start switching over to distributed databases that run in the same VM/container as the application tier. So instead of "N" web servers talking to "M" database servers, it'll be N+M nodes with both components deployed into them.

Whatever argument can be made against this new architecture will become exponentially invalidated over time. Putting everything together is "too many GB of software to deploy"? Bzzt... we'll have 1 TB ram soon in typical servers. The CPU load of both together is too high? Bzzt.. the next EPYC CPUs will likely have 128 cores! Cache thrashing a problem? Bzzt... 1 GB and larger L3/L4 CPU caches are just around the corner.

mellavora · on June 17, 2022

> slower than a network hope.

Sometimes typos contain wisdom

Aeolun · on June 17, 2022

Millions of multiplications in the time it takes me to open the ‘calc’ app.

jiggawatts · on June 17, 2022

Billions.

Your estimate is off by 3 orders of magnitude.

Aeolun · on June 17, 2022

You underestimate the speed at which I open the app ;)

wruza · on June 17, 2022

Not so extraordinary if you take into account that CPU is essentially a thumb-sized labyrinth for light.

orestarod · on June 20, 2022

Not light. It's electric potential that moves around. No visible photons involved.

dekhn · on June 16, 2022

Eventually you can't stuff enough computing in a small area (power density). Therefore you have to connect multiple CPUs spread out in space. The limit for many supercomputers is about how long it takes light or electrical signals to travel about 20 meters. Latency to first result is only part of the measurement that matters.

denzil · on June 17, 2022

This comparison reminds me of Grace Hopper explaining how long nanosecond is: https://www.youtube.com/watch?v=9eyFDBPk4Yw

grishka · on June 16, 2022

Huh, but then I'm pretty sure that there are some paths inside the CPU die that are long enough that speed of light is a consideration at these frequencies. Must require a lot of smart people to design these things, yet it only takes a bunch of junior developers to bog them down.

orestarod · on June 20, 2022

There is indeed the speed of propagation of electric potential taken into account, that is, how long it takes for the input of a logical gate or a logical subsystem to produce the output (that involves the propagation of electric potential through the chip's conductors). If your clock is too fast for the size of your subsystem, the result will not be correct at the output before the next cycle begins, so your system will just be bogus.

google234123 · on June 16, 2022

I say we ban all junior devs.

morelisp · on June 16, 2022

Easier to raise the speed of light.