Hacker News new | past | comments | ask | show | jobs | submit login
What scientists must know about hardware to write fast code (2020) (viralinstruction.com)
226 points by goerz on Oct 3, 2023 | hide | past | favorite | 74 comments



In college (for a time) I was a double major in CS and Physics. I found a job as a programmer at a Physics lab, which fit my interests very well. The previous person roughly showed me the ropes for just a few days before she left to go to grad school.

The PI started asking me to run some analyses on a raw dataset. Since I was so new at it, I often messed up and had to rerun the whole thing after looking at the output; this was painful because the entire script took a few hours to run.

I started poking around to see whether it could be optimized at all. the raw data was divided up into hundreds files from different runs, sensors, etc..., that were each processed independently in sequence, and the results were all combined together into a big array for the final result. Seems reasonable enough.

Except this code was all written by scientists, and the combination was done in the "naive" way - after each of data files was processed, a new array was created and the previous results were copied into the new array, as were the results from the current data file. This meant that for the iterations at the end, we roughly needed to have Memory = 2 * Size of final data, which eventually exceeded the amount of physical memory on the machine (and because there were so many data files, it was doing this allocation and copying dozens of times after it used all the RAM).

I updated this to pre-allocate the required size at the beginning for a very very easy 3-4 fold improvement in the overall runtime and felt rather proud of myself.


Yeah, back in college I worked with a Biochemistry grad student on a group project that involved some coding (I was Computer Engineering). To iterate over a matrix, he used three nested loops with an if-statement to switch between rows and columns. Technically it worked but wildly inefficient, and he was proud of it...

To his credit once I (as nicely as possible) showed him how to do it with two nested for-loops he clearly felt stupid and conceded the point. He was otherwise a very smart guy and good to work with, but goes to show how we can take our training for granted. Even freshman-level stuff goes over the heads of PhDs, and I'm sure the same would be true if I were to drop into a biochem lab.


Similar story - a PI had written some code to from (row, column) indices of the upper triangle of a matrix (made somewhat tricky by excluding the main diagonal) to a linear index. He used a for loop to start from the beginning and count up for an O(n^2) algorithm - I was able to give him an O(1) constant time formula to do the same thing for a rather dramatic speedup.


I ended up needing this so often for graph processing, and for values which might be inexact if using floating point, that I saved the formula in a blog post. https://vladfeinberg.com/2020/03/07/subset-isomorphism.html

The formula can be "oblivious" to the final size of the matrix too, which is helpful if you're doing some sparse ML training on edges (e.g., GNNs).


During my masters thesis in a chemistry lab, I got a side task to look at a data analysis script and make it run faster. It was a "C/C++" code (i.e. procedural C-style code using C++ stdlib for convenience) that read a file line by line and then fed it to a slow processing function, then aggregated the results. It took over a day to run.

Without even looking at the processing function, which I considered some sciency science, I set up pthreads and mutexes on the result array and such to reap almost perfectly linear scaling. So far, so good.

Then I ran a profiler to see what was actually taking so long.

... Uh, why are you spending all this time copying strings back and forth?

Turns out they passed all strings by value. Sprinkling in a few const & here and there got a 1000-fold speedup or such. I felt pretty stupid for my multithreading antics after that.


Having continuity from your previous analysis is a feature though. You can load up one of your objects and a good library should have the exact parameters used to generate stored there.

Also, H5 data formats[0] have been a god-send for scientific computing, due to its ability to inherently make sense of how to store your data. You can have your previous results curried over into your new analysis without doubling your data.

0: https://en.wikipedia.org/wiki/Hierarchical_Data_Format


Could you have achieved the same thing by just delete/free() the old arrays after copying them? I suck at manually allocating memory, have always worked in garbage collected languages where this wouldn't really be an issue.


No, it was actually in a (obscure scientific) garbage collected language. The syntax was roughly: `allOrbitFiles = allOrbitFiles + currentOrbitFiles`.

I believe what was roughly happening under the hood was: 1. Allocate an array `tmp` of size `length of allOrbitFiles` + `length of currentOrbitFiles`. 2. Copy data from `allOrbitFiles` over to `tmp`. 3. Copy data from `currentOrbitFiles` to `tmp` 4. Reassign `allOrbitFiles` to the new array `tmp`. 5. Garbage collect the old `allOrbitFiles`.

So the doubling of memory usage comes after Step 1. I would imagine (but don't know for sure) that this would actually occur in any garbage collected language I'm familiar with as well (Java, Python, Javascript).


Yes it would happen in JS/python/etc as well once (2 * allOrbitFiles) > Memory. Cool fix I never think about my arrays being larger than memory haha.


Solid post. It also shows how powerful Julia is: allowing to operate at different levels of abstractions (down to seeing the assembly) using the same set of tools.


I feel the biggest misleading statements around Julia is that for true speed you can somehow ignore the lower abstractions, or that there is some kind of free lunch, but always what you gain in performance you'll spend in development time. Julia has some neat tricks, but they are not generally and universally applicable at least not like other languages. I dunno. These arguments against Julia are many, but I'm still appalled they have so many handy wavy misleading statements in just their introductory text, I don't think it be less than a decade before they recover.


There are different levels of performance to target though - a _basic_ (no SIMD, parallelization, etc) `for` loop can easily be as fast as an C++ version. More performance can be had from both languages, of course. In my experience, the Julia versions offer easier mechanisms to take the code from _basic_ fast to _advanced_ fast. For many, _basic_ fast is fast enough. And when it matters, you can go a bit deeper.

A good example: there was recently a thread on the Julia discourse comparing Julia and Mojo. Julia used no external libraries (compared to 7 with Mojo) implemented a simpler, faster, and cleaner version of the Mojo code that was used to showcase how fast Mojo was: https://discourse.julialang.org/t/julia-mojo-mandelbrot-benc.... Then further still, folks were able to optimize for even more speed with various abstractions that let Julia take more advantage of the hardware.

That's the promise I think Julia makes and delivers on - you can write incredibly "fast" code simply and cleanly. Yes, you can have a higher standard of "fast" which requires a bit more advanced knowledge but I'd argue that Julia still offers the cleanest/simplest way to take advantage of those micro-optimizaitons.


The speed ups exist in other languages like data.table in R, Polars / Jax / NumPy in Python for example.


And exactly that is one of the things Julia was designed for, you don't need two languages to get the performance. This allows to have even more flexibility, you can write allocation free, copy free, SIMD code from top to bottom, because it is a single language. That is not possible on other languages, because you have the high-low level language/library distinction. And if you do (JAX?), then you have made what Julia already is, a runtime compiled language, but with its own set of problems.


How do you define "abstraction" then, I don't see how that is beneficial to obfuscate at which level things are happening.


For abstractions think: automated memory management, rich class/object libraries, idiomatic libraries, compilers that recognize and vectorize common usage patterns, runtime error handling, dynamic typing, etc.

That's not obfuscation, its faster code development, easier to read code, simpler maintainability.

Everything at a higher level than stack, manual heap, processor instruction, registers, explicit addressing and modes, direct I/O, and networking primitives level.

To have all that help, but still be able to drop down to the lowest level, in one consistent toolset is really nice for development and reliable sharing.

(Only a Julia fan at a distance! Not had the pleasure.)


I guess this sounds nice but I wouldn't want to be told that I should just learn assembler, or that if I know assembler that I'd have to put up with whatever high level memory management scheme that I'd have to work around, etc...


I don't think that's right. Well, you're right that some enthusiasts of Julia are too quick to say "fast as C, easy as Python" without appending an asterisk to that statement. You can't get really fast performance without paying any attention to the hardware.

Performance is on a spectrum, and usually a tradeoff against readability and conciseness. I think it IS true that Julia excels in that it gives, by far, the best expressibility/performance tradeoff.

Also, often, you really can get "free lunch" - there are many times where if you just do the obvious thing, Julia and its backend LLVM can optimise it to extremely efficient code. For a simple example, just summing an array with a for loop, for example.


To me, that's the big hand wavy thing is that that thing where if you just do the straightforward obvious thing it's fast, most real world problems aren't straightforward and obvious and therefore turn into something more complex. And now you've got a big can of worms because it was quick in the unspecial case. But now you have no real easy way to get to your special case working correctly without learning more and spending more time just like any other language or switching to a lower level.


> But now you have no real easy way to get to your special case working correctly without learning more and spending more time just like any other language

That's a big hand wavy thing right there: "spending more time" is not a binary, and "just like any other language" ignores the massive differences there are in how much time you have to spend, what resources are available to you from the language, and how easy the ecosystem makes it.


At every level of abstraction there is a lot of learning in that context, but now you demand it across all levels simultaneously.


My experience is a first implementation / novice programmer will write Julia code of a similar speed to python. But then an intermediate Julia programmer can adapt that same code to be of order C / Fortran performance, without stopping the novice programmer from being able to work on the code base. So it's this ability to iteratively improve and collaborate across very different skill sets that's really important.

This is quite a different situation to traditional scientific computing.


> My experience is a first implementation / novice programmer will write Julia code of a similar speed to python.

No because of JIT compilation he would write code faster than Python by default. Now to truly rival optimized C++ code one has to do the tricks mentioned in this post like optimizing memory access, SIMD and maximizing instruction parallelism.

The key point is you are better off by default and can do some ugly stuff in the critical parts of the code while still using the same language.


It really depends on what you end up doing. A lot of "python" code is really just thin wrappers around fairly optimized C routines (although there's inefficiencies from the wrapper, and the optimization barriers), so if you're doing something where the bulk of the work is happening inside those routines, beginner-written julia code will end up being roughly comparable to expert-written numpy code or whatever.

But yeah if you're writing a loop or something else where the majority of work is actually being done by python itself, then it's going to typically be much much slower than the equivalent julia code.


> beginner-written julia code will end up being roughly comparable to expert-written numpy code or whatever

This is not possible by definition, or is a misunderstanding of where and how performance occurs. If this is possible, then it is just as easy to perform worse if the beginner steps to either side of the happy path or if their problem doesn't fit the preconceived optimizations and is therefore no longer a "language" but some kind of "library". I think Julia should be seen as a library and not a language because a language is not comparable in this way that Julia likes to handwave away as magic.


This was the premise of Lisp Machines, unfortunely the industry took another path.


Was it? I thought even in the lisp machine days, lisp was kind of garbage collected, so that was always the bottleneck.

I remember a quote that was like “Lisp programmers know the value of everything and the cost of nothing” in reference to that.


> “Lisp programmers know the value of everything and the cost of nothing”

Obviously the developers of Lisp Machine operating systems could not ignore the cost of the operations. Especially since they developed ambitious software (an operating system and its application) on relatively slow machines (a Symbolics 3600 was as fast as a 1 MIPS DEC VAX 11/780).


Yes, and?

GC was a kernel service, and there were low level primitives, including Assembly level Lisp forms.

Parenthesis all the way down to microcode.


Ditto for Forth.


I don't see affordances for operating at multiple levels of abstraction. The single example of another level is ccall to an LLVM instrinsic - that's not any different from inline assembly in basically any other compiled language. Supporting multiple levels would mean you can do all (or most) the same things with LLVM IR that you can do with Julia itself.


The site simply told me I was using the wrong browser. How great of an environment can it be if it can render webpages for everyone.


Your comment must be more about the environment of the web browser you are using?

The site is a staticly published version of a Pluto notebook, which uses modern web features to enable interactivity, reactivity, code syntax highlighting, etc. etc. Tradeoffs to enable those features but requires enabling your browser features. The underlying file that the notebook is based on is just a basic `.jl` file, so you could happily run the notebook from a Julia instance instead of the browser-based notebook environment.

Julia itself will be happy to run however you'd like it to of course.


I'm sure that is true but I'm just expressing my experience knowing nothing about Julia or Pluto.

I thought I was visiting a website.


Do you mind mentioning what browser it is? It works fine on Firefox, and I presume Chromium-based browsers too (since this article has gone through a few rounds and I haven't heard anyone else complain, though some of that was before it became this "notebook" version and was a static page instead). Is it one of those instances where you have Javascript disabled and the page assumes your browser is just not capable of JS instead of telling you to turn it on?


I was on ancient version of safari. I use for browsing. I completely understand why it wouldn’t work. I just feel that information published on the web should degrade to a useful state for consumption.

The HN description actually referred to nicely layered technical abstractions. Which is why I had clicked the link. The description of Julia as being Lisp like. Thank you for taking an interest. I sill go see if a newer safaru works.


I don't know what is used to render this post, but the table of content as a floating icon would work best being closer to bottom at left, on mobile, because there is a scrollbar floating on top of it at the right that makes hard to tap it, and also because the eyes on the screen look at top of the screen mostly.


It is quite refreshing to see software optimization be explained so simply and elegantly.


Related:

What scientists must know about hardware to write fast code (2020) - https://news.ycombinator.com/item?id=29601342 - Dec 2021 (29 comments)


FYI the underlying link in that previous discussion post seems to be defunct and kind of suspicious.


Ok, I've disabled the link at the top and posted https://news.ycombinator.com/item?id=37759249. Thanks!


Looks like biojulia.net got taken over by spammers :(


Yeah. The single person who ran the BioJulia website disappeared (as people sometimes do in open-source volunteer work), and no-one else had permissions to the website or the domain. Oops.

We moved the website to https://biojulia.dev/, with permissions given to more people, including a core dev of Julia. That should reduce the risk of this happening again.


Having been asked to port CERN C++ code to the Mac, I can tell you that some scientists don’t know or even care about performance.

For those folks, getting the output they need is much more important than the CPU cycles - as it should be.

As a C++ programmer, I posed the question as to why they don’t hire coders to do this for them. The answer was cost which rather surprised me given the cost of the LHC.


This is not true. Maybe a PhD student doesn't care much (or doesn't know), but we care deeply about software performance at CERN. I've worked myself on optimizations in detector simulation and data analysis software (Geant4 and ROOT) for a few years. Later in this decade, when HL-LHC comes online, the only way to be able to cope with the 10x increase in data rate from experiments and a matching increase in simulation requirements will be to optimize as much as we can the software we have, because we will not have the money to just buy 10x the hardware we have now.


It came via Bristol University. Make of that what you will.


Just to complement what I said above, here's a link to a presentation I made a couple of years ago about performance optimizations in simulation: https://indico.cern.ch/event/1052654/contributions/4521602/

We also have meetings dedicated to performance, some of which are not public, but this series from ROOT is: https://indico.cern.ch/category/14122/ If you search above, you will see many discussions about performance. The CI for ROOT also has a set of benchmarks to catch regressions, and Geant4 has two systems to track performance, a CI job checking every merge request, which I've set up myself (not publicly accessible), and a more complex system to track performance run by FNAL: https://g4cpt.fnal.gov/

These are just some examples from the projects I've worked on. There are also efforts to port stuff to GPUs and HPCs, and many other projects like event generators that are also undergoing performance work for HL-LHC. If you Google you can probably find a lot more stuff than what I already mentioned. Cheers,


adding onto multithreading, other parallelization models such as OpenMP or OMPSs take sequential code and parallelise it. They delegate onto a runtime system the efficient execution of the code to achieve a balance between programmers productivity and code performance.

But for large problems the article falls short. Scientific applications may need to use several computers at a time, COMP Superscalar (COMPSs) is a task-based programming model which aims to ease the development of applications for distributed infrastructures. COMPSs programmers do not need to deal with the typical duties of parallelization and distribution, such as thread creation and synchronization, data distribution, messaging or fault tolerance. Instead, the model is based on sequential programming, which makes it appealing to users that either lack parallel programming expertise or are looking for better programmability. Other popular frameworks such as LEGION offer a lower-level interface.


That's a good writeup with a lot of general knowledge on program optimization. It might get a bit dense at times with the details of x86 assembly, but I suppose it might be worth it if performance is important enough that understanding e.g. data dependencies between subsequent instructions pays off.

A minor detail I find a bit confusing, though, is explaining the potential benefits of SMT/hyperthreading with an example where threads are spending some of their time idle (or sleeping).

I don't know Julia so I don't know if sleep is implemented with busy-waiting or something there, but generally if a thread is put to sleep, the thread gets blocked from being run until the timer expires or the sleep is interrupted. The operating system doesn't schedule the blocked thread for running on the CPU in the first place, so a thread that's sleeping is not sharing a CPU core with another thread that's being executed.

So the example does not finish 8 jobs almost as fast as 4 or 1 jobs using 4 cores due to SMT; it's rather that half of the time each of the threads is not even being scheduled for running. A total of eight concurrent jobs/threads works out to approximately four of them being eligible to run at a time, matching the four physical cores available.

If there are only four concurrent jobs/threads, each sleeping half of the time, you end up not utilizing the four cores fully because on average two of the cores will be idle with no thread scheduled.

AFAIK SMT should only really be beneficial in cases of stalls due to CPU internal reasons such as cache misses or branch mispredictions, not in cases of threads being blocked for I/O (or sleeping).

The post is of course correct in that the example computation benefits from a higher number of concurrent jobs because of each thread being blocked half of the time. However, that's unrelated to SMT.

Considering how meticulous and detailed the post generally is, I think it would make sense to more clearly separate SMT from the benefits of multithreading in case of partially I/O-bound work.


Author here. You are right - in this case Julia's scheduler would only run half the threads. The example is poor and I will find another.

Thanks for the heads up!


Is there a similar aggregation website example for C++?


Most of this applies to C++, or any other compiled language.


This subject is taught in undergrad computer architecture courses along with machine coding. As an EE, I learned it in grad school.


Congratulations. Studying one field means you know that field.

This link is not meant for you. It is meant for a scientist, and most scientists do not also have an EE degree or CS degree.

How much graduate level biology, oceanography, physics, geology, chemistry, meteorology, or other scientific field do you know?

All of those have subfields where computational performance is important. My experience is scientists are more likely to pick up the software skills than EEs are willing to pick up the science background. (In part because scientific software development generally pays less well than commercial software development.)


Math was always a must for a scientist, todays computer science is also a must. The study programmes should reflect that.


"Math" is such a wide topic that you certainly must qualify your statement.

The standard entomologist curriculum does not require calculus, while a physics curriculum does. Both produce scientists. (For example, https://cals.cornell.edu/education/degrees-programs/entomolo... under "Major Requirements" says "One semester of college statistics or biometry", and the listed physics requirement doesn't require calculus.)

On the other hand, an entomologist interested in population ecology may need to know differential equations.

Your use of "study program" suggests your experience is at the undergrad level, and not at the grad school level, which is how most scientists I know got their training.

At the undergrad level the study programs do reflect what's needed for a solid education. If a student is interested in computational biology, that program will emphasize taking more CS courses than the program for a student interested in marine biology.

But at the grad level, the "study program" is much less formalized. You might take graduate level classes the first couple of years, but then you are expected to pick up the missing bits on your own.

Once you have your PhD and are a working scientist, you rarely have the luxury of following any study program.

And if you've been a scientist for 20 years, any CS training you had likely did not cover SIMD, and emphasized practices which are no longer relevant. (For example, the link points out "That advice [about HDDs] is mostly outdated today [with SSDs]".)

Those latter categories are who the linked-to piece is for, not undergrads in a well-defined study program.


The article basically implies that some non-professional coder will be doing assembly and basically doing the work of an optimizing compiler. I think the point of the parent is that if you are at this point already, and you're in an academic setting, you might as well real a full computer architecture textbook front to back.

I would be curious to know of all the "scientific coders" what percentage of them understood the entire article. I'd be similarly curious how much your typical "bootcamp" developers would understand of it. I know everything presented, so it basically comes off as a "lecture notes" for someone that already knows it. Someone that doesn't understand SIMD, CPU fundamentals, assembly, and compilers, I'd imagine their eyes would glaze right when the assembly code appeared.

And while SSDs are MUCH FASTER than HDDs, the basics of interacting with storage is the same, just that rather than waiting a million years for data to arrive from the CPU's perspective, it comes in 10,000s of years.

Latency numbers all programmers should be aware of:

https://gist.github.com/jboner/2841832


I am a professional coder and my eyes glaze over with assembly. Still, I don't think there's that much assembly. What I saw was to show how the code is implemented, with very little about "doing assembly" outside of a simple example.

I can't judge background - I don't have a sense of who uses Julia, and I've been programming for too long, without exposure to the target audience.

Since you mentioned "academic setting", I'll point out there are also scientists-who-program in industrial settings. However, none of the ones I know about use Julia.

My belief is that most scientists-who-program aren't going to read text books from other fields. They are under pressure to produce NOW, and don't think it's worth the time to acquire an entirely new mindset. Instead, I think this sort of knowledge transfer is by jerks and fits, as someone figures out an optimization, and passes it along, with domain-specific context that makes it easier for others in the field to understand.

Which means, like you, I don't think this notebook will be all that useful, though in my case that's because I think it's too generic.

> what percentage of them understood the entire article

I don't think that's a telling metric. Only some scientific coders are interested in writing fast code (vs. fast-enough code), and only some of those use Julia.


I agree with this sentiment, like the majority of CS people are telling statisticians that a lot of Julia remains a kind of snake oil or otherwise mystical thinking, it is very unfortunate. Even in the first page of the documentation "No need to vectorize code for performance; devectorized code is fast" is some kind of category error redefinition of how programming languages work in my opinion.


> Even in the first page of the documentation "No need to vectorize code for performance; devectorized code is fast" is some kind of category error redefinition of how programming languages work in my opinion.

Can you elaborate a bit? I don't really get what you are trying to say.


If the code can easily be vectorized then it has the potential to vectorize it incorrectly or there is some automation happening that is hidden. If they're just saying their non-vectorized operations are just as quick, then how quick could true vectorization be. Also, this is how Octave, NumPy, Matlab, R, etc work by making vectorized math operations happen with whole matrices using statements that look like simple non-vector operations. Further, usually when people are having these kinds of issues it's because they started with a non-parallelizable concept of their problem in our trying to redo it... And no amount of magic is going to fix a bad concept of the problem space.


I think the problem here is that there are two very different meanings of "vectorized" at play. The first (and what the Julia docs are talking about here) is the pattern of writing "vector-operations" (i.e. rather than writing a loop, writing an expression that works across an entire array). The second meaning is using SIMD instructions (e.g. AVX2). What the julia docs are trying to say is unlike in languages like Python/R/Matlab where loops have a high overhead do to an interpreter, in Julia, loops are fast because the language is compiled. There are lots of algorithms that are easy to express in an iterative fashion that are pretty much impossible to vectorize efficiently (dataflow analyais/differential equations etc).

The docs here aren't trying to talk about SIMD instructions at all here. (although Julia/LLVM are pretty good at producing SIMD instructions from loops where possible).


But no one in science uses Python loops for example, they use NumPy / Jax / Polars etc so that is an unfair and disingenuous comparison.


That's exactly the point. People are changing their coding style to work around the fact that loops in the language are slow.


This existed quite a while before Julia, how is it that Julia claims this differentiation then? "We pride ourselves by not using the slow thing you're probably not actually using."


Because Julia claims to get usability and performance in one language, rather than the two Python needs (Python for the user-facing API, C/C++/Fortran/etc. for performance).

This should also help with optimization, and a larger amount of code can be optimized together, while (C)Python can only optimize up to the Python/C boundary.


I can't imagine anything worse than something that is not appropriate for the abstraction... eg https://en.m.wikipedia.org/wiki/Leaky_abstraction

Again redefining these things here... if the language has tools for all of inline raw chip specific instructions, compiler optimized versions of those instructions, virtualized and then optimized instructions, a jit compiler, and then also high level interpreter operations, then how can it be in anyway an encompassing system and how can that be coherent among all levels without requiring someone to know all levels at which point, I'm going to say no thanks and stick to tools at their respective levels of abstraction where I can reason them to be coherent rather than a dynamically allocated string which is sometimes tied to only working on amd64 because someone wanted an assembler way of pattern matching for some reason.

Maybe within the scope of scientific computing this seems logical if you just restrict yourself to matrix multiplication or whatever, but I don't see how that makes a "language" and it can be coherent, composable, etc...

I've seen a ton of examples, however, over the years of "stunt driven" technological advantages that people thought would be interesting but turned out to be the wrong solution for the wrong problem, but none that claim to break the laws of physics and reason with their evangelism than Julia. Even this article starts with "you must understand the quirks" and I hope that isn't a goal for their design because they are also claiming that I shouldn't have to know the quirks so which is it.


> Leaky_abstraction

Ummm, CPython is also leaky abstraction. Parts of the C implementation, like garbage collection, id(), 'a is b' checks, the ast and dis modules, and more.

It even has the beginnings of JIT support.

The leaky abstraction thesis is that all layers leak.

Julia's argument is that if you have all of these levels anyway, do it in one language instead of two. If you don't like leaky abstractions, you should prefer a system with one less layer of abstraction.

You also reject Rust, yes? It has many of the same abilities.

And JITed Lisp implementations with user access to the JITted code?

> then how can it be in anyway an encompassing system and how can that be coherent among all levels without requiring someone to know all levels at which point

That sounds like an argument from incredulity.

Just because you don't see how something can be true, that doesn't mean it isn't true.

> which is sometimes tied to only working on amd64 because someone wanted an assembler way of pattern matching for some reason

I believe all of the big C compiler vendors support ways to embed assembly. I use it in my code, for better support for x86-64, and a fallback for other platforms.

I also used Turbo Pascal's inline assembly in the early 1990s.

> but none that claim to break the laws of physics and reason with their evangelism than Julia

I guess you're too young to remember Lisp evangelists.

You seem to be reacting to something beyond what is in the linked-to essay. What breaks the laws of physics? Again, appealing to gut instinct isn't that good of an argument.


> Just because you don't see how something can be true, that doesn't mean it isn't true.

This is also a non-answer and I don't mean to be flippant but if you have any further justification I'd love to read it. It is the core of the argument you're handwaving away.


I know little about Julia. I do know that advocates for Lisp also make similar claims, so I don't see it as that exceptional.

Not surprisingly, Julia draws on Lisp's macro abilities to achieve similar goals. Julia is also influenced by Dylan, another ALGOL-like Lisp variant.

If Julia does do what you say you don't believe it can, how would you learn that you were wrong?


While I could certainly put most of this together from my undergrad CS education, I would not say this "subject [was] taught" to me during undergrad. Instead, as with much of undergrad, you get pieces of it along the way - but collecting it together and writing for a somewhat-lay audience has a lot of value. This is also more up to date than my under grad education from ~14 years ago! It has clear explanations for things like Hyperthreading, which existed at the time I was in undergrad, but hadn't really made its way into the curriculum yet.


Well, apparently I offended a lot of people by merely providing a piece of information about computer architecture curriculum. I apologize for commenting.


With no context about your emotional state, it came off as a bit brusque, and I think most people read comments like that in a snarky, condescending tone.

If you had prepended the comment with something like "I love this topic!" to show enthusiasm or approval, you probably would have gotten a much different response.


Not everyone does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: