Hacker News new | past | comments | ask | show | jobs | submit login

most languages don't even have a specification.



Yes! This is much under appreciated.

Usually, it’s just some core calculus of the language that is rigorously specified and the rest is hand waving.

There are some exceptions like JS.

But even Java has the problem that if you just implement what’s in the spec, you won’t be able to run anything meaningful unless you also do things exactly how the JDK would. You can find out what the JDK does by reading its code and writing test cases and I think that’s what folks do, if they want to be compatible.

UB and memory safety are orthogonal. If we specified formally and super rigorously that a pointer is an integer and that memory is an array of bytes, we could have a UB-free language but memory safety would still be on fire.


> If we specified formally and super rigorously that a pointer is an integer and that memory is an array of bytes, we could have a UB-free language

That's PVI (Provenance Via Integers) and it's a performance disaster. If anything in memory might be pointed to, almost all the nice modern optimisations aren't correct. It is really popular with a certain kind of "Portable assembler" programmer, who typically has no idea how the machine actually works, nor how their language is defined but is very confident the nonsense they're writing ought to do what they wanted it to do.

So, the "bad" news is that you can't have this, your compiler vendor won't make it, and the "good" news is that you'd have hated it anyway which is why they won't make it.


I don't think it is same as PVI, because i) there are a lot of possible non-determinisms still allowed and thus to be exploited, and ii) the specification will have to require only the observational equivalence anyway because every optimization will be invalid otherwise. It should be definitely possible to define a very precise machine without actually mandating PVI.


Everything is memory safe if your only datatype is uint8_t /s


Yup :-)


The issue isn't that C++'s specification has an issue. The issue is that the same issue resulted in no CVE for any C++ vendor. And most languages tend to have 1 reference de facto implementation whereas C and C++ are quite unique in having 2-4 mainstream ones in regular use (& the C++ frontend is ridiculously complex). Java, Python and C# are the only other mainstream languages with a formal spec and only Python that I know of maybe has alternate frontends (there are multiple runtime implementations for C# and Java but I don't believe the language -> bytecode part is different).

JS is maybe closer on this front but it's also quite old & JS is also a mess & a lot of development has shifted to TS as the language for those reasons & TS only has 1 frontend & no formal spec.


There are at least two Java bytecode compilers. Though javac is obviously the "reference", there is also egc. It's used primarily by IDEs and editor plugins (like Eclipse, from whence it came, and the RedHat Java plugin for VS Code).

Still, if memory serves there have been a handful of cases where egc's implementation of the spec differed from javac's with resulting fixes in javac itself (though I don't have sources at hand, so perhaps I misremember).


That is an even more grievous deficiency than an inadequate spec, but it hardly means that a bad spec is good.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: