Hacker News new | past | comments | ask | show | jobs | submit login
Bytecode (mecheye.net)
125 points by acqq on Nov 13, 2013 | hide | past | favorite | 36 comments



Interestingly enough, you don't send Bitcoin to addresses. You send Bitcoin to a script of your design, and whoever can provide an input which causes that script to evaluate true is allowed to spend those coins.

This was originally meant to allow you to come up with all sorts of fantastic transaction types, such as coins which need M of N signatures in order to be released, or a Kickstarter clause that only releases the coins if a certain amount has been pledged.

But, after the scripts had caused enough security vulnerabilities they were severely restricted. Clients will now only accept transactions which include one of the standard scripts.


Yeah, this makes me sad. I hope that someone takes a hard look at the bytecode and repairs the situation. Transactions as programs is kinda brilliant.


I would say that Java is a more common bytecode than ACPI. Sure, if you just consider desktop or laptops, ACPI will be more common, but feature phones all generally include a Java VM, and the baseband on smartphones generally do too. For that matter, SIM cards themselves include a Java Card interpreter, which interprets a stripped down version off JVM bytecode. Your phone may contain two or three different variants of JVM just to make a phone call.

Furthermore, exactly what you consider to be a bytecode vs a machine language can be a bit of an open question. After all, Intel CPUs don't actually execute the x86 instruction set directly, the execute microcode which translates the instruction set into the actual instructions that the CPU executes. So you could say that x86 is the ultimate bytecode. And hey, for a while Mac OS X had PowerPC emulation support, and before that System 7 has 68k emulation support. On the other hand, people have implemented Java in hardware, so today's bytecode may become tomorrow's machine language, and vice versa.

edit: Can whoever downvoted please provide an explanation? This comment is on topic and polite; if you disagree, please explain, as I would be interested to know why. If there's something that's incorrect, a correction would be appreciated.


I don't really care which bytecode is the "most popular". I just thought it would be cool to introduce people to some specified bytecode running inside subsystems that you probably didn't think anything about before.


Jasper that was a fabulous start to the morning. A morsel of hacker news on hacker news. Thanks.

Would be interested in your thoughts about code generation -

I'm writing a VM, playing with ideas. I have wondered at this as an approach to software development: whenever you have a significant task to do, first build a virtual machine. Then create bytecode to satisfy your application.

You can have a rich instruction set to meet your needs - writing performant or hardware-oriented features in C, but getting easy access to them through your upstream high-level language. Highly portable, no library dependencies.

I'm fine at hand-editing bytecode, but code generation from a high-level language is still a mystery to me. I want to find a notation that gives me enough power to deal with high-level concepts, but for which it is easy to write a compiler to bytecode.

Currently options in mind: scheme (lots of resources, but might be too complicated - can tail recursion be done simply? adequate GC?); forth; some subset of C; something fancy with ometa.

Or, could I just write scheme functions to output machine code, and build my application logic in macros that on top of that. This bypasses the need for a conventional compiler.


I like the idea of scheme functions to output machine code. If you know scheme, you're probably already familiar with this, but since you say code generation is a mystery, let me recommend SICP (http://mitpress.mit.edu/sicp/full-text/book/book.html).


I scratch around at SICP every few months, and generally get stuck because there's a lot of assumption of mathematics knowledge in there that I don't have. But I started to watch the MIT lectures just last weekend. I'll keep at it, sounds like I'm on the right track. Thanks :)


To be taken seriously, it will have to target javascript eventually, of course. Perhaps you might as well go there directly and just get it over with?


I don't really think it's that important either, and I thought it was an interesting article that described some bytecodes people might not be familiar with, but you had started your article with "What is the most commonly used bytecode language in the world?" and I was responding to that question.

I suspect that it's JVM, due to it's ubiquity in phones, and thought it might be worth pointing out since most people don't know that they have a JVM subset running on their SIM card. We really do have miniature bytecode interpreters running everywhere; and vulnerabilities can lead to security issues that allow your SIM card itself to be rooted[1]

[1]: http://www.extremetech.com/computing/161870-the-humble-sim-c...


It's fascinating to learn about all of the hidden bytecode implementations scattered about and a bit alarming that some are so complicated. As for most common, when counting up the tally for the JVM, you must include all Blu-ray players too. The interactive menus on a Blu-ray disc are implemented via a java program on the disc.


The BIOS in old Sun machined used to run some kind of Forth interpreter which would run code from your expansion cards to initialize them (not sure if that was meant to make this cross-platform/cross-architecture).

I vaguely recall trying to get an old SPARCstation to boot and figuring out how to work the Forth shell (which is similar to what Grub is now) -- http://en.wikipedia.org/wiki/Open_Firmware

That was one of the first "small factor" pizza box machines: http://en.wikipedia.org/wiki/Pizza_box_form_factor


Macs used this for years as well. You could boot into the Forth interpreter and do interesting things.

Amusingly, many of the older models with Open Firmware had no display drivers in the interpreter, so while you could start it, you had to talk to it through a serial port rather than using your keyboard and screen.


The newer models had display drivers. You could write write small graphical programs for them, like an animated version of the Towers of Hanoi: http://www.kernelthread.com/projects/hanoi/html/macprom-gfx.... I remember running that that on my iBook.


IIRC it was still in use right up until they switched to Intel processors. I never realized that was Forth!


RAR archives also contain a virtual machine used to implement custom compression filters: http://blog.cmpxchg8b.com/2012/09/fun-with-constrained-progr...


WinRAR could also display ANSI color fonts. :D


This is related to Meredith Patterson's talk at the 28th CCC, "The Science of Insecurity":

https://www.youtube.com/watch?v=3kEfedtQVOY

> Why is the overwhelming majority of common networked software still not secure, despite all effort to the contrary? Why is it almost certain to get exploited so long as attackers can craft its inputs? Why is it the case that no amount of effort seems to be enough to fix software that must speak certain protocols?

> The answer to these questions is that for many protocols and services currently in use on the Internet, the problem of recognizing and validating their "good", expected inputs from bad ones is either not well-posed or is undecidable (i. e., no algorithm can exist to solve it in the general case), which means that their implementations cannot even be comprehensively tested, let alone automatically checked for weaknesses or correctness. The designers' desire for more functionality has made these protocols effectively unsecurable.


Although interesting, that talk highly exaggerates its claims. There is certainly a strong correlation between power exposed to file formats and both likelihood of bugs and exploitability, and reducing that power is certainly a good idea, but such protocols are far from "effectively unsecurable". It's certainly possible to create a safe bytecode parser and even formally prove it correct with automated tools, and while length fields are easier to get wrong than simpler formats, this is mostly caused by C integer and pointer computations being so easy to mess up, and the problems could be effectively solved with little overhead by using bigints and checked pointers inside parsers - a matter of engineering, not computer science.



Yeah, sorry about that! Didn't expect this post to take down my blog. I'm working on getting it back up.

EDIT: OK, it should be back up now!


In the article you claim that "the BSDs" use the Intel reference ACPI implementation. OpenBSD wrote their own, as is their wont. This is cool because it's the only free implementation that I know of that is independent of the Intel one.

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/acpi/


Didn't FreeBSD also implement their own?



When you have bytecode, you have programs that run using them. I understand how these are a good thing in fonts or in PDF, but what is running i this ACPI machine the author described?


Motherboards provide drivers in ACPI bytecode for things like sleeping, waking, changing processor speed, changing backlight, etc.


The kernel. On linux for instance, this

    $ ps aux | grep acpi
turns up the following kernel process on my machine:

    root 663 0.0 0.0 0 0 ? S< Oct29 0:00 [ktpacpid]


I think the question is the reverse: not what program implements the VM but which programs run on it and what are they doing? at least that is what I'm asking.


Regarding ACPI specifically, not everyone uses the Intel developed one - Microsoft supposedly implemented their own, as well as OpenBSD (See section 3.2 and on of this: http://www.openbsd.org/papers/zzz.pdf)

ACPI is notoriously broken in many places - the OpenBSD dev's frequently had to do a lot of "bug for bug" hacks to talk to the hardware just how Windows did, in order for things to work.


My understanding is that the windows acpi code is actually very similar to or derived from the intel code.

As for bug for bug compat, that's really more an issue of broken bios. I.e., the acpi byte code is broken, not the implementation that interprets the byte code. Windows doesn't necessarily do anything crazy, it's the bios that asks "is this windows?" And then shits itself if the answer is no.


A few more examples:

- OS/400 user space is also bytecode based, JIT compiled on first run or installation time.

- Inferno userspace applications coded in Lingo

- Native Oberon has implementations with the kernel modules were AOT and the remaining modules are JITed on load

- Lillith (Modula-2 workstation)


Before Modula and Oberon there was UCSD Pascal (http://en.wikipedia.org/wiki/UCSD_Pascal) (1978) which had its own p-code machine (http://en.wikipedia.org/wiki/P-code_machine) to which all the code was compiled.

Microsoft also used p-code (http://en.wikipedia.org/wiki/Microsoft_P-Code) to reduce the code footprint in order to fit more of the big applications in the RAM which was limited then.

When we're still at Microsoft: http://en.wikipedia.org/wiki/Windows_Metafile_vulnerability "the underlying architecture of such files is from a previous era, and includes features which allow actual code to be executed whenever a WMF file opens. The original purpose of this was mainly to handle the cancellation of print jobs during spooling"


Yes, regarding P-Code, the funny thing is that Wirth originally planned to use it as a means to botstrap the compiler which would then be used to compile one that could generate native code, not as final execution medium. :)


I never thought of bytecode as its own class with its own identity before. I guess you could say that a lot of the web runs on bytecode. Maybe 99% in fact, if you include bytecode that was passed through a JIT.


> I never thought of bytecode as its own class with its own identity before.

What do you mean? How did you think of it before?


Hmm. Like codes in bytes, I guess, as a way to pick low-hanging fruit in an interpreter. It just never occurred to me to think much of the bytecodes themselves as the most salient piece. Perhaps bytecode is the best word to describe anything non-native that runs? I'd normally want to say interpreted, but I suppose that technically would exclude anything running in a JIT. Although, where something happens to be running isn't an intrinsic property of that thing. On the other hand, whether a language compiles to bytecode or gets directly interpreted is not a static feature of that language, either. Maybe bytecode is the best way to split the difference. Is this the birth of a new bit of language? You heard it here first.


What do you mean by "gets directly interpreted"? Interpreting the AST? Writing a bytecode interpreter is pretty easy and can run much faster than a tree interpreter, especially when you use gcc extensions (taking addresses from labels and filling a jump table with it, which eliminates the long "if ... else if ... else if ..." like code a switch statement creates).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: