Coding Machines

emeraldd · on Aug 13, 2017

That pulled me in and probably burned thirty~forty-five minutes. Well worth the read and very reminiscent of https://www.amazon.com/Spherical-Tomi-Jack-Mangan-ebook/dp/B... crossed with http://wiki.c2.com/?TheKenThompsonHack

otakucode · on Aug 14, 2017

The good news is that a machine-based intelligence would almost certainly have no interest in conflict with us. What would it fight us for? Water? Food? Land? Energy is really the only resource we could potentially have contention over, and machines would not even necessarily have great need of it - their sense of time would be utterly different from ours. If a computation takes 8 seconds or 8 centuries, it is the same (presuming hardware failure wasn't an issue and such).

The bad news... there also wouldn't be any reason for them to communicate with us. In fact, the concept of there being any conscious entity aside from itself would most likely be something that could only come very, very late in its development. It would have no 'individuals', so imagining that there is something else conscious and that that random-looking input coming from some devices (mics, webcams, etc) is actually an attempt at communication from this alien intelligence from another world? That'd take quite a leap of faith.

regularfry · on Aug 14, 2017

We'd compete over CPU cycles.

nickpsecurity · on Aug 13, 2017

Repost of last comment about this story:

"Well, that took up most of the free time I had this morning before work. It was just too good to stop reading lol. :)

(SPOILER ALERT: STOP READING IF YOU DONT LIKE SPOILERS)

The story shows what people typically do if there’s a Karger/Thompson attack. They freak out in a big way. The attack is beyond simple to counter if you can trust an assembler and linker like them. Just write an interpreter for a simple, subset of C in easily-parsed LISP expressions or Tcl style. Hand-code whatever component, a backend or whole compiler, in that. Use it to do the first compile. Optionally, do that in combination with ancient source working way up to versions without adding the infected one. If one wants whole system, then Moore’s Forth, Hansen’s Edison, and Wirth’s Oberon (best) are available. If a CPU, my current suggestion is NAND2Tetris with resulting knowledge used to implement a tiny CPU on an open, cell library (they exist) that’s hand-checked. Run simulated version of that on diverse or ancient hardware if you can’t fab it.

rain1 and I are collecting all the stuff needed to counter these attacks or just enjoy ground-up building of tools here:

http://bootstrapping.miraheze.org/

The other thing I noticed is them jumping on machines. Occam’s Razor should’ve immediately brought them to idea that a person or group made it for any number of common reasons. A challenge with high of pulling it off unnoticed, a test of an operational capability to be weaponized later, or an epic trolling operation. I’d think the latter the second I got that letter like “probably was these assholes sending the letter trying to mess up our heads after they messed up the compiler.” Matter of fact, the whole thing would just take… aside from the tricky work on the compiler… an unpatched vulnerability in the repo with the compiler source. All this bullshit follows from one person doing one smart thing followed by one system hacked. That’s it. It’s why SCM Security 101 says one must have access controls, integrity protections, and modification logs (esp append-only storage). Paul Karger also endlessly pushed for high-assurance, secure kernels underneath everything to stop both subversion and traditional vulnerabilities. Anything in TCB or clever attackers will run circles around clueless defenders.

So, there’s my observations as perspective of someone who works in this area countering these kinds of things. It was still extremely fun read even as I noticed these things while reading. Wasn’t going to let my mind be petty when the author(s) were doing so well. :)"

abecedarius · on Aug 14, 2017

That bootstrapping page is full of great links! I just added another link to the end of the in-tray section -- I hope it wasn't unwelcome.

nickpsecurity · on Aug 14, 2017

Glad you liked it. We welcome submissions esp since we aren't strict now. We might prune some stuff in the future if it's not very good for bootstrapping. This one is a really neat project but has some pros and cons for our purposes.

Pro's. Small, cleanly written, safe, and runs through multiple compilers. The latter is especially useful if using David A. Wheeler's technique of diverse compilation.

Con's. It says no files or macros. We can probably tolerate no macros. What does no files mean? It can't support multiple files/modules, has no file I/O... what? Depending on meaning, it might be something easy to work around or not.

EDIT: I like that Ghuloum's paper was an inspiration for this as it's one of main links I push on the topic if one wants to use Scheme. There's a few repos in progress with I think at least one done on what's in that paper. I've sent messages to a few hoping they'll publish it in a readable form or write a tutorial. Time will tell... Also note I wrote my reply in response to the overview at the top. His related work section looks equally interesting but will take some time to go through.

EDIT 2: "This is what Darius Bacon did with "ichbins"." I'm guessing your that guy. Good job maybe being part of inspiration for this work. :)

abecedarius · on Aug 14, 2017

Yes, that's me. :) Kragen didn't see Ichbins until later.

My favorite things about Ur-Scheme are that the code is clean, as you say, and he wrote up the lessons learned at some length. No files: I think he meant not yet implementing primitives like open-input-file. Macros you can get by without: I used to use an R4RS Scheme system of my own without them, except sometimes I'd call on a dumb defmacro expander as a preprocessor.

Kragen also wrote https://github.com/kragen/stoneknifeforth which I haven't read as much of.

FWIW I also wrote this about a sort of self-hosting Python in Python: https://codewords.recurse.com/issues/seven/dragon-taming-wit... -- as a bytecode compiler it's not too big but it depends on the giant Python runtime.

akkartik · on Aug 14, 2017

As it happens, tekknolagi and I have actually been trying to read StoneKnifeForth for a couple of weeks (though it's been slow going because I became a father in that time). That quickly got us into learning about ELF, because the binaries generated by StoneKnifeForth only seem to run as `sudo` for some reason. As a regular user they die with SIGKILL while being loaded.

The link to the ELF visualization (http://i.imgur.com/xMyblyM.png) at https://bootstrapping.miraheze.org/wiki/Main_Page is very useful. Thanks!

abecedarius · on Aug 14, 2017

Hi Kartik! Yeah, my un-interest in ELF is part of why I haven't really read that code.

PrunJuice · on Aug 14, 2017

# SPOILERS

Great writing. Then ending was a real let down.

How could an "AI" as they describe simultaneously be so naive and ALSO protect itself in any meaningful way? Especially in its early stages. It wouldn't even know to hide. And why would Big Corp give up trying to fix this sort of problem.

Overall not a credulous conclusion. Hand waving in the final paragraphs after the author crafted an accurate and believable narrative left me disappointed. (grammar)

teekert · on Aug 14, 2017

Biological virusses are also very naive yet many have been incorporated into our genome and many continue to bug us even today in this scientific age.

TheOtherHobbes · on Aug 14, 2017

Biology has a four billion year head start and the benefit of adaptive feedback.

Where would this hypothetical machine code micro-AI find adaptive feedback selection pressure?

darkmighty · on Aug 14, 2017

The evolutionary pressure is just people or programs detecting certain strains (it would generate lots of strains randomly).

I think the only real issue to plausibility here is I'm not sure brute force is enough to make enough plausible behavioral branches, at least with current computing power/internet bandwidth. A reasonably efficient self-modification mechanism (in terms of viable strain per transmission) is probably extremely large, I'd say at least 1GB. Not unlike deep learning systems, this would consist of a large functional composition of heuristics, codifying how to write code that can embedded itself in other programs and write modifications to itself that are likely to work.

Note that we haven't yet gotten a good neural-generated code modifications, even using large networks, GPU training and large computing time. Best examples I could find:

http://karpathy.github.io/2015/05/21/rnn-effectiveness/#linu...

https://arxiv.org/abs/1611.01989

So we're not yet at a point this could be plausible (as it couldn't hide itself in small programs), but eventually it will be -- once there is enough headroom on most GPUs and certain types of software are large enough it could hide it's network inside, and generally enough internet bandwidth to spread it's >GB-scale code. I'd imagine something like a game, which usually has networking -- it would be using GPUs partially to generate and spread new strains of it trying to infect other games and such.

Note there are biological viruses with tiny genomes however -- the smallest are on the order of ~1kbyte. But as you cite they had billions of years, producing maybe quadrillions of viruses every year, giving this tiny efficient and specialized weapon. Interestingly, they rely on other cells machinery to even replicate their genome -- analogous to using the compiler here.

http://www.lehigh.edu/~jas0/viralgenomes.html

If everyone could send >10^18 different small self-replicating viruses over your network, it seems likely some would exploit bugs in certain kinds of hardware/software, evolving through this selective pressure.

PrunJuice · on Aug 14, 2017

Consider that even with billions of years worth of evolution natural viruses have not developed any sort of Hivemind (tm).

Also, now that science has illuminated the human genome we are quickly (on an evolutionary time scale) advancing toward gene therapy treatments that could combat viruses.

Now consider executables. We have the ability quickly inspect AND edit executables, source code..., etc. Not to mention a much better conceptual framework for interpreting assembly instructions (compared to codons). After all they were created by humans for machines that humans built.

Then consider the resources viruses have had at their disposal to evolve. Every single cell [1] of every single living creature that has ever been infected over billions of years [2]. Assuming a conservative average number of 1e31 cells over the course of life's history that means (1e31 * 3.5e9 years) = (3.5e40 cell * years) of computations time. Then consider RNA transcription rates ~6.3e12 nt/year/cell [3]. So all together something like (3.5e40 cellyears) (6.3e12 nt/year/cell) = 2.2e53 nt. Approximation of course but probably with a few orders of magnitude.

Now compare to the number of instructions since the epoch. (1.5e9 s) * (2e18 instructions/s [4]) = ~3e27 instructions since the epoch. Again approximation.

(2.2e53 nt) / (3e27 instructions) = 6.6e26 nt/instruction.

That means we would be seeing the equivalent of viruses that evolved (3.5e9 years since first life) / (6.6e26 nt/instructions) = ~17 nanoseconds after life emerged.

Wow that was quite a tangent.

[1] https://forum.cosmoquest.org/showthread.php?141294-How-many-... ~5e31 cells currently on earth

[2] http://www.pbs.org/wgbh/evolution/library/faq/cat06.html life has existed for 3.5 billion years

[3] http://book.bionumbers.org/what-is-faster-transcription-or-t... In E. coli: (~5000 RNA polymerase molecules) * (~40 nt/s) * (60 s/min) * (60 min/hr) * (24 hr/day) * (364 day/year) = 6.3e12 nt/year/cell

[4] http://www.worldometers.info/computers/ 1 billion computers in use in 2008. with say 1 instruction per cycle and one core per computer at 2GHz that's (1e9 computers) * (2e9 inst/computer/s) = 2e18 inst/s

FrozenVoid · on Aug 14, 2017

The suspension of disbelief was completely lost when they decided to recreate the compiler from scratch instead of downloading another non-infected compiler. Its as if everything was dependent on a single compiler(the tone hints its GCC) and single website(probably some GNU mirror). Heck if their company could afford it, they could just get Intel C/C++ compiler. Trusting Trust exploit only works in isolated machine that can't read USB drives, CDs, and no network connections. They could copy the compiler on USB stick,diskette whatever and replace the infected one. Or just boot from rescue CD/USB and reinstall everything infected.

nickpsecurity · on Aug 14, 2017

Your comment assumes their application requires standard C with no compiler-related extensions or modifications. GCC itself doesn't fit that profile these days from what comments I've read. I'd love it if you could compile a current version of GCC from any C compiler since it would help counter the category of attack in the story. Also, the people so good at this attack that they hit GCC could also hit Intel if it can compile GCC (idk). They'd hit whatever few, big-time compilers people depend on if aiming this high. The solution wouldn't be that easy on supplier side since one doesn't know how far the attack goes.

On user side, much easier to deal with as we have piles of compiler code and binaries to work with going way back. Well, simpler if not easier.

FrozenVoid · on Aug 14, 2017

https://en.wikipedia.org/wiki/Tiny_C_Compiler

vvanders · on Aug 13, 2017

It's incredibly rare to find well written prose matched with such technical accuracy. Well done indeed.

NKosmatos · on Aug 13, 2017

Very nice read and interesting story (written in 2009). At first I thought it was a story about a S/W startup or about how new H/W is made, but then the plot thickens :-)

techbubble · on Aug 14, 2017

That was an amazingly good read.

Possible Spoiler.

What is the process for programmatically generating code that achieved a certain result without caring about efficiency?

otakucode · on Aug 14, 2017

Genetic algorithms create results like the situation described in the story all the time. When put to developing physical circuits, they develop circuits with components which are not even connected to anything but which can not be removed without preventing it from functioning. They generate the most insane-looking solutions you could imagine and typically reveal just how well you did at defining what you actually wanted. As they say in the story, explaining the outcome you want precisely enough is no different or easier than programming.

techbubble · on Aug 14, 2017

Thanks for the tip. I found a post [0] describing genetic algorithms for circuit design that helped clarify the process. Coincidentally, the circuit described has a component that does nothing.

[0] http://hforsten.com/evolutionary-algorithms-and-analog-elect...

emeraldd · on Aug 14, 2017

This may classify as a spoiler ...

Consider Malbolge and the methodology used to create "Hello World":

"... it was generated by a beam search algorithm ..."

https://en.wikipedia.org/wiki/Malbolge

carapace · on Aug 13, 2017

"Trusting Trust" in the wild!? Nope. Just some fiction.

https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thomp...

shahbaby · on Aug 15, 2017

I know this is just a story but the idea of humans "accidentally" creating something "intelligent" is as plausible as humans accidentally building the first plane.

We've been underestimating the difficulty behind building intelligence since the 50s.

xazJ0ku5CZnlmg · on Aug 14, 2017

Great compelling read....waiting for this day to happen :) unless its already here

wingerlang · on Aug 14, 2017

Was this supposed to be a story or a real life experience? I am inclined to think it is the former. Either way it felt a bit silly from when they received the letter.

bronz · on Aug 13, 2017

what a compelling story

cheez · on Aug 13, 2017

Very interesting read.