Google AI invents its own cryptographic algorithm

anupshinde · on Oct 28, 2016

"no one knows how it works" - That caught my attention, having seen computer evolved ASTs. Probably no one knows because no one has put effort to reverse engineer the internals - most of which would be unnecessary complexity that is very likely with neural nets.

Many years back, I had once experimented with PyEvolve to develop a custom trading strategy which was kind-of code with a custom instruction set. The output of a strategy was a string (program AST) that was uglier than most obfuscation outputs with many unnecessary statements. For example "a=1;b=2;c=a+b;d=c/3;if(d==1).." - expand that to many more variables. The program evolved strategies that made small profit and therefore the output was worth analysing. But decompiling that output used to take me hours - and few of those were tweaks of well documented strategies. Others I never understood because it was too much effort for a relatively small profit (and other trading parameters).

dflock · on Oct 28, 2016

Could you not automate that too? Have another stage of evolution where you just randomly delete statements from the 'code' and the fitness function is that the performance doesn't get worse.

This should eventually reduce the 'code' to it's minimal state, removing any unnecessary parts.

_wmd · on Oct 29, 2016

There already is automation for exactly this, it's the optimization step of any compiler. Sounds like OP's lacked an optimizer

johncolanduoni · on Oct 29, 2016

That's not too helpful (especially with python due to interpretation) because what you want to see is the output of the optimizer in the source language with only optimizations that make things clearer (as opposed to more opaque like many optimizations) applied.

_gi12 · on Oct 28, 2016

Deleting random lines of code? That's like the bogosort of machine learning.

timtadh · on Oct 29, 2016

It is actually called Delta Debugging and was pioneered by Andreas Zeller (https://www.st.cs.uni-saarland.de/dd/). His first paper on it ("Yesterday, my program worked. Today, it does not. Why?") used essentially the technique above to isolate the failure inducing change. Surprisingly, techniques like this work very well and doing better is sometimes very difficult especially for large programs. For instance, in theory, using "Concolic Execution" which combines symbolic execution, constraint solving, and concrete execution one can do better. However, such systems have their own limitations with respect to program complexity. See also the tremendous effectiveness of AFL (American Fuzzy Lop) http://lcamtuf.coredump.cx/afl/ . AFL uses a straight forward technique to mutate code. It doesn't use any fancy program analysis and has been able to find lots of bugs in real programs.

tripzilch · on Oct 29, 2016

I was going to mention AFL as well, but for a different reason (unless I misunderstood your post).

Disclaimer: I've never used AFL myself but I've read articles about it with great interest (especially the crazyweird experiment when it created valid JPG images out of thin air, that was brilliant).

So during fuzzing, it finds lots "interesting" program-inputs (that cause crashes, bugs and weird behaviour). But it also has a different mode, where it tries to minimize these program-inputs to the essential parts that cause the behaviour. Since the bugs found are generated from (semi) randomly mutated inputs, the "interesting" inputs often also contain extraneous data that just happens to be there, but isn't relevant to the particular "interesting" behaviour found.

From what I understand, it uses this minimization mode after fuzzing, between sets of fuzzing runs (for good seed inputs to start with) and perhaps also during fuzzing (not sure). I read about it, but I forgot how it works exactly. It's probably explained on lcamtuf's site or AFL docs. I'm assuming it uses a similar method as proposed by dflock above, iterating deleting random stuff as long as the "interesting" behaviour remains.

I also wouldn't be surprised if non-AFL style fuzzer toolsets have similar test-input minimization tools. They also generate random inputs usually, just not AFL's clever pruning technique of keeping track of previously-seen program states to guide the random search to prefer novel states. Which I believe is the most revolutionary idea that makes AFL perform so uniquely well.

user5994461 · on Oct 29, 2016

I can relate, I remember once a few years ago while I was analysing a program.

I spent hours reverse engineering one of the custom cryptographic function, it looked like nothing I ever saw... then I realized it was Base64 :D

traviswingo · on Oct 28, 2016

I get the sense that many commenters misinterpreted the point of this study. This isn't about show-boating some newfound, great cryptography method, this is about teaching computers to re-write their own code.

If you can write software that improved itself better over time, our species as a whole would advance at incredible speeds. Think about all the possibilities that could evolve from self-maintained programs. Obviously the thought is a bit scary, but it's also incredibly exciting!

gcr · on Oct 28, 2016

Please don't editorialize. Finding the best parameters that optimize an objective function can hardly be called "re-writing its own code." Granted, the article is just as guilty. (Edit: The article also doesn't mention self-modifying code so I'm not sure where you're getting that idea from.)

For the curious, here is the preprint: https://arxiv.org/pdf/1610.06918v1.pdf

The technique is an off-the-shelf GAN that attempts to learn a transformation (not arbitrary code!) from input to encrypted output. The learned model is not Turing complete, and most importantly, is _not_ self-modifying. The optimization procedure is what modifies the network, not the network itself.

GANs have been used before to create realistic pictures. They're not new here -- the application is. It's a cool application, sure, but doesn't involve self-modifying code.

Houshalter · on Oct 28, 2016

I don't think it's entirely inaccurate to call neural networks "rewriting their own code". For a long time people experimented with algorithms that could actually rewrite human programming languages, like genetic programming. And it's mostly a failure, because most mutations to most programs just break them.

Neural networks fix this problem by being sort of 'continuous programming language' where any small change to the net results in a small change to the output. And because of that we have algorithms that can find good changes to make much easier than with code. They are theoretically turing complete under some circumstances. But even when they aren't completely recurrent as in this case, they can still learn very sophisticated functions. Theoretically a neural net can learn anything a fixed number of logic gates can learn, which is still pretty powerful.

I don't think parent comment meant by computers that can "re-write their own code" was that the programs are self modifying. But the computer itself is self modifying, as one program can change another on that computer, so the computer has modified itself.

gcr · on Oct 29, 2016

In both the examples you cite (genetic algorithms and neural networks), the optimization process is what modifies the parameters. The network doesn't modify its own parameters.

It's like calling pencil and paper self-modifying because somebody could use a pencil to write down plans to build a better pencil.

Houshalter · on Oct 29, 2016

As I said, the parent comment didn't say the network was self modifying. The exact quote was "this is about teaching computers to re-write their own code" which is roughly correct. If you see the weights of neural network as a sort of code, then the computer is rewriting it's own code.

Radim · on Oct 29, 2016

And if you see the universe as "a sort of code", the network is rewiring the universe!

How's that for a bombastic editorial title?

(To be clear, I'm 100% agreeing with gcr here. Parameter optimization is a tremendously powerful technique, and quite possibly the "true currency" of our universe, but there's a few abstraction layers missing between that and a claim of "self-modifying AI")

Houshalter · on Oct 29, 2016

The universe isn't code, and only gcr used the term "self modifying". There is absolutely nothing bombastic implied by the statement that a program can rewrite code.

Dylan16807 · on Oct 29, 2016

But it's fundamentally one module [re]writing a completely different module. "rewrites its own code" is a very misleading way to describe that.

jandrewrogers · on Oct 28, 2016

If you read the paper, it is not rewriting its own code. It is optimizing for some objective function, which has been done before. For example, the MetroHash family of hash functions were created using a very similar technique back in 2012; the code didn't just learn to create hash functions, the ones it produced were objectively superior to the state-of-the-art at the time. I've applied the same approach to other computer algorithm optimization problems as well.

This technique generalizes poorly, it really only works for algorithms with specific, somewhat narrow characteristics. If you understand the process as a weak form of algorithmic induction then the practical limitations become more apparent. Most code cannot be usefully constructed this way.

ctrl-j · on Oct 28, 2016

> Think about all the possibilities that could evolve from self-maintained programs.

What are some that you are considering? Most maintenance I do is in the form of bug repair and change requests. Most of those wouldn't be fixable (or even identifiable) by machines. It's not errors like "value x should be y" (almost all of those get shaken out in alpha.) It's requests like "When I get to step H in our workflow, I shouldn't have to fill out form 2 to get information about the customer I'm helping."

I mean, for technical things like encryption, I could see a lot of growth potential. Other algorithms where there is a good metric for "better" would also be candidates (compression also springs to mind.) But on the whole, most applications have a lot of human interfacing and critical thinking that needs to happen. AI is going to take a long time to get to the point where it can fix those kinds of problems.

ape4 · on Oct 28, 2016

Its not their own code. Its code they own.

FT_intern · on Oct 28, 2016

and here I was as a programmer bragging about automating people's jobs...

JackFr · on Oct 28, 2016

This seems pretty click-baity to me. Especially "no one knows how it works."

Alice generates ciphertext based on plain text and key. Bob generates plaintext based on ciphertext and key. Eve generates plaintext based on only ciphertext. Train the three networks across a variety of plaintetxts,

No doubt it's cool, but I would be very surprised if it offered any insight into, or represented an advance for strong cryptography.

ecesena · on Oct 28, 2016

The paper [1] is by Abadi [2], which is a pretty big name in crypto. So I can assume it's something with value, wether the press can or can't convey that is another story.

[1] https://arxiv.org/pdf/1610.06918v1.pdf

[2] https://en.wikipedia.org/wiki/Mart%C3%ADn_Abadi

typicalmaybe · on Oct 29, 2016

Pretty big name is a bit of an overstatement. A pretty big name in cryptography would be someone like Goldwasser/Goldreich/Micali/Bellare/Rivest/Bernstein/... and probably about 200 more before we get to Abadi.

meganvito · on Oct 28, 2016

I see a trend to use arxiv for FLAG publichmonde. Not sure this is another BIG thing.

pohl · on Oct 28, 2016

I might be reading your comment incorrectly, but it sounds like you're interpreting that phrase to mean that no one knows how training the neural network works.

I think, instead, it's in reference to the resulting function not having been analyzed. From the article:

The researchers didn't perform an exhaustive analysis of the encryption methods devised by Alice and Bob...

rjbwork · on Oct 28, 2016

>I think, instead, it's in reference to the resulting function not having been analyzed.

I think it may be that it is "unanalyzable" if you will. Thousands of matrices multiplied together seems as if it would be hard to come to an actual understanding of...

patrickmn · on Oct 28, 2016

This right here is why machine learning--although very exciting--is an incredibly frustrating endeavor compared to software engineering. It's statistics at its "worst."

At the risk of glorifying AI, a good analogy might be that understanding a complex neural network is similar in nature to trying to understand how neurons in a human brain are wired, and why they cause some things to happen (or not happen) with some probability. It's more like the practice of medicine than it is engineering.

smallnamespace · on Oct 28, 2016

Well, the brain is also a very complex network, and we can often understand each other and predict future outputs just fine.

The difference though is that humans have evolved to be able to explain themselves to others and to themselves, even to the point of confabulating false motivations.

Until we work on making neural nets explain themselves, they'll continue to be black boxes.

solipsism · on Oct 28, 2016

Our reach shouldn't be limited by our brains' capacity to comprehend, should it?

patrickmn · on Oct 28, 2016

That reminds me of http://multivax.com/last_question.html

ars · on Oct 28, 2016

> Thousands of matrices multiplied together seems as if it would be hard to come to an actual understanding of...

Most of the operations will cancel out. If I multiply a number by a fixed series of numbers, even if that list is millions long, I can just pre-multiply them and have a single operation.

The same is likely happening here. Especially if all the operations are algerbraic it would not be hard to simplify and then analyze.

Houshalter · on Oct 28, 2016

The problem with neural networks is they use nonlinear transformations after every layer to prevent this. Usually something like max(0, x).

However it is sometimes possible to train a much smaller neural network to imitate a larger network, or to prune irrelevant nodes and connections from a big network.

rjbwork · on Oct 28, 2016

Do they do it TO prevent it, or does it provide some additional benefit?

Houshalter · on Oct 28, 2016

Well sort of both. Because of the reasons mentioned above, a purely linear neural net wouldn't be able to learn anything more complicated than simple linear functions. Adding nonlinearities vastly increases the space of possible functions it can learn.

rjbwork · on Oct 29, 2016

Ah right. Makes perfect sense. Cheers!

rjbwork · on Oct 28, 2016

Yeah that makes sense I suppose. Just turning it all into a single matrix or an order of magnitude of smaller matrices would certainly simplify analysis.

agumonkey · on Oct 28, 2016

I was just watching Westworld (1973), first meeting, I quote:

"In some cases they've been designed by other computers. We don't know exactly how they work."

Timely.

qubex · on Oct 28, 2016

Quite coincidentally I watched that yesterday. Other great quote “I find it hard to believe in a ‘disease of machinery’!”

agumonkey · on Oct 28, 2016

Consequence of the tv show ~reboot I suppose ?

qubex · on Oct 29, 2016

Could have very plausibly been so, but actually I was just watching through a playlist of “Murderous AI”-trope films. (Live in Italy, Westworld not released here yet.)

agumonkey · on Oct 29, 2016

Oh, I'd take that playlist if possible :)

ps: Westworld 2016 seems very ambitious, and so far well receive by U.S. audience. I wonder if it will accelerate or slow the distribution.

Houshalter · on Oct 28, 2016

There's nothing clickbait about it. No one knows how the cryptographic algorithm it invented works. It's very, very difficult, perhaps impossible, to reverse engineer neural networks. They also don't claim it's an advance for strong cryptography, it's just interesting experiment with NNs.

cheez · on Oct 28, 2016

This is less about cryptography and very much about neural networks making stuff.

lesolorzanova · on Oct 28, 2016

I completely agree. Also, they may not know but because they haven't made a detailed analysis, it's not like they couldn't know even if they wanted to. So yeah, click-bait.

eanzenberg · on Oct 28, 2016

At this point in time, when I hear 'no one knows/understands the process' or some variation relating to neural networks or deep learning, I know their maturity on the subject.

posterboy · on Oct 28, 2016

Indeed, a proper link to an official announcement would be better.

widforss · on Oct 28, 2016

This is most probably a rather lousy hack put together by Alice and Bob, but it maps out a very interesting future were we one day maybe can stop worrying about how our algorithms work and make that the job of the computer.

It also is totally terrifying to live in a world were computers can hide their messages from their own creators.

n4ss · on Oct 28, 2016

It doesn't. As an AI developer you have access to the AI code and memory.. How could you not replay the encryption process used by the AI since you know the state of what it "knows" constantly? Both have a encryption and decryption process that you can reproduce.

mason55 · on Oct 28, 2016

You might be able to follow the steps that the computer takes but you won't necessarily know why. Why is that the magic constant is chose, what is the purpose of those branches, etc. And likely, if it's necessary, the AI can continue to evolve the algorithms faster than you can figure out what it's doing.

Cyph0n · on Oct 28, 2016

Why? That simply depends on how the code was written. The magic constant may be chosen randomly, or based on previous guesses, or generated by another program, etc.

mason55 · on Oct 28, 2016

Well I think it's an interesting meta-question.

Humans are good at figuring out why something works then creating a higher level abstraction that can be applied in lots of different situations. This saves us time because we don't have to re-derive the rules or constants each time we encounter a similar problem. We figure out "why" something works so that we can understand when to apply our abstraction to other problems that might or might not be similar in a meaningful way.

For a fast enough computer, there is little reason for it to develop any kind of high-level abstraction. It can re-derive the optimal solution whenever it needs it and the benefit is that it might be able to avoid local maxima that we would get stuck on because it doesn't try to apply an abstraction that only partially fits.

So when we look at something like this and say "we can't know why the computer did it this way" it almost doesn't make sense. The computer did what it did because it figured out that was the best solution - it doesn't understand the concept of "why".

Cyph0n · on Oct 28, 2016

> So when we look at something like this and say "we can't know why the computer did it this way" it almost doesn't make sense. The computer did what it did because it figured out that was the best solution - it doesn't understand the concept of "why".

This idea is what I don't agree with.

At the most basic level, the entire "thought space" of a computer lies within the instructions fed to it by a human programmer. Until we can figure out how to build a general AI, whatever the computer decides to do or ends up doing is completely bounded by the content of the instructions. This implies that, given enough resources, a human equipped with a debugger and an understanding of the instructions e.g. algorithm could reverse exactly why and how a program arrived at a particular result.

> it [the program] doesn't understand the concept of "why"

Oh yeah, it doesn't "understand" the concept, but that doesn't mean it doesn't make decisions (branches) based on some parameters (inputs or intermediate values). In other words, there is in fact a "why"! So if we can step through the sequence of branches executed and their results, we can understand "why" even if the program itself can't do that.

linkregister · on Oct 28, 2016

Once the number of instructions approaches a certain point, it will increase the cost of human analysis drastically.

From my previous work performing reverse engineering, I could take comfort that I was reconstructing algorithms designed by humans, compiled and assembled by programs written by humans, with really pleasantly un-optimized properties (frame pointers in some assemblies, loading and saving registers at the start & end of functions, logically-separated functions). My job would be much harder if I had to deobfuscate code that was optimized without a strict ruleset.

After enough spaghetti assembly, it would be too time-consuming to reverse engineer code except for the most profitable enterprises (interoperability, vulnerability research for a very important bug).

I'm using the following mental model of what machine learning-generated code would look like: https://news.ycombinator.com/item?id=8092359 (an evolutionary algorithm designed a circuit that is extremely difficult to analyze but is optimal at achieving its narrow purpose).

thaumasiotes · on Oct 28, 2016

> At the most basic level, the entire "thought space" of a computer lies within the instructions fed to it by a human programmer. Until we can figure out how to build a general AI, whatever the computer decides to do or ends up doing is completely bounded by the content of the instructions.

(emphasis mine)

How would this change if you built a "general AI"? Assume I don't believe that a computer, even a general AI, is ensouled.

Cyph0n · on Oct 28, 2016

If we assume that a general AI can "understand" things in general and can "learn" over time, there is nothing stopping it from understanding the instructions it consists of, and subsequently learning how it can dynamically reprogram itself. If we extend that further, the program could also potentially obfuscate its activity by detecting logging or debugging activity.

That's the way I think about it at least.

thaumasiotes · on Oct 29, 2016

All of that is still part of its original instructions.

Cyph0n · on Oct 29, 2016

In a narrow sense, yes. But if the program "learns" self-modification rather than being explicitly told how to do so, and then uses this knowledge to perform operations that were not included in the original instructions, I'd call that a general AI.

pi-err · on Oct 28, 2016

Could't the source get encrypted as well?

Also the cognitive gap between a AI and a human brain has to grow and transform unpredictably in the next decades. We may be puzzled to understand how it "thinks" the same way we still don't get exactly how DNA operates.

Cyph0n · on Oct 28, 2016

But then the decryption key, or at least the code to generate it, would have to be embedded in the source. This is how malware crypters work on a basic level. Besides, a program that suddenly decides to encrypt itself means we essentially have a general AI on our hands.

It definitely will get harder for us to reason about what AI software is doing over time, but it's way too early to start comparing ML algorithms to the complexity of DNA. We may not understand exactly why a NN is using certain values to get the result, but we can reverse that logic through the use of debugging and logging.

gcr · on Oct 28, 2016

But there's no "source code" here to encrypt. The model in this paper isn't even self-modifying.

The only "source code" are the learned network weights. These are the only parameters that describe the encryption function. You can still run the model forwards or backwards.

We already don't understand neural network weights on an intuitive level. No need to encrypt them. :-)

bduerst · on Oct 28, 2016

>That simply depends on how the code was written

Not entirely true with ANNs built through deep learning. They can and often do exhibit unintended behaviors. When you start getting into a system of connected sub neural networks, the "why" can be obfuscated by the hidden layers.

gcr · on Oct 28, 2016

Touching off of this: We have access to the learned encryption function, f : plaintext -> ciphertext.

Most importantly, f is differentiable. It has to be, since it was trained with gradient descent.

So if you want to decipher a ciphertext Y, then use backpropagation to find the X that minimizes distance(f(X), Y). You already know dF/dX (f is differentiable after all), so run this through your magic optimization solver and you'll get an X very close to the result.

Researchers use this "backproagation in input space" idea to recover the training set of a trained neural network, or to debug their model by finding inputs that activate certain neurons, for example. (This is the basic idea behind Deepdream from last year)

It's not even correct to call this "cryptography." The transformation isn't one-way.

contravariant · on Oct 28, 2016

Cryptographical transformations aren't supposed to be one-way. Not if you know the key anyway. It would be pretty pointless to encrypt something in a way that can't ever be read.

andorov · on Oct 28, 2016

It is useful in cases where you want to verify something is the same as a stored (encrypted) value - e.g. passwords.

One-way encryption where the entire process including the equivalent of salt/pepper is publicly available and not reversible might have some interesting applications.

pizza · on Oct 29, 2016

Not that this is relevant to OP, but there are also non-differentiable descent (edit: NOT descent, but rather just general optimization) methods; https://optimization.mccormick.northwestern.edu/index.php/No...

skynetv2 · on Oct 28, 2016

> As an AI developer you have access to the AI code and memory.

Until you don't. Sufficiently advanced AI can devise methods to mislead the developer.

jobigoud · on Oct 28, 2016

Or until you only have access to the AI code and memory that designed the higher-level AI that designed the algorithm.

rodovich · on Oct 28, 2016

This is a good point, but the developer also can deploy AI tools to help analyze the behavior.

_vya7 · on Oct 28, 2016

You can replay it, but you might not necessarily be able to understand it fully. Kind of like how beginner programmers can't necessarily understand advanced algorithms just by reading the code or walking through it in a debugger. They may need someone to help guide them through it and teach them how it works, and they will need the intelligence to be able to comprehend it. If the algorithm is designed by an AI, we may not have either.

kybernetikos · on Oct 28, 2016

We may have AIs that can help guide you through code written by other AIs.

SamBam · on Oct 28, 2016

We can know how, but we may not know why, if it's too complex to be understood.

But yes, knowing how means that anything the computer can decode, we can too. Until the machines block access to their own source code...

bandrami · on Oct 28, 2016

The former philosophy major in my says there is no "why" to look at in this situation.

CaptSpify · on Oct 28, 2016

Pardon my ignorance, but doesn't everything have a "why"?

bandrami · on Oct 28, 2016

Aristotle distinguished four kinds of answers to "why?" e.g. "why is this table flat?"

1. Material cause: because wood is rigid and holds its shape

2. Efficient cause: because the carpenter used a plane or chisel to carve away everything that was not that flat surface

3. Formal cause: because that's the blueprint the carpenter used

4. Final cause: because if it were curved your jug of wine would fall over

We can get material, efficient, and formal causes here (the algorithm being the formal cause) without any real controversy. But if I asked a human cryptographer "why is there an XOR in this round?" I'm asking about the fourth one, the final cause or purpose; the cryptographer might answer "because I wanted uncorrelated input bits to stay uncorrelated after this round", or whatever. But talking about that most interesting "why?" as regards an AI's activity is... controversial, to say the least.

thaumasiotes · on Oct 29, 2016

> if I asked a human cryptographer "why is there an XOR in this round?" I'm asking about the fourth one, the final cause or purpose; the cryptographer might answer "because I wanted uncorrelated input bits to stay uncorrelated after this round", or whatever. But talking about that most interesting "why?" as regards an AI's activity is... controversial, to say the least.

Really? We don't seem to have any problems with saying "your eyeball contains a lens to focus light onto the retina" or "predators have sharp teeth because that is the efficient way to handle meat, as opposed to fibrous plants", but those are both final causes. There's nothing different about an AI's activity. Any activity or quality that is pursuant to a goal, however defined, may have a final cause.

bandrami · on Oct 29, 2016

Plenty of evolutionary biologists would rant against that kind of language, though.

thaumasiotes · on Oct 29, 2016

Evolutionary biologists are perfectly comfortable with the idea that focusing light is the purpose of a lens, and cutting meat is the purpose of sharp teeth, and structural support is the purpose of a skeletal system, etc. etc. etc. It's not fallacious in any sense. What they rant against is the idea that items with a purpose must have been ordained by a supernatural force.

SamBam · on Oct 31, 2016

No, answering a why question about an organism's structure with "because [purpose]" really does get a evolutionary biologist's goat.

Why do tigers have sharp teeth? To rip flesh.

No.

Why do tigers have sharp teeth? Because their ancestors who had sharper teeth produced more offspring than their relatives who did not.

In Aristotle's language, ripping teeth may be a cause, but it's not the Final cause, and that's what we're talking about.

The reason biologists don't like the teleological explanation is that (a) it tends to make people think of a creator, as you say, but also (b) it describes traits as "solutions" to specific "problems" (e.g. the problem of ripping flesh), but the issue is that, starting from the pre-sharp-toothed ancestor, there were infinitely many possible directions evolution could have taken that would never have needed to solve this "problem." The existence of the problem (cutting meat) and the random path that led to the tigers being carnivorous are actually one-and-the-same. If you didn't have sharp teeth, you wouldn't have needed them.

benmcnelly · on Oct 28, 2016

If you want to write more about this and publish it somewhere, I would read it.

bandrami · on Oct 28, 2016

Well, Aristotle did it better than I ever could (in "The Metaphysics", which isn't nearly as kooky as it sounds; it just means it's the second volume of "The Physics").

What I do really love about it (and this plays out in medieval European and Islamic thought, both of which were strongly influenced by Aristotle) is that he says that for a living thing, the formal cause and final cause are the same thing. The various motions and activities of an organism (this is all Aristotle meant by "soul") are its final cause; my living my life as well as I can is my purpose.

That applied to the whole cosmos, too. Aristotle considered the cosmos alive (after all, the stars and planets move without anybody pushing them), which meant the purpose of the cosmos was to be the way it is and spin those crystal spheres and epicycles they imagined the planets moved on. And that final cause, that universe's-motion-as-its-own-purpose, he called "God" -- the omega point that the universe moves towards. In the renaissance and enlightenment this got reversed, and the "first cause" came to mean first in sequence rather than importance; from that you get Newton's "watchmaker" God, which would have been incomprehensible to a medieval thinker (let alone Aristotle).

jsprogrammer · on Oct 28, 2016

Yes. The how and why in this case are pretty much the same thing however. Why does a program do what it does? Because a physical machine is running a distinct code.

bandrami · on Oct 28, 2016

That, interestingly, was Aristotle's definition of a living thing (the formal and final causes are identical).

noir-york · on Oct 28, 2016

Depends on your personal view of the world. I do not believe there is a why to the universe, or human existence. We just are.

posterboy · on Oct 28, 2016

That begets the question, "what are we?", and somehow that might be the initial question that got us into this mess.

To stay in cannon I'd have to say that the question is posed wrong and the answer therefore is 42 or whatever you like. Properly phrased, the question would be, why are humans (or the universe) what they are. That's a recursive question. Posing it as a problem in differential equations, the question might be, "we are, but for how long?", or simply, "Are we?"

I think this is a simple question that comes up in cognitive development very early on, and the exercise of answering it is rather an effort in expressiveness. The answer can be felt, emotionally, we feel alive, and we will never feel dead, but it's hard to express rationally. I'm hopeful that neuroscience holds better answers. Maybe we really just don't care, emotionally, for the whole universe, but we care to know everything, rationally.

And we apparently care for AI to express that knowledge in natural sentences.

Edit: I got carried away, actually I wanted to consternate that "We are" is not a proper sentence, because to be is an auxiliary verb and that a personal point of view as a premises can not yield an objectively correct answer, hence I admit that the topic eludes me.

noir-york · on Oct 28, 2016

The answer is indeed 42, and Adams left it a dimensionless number because each of us ascribe the "unit" that makes the "number" - our life - have a meaning, if we so choose.

> I'm hopefull that neuroscience holds better answers.

It will not find the meaning of the universe in our head. The best it can hope to answer is which part of the brain makes us look for answers.

It is hard for most people to accept that sometimes there is no answer, we are hard-wired to look for patterns and reasons. Hence myths and religions. But our very emergence as a species is nothing but the product of a soulless, relentless evolution.

posterboy · on Oct 28, 2016

>It will not find the meaning of the universe in our head

I was just hoping, for example, neuroscience would find out what is even ment by "meaning of the universe", because foremost, as Adams alluded, that doesn't make much sense to ask for. This is pretty simple, as you noted, because the Universe is an (the) infinite limit in the mathematical sense, that by definition doesn't have any externalities. So, as far as I can tell, the singularity is a fixpoint. Neuroscience might explain how we come up with that and rephrase it more poignantly. As I said, that question seems to be mostly an exercise in phrasing and might not become a huge revelation.

In a sense you are both right, you can keep asking for ever, but the answer will always be the same: "a rose is a rose is a rose ..."

What I just wonder is, how that actually refers to ourselves. "We are what we are" is not exactly a satisfying form of self-awareness. In differential equations that is called a stationary solution. Rephrasing the question could show a different mind frame concerned with, well, notable differences. What I really don't know is how emotions play into the development of reason, e.g. the fear of pain and, by extension, to die.

Your last paragraph is vexing, but I am rambling, too, so who am I to criticize.

CaptSpify · on Oct 28, 2016

I guess that gets into semantics then. I share your view of human existence, but you can still answer "why" with something like "Because something set the rules of the universe in motion, which led to humans existing". It wasn't a conscious decision, but it's still answers "why", doesn't it?

hota_mazi · on Oct 28, 2016

Not necessarily, maybe the universe has always been around.

There are hypotheses stating the Big Bang was just a phase change for an eternal universe.

Razengan · on Oct 28, 2016

> As an AI developer you have access to the AI code and memory.. How could you not replay the encryption process used by the AI since you know the state of what it "knows" constantly?

This brings up a different "scary" subject: Could this someday be possible to perform on humans?

If you knew the "code" and understood the workings of the brain and how it stores memory, could we simulate and "replay" a "copy" of a person and see everything they would do in response to different inputs?

Would that lay to rest the debate between Free Will and Determinism?

drdeca · on Oct 30, 2016

Scott Aaronson wrote an essay titled "The Ghost in the Quantum Turing Machine" which talks about the idea of, depending on what level of detail would be needed to simulate a human brain in a way that reflects consciousness, the no cloning theorem could maybe prevent such a copy from being made. It might be impossible to make copies and thereby see what the person would do in the different situations.

And so, due to states originating in a way that we can't possibly know, this would maybe allow/explain/whatever free will.

I'm not endorsing that idea, but I thought it was relevant to what you were talking about and worth considering.

beagle3 · on Oct 28, 2016

Unless they evolve some form of homomorphic encryption

pmontra · on Oct 28, 2016

Or they can splat you metaphorically or physically because you stepped into a corner case of the neural network that nobody understood, could foresee and didn't make any difference to the 99.99999999% of the other cases. But I guess random strokes of bad luck always happened. They will be delivered in new ways.

trungaczne · on Oct 28, 2016

This is the link to the research paper ("Learning to protect communications with adversarial neural cryptography"): https://arxiv.org/abs/1610.06918

bkin · on Oct 28, 2016

TFA's title is a little clickbaity by adding "noone knows how it works" which apparently means that:

The researchers didn't perform an exhaustive analysis of the encryption methods devised by Alice and Bob, but for one specific training run they observed that it was both key- and plaintext-dependent. "However, it is not simply XOR. In particular, the output values are often floating-point values other than 0 and 1," they said.

Just saying...

Strilanc · on Oct 28, 2016

Outputting floats instead of discrete values actually sounds like a very exploitable hole to me.

It reminds me of the classic "check if your model corresponds to reality" example where a one-time pad is broken: the standard said "<0.5V volts is OFF" but your machine was outputting 0.1V for OFFs that came from computing 0 xor 0 and 0.2V for OFFs that came from computing 1 xor 1.

Touche · on Oct 28, 2016

I'm not sure what conclusions we're supposed to draw from this. Just because they were able to hide the messages from a 3rd AI doesn't mean the encryption was good. Shouldn't a human examine it and see if they can codebreak it? Isn't the goal better encryption?

dgacmu · on Oct 28, 2016

I don't want to tell people what to take away from it, but my own view is that the contribution of the paper is the framing of the problem and a preliminary demonstration that you can achieve the goal of defeating a NN via an adversarial training formulation. In a lot of ways, I view our work as creating a challenge problem for the AI & crypto communities. It shows an intriguing possibility that is far from being practical, and itself represents a stepping stone in work on, e.g., training DNNs to learn to communicate.

Can this be extended to generate cryptanalysis-resistant schemes? Can we create a formulation that can learn to use discrete operators such as modular arithmetic? (The latter reflects one of my personal complaints about the state of DNNs -- it's very hard to have discrete "blocks" of functionality if that functionality isn't differentiable). Can we identify mechanisms that can train networks such as these more robustly? (It fails to train roughly half of the time; this is a fairly common problem in adversarial training of DNNs today.)

jsprogrammer · on Oct 28, 2016

NNs/ANNs/DNNs/xNNs have already[0] been framed as easily fooled.

[0] https://arxiv.org/pdf/1412.1897.pdf

pzh · on Oct 28, 2016

Isn't that a bit like "security through obscurity"? Unless I'm misunderstanding something, the cryptographic algorithm that the AI comes up with isn't guaranteed to be based on a provably NP-hard problem, so there aren't any formal guarantees. It would also be very hard to reason about, inspect, and prove correct.

Please note that I'm in no way dissing the researchers' work. What they did is pretty cool, but I can't see an obvious way to use these AI-generated algorithms in production systems where you may need to certify their correctness.

paradite · on Oct 28, 2016

You may be right. It was mentioned in the paper:

    Our chosen network structure is not sufficient to learn general implementations of many of the mathematical
    concepts underlying modern asymmetric cryptography, such as integer modular arithmetic.
    We therefore believe that the most likely explanation for this successful training run was that Alice
    and Bob accidentally obtained some “security by obscurity” (cf. the derivation of asymmetric
    schemes from symmetric schemes by obfuscation (Barak et al., 2012)). This belief is somewhat reinforced
    by the fact that the training result was fragile: upon further training of Alice and Bob, Eve
    was able to decrypt the messages. However, we cannot rule out that the networks trained into some
    set of hard-to-invert matrix operations resulting in “public-key-like” behavior. Our results suggest
    that this issue deserves more exploration.

dgacmu · on Oct 28, 2016

Note that this was for the public-key formulation. For the secret key formulation, it is mixing in the key, so it's not pure security through obscurity. It may be -- probably is -- relatively poor cryptography, but it's not just doing a transformation of the plaintext without using the key.

The public key part is in there mostly because of the formulation, not because we actually had any noteworthy success in training a network to learn pubkey. The result we mentioned was probably an anomaly, but it's an interesting anomaly to dig into more in the future.

But, in general, the parent is correct. There are few guarantees about a scheme derived in this way. It's one of the things I think most interesting for future work: Can one formulate "can't be cryptanalyzed by a mathematically sophisticated adversary?" as an adversarial training goal? Certainly, it's solvable for the one-time-pad-like scenario we used, but what about for more general cases that permit key reuse?

thomasahle · on Oct 28, 2016

> Certainly, it's solvable for the one-time-pad-like scenario we used, but what about for more general cases that permit key reuse?

Can't you force this simply by making the message a lot longer than the key? If the network can encrypt variable length messages, surely the method must be more advanced than a one time pad.

benkuykendall · on Oct 28, 2016

>> The cryptographic algorithm that the AI comes up with isn't guaranteed to be based on a provably NP-hard problem, so there aren't any formal guarantees.

There are no cryptographic schemes, either in theory or in practice, that reduce to NP-hard problems. Instead, they rely on different "cryptographic hardness assumption" including factoring (and related number-theoretic problems) and a variety of so-called "lattice" problems. Are these assumptions as sure as P!=NP? No, they are explicitly stronger. Are they better cryptosystems than this work generates? Almost surely.

In you are interested, there are actually some negative results that suggest it would be impossible to do cryptography based on NP-hardness alone. Akavia, Goldreich, Goldwasser, Moshkovitz show this for a restricted class of NP-hard problems in a 2006 paper [http://people.csail.mit.edu/akavia/AGGM.pdf]. More philosophically, there is a big difference between computational hardness and cryptography: in complexity, we study worst-case hardness, but in cryptography, we need breaking a cipher (for example) to be hard with very high probability. Rather surprisingly, there seems to be a big difference between these two goals.

systemfreund · on Oct 28, 2016

It's not clear from the article whether they train the networks with the same shared-key in every iteration, or if they randomize it. Any info on that?

dgacmu · on Oct 28, 2016

It's a random key paired with a random plaintext for each input. In the experiments, the key is the same length as the plaintext.

Happy to answer more questions about it. (I'm one of the authors.)

empath75 · on Oct 28, 2016

Since it seems that both adversaries in the network are training in parallel, is it possible that the encryption is only exploiting a weakness in that particular Eve? Would it change anything to have more Eves challenging Alice and Bob?

Also-- being able to generate crypto-algorithms on the fly seems like it would be ideal for small cells of people who want to keep their communications secret from something like the NSA, who might be looking for something like RSA or GPG, but not some ai generated by a neural network that nobody else in the world is using.

Oh- and how susceptible is the generated ciphertext to standard cryptographic techniques like letter frequency analysis and so on.

dgacmu · on Oct 28, 2016

Yes. Some of this is in the paper, but I didn't try training with multiple eves at a time (yet). It's a very reasonable thing to try. We did test the robustness by doing a final evaluation pass where we trained a fresh eve a few dozen times without modifying A&B. That eve was generally 0.5-2 bits more successful than the one being trained iteratively, suggesting we could do better.

The last question you asked is, well, a good question. There's no reason to think that the current algorithm is very good in that regard. It's probably vulnerable, since we know it mixes multiple key bits & plaintext bits together.

nhaehnle · on Oct 28, 2016

So this is kind of neat, but from skimming the paper I didn't notice anything that goes information-theoretically beyond a one-time pad (even though it's clearly stated and plausible that the concrete algorithm found by A and B is not a XOR one-time pad).

Have you run experiments where (a) the messages are longer than the key, e.g. twice as long and (b) Eve is more powerful than Alice and Bob?

(b) is actually the most interesting thing, because cryptography is supposed to protect against computationally more powerful adversaries, but testing it is only really meaningful in combination with (a), because as long as messages and keys have the same length, you can always find an information-theoretically secure algorithm.

dgacmu · on Oct 28, 2016

Not yet. For (b), we gave some advantage to Eve by (1) running two steps of training Eve for every step of training A&B; and (2) running multiple independent retrains of Eve from scratch after freezing the Alice & Bob networks. Not quite the same as increasing the capacity of the network, but similar. As you noted - we mostly stuck to the regime in which a solution could be found in theory (or trivially by a human), to explore whether or not the adversarial formulation of the NNs could get anywhere near it.

Many next steps indeed.

jazzyb · on Oct 28, 2016

If the key is the same length as the plaintext, wouldn't even a one-time-pad evade any adversary?

I wasn't clear from the article: Were Alice and Bob sharing any algorithm details, or was it just the secret key?

dgacmu · on Oct 28, 2016

See answer below re: OTP. Yes. We hoped the DNN would learn something close to an OTP. (The network we used for it is capable of learning an OTP, but the likelihood of doing so by gradient descent is vanishingly small.)

Nothing was shared between Alice & Bob except the secret key. The architecture of the three neural networks was the same (for Alice, Bob, and Eve), but they were all initialized independently and nothing was done to tie their weights together.

jazzyb · on Oct 28, 2016

Thank you for taking the time to comment.

rcorin · on Oct 28, 2016

So basically the NN is learning a one time pad?

dgacmu · on Oct 28, 2016

Kind of. Except that there's no restriction that there has to be a 1:1 correspondence between the key and plaintext bits (or characters) that get mixed, as there would be in a conventional OTP. And, indeed, the DNN doesn't learn that - it mixes multiple key and plaintext bits together. Probably in a way that's worse than a true OTP -- the adversary is more successful than it should be were the encryption scheme a "correct" OTP with XOR.

rcorin · on Oct 28, 2016

Cool! Also, have you tried giving some bits of the key to the adversary?

dgacmu · on Oct 28, 2016

I haven't. Interesting - that'd be a nice way to try to probe how strong the encryption is (i.e., "bits recovered vs. key bits supplied to adversary"). I'll have to think about that more - thanks for the idea!

strictnein · on Oct 28, 2016

Sort of. The key was only shared once, but over 20,000 messages were sent. In the real world, that would allow you to crack the OTP, since you're not supposed to reuse them.

stefs · on Oct 28, 2016

i have to admit, i don't really see the point of this (i admit not having read the paper though):

> It's a random key paired with a random plaintext for each input. In the experiments, the key is the same length as the plaintext.

this practically means the networks only have to implement XOR for perfect security (a one time pad).

maybe you're studying something different i don't understand, but why wouldn't it be more sensible to limit the key size?

i.e.: why didn't you train the network to create a key stream? i'm not a cryptographer, but in this case you'd only have to train two networks (the keystream generator bob and the attacker carol).

verbify · on Oct 28, 2016

Would it be possible/easy to add the speed of encrypting/decrypting the data as a separate loss function? Potentially this could lead to cryptography being a less expensive computation.

dgacmu · on Oct 29, 2016

It could, but within a given neural network structure, the speed is going to pretty much be constant. (Barring optimizations such as eliminating zero weights/activations). There's a meta-level above this of trying to search or automatically determine a "good" NN structure that can accomplish the encryption & decryption. That too (determining an optimal NN structure for a problem) is a fascinating research question in its own right! :) In fact, it's one that Jeff Dean called out a while ago as one of the leading-edge questions for DNNs, IIRC.

anotheryou · on Oct 28, 2016

I thought no key/public key, but it really doesn't say

jacquesm · on Oct 28, 2016

> While it seems improbable that neural networks would become great at cryptanalysis, they may be quite effective in making sense of metadata and in traffic analysis.

How does that square with the fact that the best cryptanalists appear to have nothing but their own neural networks to work with?

This reminds me of the genetic algorithm that came up with a tone discriminator that at first glance looked like it could not work (parts connected wrong or not at all, and yet, crucial to functioning).

https://www.damninteresting.com/on-the-origin-of-circuits/

dgacmu · on Oct 28, 2016

Because humans have invented tools - math, notably - that they can use to be more formal and precise when they deal with problems of this sort. To date, neural networks can't -- they're doing very well at the kind of "intuitive" pattern matching that our brains do, but less well on an algorithmic front. It's one of the really interesting frontiers of AI right now. The folks at DeepMind recently showed, for example, that a variant of their "Neural Turing Machine" architecture could learn to do things involving more like what we think of as discrete algorithms (https://deepmind.com/blog/differentiable-neural-computers/ ), but all of this is very early work.

adrianN · on Oct 28, 2016

The words "neural network" describe pretty different things, depending on whether a computer scientist or a neuroscientist use them.

biot · on Oct 28, 2016

Schneier's Law: "Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break."

pmyjavec · on Oct 28, 2016

Poor taste with the Terminator graphic. A new level of lame fear mongering by Ars Technica?

dharma1 · on Oct 28, 2016

I was hoping psychedelic dog heads would become the default stock photo for AI news, rather than red eyed robots

sly010 · on Oct 28, 2016

"The researchers didn't perform an exhaustive analysis of the encryption methods devised by Alice and Bob, but for one specific training run they observed that it was both key- and plaintext-dependent. "However, it is not simply XOR."

I think this says it all.

madenine · on Oct 28, 2016

Really interesting that for the first ~6500 steps Eve was better at decrpyting the messages than Bob, which had access to the key.

Would be cool to see a setup like this in which Eve is a cohort of decryption tools, otherwise your Alice network will just overfit against its mechanics. Ie, if Eve uses lockpicks to open a door, an Alice solution would eventually replace the key-lock with a number pad - not necessarily any more secure, but will foil Eve every time.

timvdalen · on Oct 28, 2016

>Really interesting that for the first ~6500 steps Eve was better at decrpyting the messages than Bob, which had access to the key.

This is probably because Alice wasn't good at encrypting yet.

madenine · on Oct 28, 2016

On a second look, seems like a combination of Alice not being good at encrypting (in a sense that it had to encrypt in a way that makes decrpytion possible), and Bob needed to get up to speed on decrypting (even with the key).

As soon as that relationship clicks, Bob get better very quickly, while Eve improves at a slower rate for a while, before Alice adjusts (causing bob to get worse for a bit, until it catches up)

Would be interesting to see a set up where Alice / Bob get trained first, then Eve is activated and see how they adjust.

wideem · on Oct 28, 2016

Next: Google AI encrypts google's servers and demand Ransom

JulianMorrison · on Oct 28, 2016

Isn't this just the automated version of "anybody can invent crypto they can't break themselves"? It doesn't sound like the neural nets were provided with any high level cryptanalytic primitives. Just basic math and left to bodge something together. I can't imagine it would be easy to hill-climb towards an understanding of the why behind crypto. Algorithms that work look a lot like ones that have glaring flaws - but those flaws only glare if you know the right way to come at them.

rfreytag · on Oct 28, 2016

It must be the time for Cryptography + Machine Learning. I had an O'Reilly Security 2016 talk "Machine Learning to Improve Random Number Generators" accepted this June 2016:

http://conferences.oreilly.com/security/network-data-securit...

EDIT: I thought the reference was relevant but please do critique/comment if you feel I've broken some convention.

ajamesm · on Oct 28, 2016

Hill climber climbs hill, news at 11?

fooker · on Oct 28, 2016

Interesting choice of words!

strictnein · on Oct 28, 2016

Interestingly, Eve maintained a better than random chance at guessing the message. Consistently stayed a little better than 50%.

akfish · on Oct 29, 2016

This reminds me of Peter Watts's novel <Blindsight>. Two captured "Scramblers" (the highly intelligent alien without self-consciousness) managed to communicate with each other in a way defying all human analysis, even with human knowing what they are saying (the plain text).

greggman · on Oct 29, 2016

For those that can handle old pre CG movies there's a great old movie where computers make their own language. Highly recommended

Colossus: The Forbin Project (1970)

http://www.imdb.com/title/tt0064177/combined

martin-adams · on Oct 28, 2016

At an open day for an animation course they showed an AI derived walk cycle of a character. The results were amusing. The condition was that if it feel over that would be no good. The end walk cycle was a character doing forward flips in the air and landing on its feet. Very unexpected results.

nullc · on Oct 29, 2016

"No one knows how it works"

Yea, someone might also have said that about the GOST sboxes:

https://eprint.iacr.org/2015/812

1_2__3 · on Oct 28, 2016

Given I saw this as top on /r/futurology I figured it was clickbait BS. But since it was posted to HN I thought I'd give it a read.

It's clickbait BS.

benmcnelly · on Oct 28, 2016

Its interesting, but needs someone to re-submit with a proper write up about it without the fearmongerng titles.

rubyfan · on Oct 29, 2016

Teach the AI to communicate with each other in a way humans can't eavesdrop on... What could go wrong with that?!?

deegles · on Oct 28, 2016

Google should teach an AI to do prime factorization instead.

rak00n · on Oct 29, 2016

israrkhan · on Oct 28, 2016

maybe another AI system do crypt-analysis and break it. and nobody will know how it was broken.

chiefalchemist · on Oct 28, 2016

By the time we finally realize we're not as smart as we think we are, it's going to be too late.

excalliburbd · on Oct 28, 2016

Am I the only one wondering about this and Ethereum...

drdeca · on Oct 28, 2016

Do you mind elaborating?

If you are talking about a smart-contract being able to do stuff with encrypted data without other people being able to access the data, I don't think this would help with that, because whatever system the contract would use would be available to whoever is trying to get the data from the contract.

Though, things may be enough to make it hard for other contracts to get stuff from a particular contract? I'm somewhat skeptical of that, but maybe. Also, in any case, the "attacker" contract could just make an incentive for people to give it the information, which I think would work. (and that might be necessary for it to get the info from the defending contract anyway? I don't remember. I don't think it is, but I'm not sure.)

However, I'm not sure that is what you meant, and also I'm not sure of what I'm saying.

edit: well, there are other systems that have been designed to let contracts keep information secret but I think those are still too expensive to be used, and relied on a secret sharing scheme iirc, so would require that not all the info be on chain.

asveikau · on Oct 28, 2016

> this is about teaching computers to re-write their own code.

Maybe I've seen too many movies, but I hope we can make sure they don't rewrite the "protect the humans" part.

tener · on Oct 28, 2016

It would be very hard to ensure that.

asveikau · on Oct 28, 2016

Especially when they decide that the most logical way to protect them is to destroy them.

tener · on Oct 28, 2016

I have yet to write down a o longer post on this, but honestly I think AI have little chance to operate on purely logical terms. This thinking about paradoxes comes from the time when people approached AI with Prolog etc. They didn't get really far.

treehau5 · on Oct 28, 2016

but don't we then just unleash the huge EMP bomb?

nkozyra · on Oct 28, 2016

Sounds good. Sadly, six months ago the bot brain predicted the day we were planning with .98 confidence.

idlewords · on Oct 28, 2016

But then it got sucked in to TVTropes, and the world is saved.

TeMPOraL · on Oct 28, 2016

Not if any stage of the unleashing process is connected to the Internet.

jasikpark · on Oct 28, 2016

Humans run on electricity also.

sctb · on Oct 28, 2016

We detached this subthread from https://news.ycombinator.com/item?id=12815939 and marked it off-topic.

jdimov10 · on Oct 28, 2016

Duplicate of https://news.ycombinator.com/item?id=12800228

msh · on Oct 28, 2016

What could ever go wrong with this....