I don't understand, what is the input and output of this at a conceptual level? ...

lyle_nel · on Feb 28, 2016

There is total separation between the myspace list and the rest of the program. The only thing the simulation can do is query hit or miss. Through these hits and misses it figures out what are the words inside the myspace list. The situation is exactly the same when the list is full of md5 hashes, except we hash the candidate password now before we check for a hit or miss.

"what is the input and output of this at a conceptual level?" Our input is the dataset we want to crack, and our output is the passwords that were successfully cracked. In a more conventional scenario, we would have a list of hashses that need to be cracked. So the steps are

1. pick parents at random

2. crossover and maybe mutate

3. hash the child

4. see if the hashed child exists as a password in our list of hashses we want to crack

5. if it does, add the child to the end of the container and pop the oldest organism from the front.

6. goto start

lyle_nel · on Feb 28, 2016

I changed the example since I realised that it is a bit hard to understand. The example now shows how to crack a list of md5 hashes. All I did was convert the myspace list to md5 hashes. You can now try and crack the myspace list in the program's md5 mode. Sorry for the confusion.

justifier · on Feb 28, 2016

adding possible combinations of characters, ngrams, from the 'organisms' file and checking them against a leaked password list, apparently tested on the 'rock_you' list of leaked myspace passwords, which is ommitted from the repo but the repo has a standin empty file where you put your own list of leaked passwords

it will genetically run through all ngrams and check if it is in the pass list to determine how to advance the evolution of the algorithm

does this crack passwords genetically? well, yes and sorta

it's a proof of concept against an existing list of real leaked passwords, proving that it could efficiently crack a number of these real passwords real people were using to protect real personal data

but from there you have to extrapolate the effectiveness on all possible passwords..

if you train against the myspace list then the passwords would have to resemble myspace passwords

can you train the algo on the myspace list then try to crack nuclear codes? very unlikely, unless government officials are protecting their access with passes like 'WARMACHINEROX'

krick · on Feb 28, 2016

> unless government officials are protecting their access with passes like 'WARMACHINEROX'

I chuckled, imagining how some government official changes his password right now.

OJFord · on Feb 28, 2016

I chuckled, after misreading that as "warm ache in eRox", and wondering what 'eRox' were...

ivoras · on Feb 28, 2016

I think it's about finding substrings which are (very) likely to appear in passwords.

Like substrings "pass" and "word" in "password".

stavros · on Feb 28, 2016

The fitness function is "how many passwords did this individual match", basically.

lyle_nel · on Feb 28, 2016

Not really, there is no fitness function, there is only some implicit selection pressure. Remember, a single individual is a single password(a string). The program does not keep track of how many viable offspring a parent produces. All an organism does is have sex with other strings in the hope of producing viable offspring(another cracked password). This carries the genetic information on to the next generation while the older generation keeps dying indiscriminatingly, whether they were fit or not since there is no fitness function.