Hacker News new | past | comments | ask | show | jobs | submit login

I don't understand, what is the input and output of this at a conceptual level?

One of the examples apparently takes a list of leaked passwords from MySpace - and so what does it "crack" then? I think the phrase "high impact substrings" down in the explanation is the key, but it's not wholly clear to me what the ultimate purpose of this is.

The 'GA without a fitness score' idea seems interesting, but it would help to know what exactly the algorithm is trying to do.




There is total separation between the myspace list and the rest of the program. The only thing the simulation can do is query hit or miss. Through these hits and misses it figures out what are the words inside the myspace list. The situation is exactly the same when the list is full of md5 hashes, except we hash the candidate password now before we check for a hit or miss.

"what is the input and output of this at a conceptual level?" Our input is the dataset we want to crack, and our output is the passwords that were successfully cracked. In a more conventional scenario, we would have a list of hashses that need to be cracked. So the steps are

1. pick parents at random

2. crossover and maybe mutate

3. hash the child

4. see if the hashed child exists as a password in our list of hashses we want to crack

5. if it does, add the child to the end of the container and pop the oldest organism from the front.

6. goto start


I changed the example since I realised that it is a bit hard to understand. The example now shows how to crack a list of md5 hashes. All I did was convert the myspace list to md5 hashes. You can now try and crack the myspace list in the program's md5 mode. Sorry for the confusion.


adding possible combinations of characters, ngrams, from the 'organisms' file and checking them against a leaked password list, apparently tested on the 'rock_you' list of leaked myspace passwords, which is ommitted from the repo but the repo has a standin empty file where you put your own list of leaked passwords

it will genetically run through all ngrams and check if it is in the pass list to determine how to advance the evolution of the algorithm

does this crack passwords genetically? well, yes and sorta

it's a proof of concept against an existing list of real leaked passwords, proving that it could efficiently crack a number of these real passwords real people were using to protect real personal data

but from there you have to extrapolate the effectiveness on all possible passwords..

if you train against the myspace list then the passwords would have to resemble myspace passwords

can you train the algo on the myspace list then try to crack nuclear codes? very unlikely, unless government officials are protecting their access with passes like 'WARMACHINEROX'


> unless government officials are protecting their access with passes like 'WARMACHINEROX'

I chuckled, imagining how some government official changes his password right now.


I chuckled, after misreading that as "warm ache in eRox", and wondering what 'eRox' were...


I think it's about finding substrings which are (very) likely to appear in passwords.

Like substrings "pass" and "word" in "password".


The fitness function is "how many passwords did this individual match", basically.


Not really, there is no fitness function, there is only some implicit selection pressure. Remember, a single individual is a single password(a string). The program does not keep track of how many viable offspring a parent produces. All an organism does is have sex with other strings in the hope of producing viable offspring(another cracked password). This carries the genetic information on to the next generation while the older generation keeps dying indiscriminatingly, whether they were fit or not since there is no fitness function.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: