Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Cracking passwords with a simple genetic algorithm (github.com/lyle-nel)
149 points by lyle_nel on Feb 28, 2016 | hide | past | favorite | 58 comments



I don't understand, what is the input and output of this at a conceptual level?

One of the examples apparently takes a list of leaked passwords from MySpace - and so what does it "crack" then? I think the phrase "high impact substrings" down in the explanation is the key, but it's not wholly clear to me what the ultimate purpose of this is.

The 'GA without a fitness score' idea seems interesting, but it would help to know what exactly the algorithm is trying to do.


There is total separation between the myspace list and the rest of the program. The only thing the simulation can do is query hit or miss. Through these hits and misses it figures out what are the words inside the myspace list. The situation is exactly the same when the list is full of md5 hashes, except we hash the candidate password now before we check for a hit or miss.

"what is the input and output of this at a conceptual level?" Our input is the dataset we want to crack, and our output is the passwords that were successfully cracked. In a more conventional scenario, we would have a list of hashses that need to be cracked. So the steps are

1. pick parents at random

2. crossover and maybe mutate

3. hash the child

4. see if the hashed child exists as a password in our list of hashses we want to crack

5. if it does, add the child to the end of the container and pop the oldest organism from the front.

6. goto start


I changed the example since I realised that it is a bit hard to understand. The example now shows how to crack a list of md5 hashes. All I did was convert the myspace list to md5 hashes. You can now try and crack the myspace list in the program's md5 mode. Sorry for the confusion.


adding possible combinations of characters, ngrams, from the 'organisms' file and checking them against a leaked password list, apparently tested on the 'rock_you' list of leaked myspace passwords, which is ommitted from the repo but the repo has a standin empty file where you put your own list of leaked passwords

it will genetically run through all ngrams and check if it is in the pass list to determine how to advance the evolution of the algorithm

does this crack passwords genetically? well, yes and sorta

it's a proof of concept against an existing list of real leaked passwords, proving that it could efficiently crack a number of these real passwords real people were using to protect real personal data

but from there you have to extrapolate the effectiveness on all possible passwords..

if you train against the myspace list then the passwords would have to resemble myspace passwords

can you train the algo on the myspace list then try to crack nuclear codes? very unlikely, unless government officials are protecting their access with passes like 'WARMACHINEROX'


> unless government officials are protecting their access with passes like 'WARMACHINEROX'

I chuckled, imagining how some government official changes his password right now.


I chuckled, after misreading that as "warm ache in eRox", and wondering what 'eRox' were...


I think it's about finding substrings which are (very) likely to appear in passwords.

Like substrings "pass" and "word" in "password".


The fitness function is "how many passwords did this individual match", basically.


Not really, there is no fitness function, there is only some implicit selection pressure. Remember, a single individual is a single password(a string). The program does not keep track of how many viable offspring a parent produces. All an organism does is have sex with other strings in the hope of producing viable offspring(another cracked password). This carries the genetic information on to the next generation while the older generation keeps dying indiscriminatingly, whether they were fit or not since there is no fitness function.


It's not accurate to say there's no fitness function. There is a fitness function, but it is not a continuous function. (There's no way to have a GA without some notion of fitness.)


There is no explicit function that decides which organisms are more fit than others. But you are right that there is selection pressure since all organisms have the same lifetime(they are eventually popped from front as newer ones are pushed to back). This means that if they did not produce viable offspring, the genetic information does not stay in the simulation.


Like I say above (my comment was tongue-in-cheek but) I really like the idea of a FIFO queue rather than an explicit fitness function. I'm thinking of writing a little a-life/learner thingy just for fun at some point and I'll definitely nick your idea. Is it a common approach in genetic algorithms? Is there some research on it?

Also, for some reason I thought you actually used a LIFO stack. So, what if you did? My guess is organisms that put more offspring on the top of the queue would take longer to be popped from it, so that they'd have a better chance to produce more offspring. They would also benefit from organisms almost as successful as themselves (but not more).

Is this also a known approach, and if so, how well does it work?


The queue structure is one of my own ideas. It is possible that it has been done before, but I did not check. There does not seem to be a lot of research on this topic. That being said, it is a nice toy problem to explore genetic algorithms and diversity.

Regarding the stack approach, I would suspect that new genetic information would immediately be lost since a new organism dies before having much chance to produce offspring. In the off chance it manages to produce offspring, those offspring would also be lost almost immediately since they get popped first.

If you are really curious, you can change line 163 in vivarium.cpp from a pop_front() to a pop_back() to see what happens.


Thanks - my assumption was that the speedup in dying would increase the reproductive pressure and make the whole process a lot more competitive. But GAs are not my specialty (far from).

I am indeed very curious, but also very, very short on time! Before I can justify picking up someone else's code I have a couple of pieces of coursework and a dissertation to work on - but I might get some time round abouts next weekend, I'll let you know how I get on :)


I understand, I tested it for you and my suspicions where right. The simulation simply stalls early on. Nothing interesting happens after that.


Oh well - so much for that then!

I'll still try to give your code a whirl next weekend. I know how useful it is to get some feedback for your projects.


I appreciate it.


Interesting approach.

Especially if you were to feed the training set with most-used password lists from leaked databases or similar.

Do you have any performance metrics against bruteforce for example?


Yes the training set should be representative of the candidates you want to crack, also the larger the training set is, the better it seems to perform. It is also important to note that, as passwords are being cracked they become part of the training set. So the program can start of completely blind then it will figure out common structures as the simulation progresses.

I have not put them head to head, but the program seems to find passwords that are far outside the tractable search-space for a bruteforce attack. Passwords like the following are found withing a few hours:

34 a111111111111111111111111111111111

20 12345678900987654321

19 9876543211234567089

19 password12345678910

18 sexytinkerbell2013

18 prettyprincess4812

18 mickeymouseforever

18 littlepinkprincess

18 iloveyoumomforever


So XKCD style correcthorsebatterystaple-esque passwords would fall in no time?


No, the point of that style of password is that you choose the words randomly. Random words are just easier to remember than random characters.


Almost yea. In a few hours it manages to find these horse related passwords.

15 strawberryhorse

15 ilovehorses1995

15 ilovehorses1994

15 ilovehorses1234

15 horselover4ever


Cool system. For clarity though, those aren't xkcd style passwords. They're words that form a normal pattern. Xkcd requires that the words are randomly generated.


Sorry for going slightly OT but I think more people should know this:

xkcd style passwords are actually called diceware passwords [1], the wikipedia article [2] links to arstechnica that reported in 2014 [3] that the original author upped his recommended diceware password length to at least 6 (random) words.

You can obviously always use a bigger dictionary too (as long as you choose truly random).

[1] http://world.std.com/~reinhold/diceware.html funny enough, chromium doesn't like the ssl versions the site uses "ERR_SSL_FALLBACK_BEYOND_MINIMUM_VERSION"

[2] https://en.wikipedia.org/wiki/Diceware

[3] http://arstechnica.com/information-technology/2014/03/dicewa...


this is easier to understand with a bit of math..

let's say you have a pad lock with a 4 number combination, each of those numbers being a ring of 6 numbers themselves(o)

we have four 'slots' with six possible numbers for each slot

that means there are a total of 1296 combinations, (possible values)^(slots), 6^4=1296

a contemporary computer can do 1300 calculations perceptually instantaneously

now we look at randall and diceware's suggestion of using a series of words to bulk out your password, since a word in english is built of a possible combination of 26 character alphabet the possible values becomes 26

for a four letter word, all possible combinations of those 26 characters becomes 26^4=456,976 possible combinations, again, all values can be computed in nanoseconds

what i think randall and diceware is hoping to achieve is a higher permutation value with little tax on the user's memory

conceptually [c,o,r,r,e,c,t,h,o,r,s,e,b,a,t,t,e,r,y,s,t,a,p,l,e] is 25 characters long with each 'slot' having 26 possible combinations.. 26^25~23x10^34, that is a huge number of possible permutations which makes brute forcing finally impractical

but here is where the whole thing starts to break down a bit..

because the suggestion is to use real words to aide in remembering which negates character length in lieu of number of words

that also means that figure above is significantly less because the first in that list of 456,976 permutations is ['a','a','a','a'] which would be invalid in regard to existing words, so in reality that 450,000 figure is significantly less

'correcthorsebatterystaple' can be broken into 25 characters 'slots' or 4 word 'slots': [correct,horse,battery,staple]

what was 26^25, is now (possible words)^4

we can keep going, look at randall's most recent book(i), using the 1000 most used words

possible words becomes 1000 and 4 slots we are looking at 1000^4=1,000,000,000,000 possible permutations

which will take a contemporary laptop a few hours, respectively, to churn out

all this broken into a chart:

     81 = 3**4 : [(3),(3),(3),(3)]
     253 = 3**5 : [(3),(3),(3),(3),(3)] 
     256 = 4**4 : [(4),(4),(4),(4)] 
     456976 = 26**4 : [(26),(26),(26),(26)]
     1000000000000 = 1000**4 [(1000),(1000),(1000),(1000)]
    ~230000000000000000000000000000000000+ = 26**25 : [(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26),(26)]
    ~640000000000000000000000000000000000000000000+ = (26*2+10)**25 : [(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62),(62)]
what this is showing you, is that anything that makes things easier for you also will make things easier for a cracker

for best security increase the number of characters when possible avoiding words.. using a combination of lowercase(26 characters) and upper case(26 characters) and numbers(0-9, 10) for 26+26+10=62 possible values.. and put those characters in as many slots as you can muster

(o) http://www.padlocks4less.com/media/catalog/product/cache/1/i...

(i) https://xkcd.com/thing-explainer/


diceware is a list of 7776 words. You pick 6 words. (That's 7776^6). If the attacker knows you're using Diceware, knows what word list you're using, and knows how many words you're using (ie worst case) you get 12.9 bits of entropy per word. If the attacker doesn't know how many words you're using, or they don't don't what wordlist you're using you get more than 12.9 bits of entropy per word.

You've made a mistake here:

> 'correcthorsebatterystaple' can be broken into 25 characters 'slots' or 4 word 'slots': [correct,horse,battery,staple]

> what was 26^25, is now (possible words)^4

1) No-one suggests using 4 words. Randall uses 4 words because it fit in the cartoon. Diceware suggests using 6 words for minimum.

2) No-one at all suggests using just the 1,000 "most common words". Diceware is a list of 7776 words. Except there's more than one diceware list.


i should think you could grok from my comment that i'm on board for your two points

why explain away the comic as just being an example but then confront the comment as being unaware?

why did i use 4 word 'slots' instead of diceware's suggested 6 minimum? because the comic did and that was referenced in the gp

why did i use 1000 words instead of diceware's 7776? because the number of words is arbitrary for an explanation, and, as i stated, i was playfully referencing, and intentionally promoting, more of randall's work

your comment fails to negate anything in my explanation, which was meant as a means of explaining how to think about these things

what the effectiveness of the ga cracking repo shows is that people are using even less possible words than dice's 7776 or randall's 1000

and the concluding statements that more variety with less patterns equates to better security stands

> The complete list contains 7776 short words, abbreviations and easy-to-remember character strings. The average length of each word is about 4.2 characters(o)

i'll be kind and round that 4.2 up to 5

> Diceware suggests using 6 words for minimum

so we are looking at 6 words of 5 characters.. 30 total characters

let's compare diceware to individual characters:

    ~220000000000000000000000+ = 7776**6 : diceware with 6 word minimum
    ~800000000000000000000000000000000000000000+ = 7776**10 : diceware with 10 words
    ~2800000000000000000000000000000000000000000000+ = 26**30 : english alphabet, 30 characters
>what this is showing you, is that anything that makes things easier for you also will make things easier for a cracker

(o) http://world.std.com/~reinhold/diceware.html


Why are you comparing a 30 character word with a 6 word passphrase?

You need to look at bits of entropy.

A 6 word passphrase is about the same as a 12 character password if you use any printable ASCII character.


because entropy is directly related to the number of possible values and number of values in the password, which is what i am comparing

i then went on to establish those values as being only those in the english alphabet, 26 characters, stead your referenced ascii set or even the diceware character set which allows some special characters

then i took the minimum suggested diceware length of 6 words, rounded the average word length of 4.2 up to 5, which gives us 6*5=30 characters

so one diceware password example:

    affixafireafootagainagateagave
    
    can be seen as diceware would have it, 6 words of length 5:
      [affix, afire, afoot, again, agate, agave]
    or 30 individual characters:
      [a,f,f,i,x,a,f,i,r,e,a,f,o,o,t,a,g,a,i,n,a,g,a,t,e,a,g,a,v,e]
so utilising any underlying pattern, here only using the diceware wordset, weakens a passphrase of equal length

or if you want to abstract away from the actual values that you use to determine a passphrase's strength you could say: diceware lowers the entropy of an individually random value passphrase of equal length

but that phrase unecesarily contains verbage that can confuse whereas my previous comment showed all of the number possible permutations in such a way as to easily see that one is greater than the other making the guess space larger


Just to add to that. The program as it currently is rarely finds passwords that consist of more than 3 long words, which can still amount to decently long passwords. I have not tried to give it an english dictionary as a seed population. Maybe it would perform better if I do that.


I'd be interested to see how it copes with Diceware. Maybe generate a list of 1000 Diceware passphrases (or several lists of 4, 5, and 6 word phrases) and use the diceware wordlist as a seed population.


Sure I'll run some tests for you.

I ran 2 experiments each lasting 120 seconds.

For the diceware list as a seed population it finds 4570 passwords. The 6 longest ones are:

11 michelle420

11 karen123456

11 iloveyouu16

11 iloveyou789

11 iloveyou234

11 iloveyou191

For the top 2000 ngrams from the rock_you list(which is the current list used) it finds 5317 passwords. The top 6 longest passwords are:

14 fuckyouhackers

13 fuckyouhahaha

11 sexylove889

11 scooter1234

11 princess123

11 iloveyouu16


It's not clear to me what this does, or if it's useful. You have a fixed-length vector of strings and you make cross-overs between elements in the vector based on non-uniform random distributions. That is to give more chances to younger offspring to procreate. Then you apparently remove the oldest (i.e. first) strings in the vector, and replace them by appending offspring. But then you also mention you are not deleting older organisms. I'm a bit confused.

I think it's wrong to say you don't have a fitness function. Your fitness function is "Is this string a password?" and the score is binary. Why do you say there is no fitness function?

Also, in your examples, is "aaaaaaaaaaaaaaaa" really a password?


The part about not deleting organisms, is a small caveat that I omitted in most of the discussion since it makes the algorithm just a bit harder to understand. To clarify that point, since you asked, the organisms can be popped from the front of the container if we provide a maximum population size. If we do not provide a maximum population size, we do not delete the organisms. Older organisms do however lie dormant due to the non-uniform distribution, thereby providing the same advantage as selection while preserving a greater degree of genetic diversity. If there is anything else that is not clear to you then I am open to any questions you might have. I will help where I can to clarify things.

With regard to the fitness function, I think I agree with you that checking if the offspring matches a password will fall in the category of a fitness function albeit a binary one. I will update my description to include that, thanks.

With regard to if it is useful, I will let it stand or fall on its own merits. If people are going to use it to find bad passwords, then I would say yes it is. It might be pertinent to mention that I am interested in genetic algorithms in general and this is a good practical way of exploring my own theoretical ideas.

Yes, "aaaaaaaaaaaaaaaa" really is a password. Have a look at the rock_you list of passwords and sort them by length. There are some extremely long but silly passwords in there.


Wow, I am surprised about the "aaa..." password. Thanks for clearing that up.

When I asked how useful this was, I assumed the organisms were removed from the front of the list. If you keep everything, then yes I can see how this can be used to crack passwords. It's not clear to me how it would be useful otherwise, as which organisms stay in the list is not homogeneously random, and so one organism might be quite unlucky (even though its offspring could have been very successful).

So basically this can be viewed as an accelerated brute-force of long/complex passwords?


I would not put it in the category of brute-force since it really does have all the properties of a genetic algorithm. But yes it does succeed to a greater extent at cracking passwords that are intractable(long/complex) with a brute-force approach.


How fast can it go? And alternatively, how fast can it go with hashcat?

This looks awesome, but without the GPU speed advantage I'm not sure how well it'll fare against the almost 80GH/s I can do on my desktop with cudahashcat.


When using hashcat, it will run as fast as hashcat's dictionary mode. Since we are being cute with named pipes, hashcat thinks its reading a dictionary. There is a caching problem with hashcat though, that is why I set the segment size to 1, so that it can update siga's gene pool as often as possible. I am optimistic that there might be a solution to that so that we can leverage hashcat to its full potential.

It might be interesting to note that the older versions of hashcat seems to have supported reading candidate passwords straight from stdin. I have not played with that yet, but I suspect it might work a bit better.


one thing to note is that the gpu hashcat uses a very complex cacheing, which basically boils down to a buffer per password length, i.e. a buffer for 12 character passwords, a buffer for 13 character passwords, and shoves passwords in accordingly. Once a password buffer hits a certain threshold, they all get sent to be cracked. This means you may not get feedback for a certain length try for a while, if it's not a common length. One thing to consider would be to figure out that threshold and keep things internally until you hit it, so you can fill a buffer in one shot. Realistically, you aren't going to get around this very easily, as this buffering is done for performance reasons, in that some of the kernels used rely on same length passwords for performance increases.


Thanks, I did not know that. It is something to think about.


Another thing to keep in mind is that hashcat can generally test guesses faster then you can feed them over the various connections, i.e. memory -> memory copy for pipe memory -> pcie -> card for loading to the GPU. Also, in cracking rigs, you often have multiple cards over pcie x1 connections, further lowering bandwidth for those cards. One thing I'd highly recommend is trying it with a rules. This allows the GPU to expand your word guesses out on the GPU itself, turning a single guess into thousands of guesses or more. This bypasses memory bandwidth issues, as it's done on the GPU itself, and also means that a lot of rather trivial checks can happen automatically (try it with a 1 at the end, try it with a 2 at the end, and so forth).


It uses a stack [1]. Therefore, I like it <3

[1] The stack is used in place of an explicit fitness function. I have nothing against fitness functions. But the bit about using a short stack instead, is really cool.

Edit: Woa. Hang on. New organisms are pushed to the back of the structure while dying ones are popped off the front- so that's a queue. My bad. But I still like it.


That's correct. It is a queue


I loved this, since it's a different approach to genetic programming (with the FIFO list) that could work in different kinds of problems. Also the way it is applied to password cracking itself, is an extremely interesting way to exploit the non-random properties of real world passwords.


I really appreciate your enthusiasm. It is one of the core concepts and passwords seemed to be a good way to explore my FIFO idea in a practical manner. It might well have been done before, but from preliminary searches I could not find anything on it(not that I searched really hard). Due to its simplicity, I suspect it will lend itself to some pretty efficient implementations.

I would be interested to hear what other kinds of problems you see it being used for.


Not sure exactly what would be another application, but it's a mental model that looks "general", when we have a few prerequisites, that is, not just a solution, but multiple solutions to a given problem (find a password among many hashes in this case), and where solutions could be related in some way so that using the past solutions could restrict the search space.


This is really cool. I'm surprised just finding common substrings through evolution works so well. Wouldn't just using the existing substrings in a dictionary work better than mutating new and random ones?

And is it really moral to do this? The only applications of this are pretty unethical.


I have found that top say 2000 ngrams from a large password list is a good starting population. That being said, it learns the structure of passwords surprisingly fast if you start of with purely random organisms.

As for the moral argument, even though this has probably been discussed at many different venues on many different occasions, I lean toward the sentiment that having good tools to gauge how good your password policies are is essential. Not to say that this is a good tool, but it is _a_ tool.


Would you care to explain like I am 5? Layman's terms... What are you doing and how are you doing it?


The topic assumes some knowledge of genetic algorithms as well as password hashing.

You can look at the section "How it work" on my github page for some diagrams and a decently lengthy description.

In simple words: Each organism is a password(a string). Passwords can mix(have sex) with each other in novel ways to produce offspring. If the offspring(a mix of the parents) also matches a password, it is added to the list of organisms. Older organisms eventually die, so it is up to the offspring to keep producing offspring in order to preserve the genetic line. The result of this simulation is that high impact substrings like "love", "luv", "mother" and "fuck" stay for a very long time in the genepool since any organism that consists of at least one of them will have a better chance to produce a child that is a password(viable offspring).

If you still don't understand, feel free to ask more questions.


Even simpler, it:

1. Finds the substrings that are popular in passwords. 2. Combines them to make likely passwords that haven't been tried yet. 3. Rinse and repeat

Useful against noob passwords - of which are there many in the wild - and more effective for some applications than the usual dictionary search.

But ineffective against randomised strings like KPF27k5ANv791P2Yi88xd88D7iALX3kH, or against XKCD random word salad passwords like QuantumGerundApoptosis as long as they include at least one unusual word.


Yes it is completely useless against random strings as one would expect, but with word salad it not completely useless. For example, it can find words like "tequieromuchogerald" and "paulaalejandrailove" easily enough. I haven't fully explored its ability to do this, but I suspect starting with a dictionary of words instead of a random population might aid it in finding XKCD like random passwords. In general I don't disagree though, passphrases if chosen correctly can be completely intractable to a password cracker.


It might be interesting to use the word frequency of the target language as some sort of weight for each word, or maybe as a multiplier for the number of times that word will get pushed to the back of the queue, i.e. the words with the most frequency get to be repeated more than 1 time when pushed to the back, or something like that.


I see what you are saying. To some extent this is already true ,since high frequency ngrams(substrings) already dominate the gene-pool if they happen to facilitate the production of a large amount viable offspring. That is why you would see swearwords with high frequency since that is what many people use for passwords(in the case of the myspace list).


~OT: I predict John Koza's Genetic Programming [1] approach will enjoy a revival in the near future. We've discussed the impact of the recent crop of "AI" systems on various work fields, but my bet is on IT programming being one of the earlier successes of having machines doing grunt work.

[1]: http://www.genetic-programming.com/#_John_Koza%E2%80%99s_Pub...


Sounds interesting. Can you expand on why?


There are interesting developments in formally defined and verified programs (including implementations). For certain domains that are non-intuitive, exploring the solution-space is now much nearer in reach.

For example (as I noted in private communication to John Koza) exploring the protocol-space of distributed consensus algorithms on top of something like Microsoft Research's IronFleet (with the GP generating Gafny code) appears to be both possible and imo a fruitful avenue for researchers in the field.

[p.s. Note that Genetic Programming is distinct from Genetic Algorithms.]


because these genetic algorithms are best suited for consistent behaviour

i noted in another comment that this ga cracker repo will only work on passes that resemble the ones it is trained on, but like all genetic algos is aggressively ineffective on inputs resembling anything else

as for programming the practice, it is embarrassingly redundant and consistent due to the structure of programming being based on grammars and logic

so imagine a program that has a data set

you need ten bits of information about that data set, like max, min, mean, etc..

so you write 10 individual functions to loop over the data and interpret it

that is all very consistent behavior and code

now imagine you write a simple genetic algorithm that can interpret your interests and write those simple loops for you on the fly

ten functions of 10 lines each is 100 lines of code

one function of 30 lines that can generate then discard those ten previous functions gives your program a 70 line and immeasurable man hour advantage

think if a search engine needed an engineer to explicitly write the discovery function of every possible query, it would be impractical, so instead you imagine generalisations that permit simple functions to take many varying inputs and produce many varying outputs

what i think the parent comment is suggesting is that these generalisation functions themselves could be built by computers and one possible method being genetic algorithms

furthering the abstraction

all puns intended




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: