It's Michel Zalewski's (known to his friends and those, like me, who fear him as "lcamtuf") code.
It's a command-line Unix tool. It builds with almost no exotic deps other than GNU IDN.
It's a pure-async wordlist-driven crawler/fuzzer. It is screaming fast on the network I'm testing it on. Because it's async, it's not bottlenecked on demand-threading for each request.
It generates pretty HTML reports. Well, pretty for a C program.
It's Apache licensed. There are things in here I'd probably steal, like the URL parser; it's much tighter than mine.
How useful is this going to be for YC-style apps? Meh? You should definitely run it on your QA instance. Make sure you give it a login cookie to run with. It will find things. But where it looks like Skipfish is really going to do some damage is on the enterprisey J2EE- and .NET-stack apps.
My (uneducated about security) guess is that it is because most bugs are in code written in a language by a programmer, not runtime bugs or bugs in the language itself. And that when it comes to languages, how easy it is to shoot yourself in the proverbial foot could (should?) be looked at as being as (or more, depending on your POV?) important then how secure the runtime is.
Wow, this actually found a moderate security vulnerability in my website!
Fetching http://mysite.com/static/ returned the plaintext template of my index.html file, which I must have accidentally copied in some manual hackery during a broken push (none of my scripts copy it normally)
However, I almost missed the warning: Skipfish complained because the page lacked a content type, and it was buried in several similar warnings. I'd like it to recognize potentially templated files, which is a much more serious vulnerability than missing a 'text/plain' content type. Years of staring at unimportant compiler warnings might cause people to miss gems like this.
Yes. Start with "analysis.c" --- the code in their works with completed request/response pairs. Look for "scrape_response()". Note how the strcspn's and pointer math approximate simple, unrolled regular expressions.
The database code (the "pivot tree") is a bit tangley as a data structure.
http_client.c has really tight URL parsing code.
This isn't written in a very modern style (I like that about it, though). For instance, look at "check_for_stuff()", which implements basic content sniffing. In a modern C program, you'd probably see an array of structs containing function pointers and names, each pointing to a tiny function looking for a different bit of content. Here it's one big unrolled function. Likewise, modern C code would probably just regex the HTML responses instead of hand-coding HTML parsing. But on the other hand this actually exists and works, and that's a good goal to have too.
The I/O loop in Skipfish is definitely the right way to do network programming in C. This program is simpler and faster because it doesn't waste time with threads. But if you do something similar, use libevent.
It asynchronously launches thousands of requests based on a very large wordlist.
It scrapes the responses and spiders them.
As it identifies actual pages, it fuzzes them with strings that tickle web app flaws. It analyzes the responses. For instance, it tries to inject "skipfish://whatever" URLs into fields, parameters, and links; then it looks to see whethe those URLs appear in "hot" places in the response, like headers or link tags.
It's looking for --- primarily --- :
* Cross-site scripting and content injection
* Best practices problems (like failing to declare charsets properly)
* Forms without XSRF tokens
* SQL injection
It's better than anything that I've written but I will hazard a guess that it finds things well in that order. It's bound to get better over time.
It's a command-line Unix tool. It builds with almost no exotic deps other than GNU IDN.
It's a pure-async wordlist-driven crawler/fuzzer. It is screaming fast on the network I'm testing it on. Because it's async, it's not bottlenecked on demand-threading for each request.
It generates pretty HTML reports. Well, pretty for a C program.
It's Apache licensed. There are things in here I'd probably steal, like the URL parser; it's much tighter than mine.
How useful is this going to be for YC-style apps? Meh? You should definitely run it on your QA instance. Make sure you give it a login cookie to run with. It will find things. But where it looks like Skipfish is really going to do some damage is on the enterprisey J2EE- and .NET-stack apps.