I worked at TechCrunch for a year focusing mainly on CrunchBase data analysis.
There are a number of problems with your analysis:
I suspect you are using founding year. Unfortunately there is a tremendous lag between when a company is founded and when it is entered into CrunchBase. The same is true of funding data in particular where we saw only around 20% of fundings within a quarter of happening and only about 70% a year out. That is due to CrunchBase's continued growth (it's much better known now) as well as a natural reporting lag.
Second, CrunchBase is a very new product and as it turns out data is only reliable as far back as 2007 and even that took a lot of work. Some time has been spent pushing to get more accurate data further back but it is scattershot at best.
(Possibly, can't tell) CrunchBase investments are stored in a number of currencies, did you make sure to recalculate them? Yen can really cause problems :)
Lastly, your NASDAQ chart is from 94 - 2005 which never overlaps with reliable CrunchBase data even by your own admission. I suspect that graph will be a bit more telling and worrisome potentially: http://www.google.com//finance?chdnp=1&chdd=1&chds=1....
I do not necessarily think we are in a bubble and I am happy to see people diving in on data I just wanted to point these things out as it would be irresponsible not to.
Wouldn't it also be the case that well funded startups and super angels are much more likely to disclose sources and levels of funding to TechCrunch at an early stage for the PR than they were back when TC was a popular blog rather than a big-name publisher?
Presumably there are also some sort of editorial policies over what sort of startups merit inclusion in CrunchBase?
The 2010 drop in startups covered by could easily be a reflection of reduced interest in covering smaller startups that don't effectively court TC and don't disclose relevant funding data.
I'm not sure the amount or willingness of sources has changed that much since the AOL acquisition though it is possible.
But there is definitely a huge selection bias of contributors. People love to disclose when they invested in the hot startup and neglect to mention their big mistakes retroactively.
I suspect the biggest reason for the drop is just a smaller team and less commitment like the OP said. There has been a lot of headcount flux at TC for awhile and even more so since the AOL deal.
CrunchBase does not by any means offer stable and comprehensive picture over the years. It has been maintained with different levels of commitment, especially for entering in old data (since Crunchbase did not exist in 2000).
CrunchBase is a great resource, but doing those kinds of statistics without appropriate research of how consistent is the coverage is at least sloppy if not willful negligence.
I don't think you did a good job of disclosing or the problems that might lie in the data.
Since I've been following crunchbase both from the data perspective and from the perspective of how much resources TechCrunch is devoting to it, I can assure you it varies wildly.
Also the editorial policy has changed a lot in that period, in about 2007/2008 they started putting much more emphasis on international start-ups. So there are specific skews that you should be aware of and disclose them in the blog post.
So I think you did a nice job, but conclusions are not to be trusted at tall. Yes it might be the most well-maintained open data out there, but it does not make it in any more useful for this kind of analysis.
This was what struck me, it seems to be a fairly clear pointer to a lowering of standards (unless someone can come up with a convincing reason why start ups have suddenly got significantly more viable in the last couple of years).
"This is actually a great time to be a startup founder" - yep, so was 1999, no business plan needed, just a half baked idea and people will start throwing money at you. This is not necessarily a good thing.
You could make the same point about profit but most metrics are difficult for start ups as obviously start ups may spend a significant time with low revenue or profits intentionally.
I wouldn't trust CrunchBase for reliable trend data because its coverage over time is unlikely to be consistent. But, a similar analysis based on legally-required (and thus comprehensive) SEC Form Ds might give stronger insight.
At least two Form-D watching services have been mentioned previously on HN:
There are a number of problems with your analysis:
I suspect you are using founding year. Unfortunately there is a tremendous lag between when a company is founded and when it is entered into CrunchBase. The same is true of funding data in particular where we saw only around 20% of fundings within a quarter of happening and only about 70% a year out. That is due to CrunchBase's continued growth (it's much better known now) as well as a natural reporting lag.
Second, CrunchBase is a very new product and as it turns out data is only reliable as far back as 2007 and even that took a lot of work. Some time has been spent pushing to get more accurate data further back but it is scattershot at best.
(Possibly, can't tell) CrunchBase investments are stored in a number of currencies, did you make sure to recalculate them? Yen can really cause problems :)
Lastly, your NASDAQ chart is from 94 - 2005 which never overlaps with reliable CrunchBase data even by your own admission. I suspect that graph will be a bit more telling and worrisome potentially: http://www.google.com//finance?chdnp=1&chdd=1&chds=1....
I do not necessarily think we are in a bubble and I am happy to see people diving in on data I just wanted to point these things out as it would be irresponsible not to.