Hacker News new | past | comments | ask | show | jobs | submit login

An API for the raw data that underlies this system would be extremely useful to academics. Databases like compustat and execucomp are expensive and lack some of the most interesting details in SEC documents. I have worked on trying to extract deep information in footnotes in financial statements (e.g. foreign cash holdings and option exercise tax shields) and found even Turk/Crowdflower couldn't handle the complexity. If they can figure out an algorithm to pull out such data, they will have both a great academic and private sector product.



If by the raw data you mean the actual filings with the SEC, then the raw data is available via ftp from the SEC itself.

http://sec.gov/edgar/searchedgar/ftpusers.htm

Parsing the edgar documents is a mixed bag. Many of the older filings and some of the more recent ones are in text rather than HTML. Finding footnotes in the HTML is probably not that bad but the issue is the lack of complete coverage where you miss the HTML footnote or the document is in text instead

I've worked on parsing the HTML tables for tables like Balance Sheet, Cash Flow, etc. It was problematic and I only got about 70% of the way there but I think a more complex rule base could get to 90%. The issue is that 90% isn't really good enough for many users.

I've heard that CapitalIQ/Thompson Reuters actually use Indian financial professionals to manually extract the info. This could be a good way to backfill/double check missing/bad values but I chose not to try that path. In the end, many of the potential customers will opt for paying a much higher price for a better brand and/or higher level processing like normalizing accounting standards.


I was really excited that they'd figured out a way to cull balance sheet data out of the filings. Doesn't appear so though, "XYZ Corp filed their annual statement today, you can view it here"


For historical data, this will be an issue.

Going forward, the SEC is requiring filers to make this process much easier

http://xbrl.sec.gov/ http://www.sec.gov/rules/final/2009/33-9002.pdf




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: