Did a quick check on Internet Archive. According to this April 23, 2018 snapshot, the most recent file is said to cover July 1, 2017 through April 10, 2018, so 1 to 2 week delay?
Over the years I had some reason to analyze them, and I do some half-assed job of collecting and parsing them into useable data. This repo from 2 years ago contains the PDFs as translated by ABBYY FineReader (in my experience, the best converter on the market, at least sub $100):
Today I started a new repo (forgetting about my previous one). I've been wanting to create a series of repos showing how I "casually" practice programming and data analysis. That is, satisfy and iterate upon a curiosity without going all-in on best software engineering practices. It's aimed at people who've tried to learn coding themselves, but don't have a job in it but don't know how to practice it in the wild and just for "fun":
Not much there except a simple wget invocation to pull the latest files, and the use of Poppler's pdftohtext to convert into plaintext files. Even though it's unstructured text, I think it's regular enough to be parseable with some regular expressions. For reference's sake, I've done an ABBBY PDF-to-Excel conversion (and will write a Python script to do the remaining data wrangling), but you can do what you want with the spreadsheets as they currently are:
They have a pdf-to-tree package which i haven't had good results from but perhaps i need to finally learn ML and try to train models for this a bit: https://github.com/HazyResearch/pdftotree
http://web.archive.org/web/20180423074904/https://www.edd.ca...
Over the years I had some reason to analyze them, and I do some half-assed job of collecting and parsing them into useable data. This repo from 2 years ago contains the PDFs as translated by ABBYY FineReader (in my experience, the best converter on the market, at least sub $100):
https://github.com/datahoarder/ca-warn
Today I started a new repo (forgetting about my previous one). I've been wanting to create a series of repos showing how I "casually" practice programming and data analysis. That is, satisfy and iterate upon a curiosity without going all-in on best software engineering practices. It's aimed at people who've tried to learn coding themselves, but don't have a job in it but don't know how to practice it in the wild and just for "fun":
https://github.com/hackbashscoop/california-warn
Not much there except a simple wget invocation to pull the latest files, and the use of Poppler's pdftohtext to convert into plaintext files. Even though it's unstructured text, I think it's regular enough to be parseable with some regular expressions. For reference's sake, I've done an ABBBY PDF-to-Excel conversion (and will write a Python script to do the remaining data wrangling), but you can do what you want with the spreadsheets as they currently are:
https://github.com/hackbashscoop/california-warn/tree/master...