Did a quick check on Internet Archive. According to this April 23, 2018 snapshot, the most recent file is said to cover July 1, 2017 through April 10, 2018, so 1 to 2 week delay?
Over the years I had some reason to analyze them, and I do some half-assed job of collecting and parsing them into useable data. This repo from 2 years ago contains the PDFs as translated by ABBYY FineReader (in my experience, the best converter on the market, at least sub $100):
Today I started a new repo (forgetting about my previous one). I've been wanting to create a series of repos showing how I "casually" practice programming and data analysis. That is, satisfy and iterate upon a curiosity without going all-in on best software engineering practices. It's aimed at people who've tried to learn coding themselves, but don't have a job in it but don't know how to practice it in the wild and just for "fun":
Not much there except a simple wget invocation to pull the latest files, and the use of Poppler's pdftohtext to convert into plaintext files. Even though it's unstructured text, I think it's regular enough to be parseable with some regular expressions. For reference's sake, I've done an ABBBY PDF-to-Excel conversion (and will write a Python script to do the remaining data wrangling), but you can do what you want with the spreadsheets as they currently are:
They have a pdf-to-tree package which i haven't had good results from but perhaps i need to finally learn ML and try to train models for this a bit: https://github.com/HazyResearch/pdftotree
Any idea how frequently this is published?
Looks like Al Jazeera is shutting down it's office in San Francisco. 68 people getting the Axe on Aug 5th
> 05/07/2018 08/05/2018 05/11/2018 Al Jazeera International (USA), LLC San Francisco San Francisco Closure Permanent
Source: PDF link in parent comment.