Author of the article here, and I wish for the same.
This country contains the laws on the national level, on the "state" (called entities) / district level, one of the "states" has canton-level laws, and then there's a local level. However, if you start moving between the two entities and the district (a relatively minor part of the population moves from one entity to the other), the line gets really blurry.
On top of that, the laws themselves are really not available in one place. They're scattered across different websites (you'll notice that the city's statue and the law I've mentioned aren't linking to the same domain), in different formats, in different local "languages" (it's a same language, we just call it differently for political reasons), and sometimes they are scanned PDFs.
It's a country with less than 4 million citizens, separated into two entities + one district, three official "languages", three presidents (no, I'm not joking, Bosnia & Herzegovina has three presidents), some international entities that guarantee the peace in the country...
And while that utopian wish of Git-driven law-making seems unreachable here, the best middle-ground solution I have is to scrape a third-party resource containing most (but not all) of the laws in PDFs, and then searching through those 200ish PDF documents I've scraped (totaling at 233 MB) locally multiple times with different variations of the word I'm looking for.
As an example, the term "article 1" could be found as "član 1", "članak 1", "члан 1", "clan 1" and "clanak 1" in different PDFs I've collected, and even if I search through all of the variations of the term I'm looking for, there's still a chance that the term I'm looking for might be scanned, and therefore unsearchable. And good luck to anyone who tries to find a OCR tool that understands Bosnian/Serbian/Croatian and works well enough to process scans of legal documents in shitty quality.
Author of the article here, and I wish for the same.
This country contains the laws on the national level, on the "state" (called entities) / district level, one of the "states" has canton-level laws, and then there's a local level. However, if you start moving between the two entities and the district (a relatively minor part of the population moves from one entity to the other), the line gets really blurry.
On top of that, the laws themselves are really not available in one place. They're scattered across different websites (you'll notice that the city's statue and the law I've mentioned aren't linking to the same domain), in different formats, in different local "languages" (it's a same language, we just call it differently for political reasons), and sometimes they are scanned PDFs.
It's a country with less than 4 million citizens, separated into two entities + one district, three official "languages", three presidents (no, I'm not joking, Bosnia & Herzegovina has three presidents), some international entities that guarantee the peace in the country...
And while that utopian wish of Git-driven law-making seems unreachable here, the best middle-ground solution I have is to scrape a third-party resource containing most (but not all) of the laws in PDFs, and then searching through those 200ish PDF documents I've scraped (totaling at 233 MB) locally multiple times with different variations of the word I'm looking for.
As an example, the term "article 1" could be found as "član 1", "članak 1", "члан 1", "clan 1" and "clanak 1" in different PDFs I've collected, and even if I search through all of the variations of the term I'm looking for, there's still a chance that the term I'm looking for might be scanned, and therefore unsearchable. And good luck to anyone who tries to find a OCR tool that understands Bosnian/Serbian/Croatian and works well enough to process scans of legal documents in shitty quality.