> *This is striking. If true, why not try to ignore whitespace and puctuation?* ...

bambax · on July 25, 2023

> the few readers who lived back then expected it to encode some form of legal or religious prose

The Latin literature was extremely rich, from Cicero to Tacitus, and was certainly not limited to legal information.

Here's part of your comment with white space and punctuation stripped:

scriptocontinuamayhaveworkedbecausethefewreaderswholivedbackthenexpectedittoencodesomeformoflegalorreligiousprosebuteventhentheycouldlearnthingsfromtheoverallshapeofthedocumentllmsareworkinginamuchricherdomainofdocumenttypesbuttheonlythingtheycanseeisastreamoftokenstheresnospatialorgeometricdataattachedtheresowhitespaceandpunctuationaretheonlythinganllmhastomakeinferencesaboutotherwisetextuallyidenticalinputs

It's a little hard to read, but not that hard. I think one would get used to it.

Also, for creative use of LLM, it may be a feature, as trying to find the words could be inspiring.

I think it would be worth a try.

hgsgm · on July 25, 2023

Now do a modern structured document with sections and bullet points and logical connectives.

bambax · on July 25, 2023

    string.replace(/[\s\.\*<>\!\?,;:\-–\|"'\[\]\(\)]/g, '')

razodactyl · on July 31, 2023

...there's only 1 space there though