> ...each Standard Ebook is lightly modernized to feature consistent and modern spelling and hyphenation, so old-fashioned ephemera doesn’t distract you from timeless content.
Yeah this is something I find a bit awkward. Good typography is definitely a boon, and fixing obvious errors (often mistakes that may not even have been present in each edition) like 'tne → the' is fine, but this modernisation step oversteps the boundary of keeping the original intent and content as the author envisioned it.
If the text of an older work presents problems for a lot of modern readers, I would prefer an annotated version with a glossary or even full-blown explanations of antiquated customs, concepts, and words.
And I agree, it also takes away some of the flavour of an older book.
Not to sound too negative — the project itself looks interesting and of lasting value.
I know it's not what you meant, but one thing I'd genuinely love from a project of re-typesetting old public-domain works is to ensure that any "typewriter symbols" like "->" or "* * *" are turned into their proper dingbats ("→" and "⁂", respectively.) Also, ensuring the (double and single) quotation-mark characters are used, but that scare-quotes and actual measurements (6'1", etc.) don't use them. Just, general heavy-handed replacement of ASCIIisms with proper Unicode equivalents.
Though, once you start down that road, it brings you to interesting places: if a book says "ye olde shoppe", do you want to keep the "ye", or do you want to consider it a typographical limitation for what should have been properly typeset as "þe"?
> Also, ensuring the (double and single) quotation-mark characters are used, but that scare-quotes and actual measurements (6'1", etc.) don't use them
Scare quotes are a adaptation of quotation marks for use/mention distinctions, and shoukd use proper quotation marks the same as anything else; measurements should use the prime and double prime characters.
Agreed, maybe they could include some way of annotating the original content and display a modern version, to preserve the original work but also make it more accessible.
The last section of the script gets more interesting
# manœuvre -> maneuver (for the Americans)
# manœuvre -> manoeuvre (for the British)
# cosey -> cozy (for the Americans)
# cosey -> cosy (for the British)
I've never seen the word manœuvre and thought that I wanted it spelt any other way. Is it really just going to stop at cosy? What's so special about cosy that it needs converting but not other words.
Further they have script to [1]
> Try to convert British quote style to American quote style
So this is not 'standardized', this is 'Americanized'.
I was going to make a joke about how they'd probably try and get rid of the pesky 'e' at the end of Shakespeare, however it turns out, instead they actually have the opposite rule:
One can way overdo the modernization. From "The Message", a modernized Christian Bible:
"God, my shepherd! I don’t need a thing. You have bedded me down in lush meadows, you find me quiet pools to drink from. True to your word, you let me catch my breath and send me in the right direction. Even when the way goes through Death Valley, I’m not afraid when you walk at my side. Your trusty shepherd’s crook makes me feel secure."
I thought I was reading an overtuned ELI5 machine translation for a minute. This is awful and significantly more difficult to read than even the KJV, which isn't exactly friendly for modern readers.
It's a tough translation. That section starts "The lord is my shepherd", and continues with a sheep metaphor, with "green pastures" and water, the things sheep need. Sheep draw confidence from the shepherd; they're herd animals and the shepherd is their alpha. Sheep are very herd-bound, far more than cattle or horses; a little bit of guidance and the whole flock follows. They've been bred for docility for millennia; the ones that were easy to herd were kept and bred. The people who wrote that were writing for an audience which knew that; today's audience has probably never seen sheep being herded.
Modern versions face the question of how much explanatory material to attach, or whether to try to express that in the main text. Scholars prefer translator footnotes; preachers don't.
Still, if you're not translating between languages, it's probably best to stay with the original. Compare, say, Kipling's "007".[1] This assumes some knowledge of railroading in the steam era. "Modernizing" Kipling would be a terrible mistake. It's probably best to leave anything post-1800 entirely as original. And don't even try to "clean up" Shakespeare. It's been done.[2]
I've often thought it would be interesting to see an ebook format that would allow one to toggle between versions of the text or display text inline as some kind of diff.
I say this because there are some authors that I read who have revised their works many times, and sometimes I prefer an earlier version over a later revision.
I've been using this approach and it works amazingly well (for text-based formats like .tex and .md). I switched from mercurial to git because rebasing feels much more natural. I'm currently actively maintaining five different versions of a book in parallel as separate branches. The `master` branch sees the most fixes then I rebase the five branches.
Anyway, it's great to see books stored in git. Mark my words, this will be a big thing for education!
Most people find it hard to read books with long s, for example, or even know what long s is or where it occurs. "Why are all the s's replaced with f's?" people will ask. It's fine if it's "ash-tray" instead of "ashtray", but the further you go back and the more of these archaicisms you encounter, the more of a barrier they become to ordinary readers (instead of English majors). I don't necessarily like this kind of textual modernization, but in general it's necessary to keep the text accessible to moderns.
I'm not sure I can agree–seeing 'ſ', where I'd ordinarily expect to see 's', was briefly confusing at first, but I quickly derived from context what it must mean, and found my surmise confirmed in further perusal of older texts.
On the other hand–I'm not sure that someone, who has written an Emacs minor mode to perform automatic ſ-insertion where it is grammatical to do so, is necessarily best placed to speak to the general case here.
I don't think the parent poster was talking about being confused as to the meaning; rather, they're talking about the presence of these characters impeding fluent reading speed even after one knows what they are. Enough of these changes and you feel like you have dyslexia, a bit: you have to consciously analyze each word on the page to extract its "reading", rather than just letting your visual cortex pass the words straight into your brain's audio loop.
Seems like it'd be just a matter of developing fluency, the same way as with any other orthography. Adding one new glyph isn't that big a problem for someone who's already likely to be reading texts where long-s is found - so, at least, has been my experience, and while I always hesitate to generalize therefrom, I don't know that it's so unreasonable to consider doing so here.
> That's not old-fashioned spelling or hyphenation: that's old-fashioned typography.
A critical difference, because once you've corrected OCR errors so that you correctly have the spelling-as-in-original, but not done any “modernization”, there is no issue with long-s to addres.s
I want the old language warts and all. I want to learn how words were expressed in a historical context. That teaches me more than just the words themselves, that transports you to their time.
I did a search in the Google Groups about 'modernization' and you get quotes like this coming up [0] (about William Wollaston's "The Religion of Nature Delineated"):
> I've got a first draft ready, after about a month's steady work. I could use some proofreading help, in three areas in particular:
> 1. general typos (the use of the long-s in particular I'm sure has led to several)
> 2. suggestions for improved use of commas. They did this weirdly in the 18th century, and I think a new edition could do well by bringing the practice up to date, but it isn't my strong suit
... and in reply ...
> then it sounds like we'll have to do some significant spelling modernization
So they've moved from just 'tasteful' to 'significant'.
I don't see how you can draw a line for this kind of thing. But I guess I don't know very much about the painful process up digitising old books.
I liked this note on Hodgson's The House on the Borderland:
The original print edition of this novel contains a pathological number of commas—so much so that a modern reader would find them distracting at best and plainly ungrammatical at worst. The editors of this Standard Ebooks edition have made an effort to remove the most egregious cases of these ungrammatical commas, so that modern readers can better enjoy this unique tale.
You can also review the changes there to see if you agree with our judgment.
This book was a unique case. I think the only other time we've done a big editorial change like this was for Pride and Prejudice, also to remove some crazy commas, but for P&P there is a lot of precedent--other editions of Austen very often make those same kinds of changes.
Exactly. And today's "consistent and modern" form is tomorrow's old-fashioned ephemera. The form used at the time of publishing is relevant, like the content from that time.
Today's typography is not going to change in the manner it did in the past, for the same reason why dialects are less pronounced: the prevalence of mass media and the Internet makes things converge rather than diverge.
Glad to hear you're liking things! I started the project and am the managing editor of sorts. We're just a small group of volunteers. Our process is very painstaking and specialized so we get much fewer contributors than larger, more easy-going projects like Gutenberg; but I'm just glad that we can contribute a little to free culture everywhere :)
Hmm. I've just downloaded Algis Budrys' "Short Fiction" and Lewis Carroll's "Alice", I have to say I have mixed feelings about the files. Both render very badly in Calibre (on Debian Linux, latest stable), very uncomfortable to read with lots of unexpected spaces breaking up the words, yet they render perfectly in Ionic on my Nokia N9 and in FBReader on my Lenovo Android tablet. I've never seen this happen before so I'm not sure if this is a bug in Calibre or in the epub file.
A couple of questions:
- How come there is no search function?
- Why are authors sorted by first name?
- Do the results of the proof reading get fed back to Project Gutenberg, et al.
- Will readers like FBReader be able to add this catalogue?
So, sounds like a good idea and I hope it succeeds but it's not quite there yet.
Have you tried downloading the compatible epub2 file, instead of the epub3 file? I don't think I've used Calibre to test our ebooks on, generally we test on eink devices and tablet reading software, all of which works more or less fine. Unfortunately the state of ereading software today looks a lot like the terrible IE6/Netscape days of yore, where there are no standards and everyone does their own thing differently. It's difficult/impossible to make one file that will render well across all devices.
There is no search function because our catalog is so small. It's growing though, so maybe it's about time to add a search function too. Sort by first name is an oversight that I hope to be able to fix later this week.
Our edits don't go back to Project Gutenberg, because our final files are so different from what PG produces merging would be impossible. We also introduce typographical and spelling changes that they might not want to accept.
Edit: Just tried reading the Alice compatible epub file using Calibre 2.55 on Ubuntu 16.04 and it seems to render fine. Maybe you can send a note to our mailing list and we can discuss this in more detail so I can get things fixed?
It seems that it is specifically Calibre 2.5 that renders incorrectly. Just in case this bug bites anyone else here is what I posted in the Standard Ebooks mailing list:
The problem that I mentioned was that the two epub2 files that I downloaded didn't render properly in Calibre on Debian even though they rendered properly in Ionic on Nokia N9 and FBReader on Android. It seems that the file has soft hyphens (thanks gwillen) that Calibre was rendering as tabs (or something similar).
In my comment I said that I wasn't sure whether the problem was the files or Calibre.
I have now tried another Calibre installation. This time on Linux Mint 17.3 Rosa and it works perfectly.
It seems that the version delivered by Debian doesn't render correctly. Unfortunately the version that renders wrongly is Calibre 2.5.0 on Debian whereas the one that works is Calibre 1.25 on Linux Mint. So I am still confused.
Time to download Calibre 3.0.0.
Works perfectly.
So, many thanks to everyone and good luck with the project."
Perhaps my I have a different calibre version, I'll check again. I opened the Budrys file in Calibre's editor and what I saw that a lot of perfectly correctly spelt words were highlighted as being misspelt. I copied some text from there to Emacs and then saw that there was a hyphen (actually I'm not sure exactly what character it is but it looks like a hyphen) at the point where the reader renders a space.
Your site looks very pretty but I feel that it is hard to discover things. Sometimes fewer or smaller graphics can make it easier to find one's way around.
Anyway, I do understand that it is tough to find time to make everything perfect (I'm a software developer and my to do list never gets shorter).
Hi folks, I'm the guy who started and runs Standard Ebooks. It looks like it's been given the hug of death at the moment, sorry about that! While I try to beef up the server you can see all of our productions on Github: https://github.com/standardebooks/
You may also be interested in our toolset (GNU-compatible only at the moment, we're working on converting everything to Python but we're not there yet): https://github.com/standardebooks/tools
I'm happy to answer any questions anyone has. We're also more than happy to have new contributors, if you're interested in working on and proofreading a public domain ebook that you've been meaning to get to.
Some of you have mentioned concerns about the modernizations we do. The key word I think is "light modernization". Mostly that just means bringing spelling up to modern standards, and removing a lot of hyphens in words that are no longer hyphenated. A common one, for example, is to-morrow -> tomorrow. Another one we recently added was lacquey -> lackey. Generally we leave punctuation and grammar alone. I liken this to modern books replacing the "long s" character--it's just presentation that doesn't affect the meaning. Modern readers would rather see "successful" instead "ſucceſsful" even though the latter is what was originally printed.
I struggled for a long time with my desire to see older books with modern spelling and typography, versus preserving the intent of the author and original publishers. Over time I've come to realize two things:
1. Many books back in the day were heavily edited by the printer and publisher without the author's input anyway, so you'll get various editions over time that look totally different. Jane Austen books are a good example of this--early editions often have a pathological overuse of commas, while later editions published after her death just remove a lot of them without comment. So when we're producing our own ebooks, we accept that there's a level of editorial discretion involved, and that "the author's intent" was a very fuzzy and often totally ignored topic hundreds of years ago anyway. How can we tell what the author's intent was in the first place, if various printers and publishers have meddled with the editions for hundreds of years already?
2. For those of you who want to read the originals in their totally unedited form, other projects like Project Gutenberg or Wikisource already have those faithful transcriptions for you, and places like Internet Archive, Hathi Trust, and Google Books have the page scans for you. By lightly modernizing our own productions, we in no way diminish your access to the painstakingly-preserved digital editions; we're just adding another option for you to read.
Is there a way to contribute to the index pages? I found a typo (sales as salees) on one of them[0] and I don't see a repository for the main site itself.
Also, I think it might be interesting to have multiple editions in the future. It would be nice to support editions in reduced reading levels for works which are not artistic prose (such as translated prose, or philosophy). For example, in the first paragraph of On Liberty, Mill writes:
> A question seldom stated, and hardly ever discussed, in general terms, but which profoundly influences the practical controversies of the age by its latent presence, and is likely soon to make itself recognised as the vital question of the future.
This sentence has five clauses, and one parenthetical; difficult prose. The first chapter of On Liberty contains a total of 80 adverbs, ~69 uses of passive voice. It also contains a bizarre convention of referencing previous phrases with ordinals.
Would multiple editions be worth the effort? I can see that having them might be useful to some few people but creating a new version of On Liberty would be a substantial amount of work even if it could be done algorithmically because of the proof-reading required to make sure that it conveyed the same meaning as the original.
Are there many people who would struggle to read the original who are nonetheless sufficiently interested that they would read a 'simplified' version?
The project is just a few years old, and it takes a very long time to produce one of our ebooks so the rate is rather slow. But we're getting more and more contributors :)
I'm glad you picked Don Quixote, I personally took the time to transcribe the over 900 endnotes in our production from the Ormsby edition. AFAIK our edition is the only digital transcription of those endnotes available online--not even Project Gutenberg has them yet. Enjoy!
I've been meaning to read Don Quixote in Spanish some day.
Slightly off topic: Would there be scope for a bilingual epub format for wide-screen devices displaying native text and translation side by side? This could be useful for Ancient Greek and Latin verse, Norse sagas, devotional readings of sacred texts, Hamlet in the original Klingon and even automagically machine-translated documents.
Dead-tree parallel texts have something of a market at the foreign language section of one's local book-store.
We track them internally. If you want to work on one, send a note to the mailing list and I'll let you know if you can start or not. Also see our list of wanted ebooks: https://standardebooks.org/contribute/wanted-ebooks
There's no subreddit but we do release announcements on the mailing list, and we have an OPDS feed you can plug in to your reading software of choice: https://standardebooks.org/opds/
People are concerned about how and why you modernise spellings and are wondering whether that could be optional - see comment by throwanem above. Can you comment on that?
"Who told you you might meddle with such hifalut'n foolishness, hey?" [1]
Liberated? That ephemera might actually be integral to the story and you are NOT the arbiters of intent. Please keep your modernizing out of my lit'ratur.
Taken to an extreme, would you propose to altogether disallow translation from one language to another? To instead force readers to learn new languages in order to strictly preserve the original authors' intents?
Personally, I would like to see our storage formats move towards more dynamic documents, so that the reader can literally flip between the original content and the "modernized" variation. Ebook readers already have integrated dictionaries, why not the discussions and interpretations integrated too? (I ask in a partially rhetorical sense; is fighting the inevitable changing of language really gaining anybody anything? Is it not possible that that "problem" is a red herring?)
I don't want discussions and interpretations integrated! At least, if they're there, they need to be trivially disposable, such that only those who actually want to see them do so.
I say this because I have never in my life been rendered so furious by the action of a literary editor as to have Shirley Jackson's We Have Always Lived in the Castle completely ruined for me by some importunate jackass who insisted upon putting a long, discursive deconstruction of the entire novel into a preface of an edition I incautiously happened to choose. By the time I realized what I was reading, it was too late, and I found myself unable to appreciate the actual story at all, having had it helpfully predigested for me by this ham-handed oaf who so highly valued his unique and precious insight into what Jackson was really trying to say that he put it first in the book, where it lay in the path of the not perfectly cautious reader as a beartrap in that of a cheerful weekend rambler.
I should like any such initiative as that you describe to take as axiomatic it's worth not doing that kind of thing, is what I'm trying to get across here.
IMHO, modernization tip toes into the same category as translation. From my limited literature experience, the translator is always credited somewhere near the author as it is considered the translator's version.
I can imagine a "Do-Gooder" helpfully going through Huck Finn and modernizing all that messy vernacular but that book ain't Mark Twain's and it ain't something I'd want to read. Some folks may have trouble understanding what is being said but modernizing the text would suck the life right out of the story.
To your point, I think the ultimate prize is both the original and a modern/translated digital version. Anecdotally, I might have developed an appreciation of Shakespeare MUCH earlier in life had I known not just what the characters were saying but combined with how they were saying it.
We actually have Huck Finn in our catalog, and of course we've preserved all of the vernacular. I invite you to check it out for yourself to see what you think. "Light modernization" in our sense doesn't mean "ham-fistedly change old-timey vernacular into new-fangled internet slang", it simply means some basic, mostly-automated one-to-one spelling and hyphenation updates. Things like "develope -> develop" and "to-night -> tonight", that would not change the meaning of the text. Vernacular is not affected by these changes, nor would we want it to be. :)
Think of it more like modernizing spelling of Shakespeare, so that we can enjoy the text and not spend time parsing spelling like:
Had, having, and in quest, to have extreame,
A blisse in proofe and provd and very wo,
Before a joy proposd behind a dreame...
Some people might prefer that old-fashioned spelling, and it might be of some use to academics and historians, but I think the majority of casual modern readers would have an easier and more enjoyable time with light modernization.
Thank you for the reply. I can only imagine that the application of "Light Modernization" to this masterpiece would not only change the pronunciation of the words, it would add the implied missing words thereby disrupting the meter and rhyme and thus subsequently ruin the sonnet. I understand the point of making things readable but when you start changing the spelling, you actually ruin the art form. Sonnet > Light Modernized Poem
Let me not to the marriage of true minds
Admit impediments. Love is not love
Which alters when it alteration finds,
Or bends with the remover to remove.
O no, it is an ever-fixed mark
That looks on tempests and is never shaken;
It is the star to every wand'ring barque,
Whose worth's unknown, although his height be taken.
Love's not Time's fool, though rosy lips and cheeks
Ironically, you've posted the lightly modernized version of 116. :) In the 1609 printing it looks like:
O no, it is an euer fixed marke
That lookes on tempeſts and is neuer ſhaken;
It is the ſtar to euery wandring barke
Whoſe worths vnknowne, although his higth be taken...
Touche! :D
The point still stands. If "mostly-automated one-to-one spelling and hyphenation updates" changes 'wand'ring' to 'wandering', 'worth's' to 'worth is' or 'Love's' to 'Love is', you break the sonnet.
I think this is a good time to maybe do some research into whether your complaints are valid or not before making them... You clearly aren't seeing that the point, in fact, does not stand...
Iambic pentameter depends on pronunciation, not spelling, and the changes/reforms/standartization generally affects the spelling only. If you take an archaic representation of some sounds and replace it with the modern representation of those sounds, the verse isn't changed.
In that example, "vnknowne" is pronounced the same as "unknown", despite having an extra "vowel" in the typography.
Spelling (and related things) in poetry (especially contractions) often signals intended pronunciation variations from the standard pronunciation of a word.
A mechanical modernization and standardization would seem to run a significant risk of damaging some of these, though proper manual final review would hopefully catch and revert the problematic cases.
Changing the dialect is not the same as a foreign language translation. It can easily be misunderstood as the author's intent since it's the same language overall. It's copyediting something already published.
As someone who grew up in Hannibal, Missouri where Sam Clemens did, let me tell ya it's still a tad diff'rent from N'York.
If someone intends to "modernize" a book, particularly the dialog, they need to make it clear that it is a translation from one dialect of the language to another. They need to not credit the author those new words and say they just fixed typography. There needs to be an "as interpreted by", and that person needs the blame by name.
I don't understand this complaint. It's not your literature; it's public domain. This only gives readers more options for reading, it doesn't force anyone to choose one.
It's also not the original work of art. It's a derivative.
Try thinking about this visually. If I modernize ANYTHING on the Mona Lisa, whose work is it? Are my brushstrokes over his really not changing anything fundamental about the work or is it just another option for viewing Leo's most famous portrait?
That's a fantastic initiative to offer world literature. FYI: A while ago I've started the World Classics Bookshelf - http://worldclassics.github.io The idea is to use plain text with markdown formatting conventions (for richer typography e.g. beautiful quotes, em-dashes, etc.). See The Trial by Franz Kafka as an example -> https://github.com/worldclassics/the-trial (in the Manuscripts plain text source format). The second idea is to use a (standard) static website builder (e.g. Jekyll) for building the online books (from markdown) and to offer different book / page designs (kind of like the Zen of CSS Garden e.g. the Zen of Book Designs). See https://github.com/bookdesigns for some examples incl. the "classic" GitBook style -> http://bookdesigns.github.io/book-git Anyways, keep up the great work and publishing public domain world classics. Cheers.
rather than a flock of templates with different fonts,
i'd suggest some simple javascript that lets the users
choose their preferred font from a generous assortment.
I think it is a great project. An I really like the modernization. As a person not native to English, reading long prose is already energy consuming. For me, I like this kind of supply :-)
I see that the page is a bit slow. If you need any help to port it to a static format (for performance), please let me know.
Amazing project! Any plans to port this to other languages? My main language is French, and I'm looking for good old books for my daughter. She read through the Comptess of Ségur's "Les malheurs de Sophie", and she loved it.
If she's a bit technically inclined, Jules Verne is an obvious choice - though I didn't read them in French, his books absolutely fascinated me as a child.
Based on your previous choice, I suspect she's much too young for Flaubert? His writing is marvellous.
On the other hand, avoid Malot and his ilk - it's just depressing stuff.
Specifically, see our typography manual, semantics manual, and the step-by-step guide to producing an ebook (all linked from the contributors page) for details.
I took a look at Gitenberg and just don't get it. They appear to have downloaded a shit-ton of books from Gutenberg into Github -- go to https://github.com/GITenberg and note that there appear to be 1659 pages(!) in the list of repos -- and then basically sat on them.
Possibly some number of the books have been vastly improved or cleaned up, but there is no way to tell them apart from the ones that are simply dups of the PG files.
the website is perfect and host a plenty of good quality ebooks in different domain, but these are not categories. It should be categorized based on their domains like- Technology, Architecture, Literature, Management etc. If it will have subcategories as well then it will be perfect.
I like old-fashioned ephemera...