Shell scripts to improve your writing

erlehmann_ · on Jan 2, 2017

There is also GNU style and GNU diction: https://www.gnu.org/software/diction/

; cat <<EOF >/tmp/testfile

Diction and style are two old standard Unix commands. Diction identifies wordy and commonly misused phrases. Style analyses surface characteristics of a document, including sentence length and other readability measures.

These programs cannot help you structure a document well, but they can help to avoid poor wording and compare the readability (not the understandability!) of your documents with others. Both commands support English and German documents.

EOF

; LANG=C diction --lang en --suggest /tmp/testfile

/tmp/testfile:2: These programs cannot help you structure a document well, but [they -> (do not use as substitute for "each, each one, everybody, every one, anybody, any one, somebody, some one")] [can -> (do not confuse with "may")] help to avoid poor wording and [compare -> "Compare" to points out resemblances, "compare with" points out differences.] the readability (not the understandability!) of your documents with others.

3 phrases in 5 sentences found.

eriknstr · on Jan 2, 2017

I like the way you format commands using semicolons at the beginning. I might start doing this too when writing online.

empath75 · on Jan 1, 2017

Nobody should be using Strunk and White as a style guide after primary school. There's nothing fundamentally wrong with passive voice, or adverbs, or any of the other things that he mentions here.

http://www.chronicle.com/article/50-Years-of-Stupid-Grammar/...

He doesn't even correctly identify what weasel words are. ("Some people say", "It is believed", etc). I'm not sure why 'very close match' is any more opinionated than 'close match' is. It's not as if the latter is precisely defined.

rflrob · on Jan 1, 2017

I would agree that "close match" isn't a lot better, but the style in science papers is to say something like "A is a close match for B (r=.85)", where of course the precise value of metric and whether that is a close match or a terrible fit is specific to the field it's being published in. But the "very" really does add nothing.

The other reason is probably that "very" is very easy to detect in a bash script, whereas looking for other, more subtle weasels is a lot harder. Given the generally poor level of scientific writing (I have certainly given my PhD advisor some turds), huge improvements can be had by just going after the easiest cases.

memco · on Jan 2, 2017

To be fair, the author did say:

> There are times when the passive voice is acceptable in technical writing.

> I also believe, as with adverbs, that removal of the passive voice would have been a net improvement for over half the technical writing I've edited. (That is, students abuse the passive voice more often than they use it well.)

The author himself is not against passive voice. He simply has not found that students have used it appropriately.

I myself was a bit humored by the way in which this article, while probably well founded, probably violates some of its own advice in that it is not backing up claims with data and facts, but generalizations which have come from personal experience.

Veen · on Jan 1, 2017

You're right, but he's taking about a particular style of writing for a specific audience. His "rules" seem reasonable enough within that context.

The passive voice prohibition is more of a lazy pedagogical tool than sound style advice. Nothing wrong with using it, but it's often used poorly, and it's easier to say don't use it at all than to explain how to use it well.

My favourite "style guide" is Clear And Simple As The Truth: Writing Classic Prose by Francis-Noël Thomas & Mark Turner, which takes the time to explain what it means by "style" and has almost nothing to say about passive voice and adverbs.

_pfxa · on Jan 1, 2017

I really don't get what's wrong with passive voice at all. In my mother tongue (Turkish) and my L3 (Italian) passive voice is a part of the educated speech. Certainly it is harder to use than direct speech (in most cases), but why neglect it?

schoen · on Jan 2, 2017

I agree with the criticism of the criticism of the passive voice (that is, I think it's often appropriate to use it, and a blanket prohibition is wrong).

A couple of ideas:

* There's an idea that passive voice is inappropriate because it "avoids responsibility" (for example, because it does not say who made a decision or performed an action, where that information might be important). An example could be when an organization says "your application was denied" (where it would somehow seem more honest or more relevant to say "we denied your application" or "the vice president denied your application"), or in a political context referring to violence without referring to the perpetrator of that violence -- "thirty people were killed" (by whom?).

However, critics have pointed out that these concerns don't correspond perfectly to the active/passive voice distinction, among other things because we can still state who was responsible when using the passive voice and because we can still avoid stating who was responsible when using the active voice. Also, sometimes clear or honest writing wouldn't need to assign responsibility at every moment, in every context, or in every sentence.

* There's an idea that the passive voice is inappropriate because it sounds too formal and hence makes writing less accessible, less enjoyable to read, or lends the writing an unwarranted air of authority. The active voice may sound more direct or straightforward in many contexts, while the passive voice may sound unduly formal, abstract, or academic.

This is probably also true, but doesn't appear to justify a blanket prohibition either.

* Edit: also compare https://en.wikipedia.org/wiki/E-Prime, a writing style that tries to avoid using the copula (certain uses of the verb "to be"), based on the view that it makes philosophically unjustified observer-independent claims that could be made more precise by showing whose perception or belief is being described (like "spinach is yucky" vs. "George H. W. Bush dislikes spinach" or "George H. W. Bush finds spinach yucky"). The copula isn't the same as the passive voice, but this is a (controversial) example of another way in which people have suggested constraining their writing in support of "taking responsibility" for certain propositions or observations.

_pfxa · on Jan 2, 2017

A response for each of you points, not really for starting an argument, but for stating my view of the matter.

1) The language allows you to emphasise either the object or the subject. Either way, though, both a passive and an active version of a given sentence can hide or tell some information equally. Compare:

The pizza was eaten.

Somebody ate the pizza. (The amount of info these sentences give is practically equivalent.)

It's more about the author, whether or not he wants to tell something to the reader.

2) Passive need not be formal in all its uses, nor does every text need be formal and accessible, and a formal text is not necessarily inaccessible.

3) That's a crazy nitpicking and a silly exaggeration. Copula-heavy text is boring to read, but while the copula verb and the auxiliary verb for passive forms are the same (to be), in its second role, it is not also acting as a copula. The verb 'to be' is not the copula, but one of its uses is as the copula. So if one wants to give up on copula, however mad that may be, he need not give up on all the uses of the verb to be. That said, if it is an artistic choice to avoid copula to the extent possible, I can't really criticise that. It can't be affirmed as a general rule though.

Edit: Oh, also, in a sentence like "Bob seems terrible.", 'to seem' is basically a copula. Copula is basically any verb that links the subject to a predicative.

schoen · on Jan 2, 2017

> A response for each of you points, not really for starting an argument, but for stating my view of the matter.

Thanks for responding.

> [...] while the copula verb and the auxiliary verb for passive forms are the same (to be), in its second role, it is not also acting as a copula. The verb 'to be' is not the copula, but one of its uses is as the copula. So if one wants to give up on copula, however mad that may be, he need not give up on all the uses of the verb to be.

I agree that E-Prime users should try to distinguish between copulative and non-copulative uses of "to be" and that passives are non-copulative. By mentioning E-Prime, I was just trying to draw an analogy with another way of restricting language in the name of "taking responsibility".

> Edit: Oh, also, in a sentence like "Bob seems terrible.", 'to seem' is basically a copula. Copula is basically any verb that links the subject to a predicative.

According to E-Prime users, using "seems" instead of "is" could typically make the scope and basis for disagreements clearer (because then you can talk more readily about to whom something seems a certain way?).

actuallyalys · on Jan 2, 2017

There's nothing intrinsically wrong with the passive voice, but people overuse it. For example, I've edited technical documents that used the passive voice so frequently that I lost track of what was being done by the user and what was being done automatically.

_pfxa · on Jan 2, 2017

Okay, but why throw the baby out with the bathwater?

coldtea · on Jan 2, 2017

Because it's not a case of baby and bathwater, where the thing you don't want to throw is millions of times more valuable than the one you want to throw away.

Here, the two things that might be thrown away are of the same value (justifiable passive voice vs unjustifiable passive voice), and it's not that big a value to begin with.

Plus, you have a perfectly good replacement (active voice).

Lastly, it's just a general advice for people who overuse unjustified passive voice. It's not supposed to be subtle. If those people could understand subtlety they'd kept the justified passive voice themselves when it's appropriate.

actuallyalys · on Jan 2, 2017

I don't think anyone seriously advocates avoiding it entirely. For example, even Strunk and White say: "This rule does not, of course, mean that the writer should entirely discard the passive voice, which is frequently convenient and sometimes necessary." (http://www.bartleby.com/141/strunk5.html#11)

Fnoord · on Jan 2, 2017

> He doesn't even correctly identify what weasel words are. ("Some people say", "It is believed", etc).

Agreed, the scripts can use improvements like these. Its a start.

I'd also add 'but' if its not there yet. Its function is to negate everything written before it which easily leads to fallacies.

> I'm not sure why 'very close match' is any more opinionated than 'close match' is.

Because 'close' in 'close match' is informative. Without 'close' you get 'match'. So close tells us about the state of the match.

Very is redundant at best. Very is also emotional, its function is to (consciously or not) attempt to induce an emotional response in the reader; ie. to manipulate the reader. You want to avoid all of the above in a scientific paper.

Its important to note that we can reach for perfection whilst writing. Because such a goal is far fetched it makes more sense to improve ourselves via iterations. (Same with these scripts, or software development in general.)

coldtea · on Jan 2, 2017

>Nobody should be using Strunk and White as a style guide after primary school. There's nothing fundamentally wrong with passive voice, or adverbs, or any of the other things that he mentions here.

That's just ONE counter source. Hardly enough to make the case.

While there might be nothing "fundamentally wrong with passive voice, or adverbs, or any of the other things that he mentions", there is a lot that's fundamentally wrong with how those things are abused by the majority of people.

Chris2048 · on Jan 5, 2017

Surely the burden is on the proof, not the counter?

emmelaich · on Jan 2, 2017

`very` should almost never be used. It simply doesn't add anything but leaves the reader guessing.

randomstring · on Jan 1, 2017

I read the whole article thinking "this needs to be an emacs mode!" only to get to the punchline at the bottom.

>> Benjamin Beckwith has contributed a "writegood" mode for emacs inspired by these scripts.

This is going into my .emacs right now.

mortenlarsen · on Jan 2, 2017

Funny coincidence that it was named "writegood" by a guy named Benjamin and "Silence Dogood" was a pen name of Benjamin Franklin.

wyclif · on Jan 2, 2017

Is there anything like that for vim?

seanwilson · on Jan 1, 2017

Is there anything like this for Google Docs, Gmail or Atom?

We've had spellcheckers for decades but I find it really surprising that automated grammar and proofreading checkers aren't in common use yet. For example, having my email client highlight overly long sentences, duplicate words, ambiguous references (e.g. what noun does "it" refer to) and more would undoubtable save proofreading time and doesn't sounds that difficult to implement. I see online comments every few days of someone pointing out the word loose/lose is used incorrectly for instance.

I recall that many grammar checkers suffer from false positives though but has the technology not advanced?

froindt · on Jan 3, 2017

  having my email client highlight overly long sentences, duplicate words, ambiguous references (e.g. what noun does "it" refer to) and more would undoubtable save proofreading time and doesn't sounds that difficult to implement.

While it's geared a bit more for the legal field, WordRake does a good job of finding many of these mistakes. It works as an Add-in for Word or Outlook. It's good for finding instances where you use 5 words when 2 would have been better. It will analyze a chunk of text, find clunky chunks, and give a suggestion which you can accept or reject. I find even if the suggestion isn't a good one, it's a bad sentence which needs to be reworked.

Because the intent of a sentence is not always known, there is no Accept All button.

http://www.wordrake.com/

Disclaimer: I have received a free year license, but not in exchange for writing this.

ehudla · on Jan 1, 2017

Not very long ago I posted a Ask HN about this:

https://news.ycombinator.com/item?id=12366364

walterbell · on Jan 1, 2017

online grammar checkers: http://nybookeditors.com/2016/02/instantly-improve-your-writ...

offline grammar checkers: https://www.serenity-software.com & http://www.editorsoftware.com/StyleWriter.html

huac · on Jan 1, 2017

See Draft for markdown based editing w/the 'writing improver' - www.draftin.com

confounded · on Jan 2, 2017

VC funded browser extension: https://www.grammarly.com/

jgalt212 · on Jan 2, 2017

These guys are huge youtube advertisers. Or at least youtube thinks me writings skills could use some improvement.

carlosbarreto · on Jan 1, 2017

Hello, I wrote similar script to do the Belcher diagnostic test in LaTeX documents. This test consists in highlighting parts of the text with potential problems (e.g., vague pronouns, weak verbs, and passive voice, among others).

You can find the instructions to do the test (and some good examples) in the book Writing your journal article in twelve weeks: A guide to academic publishing success.

The script and its documentation are here:

https://github.com/carlobar/BDT_latex https://github.com/carlobar/BDT_latex/blob/master/docs/docum...

mrob · on Jan 1, 2017

You may also be interested in LanguageTool, which catches many potential problems in several languages. It's the best Free Software style and grammar checker I've seen:

https://languagetool.org/

killercup · on Jan 1, 2017

Oh, that reminds me: I rewrote that bash script (and a bit more) as a Rust lib (+ CLI) for fun last year. If anyone wants to build on it: https://github.com/killercup/english-lint

raverbashing · on Jan 1, 2017

Looks like good guidelines overall, I just have one complaint

> Bad: We used various methods to isolate four samples. > Better: We isolated four samples.

The first sentence is right if a different method was used to obtain different samples. This is relevant information

faitswulff · on Jan 1, 2017

I would say it is relevant if you used different methods for each sample, but I understand why the author would omit "various" - it's too ambiguous.

bryanrasmussen · on Jan 1, 2017

I thought replacing 'quite difficult' with 'difficult' was unfair because I have always supposed quite to mean very or extremely when used in this manner.

hyperpape · on Jan 1, 2017

Later he cautions against adding 'very', so he's consistent at least.

bryanrasmussen · on Jan 1, 2017

what about slightly difficult. It's basically a language denuded of gradation.

hyperpape · on Jan 1, 2017

I think absolutism on this point is a bad idea. But I do think that many people, including myself, overuse qualifications in contexts where they don't add anything. In the example of a close match, what distinguishes a "very close match" from a "close match"? Better to omit the "very" unless you can quantify it, or otherwise make it clear what it adds.

_vya7 · on Jan 1, 2017

So, on a related note, someone wrote a blog post a few years ago that just perfectly epitomized the annoyingly pretentious writing style everyone in the tech world seems to have, and I recommended to him that he use simpler phrases and words and sentences. Everyone in that chatroom criticized me as both an idiot and an asshole. Skip ahead a year or two, and PG writes the same fucking thing in a blog post and posts it here, and everyone praises him. What I took from this is that I'm not actually an idiot after all, and I should probably stop listening to people who say that I am.

teach · on Jan 1, 2017

First of all, this isn't written by Paul Graham; it was written by Matt Might and posted by someone else entirely. Just because it got upvotes on Hackernews doesn't mean that PG was involved.

Secondly, there is a difference in context, timing and tone in an article like this. Here, a PhD supervisor is speaking generally about the sorts of errors his students tend to make.

In your situation, you were "attacking" a single person publicly. If you had had the social grace to make your comments privately and in person (rather than in a chatroom) they probably would have been better received.

If your takeaway from this post is just to "ignore the haters" then I'm afraid you're missing out on some real opportunity for self-improvement.

bdowling · on Jan 2, 2017

He was probably referring to this article.

http://paulgraham.com/talk.html

_vya7 · on Jan 2, 2017

Yep that's exactly it. And I had the same thought of "this is the last straw" too that Paul's talking about there. I just got fed up with seeing everyone write like that, everywhere. Thanks for the link.

lj3 · on Jan 1, 2017

Social proof is a real cognitive bias and it applies to programmers too.

the_d00d · on Jan 1, 2017

A chatroom mob got you that upset? You were just offering some advice. It is there problem if they choose to ignore it. Seriously dude, you should relax.

Solinoid · on Jan 1, 2017

Can you provide an example of this style? thanks

Solinoid · on Jan 1, 2017

I think the Soylent CEO's blog is an example of this, but I haven't really noticed it as a trend in the tech world.

ScottBurson · on Jan 1, 2017

In the footnote, what is that comma after "Regehr" doing there? Delete it!

But yes, "note that" is a bugaboo I have to battle in my own writing. With rare exceptions it can just be deleted; occasionally it indicates that the following point deserves more emphasis than I have given it.

feld · on Jan 2, 2017

I've kept these in my GitHub for a while now

https://github.com/feld/technical-writing

emmelaich · on Jan 2, 2017

[edit: I see @erlehmann_ beat me to it but am leaving it here anyway]

I would be great if someone could enhance the programs `style` and `diction` [1] to incorporate these hints. And make a browser add-in for good measure!

I used `diction` religiously back when I used a UNIX System V system -- it helped me a lot.

1. https://www.gnu.org/software/diction/

5706906c06c · on Jan 1, 2017

Love this! English is my third language; I often fall for using passive voice, adverbs or fillers. Much like this script, I found Grammarly to be extremely helpful in forcing me to rethink the above when composing.

danso · on Jan 1, 2017

I do something similar to the OP. When I'm working on a long project with multiple pieces that I might have left incomplete, I'll use a grep-like (like ack, which has PCRE) to quickly look for placeholders or cusswords:

       ack -C -i 'tk|to ?do|lorem|fu.k|shit|wt[fh]|[!?.]{3,}'

Maybe the OP's goal is to have his phds practice more shell scripting and syntax. But it seems the same effect could be achieved with grep and the flag to filter from a file of patterns, rather than creating an unwieldy single string to enumerate all the possible words. Instead of having to write if/else logic to provide a lackluster CLI, have students create a repo of weasel words and use git clone/curl with grep.

I didn't read through his third script for detecting duplicate words, but couldn't it be achieved by using PCRE regex and backreferences?

http://stackoverflow.com/questions/2823016/regular-expressio...

Off-topic, but I've been meaning to write a post on how learning the command-line made me a significantly more productive writer. I do most of my writing on sites built from static site generators, such as Jekyll and Middleman and Sphinx. For many of my tutorials, I have to describe graphical elements which require taking screenshots.

I of course know the OSX keyboard shortcut to turn on the screen grab utility and interactively make a selection. But this saves the screenshot to a default location with a generic file name. To include that image in my blog, I have to move it over to my working directory, rename it, and then write the img code and src attribute to my blog post. It's enough annoying small steps that including images in my posts was a huge chore.

Sometime ago, this blog post on OS X Terminal Utilities [0] made it to HN's front page and I learned that screencapture could be invoked from the Terminal. So I wrote a little Ruby wrapper that, when invoked from the command-line with an argument for output path, would call screencapture after a 2-second delay -- enough time for me to Cmd-Tab from a Terminal to the application I want to screensnap -- and then save the snap to the specified destination and output HTML/Markdown that I could paste into my blogpost.

Sample usage:

      $ screenpy images/path/to/screenshot.jpg

stderr:

      Writing to: images/path/to/screenshot.jpg
	Format: jpeg
	quality: 75
	optimize: True      
      ![image screenshot.jpg](images/path/to/screenshot.jpg)

stdout:

       <img src="images/path/to/screenshot.jpg" alt="screenshot.jpg">

I've iterated the tool, converting it to Python and including the was-sdk so I could upload to S3 if I need an absolute URL. And I've written plenty of other utilities since...but it's hard to overstate how much being able to operate via CLI has smoothed my writing experience. It's not just that it saves me time, but I'll write visual-heavy posts that I would have never even tried, especially back in my Wordpress days.

[0] http://www.mitchchn.me/2014/os-x-terminal/

[1] https://gist.github.com/dannguyen/bfb45408d43986eefdf83b59bc...

ams6110 · on Jan 1, 2017

For linux, the scrot utility will do that.

https://en.wikipedia.org/wiki/Scrot

icebraining · on Jan 1, 2017

They say it follows the UNIX philosophy, but that's clearly not right; delaying is a job for sleep(1). xwd is the true UNIX philosophy abiding screenshot tool :)

https://en.wikipedia.org/wiki/Xwd

kazinator · on Jan 2, 2017

> My Ph.D. advisor, Olin Shivers, ...

And you're proudly shell scripting. :)

plg · on Jan 2, 2017

A fantastic book for improving your writing:

On Writing Well by William Zinsser

BuuQu9hu · on Jan 1, 2017

Another similar tool: http://proselint.com/