Show HN: Send Secret Messages over Twitter as Public Tweets

brey · on Sept 22, 2013

Steganography conceals the existence of the message, not just the contents.

if David Miranda gets stopped at Heathrow and:

    On the following morning, the gener·al requested permission to return the emperor's visit, by waiting on him in his palace.
    A pitched battle follow·ed.
    But the pride of Iztapalapan, on which its lord had freely l·avished his care and his revenues, was its celebrated gardens.
    ...

is in his twitter account, it's in no way plausible that they're just innocuous tweets, and he can be compelled to reveal the secret.

A true steganographic message would have looked indistiguishable from any other tweet that he would have made normally. this is a cute system, but it's not steganography.

dpapathanasiou · on Sept 22, 2013

The output is a function of the corpus.

So if Miranda doesn't usually tweet about the history of Mexico, he can pick other texts (written by him or others) which would sound more plausible as something he might normally tweet.

Having said that, the middle dot is not as unobtrusive as I would like, so perhaps it's better to rethink that part of the system, using some of the other suggestions in this thread.

brey · on Sept 22, 2013

even ignoring the middle dot, just picking a more suitable corpus for that person isn't necessarily going to make this technique look innocuous - it's still a collection of excerpts taken from what looks like random positions within a document. it looks suspicious.

you could imagine a system which uses entirely normal and habitual tweets, but encodes information in choices of synonyms used in the text, or whether or not punctuation was used in certain places, or the timing of the tweet's publication. lower bitrates, but plausibly deniable as to the message's existence.

dpapathanasiou · on Sept 22, 2013

I would posit that a stream of non-sequitir tweets is not necessarily suspicious (depending on the person/account in question, of course).

The classic steganography methods you're describing may appear less obvious (for lack of a better term), but they're also easier to break once the pattern is discovered.

brey · on Sept 22, 2013

just as hard to break if you're doing it properly: one bit per tweet, encoded as 'message ends with period = 1, no period = 0', and that's your ciphertext stream. from then, AES or RSA or a OTP or whatever you want.

Then what you're writing will be functionally indistinguishable from random.

hmm ... but maybe too random, humans are bad at being truly random - the entropy in your period usage will be too high ... perhaps xor the RSA output with a OTP of something 'random' you scribbled yourself on a page ;-)

dllthomas · on Sept 23, 2013

"perhaps xor the RSA output with a OTP of something 'random' you scribbled yourself on a page ;-)"

... seemingly random bits, xored with anything not directly related to the bits in question produces seemingly random bits... There are other ways of transforming a sequence of bits to look less uniform, though.

akkartik · on Sept 23, 2013

Unless he's constantly making tweets like this.

dpapathanasiou · on Sept 22, 2013

This is a side project I've been working on for the Lisp in Summer Projects[1] contest.

It's a text steganography app using a simple book cipher, written in Clojure.

I welcome any feedback from HN so let me know what you think!

[1] http://lispinsummerprojects.org/

andrewcooke · on Sept 22, 2013

it's not steganography. the messages have little dots in them - they are very distinctive. it's trivial to find anyone using this.

why not use a typo instead? switch the letter at that point to another letter (could be a different one for each sentence). use fuzzy matching (eg locality sensitive hash) to find the correct sentence.

and construct the dictionary from previous tweets rather than books. then tweets look like tweets.

dpapathanasiou · on Sept 22, 2013

Good idea about using typos instead of a middle dot marker (TBH, it's not as unobtrusive as I would like, so perhaps it's better to rethink that part of the system).

zeckalpha · on Sept 22, 2013

I could see an advertiser making interesting use of this. "Drink more ovaltine!"

Have you thought about trying to do a public-key based version? People could list their key in their profile.

philwebster · on Sept 22, 2013

For those who aren't aware, "Drink more Ovaltine" is a reference to the movie A Christmas Story [1]. The boy in the story is excited to decode a message from the Ovaltine fan club using his decoder ring and is let down when the message is just "Be sure to drink your Ovaltine." [2]

[1] http://www.imdb.com/title/tt0085334/

[2] http://en.wikipedia.org/wiki/Secret_decoder_ring#Messages

zeckalpha · on Sept 22, 2013

Actually, it predates the movie. https://en.wikipedia.org/wiki/Secret_decoder_ring

dpapathanasiou · on Sept 22, 2013

The ad application is interesting; I hadn't been thinking along those lines at all...

I considered using public keys, but I didn't want the resulting encoded tweet to be illegible or look like gibberish.

The conceit here is that even a secret message looks completely innocent, something that no eavesdropper would notice as being out of the ordinary (though the middle dot marker does give it away, to people paying attention).

akkartik · on Sept 22, 2013

Another idea: use a corpus of tweets to retweet.

dpapathanasiou · on Sept 22, 2013

Yes, that's the simplest way to make the encoded tweets look natural, i.e., truly steganographic, but it's a fine balancing act between that and not making the corpus obvious.

akkartik · on Sept 22, 2013

Is it a big deal if it's obvious that the corpus is of tweets? There's many many ways to build one and order the tweets, right?

dpapathanasiou · on Sept 22, 2013

The content of the corpus doesn't matter, so the answer to your first question is no, and yes, if you're clever in terms of how you've composed the corpus so that no one could re-create it easily, then it's ok (relatively speaking, of course).

coherentpony · on Sept 22, 2013

The problem there is the 140 char limit.

zeckalpha · on Sept 22, 2013

The keys would not be transmitted via tweets, just messages.

a3n · on Sept 22, 2013

You can split a sentence across multiple tweets. V1.1?

rhizome · on Sept 22, 2013

Twitlonger isn't annoying enough?

coherentpony · on Sept 22, 2013

Oh, cool!

_sh · on Sept 23, 2013

Instead of using a corpus, why not select tweets from a list of followers and re-tweet them? This would make your encoded messages more like a natural twitter stream. You could curate followers lists to be themed around a subject (tech, gardening) to make them more naturally align with the rest of your twitter stream.

Edit: oops, this idea was already mentioned by akkatirk elsewhere in this thread.

orenmazor · on Sept 22, 2013

This is so cool. Well Done!

ctz · on Sept 22, 2013

The amusing thing about sending ciphertexts over twitter compared to english text is that you can actually fit in more information, assuming you do the encryption and ciphertext encoding right. That's because twitter transports 140 unicode code points.

(This has nothing to do with steganography, but seems relevant nonetheless. )

danieldk · on Sept 22, 2013

Cool! One other fun approach may be to use syntactic transformation (topicalization, middle field ordering, etc.) or lexical variation (e.g. through synonyms):

https://www.cerias.purdue.edu/assets/pdf/bibtex_archive/PSI0...

The advantage of such an approach is that you can use coherent text/messages.

dpapathanasiou · on Sept 22, 2013

Thanks, I'll look into those suggestions, especially since I'm not really happy about using the middle dot (it's really not unobtrusive, and the entire system would work better if I didn't use it).

rw · on Sept 22, 2013

I wrote a textual steganography library and CLI in 2011, called Plainsight: https://github.com/rw/plainsight

Additionally, @workmajj and I wrote TweetFS using Plainsight. It lets you recursively pack up directories and post them as an encoded linked list of Tweets to Twitter: https://github.com/rw/tweetfs

I presented Plainsight at Hack'n'Tell NYC in 2011 and a video was recorded: http://bit.ly/pecGgW

Plainsight uses each byte of the input message to generate tokens. Bits are used to decide how to traverse the token tree, weighted by frequency. The drawbacks are 1) verbosity and 2) incorrect grammar.

One of the lessons of writing Plainsight is that spam can be used to contain secret messages. Send enough gibberish to enough people, with your intended recipient included, and you'll look like a spammer--not a spy.

I also wrote a fuzzing tool, called Shag, to find edge cases, e.g. for single-byte inputs: https://github.com/rw/shag/blob/master/shag.rb

-- Example 1 (regular text)

Type your message to encode:

   echo 'Meet at Union Square at noon. The password is FuriousGreen.' > cleartext

Then, pipe it through Plainsight:

   cat cleartext | plainsight -m encipher -f sherlock.txt > ciphertext

The output will be Doyle-esque gibberish:

   cat ciphertext | fold -s

   which was the case, of a light. And, his hand. "BALLARAT." only applicant?" 
   decline be walking we do, the point of the little man in a strange, her 
   husband's hand, going said road, path but you do know what I have heard of you, 
   I found myself to get away from home and for the ventilator little cold night, 
   and I he had left my friend Sherlock of our visitor and he had an idea was not 
   to abuse step I of you, I knew what I was then the first signs it is the 
   daughter, at least a fellow-countryman. had come. as I have already explained, 
   the garden. what you can see a of importance. your hair. a picture upon of the 
   money which had brought a you have a little good deal in way: out to my wife 
   and hurry." made your hair. a charge me a series events, and excuse no sign his 
   note-book has come away and in my old Sherlock was already down to do with the 
   twisted

Now, decipher that ciphertext:

   cat ciphertext | plainsight -m decipher -f sherlock.txt > deciphered
   cat deciphered
   Meet at Union Square at noon. The password is FuriousGreen.

-- Example 2 (binary data)

   $ dd if=/dev/urandom of=/dev/stdout bs=1 count=10 | plainsight -m encipher -f 1984.txt
   10+0 records in
   10+0 records out
   10 bytes (10 B) copied, 9e-05 s, 111 kB/s
   Adding models:
   Model: 1984.txt added in 0.89s (context == 2)
   input is "<stdin>", output is "<stdout>"
   
   enciphering: 100%|#####################################################################################################################################################################|474.67  B/s | Time: 0:00:00
   
   which is a war is real, the proles used mind on the telescreen. He could see through all right to. You have read what said. 'Yes,' only in the Ministry

rw · on Sept 22, 2013

One serious use case is to seed the generator with a spam email corpus. This lets you generate messages that look like spam. Example:

   wget https://spamassassin.apache.org/publiccorpus/20030228_spam.tar.bz2
   tar -jxvf 20030228_spam.tar.bz2
   cat spam/0* > spam-corpus.txt
   
   echo "The Magic Words are Squeamish Ossifrage" | plainsight -m encipher -f spam-corpus.txt > spam_ciphertext
   
   $ cat spam_ciphertext
   (8.11.6/8.11.6) 3 (Normal) Internet can send e-mails until to transfer 26 10 [127.0.0.1] also include address from the most logical, mail business for your Car have a many our portals ESMTP Thu, 29 1.0 this letter on internet, <a style=3D"color: 0px; text/plain; cellspacing=3D"0" how quoted-printable about receiving you would like width=3D"15%" width=3D"15%" border="0" width="511" Date: Tue, 27 Thu, 19 26 because zzzz@localhost.spamassassin.taint.org for
   
   $ cat spam_ciphertext | plainsight -m decipher -f spam-corpus.txt
   Adding models:
   Model: spam-corpus.txt added in 2.57s (context == 2)
   input is "<stdin>", output is "<stdout>"
   
   deciphering: 100%|#####################################################################################################################################################################|543.84  B/s | Time: 0:00:00
   
   The Magic Words are Squeamish Ossifrage

dpapathanasiou · on Sept 22, 2013

Thanks for hijacking my thread! ;)

Seriously, though, how would your library work for twitter?

It seems that the encoding process creates texts much, much larger than the original message.

rw · on Sept 22, 2013

TweetFS uses the SeqTweet library, which takes care of sequencing the tweets for you. Specifically, see the _list_to_twitter method: https://github.com/workmajj/seqtweet/blob/master/seqtweet/se...

One of the use cases of TweetFS is to use Twitter as a 'dead drop'. You'll generate a lot of tweets by doing that, but there's no harm done.

I don't think I hijacked your thread. The other comments also discuss textual steganography.

dpapathanasiou · on Sept 22, 2013

Thanks I'll look more into how TweetFS works.

I was just kidding about the hijiacking comment; don't people know how to interpret emoticons anymore? :D

epaga · on Sept 23, 2013

I think the problem is that unlike ":)", too often people use passive-aggressive ";)"s, not meaning them in the friendly way they were originally intended but with a slightly bitter aftertaste.

But this is getting a bit off-topic I suppose... :)

616c · on Sept 22, 2013

Very impressive stuff, sir. Will check out your work in depth this weekend!

martinwnet · on Sept 23, 2013

This is a great idea, well executed.

drakaal · on Sept 22, 2013

The big issue I see is that Twitter detects and delete gibberish as spam. So at best case your posts randomly get filtered when you use this.

At worst case after posting a bunch of gibberish Twitter bans your account.

nrivadeneira · on Sept 22, 2013

Let's be real - if Twitter automatically deleted gibberish as spam, the majority of Tweets from the average Twitter user would be removed.

geocar · on Sept 22, 2013

Encode your secret message as a bunch of numbers, xor them your OTP, and then look up users with those numbers and simply re-tweet their most recent message.

Now your message is encoded in the userids of who you retweet.

graue · on Sept 23, 2013

Might be less obtrusive to favorite instead of retweeting.

(Edit: This assumes you can get a list of favorited tweets ordered by when they were favorited. I just noticed the web interface orders them by time posted, not time favorited.)

wintersFright · on Sept 22, 2013

Aren't go just publishing the otp and not the message? Or did I miss something?

ctb_mg · on Sept 23, 2013

They're publishing the encrypted ciphertext, which is the plaintext message xor OTP (aka the key). The encrypted ciphertext would be the user id's of all recent retweets.

The receiving party would have to know the OTP to decrypt the message.

brey · on Sept 22, 2013

good suggestion. you probably want to encode some redundancy so you can pick plausibly interesting people to retweet, rather than your xor'd number being an obvious bot, or it's still suspicious.

aroman · on Sept 22, 2013

I don't really see how Twitter could identify these as "Gibberish". They're normal English sentences, albeit meaningless because they lack any real context.

How could or why would Twitter be able to identify these seemingly undetectable tweets as spam? They don't look like gibberish.

andrewcooke · on Sept 22, 2013

this doesn't seem to be true. https://twitter.com/ColorlessG

context: http://colorlessgreen.net/about

cmsmith · on Sept 22, 2013

Check the bottom of the post. The tweets themselves are actual sentences from a book, not gibberish.

drakaal · on Sept 22, 2013

How do you think Twitter tells the difference between a Bot and a human? I get that they are sentences, but they aren't contextually relevant, they don't look like tweets, they look like spam.

If you randomly start posting "Four score and seven years ago" and the rest of the speech out of order you are going to get flagged.

I used to build tweet spam bots, I am very familiar with how they get detected.

omni · on Sept 22, 2013

Put quotes around the whole thing and Twitter will think you're just another 13 year old girl who thinks she's clever by incessantly quoting literature.

dpapathanasiou · on Sept 22, 2013

I wouldn't use this with a bot account, since, as you say, it would become easy to detect those as anomalies.

Instead, I was thinking that you might intersperse these in between your normal tweets, in coordination with the people you're communicating with.

bolder88 · on Sept 22, 2013

> I used to build tweet spam bots, I am very familiar with how they get detected.

I would have thought that the content of a tweet is quite low down on their detection list. IP address, user account, frequency of tweets etc would be much higher.

hhm · on Sept 22, 2013

Very nice! I worked on a similar steganographic system (not for tweets though) that you can find here: https://github.com/hmoraldo/markovTextStego There you'll find both the source code and a link to a paper explaining how it works and how it differs from other approaches.

timr · on Sept 22, 2013

At Twitter Peak Hype, when journalists were writing silly things like: "Twitter is nothing less than a new internet protocol!", I had a perverse fantasy of implementing TCP/T(weet).

If someone were to do this, it would effectively subsume all further "I implemented $X on Twitter" posts. Sort of like showing that a language is Turing complete.

dpapathanasiou · on Sept 22, 2013

Isn't that what app.net is supposed to be?

gpsarakis · on Sept 22, 2013

Nice project. Considering a stream of tweets how can you find the beginning and the end of a sentence/message?

dpapathanasiou · on Sept 22, 2013

Ευχαριστώ!

There's no obvious way of telling where a "secret" sequence would begin and end.

For now, it might be best left as a coordination issue, similar to the choice of corpus: e.g., you know that you'll be tweeting secretly n times a day, at these specific times only, etc.

netman21 · on Sept 22, 2013

Or you could use https://scrambls.com/ which uses strong crypto and works for facebook or any site. Keeping in mind that any short message protocol is vulnerable to cryptanalysis.

jefftk · on Sept 22, 2013

For this to be secure (one-time-pad) you can't reuse the corpus. That's a big enough pain that I doubt people will actually do it. Which means you can start decoding their tweets once you collect enough.

dpapathanasiou · on Sept 22, 2013

True, you shouldn't use the same corpus more than once, so there's always going to be a coordination issue between the sender and recipients.

alexharris66 · on Sept 22, 2013

Cool. Much better than my secret twitter message project: http://www.twhatever.com/tweets :)

bzalasky · on Sept 23, 2013

So, horse_ebooks could have an ulterior motive?