Steganography conceals the existence of the message, not just the contents.
if David Miranda gets stopped at Heathrow and:
On the following morning, the gener·al requested permission to return the emperor's visit, by waiting on him in his palace.
A pitched battle follow·ed.
But the pride of Iztapalapan, on which its lord had freely l·avished his care and his revenues, was its celebrated gardens.
...
is in his twitter account, it's in no way plausible that they're just innocuous tweets, and he can be compelled to reveal the secret.
A true steganographic message would have looked indistiguishable from any other tweet that he would have made normally. this is a cute system, but it's not steganography.
So if Miranda doesn't usually tweet about the history of Mexico, he can pick other texts (written by him or others) which would sound more plausible as something he might normally tweet.
Having said that, the middle dot is not as unobtrusive as I would like, so perhaps it's better to rethink that part of the system, using some of the other suggestions in this thread.
even ignoring the middle dot, just picking a more suitable corpus for that person isn't necessarily going to make this technique look innocuous - it's still a collection of excerpts taken from what looks like random positions within a document. it looks suspicious.
you could imagine a system which uses entirely normal and habitual tweets, but encodes information in choices of synonyms used in the text, or whether or not punctuation was used in certain places, or the timing of the tweet's publication. lower bitrates, but plausibly deniable as to the message's existence.
I would posit that a stream of non-sequitir tweets is not necessarily suspicious (depending on the person/account in question, of course).
The classic steganography methods you're describing may appear less obvious (for lack of a better term), but they're also easier to break once the pattern is discovered.
just as hard to break if you're doing it properly: one bit per tweet, encoded as 'message ends with period = 1, no period = 0', and that's your ciphertext stream. from then, AES or RSA or a OTP or whatever you want.
Then what you're writing will be functionally indistinguishable from random.
hmm ... but maybe too random, humans are bad at being truly random - the entropy in your period usage will be too high ... perhaps xor the RSA output with a OTP of something 'random' you scribbled yourself on a page ;-)
"perhaps xor the RSA output with a OTP of something 'random' you scribbled yourself on a page ;-)"
... seemingly random bits, xored with anything not directly related to the bits in question produces seemingly random bits... There are other ways of transforming a sequence of bits to look less uniform, though.
it's not steganography. the messages have little dots in them - they are very distinctive. it's trivial to find anyone using this.
why not use a typo instead? switch the letter at that point to another letter (could be a different one for each sentence). use fuzzy matching (eg locality sensitive hash) to find the correct sentence.
and construct the dictionary from previous tweets rather than books. then tweets look like tweets.
Good idea about using typos instead of a middle dot marker (TBH, it's not as unobtrusive as I would like, so perhaps it's better to rethink that part of the system).
For those who aren't aware, "Drink more Ovaltine" is a reference to the movie A Christmas Story [1]. The boy in the story is excited to decode a message from the Ovaltine fan club using his decoder ring and is let down when the message is just "Be sure to drink your Ovaltine." [2]
The ad application is interesting; I hadn't been thinking along those lines at all...
I considered using public keys, but I didn't want the resulting encoded tweet to be illegible or look like gibberish.
The conceit here is that even a secret message looks completely innocent, something that no eavesdropper would notice as being out of the ordinary (though the middle dot marker does give it away, to people paying attention).
Yes, that's the simplest way to make the encoded tweets look natural, i.e., truly steganographic, but it's a fine balancing act between that and not making the corpus obvious.
The content of the corpus doesn't matter, so the answer to your first question is no, and yes, if you're clever in terms of how you've composed the corpus so that no one could re-create it easily, then it's ok (relatively speaking, of course).
Instead of using a corpus, why not select tweets from a list of followers and re-tweet them? This would make your encoded messages more like a natural twitter stream. You could curate followers lists to be themed around a subject (tech, gardening) to make them more naturally align with the rest of your twitter stream.
Edit: oops, this idea was already mentioned by akkatirk elsewhere in this thread.
The amusing thing about sending ciphertexts over twitter compared to english text is that you can actually fit in more information, assuming you do the encryption and ciphertext encoding right. That's because twitter transports 140 unicode code points.
(This has nothing to do with steganography, but seems relevant nonetheless. )
Cool! One other fun approach may be to use syntactic transformation (topicalization, middle field ordering, etc.) or lexical variation (e.g. through synonyms):
Thanks, I'll look into those suggestions, especially since I'm not really happy about using the middle dot (it's really not unobtrusive, and the entire system would work better if I didn't use it).
Additionally, @workmajj and I wrote TweetFS using Plainsight. It lets you recursively pack up directories and post them as an encoded linked list of Tweets to Twitter: https://github.com/rw/tweetfs
I presented Plainsight at Hack'n'Tell NYC in 2011 and a video was recorded: http://bit.ly/pecGgW
Plainsight uses each byte of the input message to generate tokens. Bits are used to decide how to traverse the token tree, weighted by frequency. The drawbacks are 1) verbosity and 2) incorrect grammar.
One of the lessons of writing Plainsight is that spam can be used to contain secret messages. Send enough gibberish to enough people, with your intended recipient included, and you'll look like a spammer--not a spy.
cat ciphertext | fold -s
which was the case, of a light. And, his hand. "BALLARAT." only applicant?"
decline be walking we do, the point of the little man in a strange, her
husband's hand, going said road, path but you do know what I have heard of you,
I found myself to get away from home and for the ventilator little cold night,
and I he had left my friend Sherlock of our visitor and he had an idea was not
to abuse step I of you, I knew what I was then the first signs it is the
daughter, at least a fellow-countryman. had come. as I have already explained,
the garden. what you can see a of importance. your hair. a picture upon of the
money which had brought a you have a little good deal in way: out to my wife
and hurry." made your hair. a charge me a series events, and excuse no sign his
note-book has come away and in my old Sherlock was already down to do with the
twisted
Now, decipher that ciphertext:
cat ciphertext | plainsight -m decipher -f sherlock.txt > deciphered
cat deciphered
Meet at Union Square at noon. The password is FuriousGreen.
-- Example 2 (binary data)
$ dd if=/dev/urandom of=/dev/stdout bs=1 count=10 | plainsight -m encipher -f 1984.txt
10+0 records in
10+0 records out
10 bytes (10 B) copied, 9e-05 s, 111 kB/s
Adding models:
Model: 1984.txt added in 0.89s (context == 2)
input is "<stdin>", output is "<stdout>"
enciphering: 100%|#####################################################################################################################################################################|474.67 B/s | Time: 0:00:00
which is a war is real, the proles used mind on the telescreen. He could see through all right to. You have read what said. 'Yes,' only in the Ministry
One serious use case is to seed the generator with a spam email corpus. This lets you generate messages that look like spam. Example:
wget https://spamassassin.apache.org/publiccorpus/20030228_spam.tar.bz2
tar -jxvf 20030228_spam.tar.bz2
cat spam/0* > spam-corpus.txt
echo "The Magic Words are Squeamish Ossifrage" | plainsight -m encipher -f spam-corpus.txt > spam_ciphertext
$ cat spam_ciphertext
(8.11.6/8.11.6) 3 (Normal) Internet can send e-mails until to transfer 26 10 [127.0.0.1] also include address from the most logical, mail business for your Car have a many our portals ESMTP Thu, 29 1.0 this letter on internet, <a style=3D"color: 0px; text/plain; cellspacing=3D"0" how quoted-printable about receiving you would like width=3D"15%" width=3D"15%" border="0" width="511" Date: Tue, 27 Thu, 19 26 because zzzz@localhost.spamassassin.taint.org for
$ cat spam_ciphertext | plainsight -m decipher -f spam-corpus.txt
Adding models:
Model: spam-corpus.txt added in 2.57s (context == 2)
input is "<stdin>", output is "<stdout>"
deciphering: 100%|#####################################################################################################################################################################|543.84 B/s | Time: 0:00:00
The Magic Words are Squeamish Ossifrage
I think the problem is that unlike ":)", too often people use passive-aggressive ";)"s, not meaning them in the friendly way they were originally intended but with a slightly bitter aftertaste.
But this is getting a bit off-topic I suppose... :)
Encode your secret message as a bunch of numbers, xor them your OTP, and then look up users with those numbers and simply re-tweet their most recent message.
Now your message is encoded in the userids of who you retweet.
Might be less obtrusive to favorite instead of retweeting.
(Edit: This assumes you can get a list of favorited tweets ordered by when they were favorited. I just noticed the web interface orders them by time posted, not time favorited.)
They're publishing the encrypted ciphertext, which is the plaintext message xor OTP (aka the key). The encrypted ciphertext would be the user id's of all recent retweets.
The receiving party would have to know the OTP to decrypt the message.
good suggestion. you probably want to encode some redundancy so you can pick plausibly interesting people to retweet, rather than your xor'd number being an obvious bot, or it's still suspicious.
I don't really see how Twitter could identify these as "Gibberish". They're normal English sentences, albeit meaningless because they lack any real context.
How could or why would Twitter be able to identify these seemingly undetectable tweets as spam? They don't look like gibberish.
How do you think Twitter tells the difference between a Bot and a human? I get that they are sentences, but they aren't contextually relevant, they don't look like tweets, they look like spam.
If you randomly start posting "Four score and seven years ago" and the rest of the speech out of order you are going to get flagged.
I used to build tweet spam bots, I am very familiar with how they get detected.
Put quotes around the whole thing and Twitter will think you're just another 13 year old girl who thinks she's clever by incessantly quoting literature.
> I used to build tweet spam bots, I am very familiar with how they get detected.
I would have thought that the content of a tweet is quite low down on their detection list. IP address, user account, frequency of tweets etc would be much higher.
Very nice! I worked on a similar steganographic system (not for tweets though) that you can find here: https://github.com/hmoraldo/markovTextStego There you'll find both the source code and a link to a paper explaining how it works and how it differs from other approaches.
At Twitter Peak Hype, when journalists were writing silly things like: "Twitter is nothing less than a new internet protocol!", I had a perverse fantasy of implementing TCP/T(weet).
If someone were to do this, it would effectively subsume all further "I implemented $X on Twitter" posts. Sort of like showing that a language is Turing complete.
There's no obvious way of telling where a "secret" sequence would begin and end.
For now, it might be best left as a coordination issue, similar to the choice of corpus: e.g., you know that you'll be tweeting secretly n times a day, at these specific times only, etc.
Or you could use https://scrambls.com/ which uses strong crypto and works for facebook or any site. Keeping in mind that any short message protocol is vulnerable to cryptanalysis.
For this to be secure (one-time-pad) you can't reuse the corpus. That's a big enough pain that I doubt people will actually do it. Which means you can start decoding their tweets once you collect enough.
if David Miranda gets stopped at Heathrow and:
is in his twitter account, it's in no way plausible that they're just innocuous tweets, and he can be compelled to reveal the secret.A true steganographic message would have looked indistiguishable from any other tweet that he would have made normally. this is a cute system, but it's not steganography.