Can you hide data in text?

O

12

6

I wish to put some text on a page and hide some data in that text. Does anybody know of any methods / patterns that have been used in the past to solve this problem?

Example: I have the following text: "The cat sat on the dog and was happy."

I also have the number 123. I want to hide this number in that sentence such that the sentence can be placed on a web page and only someone in the know would be able to find the data.

Orthogonal answered 6/12, 2008 at 0:3 Comment(0)

S

4

HTML makes it quite easy to do this, actually. No need for really cunning amounts of steganography, etc. Let's see:

This sentence embeds 123 and then stops embedding.

This sentence embeds 0102 and then stops embedding.

(We'll have to see whether it actually works in markdown, but I suspect so.) Admittedly it's pretty obvious if you know that there's something to look for, but I think you'll agree it's not obvious to casual observers.

I've left it as a little puzzle to work out the scheme, but add a comment if you want it to be explicitly explained.

Skater answered 6/12, 2008 at 8:18 Comment(2)

Be sure to enable compression on your HTTP server if you do this! – Hueyhuff 6/12, 2008 at 8:24

Yes, if you're transmitting significant amounts of data it could get somewhat unwieldy. – Skater 6/12, 2008 at 8:29

A

11

Of course this can be done.

What you are describing is in a broad description called Steganography.

For instance, you might encode a number in such a way that you count the number of words until you see the letter B, in which case 123 could be encoded as:

You belong to the beautiful group of people being elite.

The thing is, the person wanting to decode your message must know your algorithm.

Edit I notice that my numbers are off by one. Start counting at 0 and you'll see the number 123.

Amuse answered 6/12, 2008 at 0:10 Comment(0)

S

4

HTML makes it quite easy to do this, actually. No need for really cunning amounts of steganography, etc. Let's see:

This sentence embeds 123 and then stops embedding.

This sentence embeds 0102 and then stops embedding.

(We'll have to see whether it actually works in markdown, but I suspect so.) Admittedly it's pretty obvious if you know that there's something to look for, but I think you'll agree it's not obvious to casual observers.

I've left it as a little puzzle to work out the scheme, but add a comment if you want it to be explicitly explained.

Skater answered 6/12, 2008 at 8:18 Comment(2)

Be sure to enable compression on your HTTP server if you do this! – Hueyhuff 6/12, 2008 at 8:24

Yes, if you're transmitting significant amounts of data it could get somewhat unwieldy. – Skater 6/12, 2008 at 8:29

D

3

There are very complicated approaches to this problem, however you can probably go with a very simple one. E.g. define an adjective for every number:

0. beautiful
1. harmless
2. evil
3. colorful
4. weird

and so on. Now select sentences of your choice and put place holders into the sentences where adjectives belong.

"The {adj} cat sat on the {adj} dog and the {adj} cat was happy."

Your number is 123, so your sentence is

"The harmless cat sat on the evil dog and the colorful cat was happy."

A parser can easily take the sentence, split it up into words, find adjectives on the table above, and convert them back to numbers.

The -> ?
harmless -> 1
cat -> ?
sat -> ?
on -> ?
the -> ?
evil -> 2
:

at the end you have 123 again.

As soon people know that there is information hidden in the sentence, the algorithm is easily broken. You can make it harder to break if you add variation by defining multiple adjectives per number. Instead of

1. harmless

you can define

1. harmless/stupid/blue/fashionable

when you need to encode 1, randomly pick any of the words above. As these all map to the number 1, the reverse parser won't care which of the words is printed there, the result will always be one. This randomization will make it harder to reverse engineer the algorithm.

Dita answered 8/12, 2008 at 10:50 Comment(0)

P

2

I think at a high level what you are talking about is steganography. http://en.wikipedia.org/wiki/Steganography

The section on modern techniques should get you started: http://en.wikipedia.org/wiki/Steganography#Modern_steganographic_techniques

Patronizing answered 6/12, 2008 at 0:8 Comment(0)

D

1

I think what you're looking for is something called Steganography. Corinna John has an excellent collection of articles on the subject up on CodeProject.

http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=475133

Drowse answered 6/12, 2008 at 0:8 Comment(1)

To add.. if you follow the links at CodeProject, you'll get to her homepage.. which seems focused on Do-It Yourself Steganography... binary-universe.net – Patronizing 6/12, 2008 at 5:52

D

0

The approach Jon Skeet mentioned is very similar to Matthew Kwan's "SNOW" approach. Both of them hide small amounts of arbitrary information in text without adding, deleting, or changing any of the words in the source text. Both encode the secret message in normally-irrelevant, normally-invisible whitespace -- extra space and tab characters between words and at the ends of lines.

Discant answered 6/12, 2008 at 0:3 Comment(0)

C

0

There may be an algorithm that can turn that sentence into 123, but I think in general you're going to need to accept some modifications to the text if you need to store any possible numerical value!

Corset answered 6/12, 2008 at 0:7 Comment(0)

G

0

If the 'text' was actually an image, then you could hide data in that using steganography - the data is hidden in the binary image file without affecting the way the image looks.

Gnome answered 6/12, 2008 at 0:10 Comment(1)

Hiding data in images is just one branch of steganography. – Roup 8/12, 2008 at 14:55

L

0

According to this thread:

Prof. Mikhail Atallah et. al. here at Purdue did a lot of research on watermarking text.

The approach uses TMRs (Text Meaning Representation) of phrases to encode bits by performing minor transformations positioning the TMR at a certain distance from a defined canonical form.

(another method to watermark text is presented here)

It may be another way to hide text within text, along with the Steganograph method described in the other answers.

Lindquist answered 6/12, 2008 at 0:14 Comment(0)

M

0

Here is a prototype convert encrypted data to "natural" text message.

http://herosys.net/w/project/text-steganography-hide-text-in-spam-sms

Convert source text like "See U at east door of University, tomorrow 8 am" to short text message looks like spam.

"Best house ever! you should never miss it. 1000-3000 square ft. $15-80 per square ft. Call 123-456-7890".

The algorithm is you just create a grammar diagram, and create a candidate table for each word. Just like BASE64, but index table is changed according your predefined context.

Mccarter answered 28/2, 2013 at 23:36 Comment(1)

First link is 404. – Arjun 29/4, 2016 at 19:14

C

-1

Well, you could try something like this...not sure if that's exactly what you're looking for, though.

Calendra answered 6/12, 2008 at 0:6 Comment(0)

Z

-1

I have two schemes with good secuity but with the trade-off of fairly low stegabit embedding rates. One of them is extremely simple but has an embedding rate of 1 bit per line of arbitrary user given texts only, while the other, requiring the user to compose covertexts under the guidance of the software, achieves an embedding rate in the range of [0.5, 1.0] per word. See my home page mok-kong-shen.de

Zhang answered 6/5, 2017 at 14:1 Comment(2)

This looks more like a link-only answer. Please summarise the relevant information here for a complete answer and provide the link at the end for additional reading/references/context. – Meryl 6/5, 2017 at 16:3

@Reti43:Thanks. One scheme, named EMAILSTEGANO, modifies the number of words in a text (emails etc.) such that number of words in a line mod 2, i.e. the parity, gives the stegabit. The other, employing a large English word list (there are such downloadable) and shuffle it via a session-dependent secret key to obtain two approx. equal sublists. Words in one sublist denote 0 and in the other 1. Words e.g. "to", "in" etc. are excluded from these lists. The user is asked to change a word of his own choice in case that word happens to be in the wrong sublist as required by the current stegabit. – Zhang 7/5, 2017 at 13:42

Recommended topics

Hot tags