Sunday, September 04, 2016

How to solve every substitution code very quickly.

Recently I helped to solve a mysterious notebook, written in an unknown writing system. You can read the details over at Klausis Krypto Kolumne

For decoding this manuscript I used a little trick, I've learned while of my ARG activities.

Now many readers asked me, how does it work. Here is my description.

So let's say, you've got a strange letter, full of unknown codes.

Let us assume for a moment, it's just a substitution cipher, where every letter is encoded with a special symbol, nothing more than this.

We need a key! There are so many ways to solve such substitution code - e.g. using frequency letters etc. 

But my solution is perhaps the quickiest one. A dirty one. Almost cheating and brute force. But most efficient, if you haven't enough time, if the clock-bomb is ticking and you are the only hope for the humanity to save the world.

What do we need?

We need for decoding 

  1. a graphic editor (I used Photoshop, but you can apply MS Paint as well)
  2. Adobe Acrobat Professional (not Reader!). Perhaps you can use another system with text form function, but for me worked Adobe Acrobat Professional pretty fine (again: not Reader!)

Step 1: Symbol extraction.

Firstly, since we suppose here an unknown monoalphabethic cipher, you have to find all unique symbols in this coded message, so we could build our unknown alphabet, even without to know, which symbol corresponds to which letter.

I used Photoshop for this extraction.

As you see, I took every unique symbol from the coded text and placed these symbols below the message. This will be useful later.

So we have now a list of unique symbols under the message. We can save this composition as a JPG file or even export into PDF.

Step 2. Letters allocation.

Before we begin, let's hope for following aspects of the encoded message:

  • the code is case-insensitive (i.e. the same symbol is used for a and A)
  • punctuation is not part of the code, and the spaces as well (in our case you can clearly see commas and spaces between words, which makes the decoding pretty simple)
  • the cypher is monoalphabethic (i.e. the whole text is coded with the same letter allocations)
  • Note side to the encoders - you know what you have to do in order to make the code trickier ;-)
Now, we open the PDF in Adobe Acrobat Professional (not Reader!). I have a version Adobe Acrobat X, but this should work with every version.

We will now append a unique text form to every unique symbol.

Go to Tools => Forms => Create
> choose "Use an existing file"
> choose "Use the current document"
> delete all form fields Acrobat might found automatically, because it's pretty deceptive.

Now choose "Add New Field" => "Text Field" 

With this tool you should now generate text field under the symbol row below the coded message. Give a field name "01".

Repeat it with every symbol, giving field names 02, 03, 04 etc.

Now copy the text field "01" and paste it into (oder if you have generous cell spacing - under) the first symbol. Please pay attention to field name, it should be "01" as well. You can paste the text field "01" over/under every correlative symbol in the text (find the symbols I forgot ;-)): 

First letter is done - now it's up to you to apply appropriate text fields to appropriate symbols. At the end the whole code should look in following way: 

Now I don't know, which size does the image of your code message have. In order to make the fonts readable, choose all fields, go to properties and swtich "Font Size" to "Auto". 

Now choose "Close Form Editing" and save the PDF.

Step 3. Brute force!

You can now try out various letter combinations. First you have to try to detect, which language is coded here. There are various possibilities, like search for special letter sequences. German and English should be pretty recognizable for articles "the" resp. "der/die/das".

Here we have prominent three-letter-sequence:

The word length convinces us rather, the language shouldn't be German. Let's try with English, and let's try with article "the". Input now into PDF file under the first three symbols "t", "h" resp "e". You will can do it, since there are text forms under every symbol now. 

The clue: since unique letters are marked with text field with the same "field name", they are understood by Acrobat Professional like identical fields - and all over the document the letters will appear.

Now we have not only "t", "h" and "e" spread over the whole message, we have also new recognizable sequence "th*t", where * should stay for "a".

Please note the symbol row under the message - we have automatized alphabet list now, even if not alphabetically sorted.

Now we've got two new almost complete words:

  • the*e, where * should be "r"
  • and "a**". "A" + two same letters. Let's assume, the writer wasn't such vulgar for *="s" (you can try it, if you want, though), but you can try "add" / "all" / "ann"

You can try now to solve the message. Feel free to post in the comments your solution. Here you have a form-ready PDF:

That's it, folks.

Now you are ready for cracking every substitition code, whatever language it is. Just try and try again. Voynich manuscript, anybody?

P.S. The writing system, which I used here, was developed by Matt Groening for Futurama.

No comments: