Analyzing digram frequencies
Links and useful resources
Concept summary and lesson
- What is a digram / digraph / bigram?
- frequencies of letter digrams in English text
- Using maps to count digrams
Examples/demo
Last time, we counted letters in a piece of text. That by itself is a pretty useful tool for breaking classical crypto, but we can do even better. It turns out that English has a much stronger pattern to it when you look at pairs of letters rather than single letters. If you look at this link: digraph frequencies in English text, you'll see an interesting thing: they only gave frequencies for the 22 most common digrams, even though there are
That's good news for us as codebreakers, because it lets us rule out a lot of potential solutions based on how unlikely the digrams are. Say you've found 10 possible single-letter simple substitution cipher keys that give believable letter frequencies. That means that you can use any of them to "decipher" the message, but only one of them will actually be correct. You could then take a look at what the digram frequencies of each would give you, and you'll probably be able to rule out almost all of the bad ones immediately! That's because digrams are a lot more sparse than single letters, and the weird ones are often actually impossible to find in correct english text.
So today we're going to write a function that will give us the count of all digrams in a piece of text. It's going to work
Media resources
- Youtube search for "What is a digram?"
- Youtube search for "frequencies of letter digrams in English text"
- Youtube search for "Using maps to count digrams"