Tinkering with Entropy

A quick session in Shannon Entropy expressed in bytes.

Jun 30, 2023

Buckle up; we’re diving in deep today. I just finished the book, A Mind at Play (affiliate link1), about Claude Shannon, someone I’ve written about before and will write about again. It got me thinking about his work on entropy in information theory, and I thought I’d brush off some code I’d written a while back to calculate the “Shannon Entropy” of a byte slice.

This snippet is relatively short, efficient, and complete. It’s written in Go.

func entropy(b []byte) float64 {
	var score float64
	var freqArray [256]float64
	for i := 0; i < len(b); i++ {
		freqArray[b[i]]++
	}
	len := float64(len(b))
	for i := 0; i < 256; i++ {
		if freqArray[i] != 0 {
			freq := freqArray[i] / len
			score -= freq * math.Log2(freq)
		}
	}
	return score / 8
}

I like this code because it’s deceptively simple and uses a trick of the array index to quickly allow tallying the totals of all bytes to perform the frequency analysis. Before we get into the nuts and bolts of why measuring entropy is interesting, I want to review what this function is doing.

There are two for loops in this function. The first loops through all bytes and counts up the occurrence of each byte by value.

	var freqArray [256]float64
	for i := 0; i < len(b); i++ {
		freqArray[b[i]]++
	}

This works because any byte can only be one of 256 values, so we can "hack" the array index value for our counter. All this does is keep track of the number of times we "see" a number, so if 5 bytes in 50 contain the value of "25", freqArray[25] == 5. The number of iterations of this loop scales linearly with the length of the byte slice we are evaluating.

The second loop is where the interesting work happens. Here we implement the Shannon entropy algorithm.

	len := float64(len(b))
	for i := 0; i < 256; i++ {
		if freqArray[i] != 0 {
			freq := freqArray[i] / len
			score -= freq * math.Log2(freq)
		}
	}

The big takeaway is that no matter how long our byte slice is, this loop runs through a fixed set of 256 iterations. Here we calculate the score by keeping a running total of the subtracted frequency of each value times Log Base 2. To adjust this for bytes, we divide this score by eight before returning the value, which gives us a score between 0 and 1.

So what is entropy?

You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.
John von Neumann

Legend has it that John von Neumann pitched the term entropy to Shannon while Shannon wrote his seminal work A Mathematical Theory of Communication.

The Shannon entropy score is a number between 0 and 1 that measures a message's average level of surprise. A message with low entropy contains no information (such as a message of “ “ or “0000000”). Such a message would have a score of 0. A “perfect” entropy message would evenly distribute all possible values. Such a message would have a score of 1.

An example of this is flipping a fair coin one million times. The “entropy” contained in this message is very high (close to 1, probably something like 0.999976). This is because if someone were to be asked to try to “guess” the exact pattern of 1 million coin flips, the message itself can’t reveal underlying patterns or redundancy. You can think of a high entropy score for a data set with low “guess-ability.” This is why tools like cryptographically secure pseudo-random number generators are concerned with high entropy scores.

I can demonstrate this with the code I wrote earlier. Instead of coin flips, we will use bytes. A Byte is an 8-bit number with a value between 0 and 255. We can think of a random byte equivalent to 8 random coin flips. The entropy score we’re measuring here is the level of “surprise” we get in the bytes of some arbitrary data.

entropy([]byte("     "))
0.0000000000

A message of “ “ gives us an entropy score of 0, as expected. It doesn’t contain any “surprises.”

entropy(randBytes(1_000_000))
0.9999804355

A message containing one million randomly generated bytes gives us a high entropy score of 0.9999804355. That’s nice; we’d hope Go’s "crypto/rand" library is pumping out high entropy outputs.

But what about a human-readable message? Let’s try “hello, world”.

entropy([]byte("hello, world")
0.3777569011

That’s a pretty low score. This message is only 12 bytes long but contains three “l”s and two “o”s. Those redundancies do not produce “surprising” information, so the entropy score is low due to the message length and redundant bytes in the message.

It is essential to cover message length as well. A short message length will get a lower entropy score even with a high entropy input. If we run a set of experiments with our entropy tool on randomly generated bytes from 16 bytes to 1024 bytes, we see that the amount of “surprising” data steadily increases.

entropy: 0.5000000000 for:   16 random bytes
entropy: 0.6171875000 for:   32 random bytes
entropy: 0.7211818603 for:   64 random bytes
entropy: 0.8232230957 for:  128 random bytes
entropy: 0.9021275742 for:  256 random bytes
entropy: 0.9485006656 for:  512 random bytes
entropy: 0.9782479345 for: 1024 random bytes

Tinkering with the Shannon entropy algorithm in a language I enjoy and a data type I understood helped me a lot in understanding how it really works. If you are interested in running this code yourself. I’ve made it available in the go playground and as a public Github Gist.

This Week’s Notable Links

Finding a Needle in the Ocean by Avi Loeb

I don’t know what to think about all the UFO whistleblower news, but I can tell you that Avi Loeb’s work is fantastic. Looking for evidence of extraterrestrial metal alloys left over from interstellar meteors deep in the ocean is fascinating and worth reading about.

Amazon Prime Video’s Microservices Move Doesn’t Lead to a Monolith after All

A few weeks ago, a post about how Amazon abandoned microservices and went back to a monolith made the rounds. It turns out that wasn’t true. Both articles are interesting reads.

Power LED Attack - Computerphile

Recently, Security researchers published their work on a Power LED attack that could steal cryptographic keys. This Computerphile video covers the topic well and makes the basic concepts approachable.

The Beautifully Strange World of Outsider Music

This video is a couple of years old but new to me. I’ve known of a few of the artists outlined here such as Tiny Tim, Wesley Willis, Daniel Johnston (as any self-respecting Austinite would) , but I enjoyed how this was framed, and I’d never heard of the term “outsider music” to describe these artists.

I earn from qualifying purchases as an Amazon Associate.

Functionally Imperative

Discussion about this post

Ready for more?