With the recent Heartbleed vulnerability scramble, there have been discussions on detecting the vulnerability in the wild. This is a quick write up to explain entropy testing and to help explain what the characteristics of encrypted vs. unencrypted traffic really are.
To start with, we'll analyze a binary as it sits on a system versus that same binary after being encrypted. In this example I used /bin/bash. Below is a quick perl script to measure the occurrence of all bytes between 0x00 and 0xff.
BASH Binary Unencrypted
bsmall@bsmall-vm-debian:~/Documents/entropy_example$ perl -e 'do { local $/ = undef; $blah = ; }; %b; foreach my $i (split(//, $blah)) { $b{$i}++; } foreach my $k (0..255) { printf("%02x,%0d\n", $k, $b{chr($k)} ? $b{chr($k)} : 0) } ' < bash
00,159246
01,16628
02,7081
03,4459
04,21912
05,4461
06,2125
07,1898
08,31585
...
BASH Binary Encrypted with GPG
bsmall@bsmall-vm-debian:~/Documents/entropy_example$ perl -e 'do { local $/ = undef; $blah = ; }; %b; foreach my $i (split(//, $blah)) { $b{$i}++; } foreach my $k (0..255) { printf("%02x,%0d\n", $k, $b{chr($k)} ? $b{chr($k)} : 0) } ' < bash.gpg
00,1839
01,1723
02,1749
03,1797
04,1790
05,1860
06,1743
07,1749
08,1777
...
The goal of the script is to dump a table that we can use to generate a histogram and get visual representation how often a particular byte occurs in the binary. Right away we can see the variance between each byte is drastically different. Per the output, the encrypted binary has 1,839 occurrences of the byte sequence 0x00, where the unencrypted binary has 159,246 occurrences of 0x00.
Plotted on a histogram, the results are as so:
The visual representation of the data shows a clear difference in the distribution of bytes. Running these binaries through one of the few algorithms listed here, we can measure the entropy in the files as as score:
bsmall@bsmall-vm-debian:~/Documents/entropy_example$ python entropy_test.py < bash.gpg
.[Results].
Length 451478
Entropy: 7.99960058661
bsmall@bsmall-vm-debian:~/Documents/entropy_example$ python entropy_test.py < bash
.[Results].
Length 941252
Entropy: 6.36049396951
These scores will line up with confidence tables described in this document.
The goal of the script is to dump a table that we can use to generate a histogram and get visual representation how often a particular byte occurs in the binary. Right away we can see the variance between each byte is drastically different. Per the output, the encrypted binary has 1,839 occurrences of the byte sequence 0x00, where the unencrypted binary has 159,246 occurrences of 0x00.
Plotted on a histogram, the results are as so:
The visual representation of the data shows a clear difference in the distribution of bytes. Running these binaries through one of the few algorithms listed here, we can measure the entropy in the files as as score:
bsmall@bsmall-vm-debian:~/Documents/entropy_example$ python entropy_test.py < bash.gpg
.[Results].
Length 451478
Entropy: 7.99960058661
bsmall@bsmall-vm-debian:~/Documents/entropy_example$ python entropy_test.py < bash
.[Results].
Length 941252
Entropy: 6.36049396951
These scores will line up with confidence tables described in this document.