We can ask, how big is the message after the first gap s is removed. At that point the number of gaps decreases by one, and the total of the gap widths decreases by s. Subtracting the two sizes gives the number of bits it took to encode s. Let's call this function nbitsC() for Combinatorics:
(1000000 +232-1)! (999999 +232-1-s)! nbitsC(s)=log2(-----------------) - log2(-----------------) 1000000!(232-1)! 999999!(232-1-s)!Of course this is a division (the inverse of the fraction of messages that start with s):
(1000000 +232-1)! 999999!(232-1-s)! nbitsC(s)=log2(-----------------------------------) 1000000!(232-1)! (999999 +232-1-s)!Separating the first factor of the first factorial...
(1000000 +232-1) (999999 +232-1)! 999999!(232-1-s)! = log2(---------------------------------------------------) 1000000!(232-1)! (999999 +232-1-s)!Canceling and rearranging a bit...
1000000+232-1 (999999 +232-1 )! (232-1-s)! = log2(-------------) + log2(----------------------------) 1000000 (999999 +232-1-s)! (232-1 )!When s = 0, the right term is zero, and the constant on the left is the number of "overhead" bits per codeword:
1000000+232-1 nbitsC(0) = log2(-------------) ~= 12.0687673 1000000When s is small, the right term is approximately
999999 +232-1-(s+1)/2 nbitsC(s)-12.0687673 = log2(---------------------) * s 232-1-(s+1)/2which is fairly linear until s gets close to 232, as the following table (calculated here) shows:
s |
nbitsC(s) |
(nbitsC(s) - nbitsC(0)) / s |
nbitsC(M) - nbitsC(s) |
1 |
12.069 |
0.00035095214844 | 13511271.050 |
10 |
12.072 |
0.00033569335938 | 13511271.047 |
100 |
12.102 |
0.00033584594727 | 13511271.016 |
1000 |
12.405 |
0.00033586120605 | 13511270.714 |
4295* |
13.511 |
0.00033585678296 | 13511269.607 |
10000 |
15.427 |
0.00033586120605 | 13511267.691 |
100000 |
45.656 |
0.00033586761475 | 13511237.463 |
1000000 |
347.972 |
0.00033590325928 | 13510935.147 |
10000000 |
3374.626 |
0.00033625573883 | 13507908.492 |
100000000 |
33995.614 |
0.00033983545578 | 13477287.504 |
1000000000 |
382343.724 |
0.00038233165474 | 13128939.395 |
2000000000 |
904038.383 |
0.00045201315731 | 12607244.735 |
3000000000 |
1729352.866 |
0.00057644693227 | 11781930.253 |
232-1000000001 |
2102103.944 |
0.00063797048254 | 11409179.175 |
232-100000001 |
5417560.281 |
0.00129143991621 | 8093722.838 |
232-10000001 |
8676830.945 |
0.00202494401440 | 4834452.174 |
232-1000001 |
11511294.910 |
0.00268080356709 | 1999988.208 |
232-100001 |
13027846.132 |
0.00303334961682 | 483436.986 |
232-10001 |
13430353.688 |
0.00312700236496 | 80929.431 |
232-1001 |
13499880.228 |
0.00314318372525 | 11402.891 |
232-101 |
13509814.720 |
0.00314549612081 | 1468.399 |
232-11 |
13511105.594 |
0.00314579660999 | 177.525 |
232-2 |
13511263.187 |
0.00314583329591 | 19.932 |
232-1 |
13511283.119 |
0.00314583793585 | 0.000 |
This page was made with a
php script
and a Python program.
Last change October 26, 2010.
Up to