(This section is taken from an earlier draft of squozen_code.html where I was using nbits() instead of P().)

We can ask, how big is the message after the first gap s is removed. At that point the number of gaps decreases by one, and the total of the gap widths decreases by s. Subtracting the two sizes gives the number of bits it took to encode s. Let's call this function nbitsC() for Combinatorics:

               (1000000 +232-1)!         (999999 +232-1-s)!
nbitsC(s)=log2(-----------------) - log2(-----------------)
                1000000!(232-1)!          999999!(232-1-s)!
Of course this is a division (the inverse of the fraction of messages that start with s):
               (1000000 +232-1)!  999999!(232-1-s)!
nbitsC(s)=log2(-----------------------------------)
                1000000!(232-1)! (999999 +232-1-s)!       
Separating the first factor of the first factorial...
       (1000000 +232-1) (999999 +232-1)!  999999!(232-1-s)!
= log2(---------------------------------------------------)
        1000000!(232-1)!                 (999999 +232-1-s)!       
Canceling and rearranging a bit...
       1000000+232-1         (999999 +232-1  )! (232-1-s)!
= log2(-------------) + log2(----------------------------)
          1000000            (999999 +232-1-s)! (232-1  )!
When s = 0, the right term is zero, and the constant on the left is the number of "overhead" bits per codeword:
                 1000000+232-1
nbitsC(0) = log2(-------------) ~= 12.0687673
                   1000000 
When s is small, the right term is approximately
                            999999 +232-1-(s+1)/2
nbitsC(s)-12.0687673 = log2(---------------------) * s
                               232-1-(s+1)/2
which is fairly linear until s gets close to 232, as the following table (calculated
here) shows:

s

nbitsC(s)

(nbitsC(s) - nbitsC(0)) / s

nbitsC(M) - nbitsC(s)

12.069 0.00035095214844 13511271.050
10  12.072 0.00033569335938 13511271.047
100  12.102 0.00033584594727 13511271.016
1000  12.405 0.00033586120605 13511270.714
4295* 13.511 0.00033585678296 13511269.607
10000  15.427 0.00033586120605 13511267.691
100000  45.656 0.00033586761475 13511237.463
1000000  347.972 0.00033590325928 13510935.147
10000000  3374.626 0.00033625573883 13507908.492
100000000  33995.614 0.00033983545578 13477287.504
1000000000  382343.724 0.00038233165474 13128939.395
2000000000  904038.383 0.00045201315731 12607244.735
3000000000  1729352.866 0.00057644693227 11781930.253
232-1000000001  2102103.944 0.00063797048254 11409179.175
232-100000001  5417560.281 0.00129143991621 8093722.838
232-10000001  8676830.945 0.00202494401440 4834452.174
232-1000001  11511294.910 0.00268080356709 1999988.208
232-100001  13027846.132 0.00303334961682 483436.986
232-10001  13430353.688 0.00312700236496 80929.431
232-1001  13499880.228 0.00314318372525 11402.891
232-101  13509814.720 0.00314549612081 1468.399
232-11  13511105.594 0.00314579660999 177.525
232-2  13511263.187 0.00314583329591 19.932
232-1  13511283.119 0.00314583793585 0.000
*the average gap width

This page was made with a php script and a Python program. Last change October 26, 2010.

--Steve Witham

Up to