Walrus coder vs. Arithmetic coder Efficiency Comparisons

Empirical tests using test1_12.out

Testing 12-bit versions of both coders. Each test is run with p_A held constant, and pseudorandom input data matching that p_A, until the entropy of the input data is 10000 bits. For a given p_A, the same pseudorandom input is fed to each coder. Tests are run with p_A's from 2-1 down to 2-16.

The red "Old_Arith_coder" line below shows the performance of Arithmetic_coder without its avoid_straddle feature. The blue line shows its performance with the feature. Note that most of the difference happens when lg(p_A) <= -12, the nominal resolution of the coders being tested. After that point the green (walrus) and blue (new arithmetic) lines are identical.

Generating Markov models for constant p_A

With a slightly simplified (and slightly less efficient) "pessimistic" picture of how the Walrus coder works, it's easy to build a Markov model to give expected and lower-bound efficiencies. In this pessimistic picture, the state_width identifies the state. There can be more than one transition between the same in and out state.

Transition table format

At this stage it's a list of transition tuples. Nothing is guaranteed about the order of the list:

[
  (in_state_width, probability, out_state_width, n_of_bits_out),
  ...
]

From transitions to loops

With the pessimistic picture and a fixed p_A, the transitions that are not to the start state form a tree (with the start state at the root). Each transition back to start is the end of a message or "loop" from start to start. The transitions need to be sorted, and then all the loops can be found, and for each,

These allow easy calculation of the coding efficiency for each loop, and the expected coding efficiency for the pessimistic picture as a whole with the given fixed p_A. The worst-case efficiency for p_A is that of the worst loop. Below the tables of numbers is a section where they're graphed.