![]() |
Squozen Sortversion 0.2 |
To clarify some assumptions: we'll read the numbers in ascii decimal from standard input, one line per number. They'll be unsigned longs. There may be duplicates. We'll have some fixed overhead, but the main arrays we sort in will total two million bytes. The results will be printed on standard output.
[1...] 0 bbbb bbbb bbbbThat is, zero or more one bits, a zero, and twelve binary bits. The leading ones represent gap / 4096 in unary, and the binary bits encode gap % 4096. For a million gaps that add up to 232, the code uses at most
232/4096 + 1,000,000*(1+12) = 14,048,576 bits = 1,756,072 bytes,which is within 4% of the theoretical minimum mentioned above. It's easy to verify that the Rice code works well here, but the choice wasn't obvious or intuitive to me beforehand, so this page shows a systematic path to it.
If we allocate a buffer for the final sorted list, the space we have left is...
2,000,000 - 1,756,072 = 243,928 bytes...enough for a buffer of 60,982 32-bit numbers.
We will fill and sort the small buffer 1,000,000 / 60,982 < 17 times (but see below), and each time merge it into the big buffer. That means that in addition to the the work of sorting the small bufferfulls, we do about 1,000,000 * 17 / 2 comparisons, 8.5 * n, which is about half the number of comparisons within the sorts. But we also do about that number of compressions and decompressions, which are more expensive.
The growth of the buffer over merge passes looks like a triangle, which suggests O( n^2 ) space use, but the base of the triangle is set by the ratio of the two buffer sizes, not by the size of the input. As long as we're working in memory, and the two buffers' capacities are k:1, then the total work is
O( n * (k/2 + log( n/k )) ),which is still O( n log n ). To merge the sorted numbers into a single buffer that can only fit the final result, first slide the compressed data to the end of the big buffer. Start uncompressing the old data, merging with the small buffer of sorted but uncompressed data, compressing the result, and filling in the big buffer from the beginning. Old compressed data is read and processed leaving space for new data that's been merged.
The slack space between the new data and the remaining old data shrinks because more data from the small buffer is being inserted. But because the big buffer is big enough for the final result, we know that the total of the new and remaining old data will fit without overlap-- the same calculation of the total size is valid at every intermediate point.
One more optimization: the last merge step doesn't need its results recompressed into the big buffer, they only need to be sent to the output. That means the big buffer's capacity can be reduced by the capacity of the small buffer. That in turn means the small buffer can be larger, which means the big buffer can be smaller, and so forth. I didn't look for a closed form or efficient way to find the squeezed buffer size, I just iterate until I get to a fixed point. For the original problem of one million longs in two million bytes, this reduces ~17 passes of ~61K items each to ~10 passes of ~103K.
That compares to 18.5 seconds for Unix's sort -n, but that's a little unfair since Unix sort works on the data as strings, sorts by arbitrary combinations of fields, and can sort a bigger variety of number formats.
Makefile (1K) |
for C programs, index.html, combos.html and test files |
combino.py_txt (3K) |
calculates table for combos.html |
combos.html (10K) |
derivation and table for combinatorial math in squozen_code.html |
combos.php.txt (4K) |
php source for combos.html |
factorial.py_txt (1K) |
Python procedures for n! and log-base-2( n! ) |
gamma.py_txt (10K) |
Python gamma and log-gamma functions used by factorial.py |
log2_factorial (1K) |
executable Python log-base-2( n! ) for large n |
pr_urands.c (2K) |
generate random test files |
sort_n.c (17K) |
the squozen sort program |
squozen_code.html (21K) |
arriving at the Rice code step by step |
time_test (1K) |
shows speed vs. buffer size. Also compares to Unix sort. |
time_test.out (14K) |
test results when run on my machine |
urands.txt.gz (5000K) |
random test file (gnuzip'ped) |
urands_w.txt.gz (4603K) |
random multiples of 4096, sort_n's worst case (gnuzip'ped) |
This page was made with a
php script.
Last change January 28, 2021.
--Steve Witham
Up to my home page.