...simply by using LittleEndianInputBitStream::read_bit() instead of
read_bits(1). This puts us on the fast path for single-bit reads.
There's still lots of money on the table for bigger optimizations to
claim here, just picking an embarrassingly low-hanging fruit. :^)