For some reason, we were decoding the source color twice for every pixel
in the inner-most loop of `blit_filtered`. This makes sure we only
decode the source color once, and rearranges the code to improve
readability.
For my synthetic font rendering benchmark, this improves glyph rendering
performance by ~9%.