Use templates to specialize draw_scaled_bitmap() so we don't have to blend()
for source without alpha, and also inline the GraphicsBitmap::get_pixel()
logic so we don't have to branch on the bitmap format on every iteration.
This is another ~30% speedup on top of the previous changes. :^)