LibTextCodec/Latin1: Iterate over input string with u8 instead of char

Using char causes bytes equal to or over 0x80 to be treated as a negative value and produce incorrect results when implicitly casting to u32. For example, `atob` in LibWeb uses this decoder to convert non-ASCII values to UTF-8, but non-ASCII values are >= 0x80 and thus produces incorrect results in such cases: ```js Uint8Array.from(atob("u660"), c => c.charCodeAt(0)); ``` This used to produce [253, 253, 253] instead of [187, 174, 180]. Required by Cloudflare's IUAM challenges.
2026-04-14 08:35:52 +00:00 · 2023-02-28 03:47:40 +00:00
parent 1c918e826c
commit e864444fe3
1 changed files with 1 additions and 1 deletions
--- a/Userland/Libraries/LibTextCodec/Decoder.cpp
+++ b/Userland/Libraries/LibTextCodec/Decoder.cpp
@@ -353,7 +353,7 @@ ErrorOr<String> UTF16LEDecoder::to_utf8(StringView input)

 ErrorOr<void> Latin1Decoder::process(StringView input, Function<ErrorOr<void>(u32)> on_code_point)
 {
-    for (auto ch : input) {
+    for (u8 ch : input) {
        // Latin1 is the same as the first 256 Unicode code_points, so no mapping is needed, just utf-8 encoding.
        TRY(on_code_point(ch));
    }