Files
ladybird/Userland/Libraries/LibJS/Runtime/PrimitiveString.cpp
Timothy Flynn 0c42aece36 LibJS: Transcode UTF-8 strings to UTF-16 and add UTF-16 accessors
LibJS parses JavaScript as UTF-8, so when creating a string, we must
transcode it to UTF-16 to handle encoded surrogate pairs.

For example, consider the following string:
    "\ud83d\ude00"

The UTF-8 encoding of this surrogate pair is:
    0xf0 0x9f 0x98 0x80

However, LibJS will currently store the two surrogates individually as
UTF-8 encoded bytes, rather than combining the pair:
    0xed 0xa0 0xb8, 0xed 0xb8 0x80

These are not equivalent. So, as String.prototype becomes UTF-16 aware,
this encoding will no longer work for abstractions like strict equality.
2021-07-22 09:10:44 +02:00

2.3 KiB