Files
ladybird/Userland/Libraries/LibUnicode/Segmentation.cpp
Timothy Flynn fa96811a22 LibUnicode: Skip over emoji sequences in grapheme boundary segmentation
Emoji sequences in the grapheme segmentation spec are a bit tricky:

    \p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic}

Our current strategy of tracking a boolean to indicate if we are in an
emoji sequence was causing us to break up emoji made of multiple sub-
sequences. For example, in the "family: man, woman, girl, boy" sequence:

    U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466

We would break at indices 0 (correctly) and 6 (incorrectly).

Instead of tracking a boolean, it's quite a bit simpler to reason about
emoji sequences by just skipping past them entirely. Note that in cases
like the above emoji, we skip one sub-sequence at a time.
2023-02-25 22:23:39 +01:00

18 KiB