Files
ladybird/Userland/Libraries/LibRegex/RegexByteCode.cpp
Timothy Flynn 2212aa2388 LibRegex: Support non-ASCII whitespace characters when matching \s or \S
ECMA-262 defines \s as:

    Return the CharSet containing all characters corresponding to a code
    point on the right-hand side of the WhiteSpace or LineTerminator
    productions.

The LineTerminator production is simply: U+000A, U+000D, U+2028, or
U+2029. Unfortunately there isn't a Unicode property that covers just
those code points.

The WhiteSpace production is: U+0009, U+000B, U+000C, U+FEFF, or any
code point with the Space_Separator general category.

If the Unicode generators are disabled, this will fall back to ASCII
space code points.
2022-02-05 22:30:10 +03:30

39 KiB