Andreas Kling
3fefc7f3e9
LibWeb: Tweak CSS parser to swallow backslash-escaped characters
...
This isn't the correct way of doing this, but at least it allows the
parsing to progress a bit further in some cases.
2020-06-25 16:52:38 +02:00
Andreas Kling
4b2ac34725
LibWeb: Move the offset, margin and padding boxes into LayoutStyle
2020-06-24 18:06:21 +02:00
Andreas Kling
5744dd43c5
LibWeb: Remove default Length constructor and add make_auto()/make_px()
...
To prepare for adding an undefined/empty state for Length, let's first
move away from Length() creating an auto value.
2020-06-24 11:08:46 +02:00
Andreas Kling
d0312f6208
LibWeb: Handle empty inputs to the CSS parser
...
Empty inputs -> empty outputs.
2020-06-23 20:06:45 +02:00
Andreas Kling
3a5af6ef61
LibWeb: Remove hacky old ways of running <script> element contents
...
Now that we're using the new HTML parser, we don't have to do the weird
"run the script when inserted into the document, uhh, or when the text
content of the script element changes" dance.
Instead, we just follow the spec, and scripts run the way they should.
2020-06-23 16:45:01 +02:00
Andreas Kling
c33d17d363
LibWeb: Fix tokenization of attributes with URL query strings in them
...
<a href="/foo&=bar"> was being tokenized into <a href="/foo&=bar">.
The spec mentions this but I had overlooked it. The bug happens because
we interpreted the "&" as a named character reference.
2020-06-23 16:45:01 +02:00
Andreas Kling
07d976716f
LibWeb: Remove most uses of the old HTML parser
...
The only remaining client of the old parser is the fragment parser used
by the Element.innerHTML setter. We'll need to implement a bit more
stuff in the new parser before we can switch that over.
2020-06-21 22:29:05 +02:00
Andreas Kling
dd7cd92de4
LibWeb: Fix two typo bugs in table parsing
...
These were flushed out by the earlier fix to "table scope". Without the
bad implementation of table scopes, ACID2 stopped parsing correctly.
2020-06-21 17:49:02 +02:00
Andreas Kling
15b5dfc794
LibWeb: A </table> inside <tbody> is not a parse error
...
This condition was backwards. Fixes parsing of google.com.
2020-06-21 17:42:00 +02:00
Andreas Kling
1c2b6b074e
LibWeb: Fix misunderstood implementation of "table" and "select" scopes
...
These "stack of open elements" scopes are not supposed to include the
base list of element types.
2020-06-21 17:42:00 +02:00
Andreas Kling
966bc05fef
LibWeb: Implement more of the foster parenting algorithm in the parser
2020-06-21 17:42:00 +02:00
stelar7
5eb39a5f61
LibWeb: Update parser with more insertion modes :^)
...
Implements handling of InHeadNoScript, InSelectInTable, InTemplate,
InFrameset, AfterFrameset, and AfterAfterFrameset.
2020-06-21 10:13:31 +02:00
Andreas Kling
6242e029ed
LibWeb: Make Element::tag_name() return a const FlyString&
...
The more generic virtual variant is renamed to node_name() and now only
Element has tag_name(). This removes a huge amount of String ctor/dtor
churn in selector matching.
2020-06-16 19:09:14 +02:00
Andreas Kling
49cd03be95
LibWeb: Fix broken parsing of </form> during "in body" insertion
2020-06-15 20:31:19 +02:00
Andreas Kling
2f26d4c6a1
LibWeb: Fix broken parsing of </select> during "in select" insertion
2020-06-15 19:57:20 +02:00
Andreas Kling
17d26b92f8
LibWeb: Just ignore <script> elements that failed to load the script
...
We're never gonna be able to run them if we can't load them so just
let it go.
2020-06-15 18:37:48 +02:00
Luke
a01478c858
LibWeb: Fully implement HTML parser "in table" insertion mode
...
Also fixes some little mistakes in the "in body" insertion mode
that I found whilst cross-referencing.
2020-06-14 14:07:07 +02:00
Luke
6532c1e2fa
LibWeb: Implement HTML parser "in column group" insertion mode
2020-06-14 14:07:07 +02:00
Luke
2241b09cd0
LibWeb: Implement HTML parser "in caption" insertion mode
2020-06-14 14:07:07 +02:00
Luke
a1838f676e
LibWeb: Implement all CDATA tokenizer states
...
Even though we haven't implemented any switches to these states yet,
we may as well have them ready for when we do implement the switches.
2020-06-14 13:47:19 +02:00
Luke
821312729a
LibWeb: Fully implement all DOCTYPE tokenizer states
...
Also fixes TagOpen having a seperate emit and reconsume in
ANYTHING_ELSE.
2020-06-14 13:47:19 +02:00
Luke
ab1df177d8
LibWeb: Fully implement all comment tokenizer states
2020-06-14 13:47:19 +02:00
Andreas Kling
47df0cbbc8
LibWeb: Fix broken tokenization of hexadecimal character references
...
We were interpreting 'A'-'F' as decimal digits which didn't work right.
2020-06-13 13:46:12 +02:00
Andreas Kling
483b371a7b
LibWeb: Parse and match the :visited pseudo-class (always fails)
...
If we don't do this, something like "a:visited" is parsed as "a" which
may then take precedence over a previous "a:link" etc.
2020-06-13 00:23:30 +02:00
Andreas Kling
fdfda6dec2
AK: Make string-to-number conversion helpers return Optional
...
Get rid of the weird old signature:
- int StringType::to_int(bool& ok) const
And replace it with sensible new signature:
- Optional<int> StringType::to_int() const
2020-06-12 21:28:55 +02:00
Andreas Kling
bd33bfd120
LibWeb: Whine about unrecognized CSS properties in debug log
2020-06-12 14:15:55 +02:00
Andreas Kling
03da686aa2
LibWeb: Ignore backslashes (\) in attribute selectors
...
This makes us at least parse selectors like [foo=bar\ baz] correctly.
The current solution here is quite hackish but the real fix will come
when we implement a spec-compliant CSS parser.
2020-06-10 15:50:07 +02:00
Andreas Kling
65c4e5cacf
LibWeb: Parse and match basic "contains" attribute selectors (~=)
2020-06-10 15:43:41 +02:00
Andreas Kling
e836f09094
LibWeb: Fix parser interpreting """ as """
...
There was a logic mistake in the entity parser that chose the shorter
matching entity instead of the longer. Fix this and make the entity
lists constexpr while we're here.
2020-06-10 10:34:28 +02:00
Andreas Kling
9b17bf3dcd
LibWeb: Use HTML::TagNames globals in the new HTML parser
2020-06-07 23:53:16 +02:00
Andreas Kling
1d94ca7cfc
LibWeb: Fix codepoint_from_entity() never returning an error
...
If we don't find a matching entity, return an empty Optional.
2020-06-07 19:13:56 +02:00
Andreas Kling
ab4c03ce2d
LibWeb: Fix tokenizer swallowing an extra token after a named entity
2020-06-07 19:09:03 +02:00
Andreas Kling
731685468a
LibWeb: Start fleshing out support for relative CSS units
...
This patch introduces support for more than just "absolute px" units in
our Length class. It now also supports "em" and "rem", which are units
relative to the font-size of the current layout node and the <html>
element's layout node respectively.
2020-06-07 17:55:46 +02:00
Andreas Kling
be6abce44f
LibWeb: Handle EOF tokens during "text" insertion
2020-06-06 16:36:18 +02:00
Luke
61d5bec739
LibWeb: Fully implement all script tokenizer states
...
Also fixes RAWTEXTLessThanSign having a separate emit and reconsume.
2020-06-06 09:55:15 +02:00
Andreas Kling
3337365000
LibWeb: Parse param/source/track start tags during "in body" insertion
2020-06-05 21:59:46 +02:00
Andreas Kling
b4591f0037
LibWeb: Fix parsing of "<textarea></textarea>"
...
When handling a "textarea" start tag, we have to ignore the next token
if it's an LF ('\n'). However, we were not switching the tokenizer
state before fetching the lookahead token, and this caused us to force
the tokenizer into the RCDATA state too late, effectively getting it
stuck in that state for way longer than it should be.
Fixes #2508 .
2020-06-05 12:05:42 +02:00
Andreas Kling
4e71684a3a
LibWeb: Fix missing tokenizer state change in RCDATALessThanSign
...
We can't RECONSUME_IN after we've used EMIT_CHARACTER since we'll have
returned from the function.
2020-06-05 12:02:30 +02:00
Andreas Kling
b59f4632d5
LibWeb: Unbreak character reference and DOCTYPE parsing post-UTF-8
...
Oops, these were still using the byte-offset cursor. My goodness is it
unergonomic to index into UTF-8 strings, but Dr. Bugaev says it's good.
There is lots of room for improvement here. Just like the rest of the
tokenizer and parser. We'll have to do a few optimization passes over
them once they mature.
2020-06-04 22:09:36 +02:00
Andreas Kling
b6288163f1
LibWeb: Make the new HTML parser parse input as UTF-8
...
We already convert the input to UTF-8 before starting the tokenizer,
so all this patch had to do was switch the tokenizer to use an Utf8View
for its input (and to emit 32-bit codepoints.)
2020-06-04 21:12:17 +02:00
Andreas Kling
19190267a6
LibWeb: Fix incorrectly consumed characters after reference tokens
...
The NumericCharacterReferenceEnd tokenizer state should not advance
the input stream.
2020-06-04 16:49:21 +02:00
Andreas Kling
ca33bc7895
LibWeb: Fix tokenization of attributes with empty attributes
...
We were neglecting to emit start tags for tags where the last attribute
had no value.
Also fix a parse error TODO that I hit while looking at this.
2020-06-04 12:00:09 +02:00
Kyle McLean
b9549078cc
LibWeb: Handle "html" end tag during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
a3bf3a5d68
LibWeb: Handle "xmp" start tag during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
c70bd0ba58
LibWeb: Handle "nobr" start tag during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
22521e57fd
LibWeb: Handle "form" end tag during "in body" if stack of open elements does not contain "template"
2020-06-04 09:09:33 +02:00
Kyle McLean
4edd0643a6
LibWeb: Handle NULL character during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
5e3972a946
LibWeb: Parse "body" end tags during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
1ad81e4833
LibWeb: Parse "br" end tags during "in body"
2020-06-04 09:09:33 +02:00
Kyle McLean
9fca4b56d3
LibWeb: Parse end tags for "applet", "marquee", and "object" during "in body"
2020-06-04 09:09:33 +02:00