Now that the interpreter is unrolled, we can advance the program counter
manually based on the current instruction type.
This makes most instructions a bit smaller. :^)
This commit adds a HANDLE_INSTRUCTION macro that expands to everything
needed to handle a single instruction (invoking the handler function,
checking for exceptions, and advancing the program counter).
This gives a ~15% speed-up on a for loop microbenchmark, and makes
basically everything faster.
Instead of storing a BasicBlock* and forcing the size of Label to be
sizeof(BasicBlock*), we now store the basic block index as a u32.
This means the final version of the bytecode is able to keep labels
at sizeof(u32), shrinking the size of many instructions. :^)
Instead of storing source offsets with each instruction, we now keep
them in a side table in Executable.
This shrinks each instruction by 8 bytes, further improving locality.
Instead of keeping bytecode as a set of disjoint basic blocks on the
malloc heap, bytecode is now a contiguous sequence of bytes(!)
The transformation happens at the end of Bytecode::Generator::generate()
and the only really hairy part is rerouting jump labels.
This required solving a few problems:
- The interpreter execution loop had to change quite a bit, since we
were storing BasicBlock pointers all over the place, and control
transfer was done by redirecting the interpreter's current block.
- Exception handlers & finalizers are now stored per-bytecode-range
in a side table in Executable.
- The interpreter now has a plain program counter instead of a stream
iterator. This actually makes error stack generation a bit nicer
since we just have to deal with a number instead of reaching into
the iterator.
This yields a 25% performance improvement on this microbenchmark:
for (let i = 0; i < 1_000_000; ++i) { }
But basically everything gets faster. :^)
Before this change, all JumpFoo instructions inherited from Jump, which
forced the unconditional Jump to have an unusued "false target" member.
Also, labels were unnecessarily wrapped in Optional<>.
By defining each jump instruction separately, they all shrink in size,
and all ambiguity is removed.
If all lexical declaration use local variables then there is no need
to allocate declarative environment.
With this change we skip ~3x more environment allocations on Github.
We already had fast access to own properties via shape-based IC.
This patch extends the mechanism to properties on the prototype chain,
using the "validity cell" technique from V8.
- Prototype objects now have unique shape
- Each prototype has an associated PrototypeChainValidity
- When a prototype shape is mutated, every prototype shape "below" it
in any prototype chain is invalidated.
- Invalidation happens by marking the validity object as invalid,
and then replacing it with a new validity object.
- Property caches keep a pointer to the last seen valid validity.
If there is no validity, or the validity is invalid, the cache
misses and gets repopulated.
This is very helpful when using JavaScript to access DOM objects,
as we frequently have to traverse 4+ prototype objects before finding
the property we're interested in on e.g EventTarget or Node.
We can now tell the difference between an own property access and a
subsequent (automatic) prototype chain access.
This will be used to implement caching of prototype chain accesses.
If a function has the following properties:
- uses only local variables and registers
- does not use `this`
- does not use `new.target`
- does not use `super`
- does not use direct eval() calls
then it is possible to entirely skip function environment allocation
because it will never be used
This change adds gathering of information whether a function needs to
access `this` from environment and updates `prepare_for_ordinary_call()`
to skip allocation when possible.
For now, this optimisation is too aggressively blocked; e.g. if `this`
is used in a function scope, then all functions in outer scopes have to
allocate an environment. It could be improved in the future, although
this implementation already allows skipping >80% of environment
allocations on Discord, GitHub and Twitter.
This does two things:
* Clear exceptions when transferring control out of a finalizer
Otherwise they would resurface at the end of the next finalizer
(see test the new test case), or at the end of a function
* Pop one scheduled jump when transferring control out of a finalizer
This removes one old FIXME
Before this change both ExecutionContext and CallFrame were created
before executing function/module/script with a couple exceptions:
- executable created for default function argument evaluation has to
run in function's execution context.
- `execute_ast_node()` where executable compiled for ASTNode has to be
executed in running execution context.
This change moves all members previously owned by CallFrame into
ExecutionContext, and makes two exceptions where an executable that does
not have a corresponding execution context saves and restores registers
before running.
Now, all execution state lives in a single entity, which makes it a bit
easier to reason about and opens opportunities for optimizations, such
as moving registers and local variables into a single array.
For this case to work correctly in the current bytecode world:
func(a, a++)
We have to put the function arguments in temporaries instead of allowing
the postfix increment to modify `a` in place.
This fixes a problem where jQuery.each() would skip over items.
The following command was used to clang-format these files:
clang-format-18 -i $(find . \
-not \( -path "./\.*" -prune \) \
-not \( -path "./Base/*" -prune \) \
-not \( -path "./Build/*" -prune \) \
-not \( -path "./Toolchain/*" -prune \) \
-not \( -path "./Ports/*" -prune \) \
-type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")
There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
This allows to skip iterating through all allocated blocks in
`find_min_and_max_block_addresses()`.
With this change `collect_garbage()` in profiles of Discord goes down
from 17% to 8%.
There's no need to capture things as Handle when using HeapFunction.
In this case, it was even creating a strong reference cycle, which ended
up leaking.
currently crashes with an assertion failure in `String::repeated` if
malloc can't serve a `count * input_size` sized request, so add
`String::repeated_with_error` to propagate the error.
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
When running with --log-all-js-exceptions, we will print the message
and backtrace for every single JS exception that is thrown, not just
the ones nobody caught.
This can sometimes be very helpful in debugging sites that swallow
important exceptions.
This fixes the relevant warnings when running LibJSGCVerifier. Note that
the analysis is only performed over LibJS-adjacent code, but could be
performed over the entire codebase. That will have to wait for a future
commit.
This lets us avoid false positives when a GCPtr-wrapped member is only
a weak reference which is automatically updated by the GC when the
member's gc state is updated.