GoSuB Browser Progress, pt23

Created at 2023-11-06 08:04:27 (5 months ago)

We've finally passed all the tree-construction parser tests!

Well, almost actually. We're at 99.89%, since there are 4 tests remaining for which we need to implement a javascript scripting engine like v8 or spidermonkey. There is still a long way before we can actually do this, so for now the parser as functional entity is done.

However, we still have a lot of tinkering to do. I've tried to parser the top 100K (taken from majestic millions) domains and see how well the parser would behave. We already hit problems within the first 10 domains :-)

It turns out that apple.com site has a large blob of javascript data. Our tokenizer doesn't seem to handle this well: each character gets tokenized separately and emitted to the parser. There, it will flow through the parser and ends up being merged into a temporary store, which gets converted to a text blob once the script endtag is found.

This is a lot of overhead that we do not want. Therefore, we are looking into making the tokenizer "greedy": if it can add more characters to the current token, then it will. This will result in only having one single token emitted to the parser (containing the whole script blob) and this will save a lot of time. Both the tokenizer and parser are ready to receive multichars tokens since this was the original mode of working. It has changed because there were some edge-cases we encountered where we need single newlines or spaces. These edge-cases must be dealt with separately but we might be able to get that running correctly in the tokenizer.

In the meantime a lot of work is done with the CSS3 parser, and we are doing some great work on getting initial functionality ready which we can connect to the DOM interface but again, we can't really do anything with this before we have a script engine implemented.

gosub rust

About jaytaph

Codemuser extraordinaire

avatar Loves building crazy and insane stuff. Happiest when left alone. All I wanted was a Pepsi, just a Pepsi.
Joined:March 24, 2023
RSS feed