GoSuB Browser Progress, pt23
We've finally passed all the tree-construction parser tests!
Well, almost actually. We're at 99.89%, since there are 4 tests remaining for which we need to implement a javascript scripting engine like v8 or spidermonkey. There is still a long way before we can actually do this, so for now the parser as functional entity is done.
However, we still have a lot of tinkering to do. I've tried to parser the top 100K (taken from majestic millions) domains and see how well the parser would behave. We already hit problems within the first 10 domains :-)
It turns out that apple.com site has a large blob of javascript data. Our tokenizer doesn't seem to handle this well: each character gets tokenized separately and emitted to the parser. There, it will flow through the parser and ends up being merged into a temporary store, which gets converted to a text blob once the script endtag is found.
This is a lot of overhead that we do not want. Therefore, we are looking into making the tokenizer "greedy": if it can add more characters to the current token, then it will. This will result in only having one single token emitted to the parser (containing the whole script blob) and this will save a lot of time. Both the tokenizer and parser are ready to receive multichars tokens since this was the original mode of working. It has changed because there were some edge-cases we encountered where we need single newlines or spaces. These edge-cases must be dealt with separately but we might be able to get that running correctly in the tokenizer.
In the meantime a lot of work is done with the CSS3 parser, and we are doing some great work on getting initial functionality ready which we can connect to the DOM interface but again, we can't really do anything with this before we have a script engine implemented.
About jaytaph
Codemuser extraordinaire
Joined: | March 24, 2023 |
Following: | 2 |
Followers: | 2 |
Posts: | 52 |
Comments: | 3 |
Upvotes: | 4 |
Previous musings
- (2) December 2024
- (1) November 2024
- (1) October 2024
- (1) September 2024
- (1) July 2024
- (2) February 2024
- (3) January 2024
- (3) December 2023
- (4) November 2023
- (5) October 2023
- (10) September 2023
- (8) August 2023
- (1) June 2023
- (1) May 2023
- (4) April 2023
- (5) March 2023