GoSuB Browser Progress, pt9
Now that the tokenizer is done, I seem to be able to tokenize most of the html files i've tested. That's good!
So i'm spending my time on creating the initial parser of the system. This is quite similar to the tokenizer, except it works on tokens generated by the tokenizer, and not on an input stream. Basically it's just another big state machine that sets flags and stacks and stuff, making sure the token input follows the html5 rules.
This is normally a big hassle to write manually, and this is why most of tokenizers and parsers are actually generated through other systems like bison/antlr etc. In those cases, we define our rules in a custom language that will generate a complete tokenizer / parser source code for us.
Anyway, we are doing things manually, and there are a lot of new things i'm learning about (like, foster parenting, obsure rules where certain tags may or may not be). All this should result in a parser that will output an initial node tree that later will become the DOM tree.
Fortunately, there are some parser tests available in the html5lib-tests suite, so i can check the parser for correctness. For now, it's reading the specs, implementing each state, going over each state a million times to see where I screwed up, fix it, fix it again, refactor it, fix it again, and hopefully in a few days, I've got a working parser.
About jaytaph
Codemuser extraordinaire
Joined: | March 24, 2023 |
Following: | 2 |
Followers: | 2 |
Posts: | 50 |
Comments: | 3 |
Upvotes: | 4 |
Previous musings
- (1) November 2024
- (1) October 2024
- (1) September 2024
- (1) July 2024
- (2) February 2024
- (3) January 2024
- (3) December 2023
- (4) November 2023
- (5) October 2023
- (10) September 2023
- (8) August 2023
- (1) June 2023
- (1) May 2023
- (4) April 2023
- (5) March 2023