GoSuB Browser Progress, pt9
Now that the tokenizer is done, I seem to be able to tokenize most of the html files i've tested. That's good!
So i'm spending my time on creating the initial parser of the system. This is quite similar to the tokenizer, except it works on tokens generated by the tokenizer, and not on an input stream. Basically it's just another big state machine that sets flags and stacks and stuff, making sure the token input follows the html5 rules.
This is normally a big hassle to write manually, and this is why most of tokenizers and parsers are actually generated through other systems like bison/antlr etc. In those cases, we define our rules in a custom language that will generate a complete tokenizer / parser source code for us.
Anyway, we are doing things manually, and there are a lot of new things i'm learning about (like, foster parenting, obsure rules where certain tags may or may not be). All this should result in a parser that will output an initial node tree that later will become the DOM tree.
Fortunately, there are some parser tests available in the html5lib-tests suite, so i can check the parser for correctness. For now, it's reading the specs, implementing each state, going over each state a million times to see where I screwed up, fix it, fix it again, refactor it, fix it again, and hopefully in a few days, I've got a working parser.
|March 24, 2023