GoSuB Browser Progress, pt3
Finally done with the named character references. This took too long because I could not get the businesslogic right. Different code bases uses a different way of parsing (incorrect) entities, and the html5 standard isn't always that clear about it.
I've decided to base the code on the golang.org/x/net/html parser, which probably isn't completely correct, but serves most of the testcases I have.
Now I need to fill in the rest of the data states, and actually allowing to emit tokens. I also need to get my head around the fact that Rust isn't a language that deals with memory like go, but is more like C/C++, we you need to manage it (although rust seems to be more capable of detecting issues).
So where in go it might be ok to copy strings around the place, we probably should work with pointers and copy strings at the last moment, especially when dealing with long tokens like TextTokens that can contain a lot of text. In those cases, you want the tokenizer to walk over all chars, but actually create a token at the last moment, or maby better: copy all until we encounter a replacement (like a named entity), add that entity to the list, and continue.. That would minimize the number of copies i guess.. I'm not sure.. i'm just drunk...
About jaytaph
Codemuser extraordinaire
Joined: | March 24, 2023 |
Following: | 2 |
Followers: | 2 |
Posts: | 53 |
Comments: | 3 |
Upvotes: | 4 |
Previous musings
- (1) January 2025
- (2) December 2024
- (1) November 2024
- (1) October 2024
- (1) September 2024
- (1) July 2024
- (2) February 2024
- (3) January 2024
- (3) December 2023
- (4) November 2023
- (5) October 2023
- (10) September 2023
- (8) August 2023
- (1) June 2023
- (1) May 2023
- (4) April 2023
- (5) March 2023