GoSuB Browser Progress, pt3

Created at 2023-08-16 10:23:51 (1 year ago)

Finally done with the named character references. This took too long because I could not get the businesslogic right. Different code bases uses a different way of parsing (incorrect) entities, and the html5 standard isn't always that clear about it.

I've decided to base the code on the golang.org/x/net/html parser, which probably isn't completely correct, but serves most of the testcases I have.

Now I need to fill in the rest of the data states, and actually allowing to emit tokens. I also need to get my head around the fact that Rust isn't a language that deals with memory like go, but is more like C/C++, we you need to manage it (although rust seems to be more capable of detecting issues).

So where in go it might be ok to copy strings around the place, we probably should work with pointers and copy strings at the last moment, especially when dealing with long tokens like TextTokens that can contain a lot of text. In those cases, you want the tokenizer to walk over all chars, but actually create a token at the last moment, or maby better: copy all until we encounter a replacement (like a named entity), add that entity to the list, and continue.. That would minimize the number of copies i guess.. I'm not sure.. i'm just drunk...

rust gosub

About jaytaph

Codemuser extraordinaire

avatar Loves building crazy and insane stuff. Happiest when left alone. All I wanted was a Pepsi, just a Pepsi.
Joined:March 24, 2023
Following:2
Followers:2
Posts:51
Comments:3
Upvotes:4
RSS feed