GoSuB Browser Progress, pt1
A few days in, and lots of things learned. Rust is less explicit and relies more on syntax stuff than for instance go. Which means it can be hard to figure out what something does, or how to do something. Also, the lack of null is still strange to me. I'm still trying to figure out the correct rust idioms on certain problems (read a character from a stream, so some cleanup when nothing is found etc, seems like a lot of match() Ok() Err() boilerplate.
I do the coding with CLion. It seems that this IDE is the only one that actually can do step debugging but it's very slow in loading its variables. Probably it doesn't help running it on WSL.
The code itself isn't much as of yet, but I've managed to read a html-stream from a file into my own inputstream object (is that the correct rust-name? I guess not). This stream is a stream of characters based on the actual encoding given. Later on, there should be a way to detect the encoding based on the stream. If you change the stream from UTF8 to ASCII, all utf8 characters that span multiple bytes are replaced with a '?' instead, so at least that part is seemlingy working. For now I don't need to worry about the streams anymore and can focus on the tokenizing and parsing of the html stream.
I've seen that gecko uses their own encoding system that is highly optimized. I'm not planning on using that for now, but maybe in a later stage we can take a look into that.
I'm currently in the process of tokenizing characters from the given input stream. This procedure is well documented by the html(5) standards: https://html.spec.whatwg.org/multipage/parsing.html#the-input-byte-stream.
For now I will focus on html5 / utf8 inputs, which probably spans 95% of the sites. I'm not even sure if a "real" browser should support any other things actually. I know that there a still a lot of sites not utf8, or not html5 compliant, but a browser should probably warn the user that the site is old. If I have to spend the majority of my time in fixing other peoples issues, it's not worth it to be honest. If your site isn't compliant, we return an error. Let's get rid of old sites that are non-compliant.
Also, I've decided on the name GoSuB for now, as ChatGPT said that it would stand for: Gateway to Optimized Searching and Unlimited Browsing.
About jaytaph
Codemuser extraordinaire
Joined: | March 24, 2023 |
Following: | 2 |
Followers: | 2 |
Posts: | 51 |
Comments: | 3 |
Upvotes: | 4 |
Previous musings
- (1) December 2024
- (1) November 2024
- (1) October 2024
- (1) September 2024
- (1) July 2024
- (2) February 2024
- (3) January 2024
- (3) December 2023
- (4) November 2023
- (5) October 2023
- (10) September 2023
- (8) August 2023
- (1) June 2023
- (1) May 2023
- (4) April 2023
- (5) March 2023