Low Output Latency streaming HTML rewriter/parser with CSS-selector based API. It is designed to modify HTML on the fly with minimal buffering. It can quickly handle very large documents, and operate in environments with limited memory resources. The crate serves as a back-end for the HTML rewriting functionality of Cloudflare Workers, but can be used as a standalone library with a convenient API for a wide variety of HTML rewriting/analysis tasks. The parser switches back to the tag scanner as soon as input leaves the scope of all selector matches. The tag scanner may also sometimes switch the parser to the Lexer - if it requires additional tag information for the parsing feedback simulation. Having two different parser implementations for the same grammar will increase development costs and is error-prone due to implementation inconsistencies. We minimize these risks by implementing a small Rust macro-based DSL which is similar in spirit to Ragel.
Features
- Regular full parser, that produces output for all types of content that it encounters
- Looks for start and end tags and skips parsing the rest of the content
- LOL HTML’s tag scanner is typically twice as fast as LazyHTML and the lexer has comparable performance
- Each component is easy to match having a start tag token
- The crate serves as a back-end for the HTML rewriting functionality of Cloudflare Workers
- It is designed to modify HTML on the fly with minimal buffering