When using a stylesheets with >3000 rules, computing style for an element iterates all rules.
With a process that requires computed style for multiple elements, this quickly becomes an issue.
In my use case, HtmlSerializer must compute styles for elements to check "display" property if they are block.
Quickfix suggest:
When only a css property value is required, I suggest using a map with <css property=""> key to <rule> value for quick rule access (+ some user accesible method in HTMLElement class) .</rule></css>
Please try the latest Snapshot I did some optimizations already.
And can you please outline your idea a bit more....
Last edit: RBRi 2018-03-04
Looked at CssStyleSheet.java, but didn't found any relevant optimization.
Property to rule list lookup map idea:
The HtmlSerializer computes all styles for a given element and gets only the 'display' value.
Instead of iterating all rules, only the rules containing the 'display' are relevant.
Thus a map can be computed at stylesheet initialization for quick rules lookup, eg:
Selector optimization idea (some reference)
Similarly, lookup maps can be computed for each key selector type (id, class, tag, universal).
Eg, when computing styles for an element, if it has a class, rules are collected from classToRules map, same for id and so forth.
For compound selectors (descendants, etc), only the last selector (key selector) is used for computing the lookup maps (sel "ul li" - "li" is added to tagToRules map).
After rules are collected, the whole selectors must be rechecked if they match the html element.
Both of the above ideas may be merged when computing the 'display' style for an element, first retrieving rules that contain 'display' property, and intersecting them with the rules collected with matching key selectors.
Of course these optimizations trade memory for performance and may have impact.
Another idea is to investigate firefox source code for their optimization.
The optimization was done somewhere else. Why not simply try and see if it helps in your case.
Tested witn 2.30-SNAPSHOT, and found same performance for computing text with large css stylesheet.
Computing text is ~150 times faster when ignoring styles.
Let me know if you want my test source code.
Test code will be great, if you like send it to my privat adress.
Thanks
Last edit: RBRi 2018-03-04
Test source code: https://github.com/cdalexndr/HtmlUnitCssExample_1950
Just run "gradlew test"
Ok, looks like we need to renovate in this field a bit. Have done some checks and i think it is worth the effort; but will require some time.
As first step i have created a fork of the current cssparser and did some cleanup. This will land in the snapshot releases soon.
The plan ist:
* more css parser cleanup
* change the parser to return the selector-parts in reverse order and in a more technical way
* change htmlunit to be able to deal with the new selectors
* create some kind of selector index to be able to look at less selectors if we need the style for an element
* maybe we can move and cache the calculation of the selector specificy to the cssparser
Hope this will help to improve the performance in many cases.
Contributions are welcome.
Hi Alex,
have done some major changes in the css selector handling and finally found some time to use the profiler with your sample.
Your sample is that slow, because the chaching in HtmlUnit does not work with file content so far. Because of this the css was reparsed at every run and this was the reason for your bad results.
Hopefully this is fixed now, chaching works for file content also.
I have also implemented a minimal first version of the selector index. Work fine so far (at least our test suite is happy) and had a minor positive effect for my real world tests (using wetator). If you like you can test the latest snapshot build and/or help with making the index more clever.
Hope you like it, every feedback is welcome.
Have done another update, now this is again notable faster for your sample.
Waiting for your feedback....
Just downloaded snapshot and found a bug:
CSSStyleSheet.java (line 1600):
returns uppercase tag name, but the index contains lowercase tag names, and the following fails and returns no selectors:
CSSStyleSheetImpl.java (SelectorEntriesIterator ctr):
I will return with more feedback.
Thanks for working on this issue.
Last edit: Alex 2018-03-09
Will fix this soon, thanks
On Fri, 09 Mar 2018 16:06:37 -0000 Alex wrote:
Related
Bugs: #1950
Current optimization lowers search count from 3300 rules to 800 wildcard rules + 400 other rules + element rules.
Will try to implement property filter and rule intersection with current optimization and come back with a patch. This will greatly benefit HtmlSerializer that only needs 'display' property.
Last edit: Alex 2018-03-10
Made the patch with this optimization and some cleanup.
Computed styles cache now stores selected property, and can add more when they are required or the whole element computes styles.
Because
CSS2Propertiescontains style logic, I made an unsafe bridge between property value (StyleAttributes.Definition) and the required method to extract it fromCSS2Propertiesand if used incorrectly will cause errors (todo: how to fix this).Currently only HtmlSerializer usage.
Requires merge with the above case-sensitive bug fix (added TODO in code).
Note that my test output is 5 NYI failing tests, and this patch doesn't add any new tests.
Hope it's ok.
Thanks will have a look at this later on, at the moment there is too mutch work on my desk.
Any news regarding my patch? Should you change status from "pending" to not overlook it?