Menu

#1950 CSS computed style too slow for large stylesheets

2.29
pending
RBRi
css (3)
1
2018-03-14
2018-02-10
Alex
No

When using a stylesheets with >3000 rules, computing style for an element iterates all rules.
With a process that requires computed style for multiple elements, this quickly becomes an issue.
In my use case, HtmlSerializer must compute styles for elements to check "display" property if they are block.

Quickfix suggest:
When only a css property value is required, I suggest using a map with <css property=""> key to <rule> value for quick rule access (+ some user accesible method in HTMLElement class) .</rule></css>

Related

Bugs: #1950

Discussion

  • RBRi

    RBRi - 2018-02-11

    Please try the latest Snapshot I did some optimizations already.
    And can you please outline your idea a bit more....

     

    Last edit: RBRi 2018-03-04
  • Alex

    Alex - 2018-02-11

    Looked at CssStyleSheet.java, but didn't found any relevant optimization.

    Property to rule list lookup map idea:
    The HtmlSerializer computes all styles for a given element and gets only the 'display' value.
    Instead of iterating all rules, only the rules containing the 'display' are relevant.
    Thus a map can be computed at stylesheet initialization for quick rules lookup, eg:

    CSSStyleSheet.java
    ...
    Map<String, CSSRuleList> propertyToRules;
    ...
        public void modifyIfNecessary(final ComputedCSSStyleDeclaration style, final Element element, final String pseudoElement, final String propertyName) {
            final CSSRuleList rules = propertyToRules.get(propertyName);
            modifyIfNecessary(style, element, pseudoElement, rules, new HashSet<String>());
        }
    

    Selector optimization idea (some reference)
    Similarly, lookup maps can be computed for each key selector type (id, class, tag, universal).
    Eg, when computing styles for an element, if it has a class, rules are collected from classToRules map, same for id and so forth.
    For compound selectors (descendants, etc), only the last selector (key selector) is used for computing the lookup maps (sel "ul li" - "li" is added to tagToRules map).
    After rules are collected, the whole selectors must be rechecked if they match the html element.

    Both of the above ideas may be merged when computing the 'display' style for an element, first retrieving rules that contain 'display' property, and intersecting them with the rules collected with matching key selectors.

    Of course these optimizations trade memory for performance and may have impact.

    Another idea is to investigate firefox source code for their optimization.

     
    • RBRi

      RBRi - 2018-02-12

      Looked at
      CssStyleSheet.java, but didn't
      found any relevant optimization.

      The optimization was done somewhere else. Why not simply try and see if it helps in your case.

       
      • Alex

        Alex - 2018-02-12

        Tested witn 2.30-SNAPSHOT, and found same performance for computing text with large css stylesheet.
        Computing text is ~150 times faster when ignoring styles.
        Let me know if you want my test source code.

         
        • RBRi

          RBRi - 2018-02-12

          Test code will be great, if you like send it to my privat adress.
          Thanks

          RBRi
          
           

          Last edit: RBRi 2018-03-04
          • Alex

            Alex - 2018-02-12

            Test source code: https://github.com/cdalexndr/HtmlUnitCssExample_1950
            Just run "gradlew test"

             
  • RBRi

    RBRi - 2018-02-17
    • status: open --> accepted
    • assigned_to: RBRi
     
  • RBRi

    RBRi - 2018-02-17

    Ok, looks like we need to renovate in this field a bit. Have done some checks and i think it is worth the effort; but will require some time.
    As first step i have created a fork of the current cssparser and did some cleanup. This will land in the snapshot releases soon.

    The plan ist:
    * more css parser cleanup
    * change the parser to return the selector-parts in reverse order and in a more technical way
    * change htmlunit to be able to deal with the new selectors
    * create some kind of selector index to be able to look at less selectors if we need the style for an element
    * maybe we can move and cache the calculation of the selector specificy to the cssparser

    Hope this will help to improve the performance in many cases.
    Contributions are welcome.

     
  • RBRi

    RBRi - 2018-03-05

    Hi Alex,
    have done some major changes in the css selector handling and finally found some time to use the profiler with your sample.
    Your sample is that slow, because the chaching in HtmlUnit does not work with file content so far. Because of this the css was reparsed at every run and this was the reason for your bad results.
    Hopefully this is fixed now, chaching works for file content also.

    I have also implemented a minimal first version of the selector index. Work fine so far (at least our test suite is happy) and had a minor positive effect for my real world tests (using wetator). If you like you can test the latest snapshot build and/or help with making the index more clever.

    Hope you like it, every feedback is welcome.

    RBRi
    
     
  • RBRi

    RBRi - 2018-03-07

    Have done another update, now this is again notable faster for your sample.
    Waiting for your feedback....

     
  • RBRi

    RBRi - 2018-03-07
    • status: accepted --> pending
     
  • Alex

    Alex - 2018-03-09

    Just downloaded snapshot and found a bug:
    CSSStyleSheet.java (line 1600):

    final String elementName = element.getLocalName();
    

    returns uppercase tag name, but the index contains lowercase tag names, and the following fails and returns no selectors:
    CSSStyleSheetImpl.java (SelectorEntriesIterator ctr):

    sel = (List)index.elementSelectors_.get(elementName);
    

    I will return with more feedback.
    Thanks for working on this issue.

     

    Last edit: Alex 2018-03-09
    • RBRi

      RBRi - 2018-03-11

      Will fix this soon, thanks

      On Fri, 09 Mar 2018 16:06:37 -0000 Alex wrote:

      Just downloaded snapshot and found a bug:
      CSSStyleSheet.java (line 1600):
      ~~~
      final String elementName = element.getLocalName();
      ~~~
      returns uppercase tag name, but the index contains lowercase tag names, and the
      following fails and returns no selectors:
      CSSStyleSheet.java (line 296):
      ~~~
      sel = (List)index.elementSelectors_.get(elementName);
      ~~~

      I will return with more feedback.
      Thanks for working on this issue.


      ** [bugs:#1950] CSS computed style too slow for large stylesheets**

      Status: pending
      Group: 2.29
      Labels: css
      Created: Sat Feb 10, 2018 11:04 PM UTC by Alex
      Last Updated: Wed Mar 07, 2018 11:03 AM UTC
      Owner: RBRi

      When using a stylesheets with >3000 rules, computing style for an element iterates all
      rules.
      With a process that requires computed style for multiple elements, this quickly becomes
      an issue.
      In my use case, HtmlSerializer must compute styles for elements to check "display"
      property if they are block.

      Quickfix suggest:
      When only a css property value is required, I suggest using a map with <css property=""> key
      to <rule> value for quick rule access (+ some user accesible method in HTMLElement
      class) .</rule></css>


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/htmlunit/bugs/1950/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #1950

  • Alex

    Alex - 2018-03-10

    Current optimization lowers search count from 3300 rules to 800 wildcard rules + 400 other rules + element rules.
    Will try to implement property filter and rule intersection with current optimization and come back with a patch. This will greatly benefit HtmlSerializer that only needs 'display' property.

     

    Last edit: Alex 2018-03-10
    • Alex

      Alex - 2018-03-12

      Made the patch with this optimization and some cleanup.
      Computed styles cache now stores selected property, and can add more when they are required or the whole element computes styles.
      Because CSS2Properties contains style logic, I made an unsafe bridge between property value (StyleAttributes.Definition) and the required method to extract it from CSS2Properties and if used incorrectly will cause errors (todo: how to fix this).
      Currently only HtmlSerializer usage.
      Requires merge with the above case-sensitive bug fix (added TODO in code).

      Note that my test output is 5 NYI failing tests, and this patch doesn't add any new tests.

      Hope it's ok.

       
  • RBRi

    RBRi - 2018-03-14

    Thanks will have a look at this later on, at the moment there is too mutch work on my desk.

     
    • Alex

      Alex - 2018-04-24

      Any news regarding my patch? Should you change status from "pending" to not overlook it?

       

Log in to post a comment.