Menu

#6 Efficiency: limit parsing in JML, lazily process files

open
nobody
6
2013-10-27
2002-12-19
David Cok
No

The jml tools might run more rapidly when operating on
individual files by being more selective about the
degree of parsing or checking of referenced files that
they do.

For example, suppose T1 uses T2 in its method
interfaces, and T2 uses T4 in its interfaces but only
uses T3 in the body of a method. Then compiling T1
should not require that T3 be parsed and checked. [javac
does compile and generate class files for all referenced
files, including T3 in this scenario.]

Taking this approach with respect to class or model
references within the specifications would also limit
the number of classes needing to be parsed and
typechecked when processing an individual file. Since
these steps are relatively time-consuming, the user-
experience in using jml tools could potentially be
improved.

This would imply that a file could be parsed in a
"complete" mode or in an "interface-only" mode or in an
"interface + specification" mode, adding some
complexity, but perhaps providing some productivity
improvement. There would be no productivity improvement
when running jml on every file in a project at once.
Note that the esc/java documentation asserts that it
utilizes such a distinction (at least between complete
and interface parsing).

cf. #656252

Discussion

  • Joseph Kiniry

    Joseph Kiniry - 2002-12-19

    Logged In: YES
    user_id=42011

    As I discussed with Gary recently, I think that the
    user-perceived performance of the JML tool suite is
    critical. Currently, when using JML on non-trivial code
    sets, the typechecking and compilation time is unusable.

    I've started doing some profiling to determine what the
    primary culprits are, but initial results indicate that it
    simply comes down to slow over-parsing, as indicated by
    David's bug report here.

    For example, if I process only a single package "P" using
    jmlc, then I think that only the classes within P and its
    subpackages should ever be parsed by jmlc/mjc, presuming all
    dependent classes are already compiled by other means.

    When I have more details on profiling I will post there in
    this forum.

     
  • Gary T. Leavens

    Gary T. Leavens - 2003-02-13

    Logged In: YES
    user_id=633675

    Some later results indicate a significant amount of time is
    due to the use of Vector instead of ArrayList, as we are
    paying for synchronization overhead.

     
  • Gary T. Leavens

    Gary T. Leavens - 2003-02-13
    • labels: --> 477027
     
  • Gary T. Leavens

    Gary T. Leavens - 2003-04-22
    • labels: 477027 -->
     
  • Gary T. Leavens

    Gary T. Leavens - 2003-04-22
    • summary: efficiency: limited parsing in JML --> Efficiency: limit parsing in JML
    • milestone: --> 296847
    • labels: --> checker (i.e., the jml tool)
     
  • Gary T. Leavens

    Gary T. Leavens - 2003-04-22
    • milestone: 296847 --> self_reported
     
  • Gary T. Leavens

    Gary T. Leavens - 2003-04-29
    • priority: 5 --> 6
    • summary: Efficiency: limit parsing in JML --> Efficiency: limit parsing in JML, lazily process files
    • assigned_to: nobody --> cclifton
     
  • Gary T. Leavens

    Gary T. Leavens - 2003-04-29

    Logged In: YES
    user_id=633675

    Clyde Ruby and I were discussing why the JML type checker
    parses so many files and we came across the same idea
    that David has suggested in his initial comment. If JML were
    to process files more lazily, then it could completely avoid
    parsing (and checking) many files. That is, if T1 uses T2 in
    its interfaces, then we don't parse T2 until we need some
    information from it (such as the type of one of its methods).
    Or even if we do process T2, we shouldn't process the files it
    uses (and so on) as we do currently. In a word, the current
    checker uses eager evaluation to parse files; we should try to
    avoid work by not processing files at all if possible.

    Curt, I'm assigning this to you so you can give Clyde some
    tips (in a comment here) on how this might be done, if you
    think it's reasonable. After you comment, assign this to
    Clyde. (Of course, anyone else is welcome to volunteer for
    this.)

     
  • Curtis Clifton

    Curtis Clifton - 2003-05-01
    • assigned_to: cclifton --> cruby
     
  • Curtis Clifton

    Curtis Clifton - 2003-05-01

    Logged In: YES
    user_id=635640

    Gary and I discussed this over lunch yesterday. This note captures the
    highlights of our discussion. I am assigning this task to Clyde as
    requested.

    Each pass of the compiler assumes that all other types being processed
    by the compiler have been processed through at least the preceeding
    pass. If a type is referenced that has not been loaded, then that type is
    loaded (always from source in the JML tools) and processed up through
    the preceeding pass. The newly loaded type is the processed on all
    subsequent passes. It seems exceedingly difficult to change this
    invariant as it is scattered through the entire code base.

    The current implementation of the JML checker already avoids the
    problem described in the original features request: a type (like T3 above)
    that is only referenced within the body of a method of a recursively
    checked class (like T2) will not be processed. However, it may also be
    possible to avoid full-blown parsing and interface checking for a type
    (like T4) that is referenced in the interface of a recursively checked class
    (like T2).

    If it can be implemented, it is likely that this optimization will be very
    effective. The primary benefit comes from the secondary effects---most
    types named in T1's interface will need to have their interfaces checked,
    but most additional types named in T2's interface will not have to be
    processed at all. This should result in significant pruning of the graph
    of processed files.

    To achieve this optimization, we would need a utility that did a quick-
    and-dirty parse of the file that should declare T4. This parse would only
    look for a package clause and top-level class (and generic function)
    declarations. If the file was found to declare T4 then the use of T4 in
    the interface of T2 would be assumed to be valid; additional parsing and
    interface checking of T4 would only be required if the file named on the
    command-line (T1) relied on the interface of T4. On the other hand, if
    the file was not found to declare T4 via the quick-and-dirty parse, we
    would probably want to do a full blown parse on it, because T4 might be
    a nested type, for example.

    The separation of processing modes between quick-and-dirty parsing
    and full interface checking might be implemented by having the
    JmlContext classes implement different behavior for catchUp when
    contexts are created for recursively processed compilation units (like T2)
    vs. directly processed ones (like T1). Another challenge will be keeping
    coherent information in the caches; we would probably need a new
    cache for quick-and-dirty information that is separate from the current
    caches.

    Finally, I wonder if there are some cases where we can ignore all JML
    annotations in recursively processed files. If so we could let mjc handle
    these types, reading from bytecode if available (and up-to-date) or
    parsing just the MultiJava code and treating JML annotations as
    comments.

     
  • Gary T. Leavens

    Gary T. Leavens - 2006-05-18
    • assigned_to: cruby --> nobody
     

Log in to post a comment.