Java Modeling Language (JML) / Feature Requests / #6 Efficiency: limit parsing in JML, lazily process files

Joseph Kiniry - 2002-12-19

Logged In: YES
user_id=42011

As I discussed with Gary recently, I think that the
user-perceived performance of the JML tool suite is
critical. Currently, when using JML on non-trivial code
sets, the typechecking and compilation time is unusable.

I've started doing some profiling to determine what the
primary culprits are, but initial results indicate that it
simply comes down to slow over-parsing, as indicated by
David's bug report here.

For example, if I process only a single package "P" using
jmlc, then I think that only the classes within P and its
subpackages should ever be parsed by jmlc/mjc, presuming all
dependent classes are already compiled by other means.

When I have more details on profiling I will post there in
this forum.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary T. Leavens - 2003-02-13

Logged In: YES
user_id=633675

Some later results indicate a significant amount of time is
due to the use of Vector instead of ArrayList, as we are
paying for synchronization overhead.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary T. Leavens - 2003-02-13

labels: --> 477027
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary T. Leavens - 2003-04-22

labels: 477027 -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary T. Leavens - 2003-04-22

summary: efficiency: limited parsing in JML --> Efficiency: limit parsing in JML

milestone: --> 296847

labels: --> checker (i.e., the jml tool)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary T. Leavens - 2003-04-22

milestone: 296847 --> self_reported
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary T. Leavens - 2003-04-29

priority: 5 --> 6

summary: Efficiency: limit parsing in JML --> Efficiency: limit parsing in JML, lazily process files

assigned_to: nobody --> cclifton
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary T. Leavens - 2003-04-29

Logged In: YES
user_id=633675

Clyde Ruby and I were discussing why the JML type checker
parses so many files and we came across the same idea
that David has suggested in his initial comment. If JML were
to process files more lazily, then it could completely avoid
parsing (and checking) many files. That is, if T1 uses T2 in
its interfaces, then we don't parse T2 until we need some
information from it (such as the type of one of its methods).
Or even if we do process T2, we shouldn't process the files it
uses (and so on) as we do currently. In a word, the current
checker uses eager evaluation to parse files; we should try to
avoid work by not processing files at all if possible.

Curt, I'm assigning this to you so you can give Clyde some
tips (in a comment here) on how this might be done, if you
think it's reasonable. After you comment, assign this to
Clyde. (Of course, anyone else is welcome to volunteer for
this.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Curtis Clifton - 2003-05-01

assigned_to: cclifton --> cruby
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Curtis Clifton - 2003-05-01

Logged In: YES
user_id=635640

Gary and I discussed this over lunch yesterday. This note captures the
highlights of our discussion. I am assigning this task to Clyde as
requested.

Each pass of the compiler assumes that all other types being processed
by the compiler have been processed through at least the preceeding
pass. If a type is referenced that has not been loaded, then that type is
loaded (always from source in the JML tools) and processed up through
the preceeding pass. The newly loaded type is the processed on all
subsequent passes. It seems exceedingly difficult to change this
invariant as it is scattered through the entire code base.

The current implementation of the JML checker already avoids the
problem described in the original features request: a type (like T3 above)
that is only referenced within the body of a method of a recursively
checked class (like T2) will not be processed. However, it may also be
possible to avoid full-blown parsing and interface checking for a type
(like T4) that is referenced in the interface of a recursively checked class
(like T2).

If it can be implemented, it is likely that this optimization will be very
effective. The primary benefit comes from the secondary effects---most
types named in T1's interface will need to have their interfaces checked,
but most additional types named in T2's interface will not have to be
processed at all. This should result in significant pruning of the graph
of processed files.

To achieve this optimization, we would need a utility that did a quick-
and-dirty parse of the file that should declare T4. This parse would only
look for a package clause and top-level class (and generic function)
declarations. If the file was found to declare T4 then the use of T4 in
the interface of T2 would be assumed to be valid; additional parsing and
interface checking of T4 would only be required if the file named on the
command-line (T1) relied on the interface of T4. On the other hand, if
the file was not found to declare T4 via the quick-and-dirty parse, we
would probably want to do a full blown parse on it, because T4 might be
a nested type, for example.

The separation of processing modes between quick-and-dirty parsing
and full interface checking might be implemented by having the
JmlContext classes implement different behavior for catchUp when
contexts are created for recursively processed compilation units (like T2)
vs. directly processed ones (like T1). Another challenge will be keeping
coherent information in the caches; we would probably need a new
cache for quick-and-dirty information that is separate from the current
caches.

Finally, I wonder if there are some cases where we can ignore all JML
annotations in recursively processed files. If so we could let mjc handle
these types, reading from bytecode if available (and up-to-date) or
parsing just the MultiJava code and treating JML annotations as
comments.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gary T. Leavens - 2006-05-18

assigned_to: cruby --> nobody
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Efficiency: limit parsing in JML, lazily process files

Searches

Help

#6 Efficiency: limit parsing in JML, lazily process files

Discussion