The jml tools might run more rapidly when operating on
individual files by being more selective about the
degree of parsing or checking of referenced files that
they do.
For example, suppose T1 uses T2 in its method
interfaces, and T2 uses T4 in its interfaces but only
uses T3 in the body of a method. Then compiling T1
should not require that T3 be parsed and checked. [javac
does compile and generate class files for all referenced
files, including T3 in this scenario.]
Taking this approach with respect to class or model
references within the specifications would also limit
the number of classes needing to be parsed and
typechecked when processing an individual file. Since
these steps are relatively time-consuming, the user-
experience in using jml tools could potentially be
improved.
This would imply that a file could be parsed in a
"complete" mode or in an "interface-only" mode or in an
"interface + specification" mode, adding some
complexity, but perhaps providing some productivity
improvement. There would be no productivity improvement
when running jml on every file in a project at once.
Note that the esc/java documentation asserts that it
utilizes such a distinction (at least between complete
and interface parsing).
cf. #656252
Logged In: YES
user_id=42011
As I discussed with Gary recently, I think that the
user-perceived performance of the JML tool suite is
critical. Currently, when using JML on non-trivial code
sets, the typechecking and compilation time is unusable.
I've started doing some profiling to determine what the
primary culprits are, but initial results indicate that it
simply comes down to slow over-parsing, as indicated by
David's bug report here.
For example, if I process only a single package "P" using
jmlc, then I think that only the classes within P and its
subpackages should ever be parsed by jmlc/mjc, presuming all
dependent classes are already compiled by other means.
When I have more details on profiling I will post there in
this forum.
Logged In: YES
user_id=633675
Some later results indicate a significant amount of time is
due to the use of Vector instead of ArrayList, as we are
paying for synchronization overhead.
Logged In: YES
user_id=633675
Clyde Ruby and I were discussing why the JML type checker
parses so many files and we came across the same idea
that David has suggested in his initial comment. If JML were
to process files more lazily, then it could completely avoid
parsing (and checking) many files. That is, if T1 uses T2 in
its interfaces, then we don't parse T2 until we need some
information from it (such as the type of one of its methods).
Or even if we do process T2, we shouldn't process the files it
uses (and so on) as we do currently. In a word, the current
checker uses eager evaluation to parse files; we should try to
avoid work by not processing files at all if possible.
Curt, I'm assigning this to you so you can give Clyde some
tips (in a comment here) on how this might be done, if you
think it's reasonable. After you comment, assign this to
Clyde. (Of course, anyone else is welcome to volunteer for
this.)
Logged In: YES
user_id=635640
Gary and I discussed this over lunch yesterday. This note captures the
highlights of our discussion. I am assigning this task to Clyde as
requested.
Each pass of the compiler assumes that all other types being processed
by the compiler have been processed through at least the preceeding
pass. If a type is referenced that has not been loaded, then that type is
loaded (always from source in the JML tools) and processed up through
the preceeding pass. The newly loaded type is the processed on all
subsequent passes. It seems exceedingly difficult to change this
invariant as it is scattered through the entire code base.
The current implementation of the JML checker already avoids the
problem described in the original features request: a type (like T3 above)
that is only referenced within the body of a method of a recursively
checked class (like T2) will not be processed. However, it may also be
possible to avoid full-blown parsing and interface checking for a type
(like T4) that is referenced in the interface of a recursively checked class
(like T2).
If it can be implemented, it is likely that this optimization will be very
effective. The primary benefit comes from the secondary effects---most
types named in T1's interface will need to have their interfaces checked,
but most additional types named in T2's interface will not have to be
processed at all. This should result in significant pruning of the graph
of processed files.
To achieve this optimization, we would need a utility that did a quick-
and-dirty parse of the file that should declare T4. This parse would only
look for a package clause and top-level class (and generic function)
declarations. If the file was found to declare T4 then the use of T4 in
the interface of T2 would be assumed to be valid; additional parsing and
interface checking of T4 would only be required if the file named on the
command-line (T1) relied on the interface of T4. On the other hand, if
the file was not found to declare T4 via the quick-and-dirty parse, we
would probably want to do a full blown parse on it, because T4 might be
a nested type, for example.
The separation of processing modes between quick-and-dirty parsing
and full interface checking might be implemented by having the
JmlContext classes implement different behavior for catchUp when
contexts are created for recursively processed compilation units (like T2)
vs. directly processed ones (like T1). Another challenge will be keeping
coherent information in the caches; we would probably need a new
cache for quick-and-dirty information that is separate from the current
caches.
Finally, I wonder if there are some cases where we can ignore all JML
annotations in recursively processed files. If so we could let mjc handle
these types, reading from bytecode if available (and up-to-date) or
parsing just the MultiJava code and treating JML annotations as
comments.