[SED-ML-discuss] Summary of nested proposal breakout at COMBINE 2012

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear all,

Dagmar, Frank & I had a discussion at COMBINE 2012 about the nested 
proposal. I'm going to try to summarise it here for the benefit of 
others and posterity. Hopefully as a result we can call a vote on the 
proposal very soon now :)

I emphasise that this is primarily my description of the discussion, but 
Frank and Dagmar have had a chance to check a draft for drastic errors.

The main issues discussed were (at a high level - some had sub-issues):

 1. Whether the proposal is at a stage where it's ready to be voted upon
 2. How to deal with multiple subTasks of a repeatedTask
 3. How to refer to variables in the protocol, in particular referring
    to ranges

There were also a couple of minor issues, which I'll deal with first.

  * The step attribute on oneStep (see
    http://sourceforge.net/mailarchive/message.php?msg_id=29653864).
    Consensus that the current spec means what's written - the step is
    fixed, and will remain so until we support parameterising
    simulations, which can be a separate proposal.
  * I asked about the type="log" attribute on uniformRange, which can be
    done with functionalRange.  Frank explained that the consensus at
    HARMONY was that it's a common use case and a short-hand should be
    provided.  We do need to specify which log in the spec (natural?
    base 10?).

        1. Are we ready to vote?

There are some issues that could do with tidying up (see e.g. below!) 
but the core idea is definitely, in our opinion, something that should 
be added to SED-ML.  It is hence worth voting to get approval for this, 
and we can tidy up loose ends as we prepare the changes for the 
specification, schema, etc.

As a general point, we want to avoid adding more complexity to the 
proposal in the process of tidying up. It's better to have many small 
complementary proposals than one large one. This may imply deferring 
some features in the original proposal to a later stage, if we decide 
that they are not sufficiently well motivated and described at present. 
This also applies to issues discussed below which are not currently part 
of the nested proposal - we do not want to add further features to this 
proposal! On the other hand, I think it is worth considering that if 
there are two potential ways of representing something, and one of those 
allows more possibilities in the future, then it might be better to go 
with that option.

        2. Multiple subTasks of a repeatedTask

The current proposal allows a repeatedTask to have multiple subTasks, 
with a notion of task dependency to allow for ordered execution, but 
doesn't include motivating use cases or examples. (Indeed, only one of 
the examples has a sub-task at all!) Do we need multiple subTasks within 
the nested proposal, or should this feature be deferred for a later 
proposal? This is closely tied to the question of whether (and how much) 
to change the Task class hierarchy in adding repeatedTask.

My talk 
(http://co.mbine.org/events/COMBINE_2012/agenda?q=system/files/2012-08-15-combine-cooper-sedml-proposals.pdf) 
suggested splitting out support for multiple subtasks into another task 
class: CombinedTask.  This would allow (almost) all the issues discussed 
in this section to be considered under a /separate proposal specifically 
for CombinedTask/, and we could proceed with RepeatedTask without 
multiple sub-tasks.  The /only /question remaining for the nested 
proposal under this approach would be whether to make RepeatedTask 
inherit from the current Task, or give them a common base class.

  * Frank's preference is to *extend* SED-ML L1V1 so that the nested
    task could be used in tools *as is*, without requiring a re-design
    of the language.
  * My main concern with this is that if it inherits then it needs to
    have simulationReference and modelReference.
      o There is a use for modelReference as a convenience, although it
        would in principle be possible for tools to use the subTask's
        model (assuming only one subTask; see also below).
      o Including simulationReference allows you to have a short-cut for
        the simple (and probably common) case where you want to repeat a
        task that's just running a given simulation on a given model
        (and not including a subTask at all). On the down side, it
        requires giving an empty simulationReference when you do have a
        subTask, which looks ugly, and means you have to change the
        language schema anyway to allow this.
  * Something I've thought of since: you might in general want multiple
    models referenced in a task, allowing you to (e.g.) parameterise one
    model based on another. But let's not include this in nested!

Several things still need to be figured out when dealing with multiple 
sub-tasks, hence my preference for deferring these to later proposal(s).

  * How to specify dependencies between tasks (and what these mean) - we
    didn't talk about this.
  * How to refer to the results arising from different sub-tasks (slide
    16 of my talk).
  * Which model the parent task refers to (if not given explicitly and
    the sub-tasks use different models).

With modelReference, there's a question over what this means if the 
subTask is a combinedTask with different models for each of /its/ 
subTasks.  Then there could be multiple model references.  I think (as 
alluded to above) that we might need to re-design the referencing scheme 
more widely if we allowed this, for example putting modelReference 
and/or taskReference on the variable element to specify which subTask's 
model you're interested in.  (See also slide 16 of my talk.)  It might 
no longer make sense to use the task's model implicitly when you have a 
variable reference within a task (as opposed to in a dataGenerator or 
model change).

Getting results with multiple sub-tasks is troublesome.  Currently the 
nested proposal leaves this undefined and describes it as a tool issue.  
(It happens anyway with pure repeated tasks, but it's easier to define a 
suitable behaviour here.) I think it's a bad idea in principle to leave 
something undefined and up to implementations in a specification though, 
if you can avoid it - it makes reliable exchange much harder.  Currently 
this just affects how the outputs (plots & results) consume the results; 
if you chain post-processing (see below) it affects that too.  For the 
nested proposal itself, I'd just suggest changing Frank's example for 
what to do with the results into a recommendation for tools to follow.

In summary, we need to make a decision over the class hierarchy, and, if 
we include multiple sub-tasks within the nested proposal, then we need 
some motivating examples and careful consideration of the implications.

        3. Referring to variables in the protocol, in particular
        referring to ranges

The setValue element needs to be able to reference range values in order 
to compute the new value, so it's just a matter of whether you do that 
in a way that allows for other uses too in future proposals.

The nested proposal currently has several alternative mechanisms in the 
spec:

 1. Using an index attribute on functionalRange
 2. Using <variable target="#id">
 3. Using an XPath expression into the SED-ML
 4. Possibly implicitly making all range ids addressable with <ci> in
    the math

We agreed that (3) should be dropped.  It's ugly, and there's great 
potential for confusion with XPath addressing a model.
We also agreed that (4) shouldn't be allowed.

Slide 17 of my talk proposes a variation of (2): adding an idref 
attribute to variable, as an additional alternative to the target 
attribute (just as we have a symbol attribute already). This then 
selects the part of the protocol with that id.

_Examples_
1) In functionalRange:
<functionalRange id="range_function">
     <listOfVariables>
         <variable id="index" idref="range_counter"/>
     </listOfVariables>
     ....

2) In setValue:
<setValue target="...">
     <listOfVariables>
         <variable id="range_value" idref="range_function"/>
     </listOfVariables>
     ....

3) In chaining post-processing (*NB*: I'm *not *suggesting to include 
this in the nested proposal!):
<dataGenerator id="datagen2">
     <listOfVariables>
         <variable id="v" idref="datagen1"/>
     </listOfVariables>
<m:math><m:apply><m:plus/><m:cn>5</m:cn>
         <m:ci>v</m:ci>
     </m:apply></m:math>
</dataGenerator>

And you could still allow option (1), with something like 
<functionalRange id="range_function" range="range_counter"> as a 
short-hand for the common case.  (I've changed the index attribute to be 
called range, for consistency with on repeatedTask itself, and on setValue.)

We discussed what happens if dataGen1 is multidimensional. Currently 
dataGenerators are populated by sweeping through elements, so while 
potentially multidimensional they are not yet.  I term this behaviour an 
implicit map.  Frank wondered if you needed to be able to refer to the 
n'th entry of a dataGenerator with this extension. I don't think we do 
yet - the chained dataGenerator is defined by sweeping through the 
original one, just as the original one sweeps through values of the 
model variable.

We also discussed needing to restrict what could be selected to 
constructs which make sense: probably just ranges and data generators 
(at present). We could reference parameters (all ids are global) but 
it's probably best not to allow referencing parameters of another 
dataGenerator or similar: both because it's harder to implement, and 
because we might want to make ids local in the future.

Chaining dataGenerators is something that only really becomes useful 
when you have more complex functionality in the post-processing, such as 
is allowed by my MathML extensions. My point is just that, given we 
might want something like this in the future, it makes sense for changes 
done now to be done in a way that would make such things easier, unless 
it makes our life more difficult now.

Best wishes,
Jonathan