[SED-ML-discuss] Referencing nested data in repeated tasks

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi, everyone!  At COMBINE yesterday, we had a discussion about (among other
things) the fact that 'taskReference' and 'modelReference' are sometimes
not sufficient to uniquely identify a single model:  a repeated task may
refer to multiple other tasks, which in turn may refer to multiple other
tasks, which may refer to a variety of models, some of which may be the
same model used in different simulations.  The example I used was (in my
'phraSED-ML' syntax):

t1 = run sim1 on mod1
t2 = run sim1 on mod2
t3 = run sim2 on mod1
t4 = repeat [t1, t2, t3] for t1.mod1.S1 in [1,3,6]
t5 = repeat [t1, t2, t3] for t3.mod1.S1 in [3,6,10]
t6 = repeat [t4, t5] for t4.t1.mod1.S1 in [0,1,10]

Here, the last line cannot currently be expressed in SED-ML:  if you want
to change the value of that particular instance of mod1, you really need to
know both that it comes from task t1 from task t4.

Similarly, when plotting data, the same issue is present.  We concluded
that in that situation, we really need a way to get at particular results,
and Frank suggested using a 'slicing' system similar to what is currently
proposed for external data.

However, this system seemed more difficult to implement in the 'repeated
task' context, and we thought that especially since this was a made-up
example designed to test the limits of SED-ML (I created it as a test case
as I was implementing support for phraSED-ML), and because the restriction
could be worked around by creating copies of models and using those
instead, we could simply say in the spec something like "If multiple
instantiated models match what is referenced by the combination of a
'taskReference' and 'modelReference', the assignment applies to all
instances of that model."

However, I realized last night that there is another problem:  if an
ambiguous reference to a symbol is used to assign *to* another variable.
In my above example:

t6 = repeat [t4, t5] for t4.t1.mod1.S1 in [0,1,10], mod2.S3 = mod1.S1

'mod2.S3' is ambiguous, which is OK:  we just apply the assignment to all
copies of mod2.S3.  But which version of mod1.S1 do we use?  If we say 'use
any of them', we sacrifice reproducibility.  But what are our alternatives?

I think we have two options:

1) Create a slicing system similar for repeated task variables, just like
Frank proposed for plotting variables and for external data.
2) Declare that the above situation is invalid SED-ML.

Either one is fine by me, as neither are ambiguous.  The second is
obviously much easier to implement.  And again, if there were any use cases
for this situation, you could resolve it by making multiple copies of the
ambiguous models.

I would also be OK with saying that instead of saying 'apply the assignment
to all copies of the model' you just say 'it's illegal to do that', if
we're going to declare this situation illegal, too.  But what do others
think?

-Lucian