From: Jonathan C. <jon...@cs...> - 2012-09-21 09:04:47
|
Dear all, Dagmar, Frank & I had a discussion at COMBINE 2012 about the nested proposal. I'm going to try to summarise it here for the benefit of others and posterity. Hopefully as a result we can call a vote on the proposal very soon now :) I emphasise that this is primarily my description of the discussion, but Frank and Dagmar have had a chance to check a draft for drastic errors. The main issues discussed were (at a high level - some had sub-issues): 1. Whether the proposal is at a stage where it's ready to be voted upon 2. How to deal with multiple subTasks of a repeatedTask 3. How to refer to variables in the protocol, in particular referring to ranges There were also a couple of minor issues, which I'll deal with first. * The step attribute on oneStep (see http://sourceforge.net/mailarchive/message.php?msg_id=29653864). Consensus that the current spec means what's written - the step is fixed, and will remain so until we support parameterising simulations, which can be a separate proposal. * I asked about the type="log" attribute on uniformRange, which can be done with functionalRange. Frank explained that the consensus at HARMONY was that it's a common use case and a short-hand should be provided. We do need to specify which log in the spec (natural? base 10?). 1. Are we ready to vote? There are some issues that could do with tidying up (see e.g. below!) but the core idea is definitely, in our opinion, something that should be added to SED-ML. It is hence worth voting to get approval for this, and we can tidy up loose ends as we prepare the changes for the specification, schema, etc. As a general point, we want to avoid adding more complexity to the proposal in the process of tidying up. It's better to have many small complementary proposals than one large one. This may imply deferring some features in the original proposal to a later stage, if we decide that they are not sufficiently well motivated and described at present. This also applies to issues discussed below which are not currently part of the nested proposal - we do not want to add further features to this proposal! On the other hand, I think it is worth considering that if there are two potential ways of representing something, and one of those allows more possibilities in the future, then it might be better to go with that option. 2. Multiple subTasks of a repeatedTask The current proposal allows a repeatedTask to have multiple subTasks, with a notion of task dependency to allow for ordered execution, but doesn't include motivating use cases or examples. (Indeed, only one of the examples has a sub-task at all!) Do we need multiple subTasks within the nested proposal, or should this feature be deferred for a later proposal? This is closely tied to the question of whether (and how much) to change the Task class hierarchy in adding repeatedTask. My talk (http://co.mbine.org/events/COMBINE_2012/agenda?q=system/files/2012-08-15-combine-cooper-sedml-proposals.pdf) suggested splitting out support for multiple subtasks into another task class: CombinedTask. This would allow (almost) all the issues discussed in this section to be considered under a /separate proposal specifically for CombinedTask/, and we could proceed with RepeatedTask without multiple sub-tasks. The /only /question remaining for the nested proposal under this approach would be whether to make RepeatedTask inherit from the current Task, or give them a common base class. * Frank's preference is to *extend* SED-ML L1V1 so that the nested task could be used in tools *as is*, without requiring a re-design of the language. * My main concern with this is that if it inherits then it needs to have simulationReference and modelReference. o There is a use for modelReference as a convenience, although it would in principle be possible for tools to use the subTask's model (assuming only one subTask; see also below). o Including simulationReference allows you to have a short-cut for the simple (and probably common) case where you want to repeat a task that's just running a given simulation on a given model (and not including a subTask at all). On the down side, it requires giving an empty simulationReference when you do have a subTask, which looks ugly, and means you have to change the language schema anyway to allow this. * Something I've thought of since: you might in general want multiple models referenced in a task, allowing you to (e.g.) parameterise one model based on another. But let's not include this in nested! Several things still need to be figured out when dealing with multiple sub-tasks, hence my preference for deferring these to later proposal(s). * How to specify dependencies between tasks (and what these mean) - we didn't talk about this. * How to refer to the results arising from different sub-tasks (slide 16 of my talk). * Which model the parent task refers to (if not given explicitly and the sub-tasks use different models). With modelReference, there's a question over what this means if the subTask is a combinedTask with different models for each of /its/ subTasks. Then there could be multiple model references. I think (as alluded to above) that we might need to re-design the referencing scheme more widely if we allowed this, for example putting modelReference and/or taskReference on the variable element to specify which subTask's model you're interested in. (See also slide 16 of my talk.) It might no longer make sense to use the task's model implicitly when you have a variable reference within a task (as opposed to in a dataGenerator or model change). Getting results with multiple sub-tasks is troublesome. Currently the nested proposal leaves this undefined and describes it as a tool issue. (It happens anyway with pure repeated tasks, but it's easier to define a suitable behaviour here.) I think it's a bad idea in principle to leave something undefined and up to implementations in a specification though, if you can avoid it - it makes reliable exchange much harder. Currently this just affects how the outputs (plots & results) consume the results; if you chain post-processing (see below) it affects that too. For the nested proposal itself, I'd just suggest changing Frank's example for what to do with the results into a recommendation for tools to follow. In summary, we need to make a decision over the class hierarchy, and, if we include multiple sub-tasks within the nested proposal, then we need some motivating examples and careful consideration of the implications. 3. Referring to variables in the protocol, in particular referring to ranges The setValue element needs to be able to reference range values in order to compute the new value, so it's just a matter of whether you do that in a way that allows for other uses too in future proposals. The nested proposal currently has several alternative mechanisms in the spec: 1. Using an index attribute on functionalRange 2. Using <variable target="#id"> 3. Using an XPath expression into the SED-ML 4. Possibly implicitly making all range ids addressable with <ci> in the math We agreed that (3) should be dropped. It's ugly, and there's great potential for confusion with XPath addressing a model. We also agreed that (4) shouldn't be allowed. Slide 17 of my talk proposes a variation of (2): adding an idref attribute to variable, as an additional alternative to the target attribute (just as we have a symbol attribute already). This then selects the part of the protocol with that id. _Examples_ 1) In functionalRange: <functionalRange id="range_function"> <listOfVariables> <variable id="index" idref="range_counter"/> </listOfVariables> .... 2) In setValue: <setValue target="..."> <listOfVariables> <variable id="range_value" idref="range_function"/> </listOfVariables> .... 3) In chaining post-processing (*NB*: I'm *not *suggesting to include this in the nested proposal!): <dataGenerator id="datagen2"> <listOfVariables> <variable id="v" idref="datagen1"/> </listOfVariables> <m:math><m:apply><m:plus/><m:cn>5</m:cn> <m:ci>v</m:ci> </m:apply></m:math> </dataGenerator> And you could still allow option (1), with something like <functionalRange id="range_function" range="range_counter"> as a short-hand for the common case. (I've changed the index attribute to be called range, for consistency with on repeatedTask itself, and on setValue.) We discussed what happens if dataGen1 is multidimensional. Currently dataGenerators are populated by sweeping through elements, so while potentially multidimensional they are not yet. I term this behaviour an implicit map. Frank wondered if you needed to be able to refer to the n'th entry of a dataGenerator with this extension. I don't think we do yet - the chained dataGenerator is defined by sweeping through the original one, just as the original one sweeps through values of the model variable. We also discussed needing to restrict what could be selected to constructs which make sense: probably just ranges and data generators (at present). We could reference parameters (all ids are global) but it's probably best not to allow referencing parameters of another dataGenerator or similar: both because it's harder to implement, and because we might want to make ids local in the future. Chaining dataGenerators is something that only really becomes useful when you have more complex functionality in the post-processing, such as is allowed by my MathML extensions. My point is just that, given we might want something like this in the future, it makes sense for changes done now to be done in a way that would make such things easier, unless it makes our life more difficult now. Best wishes, Jonathan |