#267 Allow SBML functions to reference global variables

Rejected-No_action
closed
nobody
5
2014-08-13
2014-06-04
No

Often when one is writing a physically based model, there can be tens or hundreds of physical constants. It is extremely tedious to add these to the argument list of a function.

The sbml spec mentions that functions should be treated a 'macro expansions'. This perfectly fine, it implies that sbml functions are dynamically scoped, and a macro expansion should have no problem dealing with a globally scoped variable.

Consider the following macro in C

// here the var y is passed in and x is a global,
// (or exists in whatever scope block the macro is expanded in)
#define foo(y) x + y

// now we can have global var like this:
int x = 1;

// and we can call the macro in a function like this:
int func() { return foo(2); }

The macro expands here inline to "x + 2" and this expression is then resolved in the calling code. Here, the symbol 'x' happens to be a global variable, and is resolved here. This is standard macro behavior.

So, I see no problem with SBML functions simply expanding wherever they are referenced, and whatever variable they may have that is not in their argument list is automatically resolved to and global (or local parameter).

Similarly, if a function is used in a kinetic law which may have local parameters, when the function expands here, any symbols that are not in the argument list, but are local parameters resolve to these local parameters.

Discussion

1 2 > >> (Page 1 of 2)
  • Andy Somogyi

    Andy Somogyi - 2014-06-04

    Also, I fully support functions accessing global variables in libroadrunner, and I suspect most sbml engines out there that expand functions inline would also have no problem with global variables in function definitions.

     
  • Lucian Smith

    Lucian Smith - 2014-06-04
    • labels: --> Level 3 Version 1 Core
    • Group: Accept-conformance-implications --> Reported-Proposed
     
  • Michael Hucka

    Michael Hucka - 2014-06-04

    This rule was introduced between L2v1 (which allowed global references inside functions) and L2v2 (which introduced the principle that everything has to be passed in explicitly via parameters). The reason was to prevent easy-to-make errors. One of the nastier possibilities in L2v2 arose from values having different units in different contexts. For example, events in L2v2 had their own optional time units attribute, so if you write a function that does something like “<csymbol time=""> + 50”, and you use it in different contexts such as event assignments, you might or might not get the expected results depending on where you called the function. If the time units were seconds in one place and milliseconds elsewhere, it would yield numerically identical results but contextually different units.

    We no longer have the potential for different units of time on different constructs in L3, but the principle of avoiding dynamic scoping was still deemed useful. As you may know, dynamic scoping is well known in computer science to be the source of subtle programming errors. It is often difficult or impossible to detect unintentional uses of dynamic scoping in code. Here’s an example of what can go wrong when you allow global references from within functions in SBML models. Suppose a model has a large number of global variables and one of them is called k. Suppose there’s a function called ‘f’ that computes f(x) = var1 + k. Note that the function effectively hides the reference to ‘k’ inside itself; there’s no indication from the name of the function or its parameters about what’s going on inside. Suppose the model calls f(x) inside the kinetic law math of many reactions, and some of the reactions define local parameters named ‘k’. Did the author really mean to use the local parameter value of k or the global one? If the global k was meant, software tools will not be able to flag this as a potential error if the units match – there’s no way to tell what the user’s intent was in this case. Worse, a user may not even realize they are getting a different ‘k’.

    The bottom line is that the goal is to avoid the potential for subtle, difficult to debug errors in models.

    Now, that said, I can see that it would be annoying to have the case of a function that has a dozen parameters, and having to insert them repeatedly in different places. Perhaps the model could be written to use an assignment rule, to group some intermediate results and make it possible to pass fewer parameters to the functions?

     
    • Andy Somogyi

      Andy Somogyi - 2014-06-05

      I'll agree that dynamic scoping is potentially error prone, that is likely why relatively few languages support dynamic scoping (perl is the only one that comes to mind). The reason I brought up dynamic scoping is if one considers macro expansions, then they are effectively dynamically scoped functions.

      Lexical scoping is a far better option, much less error prone and easier to implement.

      So, how about letting functions access global variables with lexical scoping? Implementation wise, this is very easy to do, and lexical scoping is something that most people are familiar with as nearly every language I can think of is lexically scoped.

      It is of course possible to re-write everything as assignment rules, but with models with a large number of physical constants and fairly complex functions, it becomes ever more cumbersome to repeat the same function 100's times, especially considering that there are 20 or so physical constants, and only 2 arguments.

      I would argue that is it more error prone to have to pass the same 20 constants into to a function 100's of times rather then to simply reference them as global constants.

      Note, that if referencing globals via lexical scoping were allowed, functions would of course remain side-effect free as there does not exist any mechanism to change anything except the return value.

       
      Last edit: Lucian Smith 2014-06-05
      • Lucian Smith

        Lucian Smith - 2014-06-05

        I don't see how lexical scoping is any different from dynamic scoping--exactly the same thing happens, and exactly the same drawbacks apply.

        There are other ways around your problem, too: if these are actual constants, you can just put the numbers themselves into the function definition, and annotate them with their semantic meaning. You could also provide a front-end to your users that translated a simplified form of the equations into the more formally restricted version that SBML requires.

         
        • Andy Somogyi

          Andy Somogyi - 2014-06-05

          Lexical scoping is completely different than dynamic scoping, lexical is what nearly every extant language uses.

          Its actually even simpler in sbml as you can only define function at the global level.

          Here, a function may reference arguments that are passed in, or any model symbols. The huge difference is that if a function is used within another function or kinetic rule with local params, it can only access those variables if they are explicitly passed in as arguments.

          With lexical scoping you have the following:

          var x = 1

          // define foo here, as x is in the scoping block where foo is defined,
          // foo can automatically access it.
          function foo(y) { return x + y; }

          function bar() {
          x = 5;
          feturn foo(1); // this will return 2, as the local variable
          // is not visible to foo.
          }

          Now, on the other hand, with dynamic scoping, as with macros, or perl, the above example would return 6. The difference between lexical and dynamic is lexical resolves when the function is defined, whereas dynamic resolves when it is used.

          Lexical (also called static) is much easier to understand, less error prone, and much easier to implement especially with a compiled language.

          In any case, allowing access to only global variables from sbml functions would make the document much easer to write and read and is less error prone that having to pass dozens of parameters to functions.

          And it would not help matters much by writing the numbers directly in the functions as these same parameters are used by a large number of functions and it would make reading the function exceedingly difficult and error prone.

          I've run into a number of sbml documents that do reference globals from sbml functions, so presumably there are a number of simulators out there that support this behavior. Thats one of the reasons I added global variable support to libraodrunner is so that it could have compatibility with these existing documents that were already used in CompuCell3D.

           
          Last edit: Lucian Smith 2014-06-05
          • Michael Hucka

            Michael Hucka - 2014-06-09

            I've run into a number of sbml documents that do reference globals from sbml functions, so presumably there are a number of simulators out there that support this behavior. Thats one of the reasons I added global variable support to libraodrunner is so that it could have compatibility with these existing documents that were already used in CompuCell3D.

            What Level+Version combination does CompuCell3D do this for? If it is doing it for Level 2 Version 2 or above, then -- I know I'm going to come off sounding like a pedantic jerk for saying this -- but the truth is that it (and other tools that do that) actually don't conform to the SBML specifications. Moreover, modifying other software to support incorrect behaviors like this it is detrimental to the goals of SBML, because it perpetuates the error. By perpetuating the error rather than working to correct it, the consequence is reduced interoperability between tools and reduced reproducibility of models. Even more unfortunately, it doesn't matter if a change in scoping behavior were to be introduced in L3v2, because those existing models will always be wrong for the Level+Version combination in which they were written, and the tool's behavior (if unchanged) will also be wrong for that Level+Version combination.

            The right thing is to alert the developers of the tool that they've made an error in interpreting the spec.

            Would it be possible for libRoadRunner to at least have a flag, such that this is not its default behavior, and CompuCell3D has to do something special to get it? This would at least prevent other tools using libRoadRunner from perpetuating the error. I'm especially worried about new tools that use libRoadRunner blindly.

            I'm surprised CompuCell3D allows this, because I thought they used libSBML, and libSBML performs explicit validation on user-defined functions and should catch references to identifiers that passed in to the function. Do you know anything more about how CompuCell3D is accomplishing this?

             
            • Endre Somogyi

              Endre Somogyi - 2014-06-09

              Previously, CompuCell3D used version of SOSLib, and I've never looked at the code. I do know that it used libSBML to parse the sbml, but other that, I don't know much about it. I would assume that it just expands the function inline like many other solvers.

              I could add an option, not a big deal.

              But, the thing with standards, languages in particular is that a particular compiler introduces a extension, others find it useful, and it gets adopted. There have been numerous C, C++ extensions introduced individually by GCC and Microsoft that were deemed useful and others adopted it.

              Here, I (and others) found the SOSLib extension very useful, so I implemented it, and similarly, I found the rateOf extension in COPASI useful, so I implemented it as well.

               
              Last edit: Lucian Smith 2014-06-09
              • Michael Hucka

                Michael Hucka - 2014-06-27

                But, the thing with standards, languages in particular is that a particular compiler introduces a extension, others find it useful, and it gets adopted. There have been numerous C, C++ extensions introduced individually by GCC and Microsoft that were deemed useful and others adopted it. Here, I (and others) found the SOSLib extension very useful, so I implemented it

                I know I'm going to come off sounding argumentative, but this is exactly the kind of thing that leads to incompatibility between software, which degrades interoperability and leads to people blaming SBML.

                When someone implements an extension that involves the addition of something that software can recognize (say, a #pragma, or a csymbol), that's one thing. Then, a software tool can detect that a different behavior is intended, and act accordingly.

                But what we have here with global references in SBML FunctionDefinitions is a behavior that's different from what is defined in the specification, without an indication in the SBML file that a difference was purposefully intended. No reader of the file can know that the software that wrote the file did this intentionally, because in this situation, there's no way to tell an error or incomplete model from a supposedly-correct model.

                SBML has a way for tools to handle extensions: use the annotation element. That's exactly the kind of experimentation annotation is intended for.

                 
                Last edit: Michael Hucka 2014-06-27
                • Chris Myers

                  Chris Myers - 2014-06-28

                  I have mixed feelings about this one. While I do like declaring your interfaces on functions and avoiding globals variables, Andy makes a pretty valid point that functions when thought of as just a macro expansion, then it does appear that they should be able to refer to global variables. Perhaps I'm missing something, but I'm not sure that I see the complication. One should always be able in inline functions, indeed libsbml has a helper function to do this, and when you inline the function there may be id's used in the function but not listed in the arguments that just get left alone.

                  Mike: I'm not sure what you mean by a "behavior that's different from what is defined". If you have a function f(y) = x+y, and you pass the value 2, you get x+2, a symbolic answer. This is the way I would interpret this. It is then up to the rest of the model to ensure that the value of x takes a defined value. Can you give an example of what you mean?

                  Thanks,

                  Chris

                   
                  Last edit: Lucian Smith 2014-07-07
                  • Michael Hucka

                    Michael Hucka - 2014-06-28

                    "Behavior different than what is defined" in the SBML specification. Simple: the specification says that global references are not allowed. Tools that allow references to globals are doing something that is defined differently by the SBML specification. The specification says functions containing references to identifiers not defined by the arguments passed in, are invalid. These tools are not producing valid models.

                     
                    Last edit: Michael Hucka 2014-06-28
                    • Chris Myers

                      Chris Myers - 2014-06-28

                      Oh, but what if the specification is changed to allow them to reference global variables. I think that is the point here. Namely, Andy would like us to change the specification to allow this for L3V2.

                      Chris

                       
                      Last edit: Lucian Smith 2014-07-07
                      • Michael Hucka

                        Michael Hucka - 2014-06-28

                        No, my point is that their tools do that now, with existing SBML. No matter what we do for L3v2, these tools are always going to be non-conformant unless they change their behavior. That would be okay if they weren't also producing invalid SBML without even informing the user. By doing this, they are actually worsening interoperability between SBML software.

                        Separately, upthread, we discussed why it is a bad idea to allow global references. I stand by my arguments that it's undesirable. But that's a subject of discussion, and potentially a vote to change how it's done in L3v2.

                        (And I agree that this -- the question of allowing globals -- is the original topic of this issue in the tracker. I am only replying to your [Chris] question about what is "different from", and then the follow-up implication that things would be fine if we make a change for L3v2. A change to the spec would only enable that things going forward.)

                         
                        Last edit: Michael Hucka 2014-06-28
                        • Chris Myers

                          Chris Myers - 2014-06-28

                          I agree with you there. Non-conformance is a big problem. Problem with these messages on sbml-specifications list is they don't always include the context, so one can lose the thread.

                          My comments were only about whether or not it is a good idea to allow this for L3V2, and I can see it as potentially being useful. It is less problematic than recursive functions :-).

                          Chris

                           
                          Last edit: Michael Hucka 2014-06-28
                          • Michael Hucka

                            Michael Hucka - 2014-06-28

                            Whew! Clarity.

                            Returning to Andy's original request: I agree that having this capability would be convenient from one perspective, but it was deliberately avoided in SBML for (IMHO) pretty good reasons. We can revisit those reasons if people want.

                             
  • Michael Hucka

    Michael Hucka - 2014-06-04

    P.S. It would be right to point out that what SBML defines with its user-defined functions is not, in fact, like macros, so the description in the spec is misleading in a way. We should maybe try to improve that in the L3v2 spec.

     
  • Brett Olivier

    Brett Olivier - 2014-06-07

    One problem I have with this is that functions are widely supported and this change would require non-insignificant implementation changes.

    While passing in large amounts of constants to a function can be irritating is this not a potential use case for an array of constants (e.g. using the Arrays package) that could be passed to function as a single argument?

     
    • Andy Somogyi

      Andy Somogyi - 2014-06-07

      A number of solvers already allow functions to access global variables.

      I've just verified that LibSBMLSim, 'Systems Biology Simulation Core Library', and libRoadRunner allow global variables. I'm assuming that SOSLIB also supports global variables, as this was the previous solver in CompuCell3D and one of the reasons why I initially support global variables in libRoadRunner is that a number of existing sbml files had functions accessing globals.

      However, COPASI issues an error message when globals are present.

      I've tested the solvers with the attached test model.

       
      Last edit: Lucian Smith 2014-06-07
      • Andy Somogyi

        Andy Somogyi - 2014-06-07

        Attachments evidently get stripped out, here is the test model:

        <?xml version="1.0" encoding="UTF-8"?>
        <!-- Created by libAntimony version v2.5 on 2014-06-03 17:26 with libSBML version 5.9.0. -->
        <sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" version="1">
          <model id="functest" name="functest">
            <listOfFunctionDefinitions>
              <functionDefinition id="test">
                <math xmlns="http://www.w3.org/1998/Math/MathML">
                  <lambda>
                    <apply>
                      <plus/>
                      <ci> x </ci>
                      <cn type="integer"> 1 </cn>
                    </apply>
                  </lambda>
                </math>
              </functionDefinition>
            </listOfFunctionDefinitions>
            <listOfParameters>
              <parameter id="x" value="0" constant="false"/>
              <parameter id="y" constant="false"/>
            </listOfParameters>
            <listOfRules>
              <rateRule variable="x">
                <math xmlns="http://www.w3.org/1998/Math/MathML">
                  <apply>
                    <sin/>
                    <csymbol encoding="text" definitionURL="http://www.sbml.org/sbml/symbols/time"> time </csymbol>
                  </apply>
                </math>
              </rateRule>
              <assignmentRule variable="y">
                <math xmlns="http://www.w3.org/1998/Math/MathML">
                  <apply>
                    <ci> test </ci>
                  </apply>
                </math>
              </assignmentRule>
            </listOfRules>
          </model>
        </sbml>
        
         
        Last edit: Lucian Smith 2014-06-07
      • Michael Hucka

        Michael Hucka - 2014-06-09

        I'm currently writing questions to the developers of those tools. All of them violate SBML validation rule #20304 if they allow that function definition with no error. That validation rule has been present in the specifications since L2v2.

         
      • Michael Hucka

        Michael Hucka - 2014-06-09

        By the way, thanks for bringing this up, and for going out of your way to examine the behavior of other libraries and providing a test case. I realize it's time and effort for you, and while I'm going on complaining about how the tool behavior is an error, I really appreciate that you did this. This problem probably would have gone unnoticed for a lot longer otherwise, because there is apparently no test for this situation in the SBML Test Suite. (I will also look at updating the test suite to have a way to catch this situation.)

         
  • Lucian Smith

    Lucian Smith - 2014-06-09

    What people do with invalid models that they import is basically their own business--some people like to take the browser's approach to HTML, namely, accept as much as possible. This has consequences, but it's a social issue, not so much a spec issue. If a tool produces an invalid model, that can be more obviously flagged as a bug, if there's no way for the user to know they did something wrong. So the question becomes: what tools let a user sit down to create a model, that is then invalid SBML? Simulators are not, in general, model creation tools, so something like roadrunner is out of the running here.

    I am leery of claiming that simulator behavior should be a guide in this case, because the behavior of invalid SBML models is undefined, so anything that a simulator does when it encounters an invalid model is just a personal choice on the part of the modeler, and might even be a side-effect of the method they use to deal with functions: in other words, they might not have ever put any conscious thought into whether (say) a spurious 'X' in a function referenced in a kinetic law should reference the local or global 'X'. When Brett claims that it would be a lot of work to change this behavior, it doesn't mean that any simulator that happens to behave as we might want it to behave wouldn't have to do anything, because they, too, would have to work out the implications of making sure they support newly-defined behavior that used to be undefined.

    Given all of that, I think this could easily boil down to being a tool issue. For convenience, some model creation software could allow users to omit explicitly passing in global variables, and then programmatically filling in the missing arguments at export time. This would minimize error on the users part (as Andy wants), but also ensures that the produced SBML is explicit and fully exchangeable. As an example, in Antimony, I could let the user write:

    function test(x)
    x + 1
    end

    model foo()
    x' = test()
    end

    and the call to 'test' would automatically fill in the missing arguments, so that the produced SBML would be valid:

    <?xml version="1.0" encoding="UTF-8"?>
    <!-- Created by libAntimony version v2.5.2 on 2014-06-09 09:59 with libSBML version 5.10.0. -->
    <sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" version="1">
      <model id="foo" name="foo">
        <listOfFunctionDefinitions>
          <functionDefinition id="addone">
            <math xmlns="http://www.w3.org/1998/Math/MathML">
              <lambda>
                <bvar>
                  <ci> x </ci>
                </bvar>
                <apply>
                  <plus/>
                  <ci> x </ci>
                  <cn type="integer"> 1 </cn>
                </apply>
              </lambda>
            </math>
          </functionDefinition>
        </listOfFunctionDefinitions>
        <listOfParameters>
          <parameter id="x" constant="false"/>
        </listOfParameters>
        <listOfRules>
          <rateRule variable="x">
            <math xmlns="http://www.w3.org/1998/Math/MathML">
              <apply>
                <ci> addone </ci>
                <ci> x </ci>
              </apply>
            </math>
          </rateRule>
        </listOfRules>
      </model>
    </sbml>
    
     
  • Lucian Smith

    Lucian Smith - 2014-06-26

    Just as a note, since I'm commenting on all the other open issues: I am not planning to incorporate this change into SVN at this time, but we still need clear votes from the editors one way or the other before the issue can be closed. If someone wants me to write this up to see what it would look like in the spec before they vote, however, I can do that.

     
  • Brett Olivier

    Brett Olivier - 2014-06-27

    I think we loose more than we gain from this (function portability, clear debugging, etc) even though it is a tempting idea.

    I disagree with this proposed change.

     
  • Frank Bergmann

    Frank Bergmann - 2014-06-28

    Like I said before, i'm against adding this feature. It does not anything to the standard in terms of expressiveness, but makes it more difficult to validate and will cause issues for all tools that did not implement FDs as simple preprocessor defines.

    So I'm against adding this.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.