Re: [SED-ML-discuss] Question about model IDs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> Herbert was asking  about SED-ML use cases, we don't really have this information on the SED-ML website but probably should do so, it would help clarify its purpose to newcomers. 
> 

Agreed, do you want to volunteer taking a first stab at the text (perhaps in a google doc) and then we massage it and put it online. 

> David, is there any reason why SED-ML files can't use SVN or mercurial URLs to access the  resources? In this case an archive file isn't really needed - just exchange a SED-ML file, which could itself be under version control within a project.
> 

No matter what, I'd prefer for the archive not to go away. It will always be useful to have a snapshot (including all necessary data / models … ) to pass it along. but I'm actually not against having URLs to SVN / git / mercurial. It is an overhead but if the community decides that it is needed why not. Here the potential issues that would need to be resolved: 

1. we need to identify the type of repository (SVN |  HG | GIT …
2. we might have to deal with authentication (though in a way this could hit us already today with the use of URLs)

The naive solution would be to stick in the type of repository into the URI, however since the repositories can use different protocols this might be tough. Unless we use a syntax like: 

svn+<actual repository url>
hg+<actual repository url>
git+<actual repository url>

Thoughts?
Frank

> Best wishes,
> 
> Richard
> 
> 
> 
> 
> On 6 Oct 2011, at 22:28, David Nickerson wrote:
> 
>> The idea of an archive file works well when you have a single
>> transferable set of data that gets shipped from user to user with no
>> need to track or merge changes from different users (i.e., one
>> developer many users). DOCX archives work because Word provides very
>> sophisticated edit tracking and merging functionality - is this
>> functionality something that SED-ML tools want to develop and support?
>> 
>> An alternative, that we have been using successfully in the Physiome
>> Model Repository, is the idea of using mercurial repositories as a
>> "workspace" in which you contain all data related to a piece of work.
>> This piece of work could be anything from a complete model with
>> simulation descriptions and outputs, experimental data used to fit the
>> model, etc. to a single sub-component of a generic model constituent
>> designed as part of a component library. While we use mercurial, any
>> versioning system could be used instead. The advantage of this
>> approach is that the user is able to embed repositories within
>> repositories (subrepos in mercurial, submodules in git, externals in
>> svn) and manage the versioning of embedded repositories such that the
>> user can choose whether to track latest changes or fix on a specific
>> version that is know to be "correct".
>> 
>> We find that this allows modellers to easily[1] ship their work
>> between each other while maintaining a decent provenance record of the
>> developments. While it is not always going to be optimal to include
>> all types of data in a single type of repository, mercurial at least
>> (and probably the other systems), allows embedding different types of
>> repository within a repository. We're starting now to think about how
>> this would work when you start wanting to include datasets that are
>> huge (e.g., large image collections used to fit patient specific
>> cardiac models, or the simulation results of several million cell
>> models in a stomach). But at least in terms of the work currently
>> being done with CellML models, mercurial  repositories are proving
>> sufficient.
>> 
>> When a user simply wants to grab a particular piece of work from the
>> repository for their own use, the website (and soon webservices)
>> provide a way to download a static archive containing the contents of
>> the workspace. Again, we're still trying to work out what to do in the
>> case where an workspace links to datasets that can easily be multiple
>> gigabytes in size. And we are certainly open to any help on this
>> issue.
>> 
>> Cheers,
>> David.
>> 
>> [1] There is currently a lack of tool support for this, so users
>> pretty much have to manage their mercurial workspaces using mercurial
>> directly. This is not always ideal and is also something we are
>> thinking about and working on to some degree for certain applications.
>> 
>> On Fri, Oct 7, 2011 at 9:41 AM, Richard Adams <ric...@ed...> wrote:
>>> 
>>> For those unfamiliar with the concept of the SEDX archive format: It's a
>>> zipped archive of a SED-ML file and >=1 model file. Would an equally simple
>>> solution be acceptable to incorporate other data files as well?
>>> For example, a zip with subfolders
>>> models/
>>> simulations/
>>> data/
>>> diagrams/
>>> results/
>>> Each resource could refer to other resources via relative URIs. E.g., in the
>>> scenario above, a SedML file would refer to ../models/model.xml. Of course,
>>> within an archive, external public resources can still be referred to if
>>> need be - for example, if a dataset is very large.
>>> In this approach all the folders are 'source' folders (i.e., inputs to
>>> computational tasks) except for results/, which could be used to contain the
>>> results of an experiment if need be.
>>> In SBSI we have the concept of a 'project' for modelling resources, with a
>>> defined folder structure largely similar to that proposed above, to enable
>>> tooling to reliably link and detect  resource files, and allow export of
>>> zipped archives for exchange between collaborators.  I'm not especially
>>> advocating its exact usage but just to say it's worked well for us.
>>> Best wishes
>>> Richard
>>> On 6 Oct 2011, at 17:05, Nicolas Le Novère wrote:
>>> 
>>> It seems to me that the need of an archive is pervasive in all the M&S
>>> efforts. We want the models and the simulation descriptions together, but
>>> also the simulation results or the data-sets used to complement or fit the
>>> models, and of course the SBGN-ML files.
>>> 
>>> And we are going in the same direction in DDMoRe, where the need is clearer
>>> since the mathematics only represent part of the "model".
>>> 
>>> After all the  it is just like ODF or DOCX.
>>> 
>>> I therefore propose to move that discussion to combine-discuss.
>>> 
>>> On 06/10/11 16:43, Herbert Sauro wrote:
>>> 
>>> Thank you Richard, that is exactly what I wanted to know, will read appendix
>>> D.  The zip file idea has been floated around for a while (since 2004 I
>>> think by Cliff Shaffer in particular) but in the past there was never
>>> anything to zip with the sbml model until the advent of sedml.
>>> 
>>> Herbert
>>> 
>>> Sent from my iPad, hence excuse my typos.
>>> 
>>> ------------------------------------------------------------------------------
>>> 
>>> All the data continuously generated in your IT infrastructure contains a
>>> 
>>> definitive record of customers, application performance, security
>>> 
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> 
>>> sense of it. Business sense. IT sense. Common sense.
>>> 
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> 
>>> _______________________________________________
>>> 
>>> SED-ML-discuss mailing list
>>> 
>>> SED...@li...
>>> 
>>> https://lists.sourceforge.net/lists/listinfo/sed-ml-discuss
>>> 
>>> 
>>> --
>>> Nicolas LE NOVERE, Computational Systems Neurobiology, EMBL-EBI, WTGC,
>>> Hinxton CB101SD UK, Mob:+447833147074, Tel:+441223494521 Fax:468,
>>> Skype:n.lenovere, AIM:nlenovere, twitter:@lenovere
>>> http://www.ebi.ac.uk/~lenov/, http://www.ebi.ac.uk/compneur/
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> SED-ML-discuss mailing list
>>> SED...@li...
>>> https://lists.sourceforge.net/lists/listinfo/sed-ml-discuss
>>> 
>>> 
>>> Dr Richard Adams
>>> Software Development Team Leader,
>>> Centre For Systems Biology Edinburgh
>>> University of Edinburgh
>>> Tel: 0131 651 9019
>>> email : ric...@ed...
>>> Web: http://csbe.bio.ed.ac.uk/adams.php
>>> 
>>> 
>>> 
>>> 
>>> 
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> SED-ML-discuss mailing list
>>> SED...@li...
>>> https://lists.sourceforge.net/lists/listinfo/sed-ml-discuss
>>> 
>>> 
>> 
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> SED-ML-discuss mailing list
>> SED...@li...
>> https://lists.sourceforge.net/lists/listinfo/sed-ml-discuss
>> 
> 
> Dr Richard Adams
> Software Development Team Leader,
> Centre For Systems Biology Edinburgh
> University of Edinburgh 
> Tel: 0131 651 9019
> email : ric...@ed...
> Web: http://csbe.bio.ed.ac.uk/adams.php
> 
> 
> 
> 
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2_______________________________________________
> SED-ML-discuss mailing list
> SED...@li...
> https://lists.sourceforge.net/lists/listinfo/sed-ml-discuss