Re: [sbml-flux] [sbml-discuss] Storing gene expression values in SBML

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dear Miguel and Robert,

Thanks for starting this discussion; context-specific models are becoming an essential topic and improved support for their standardized development is therefore indispensable.

I am adding the FBC mailing list to the list of recipients of this e-mail because the geneProduct information and so-called gene-protein-reaction (GPR) rules are defined in this extension package for SBML rather than in the core. From my perspective, it makes sense to continue discussing this issue on the more specific mailing list.

As I understand it, you would like to add gene expression data to the model and thus create an array of structurally identically models that differ only in the expression levels of specific gene products. Please note that maintenance of such an array of nearly identical models can be complicated in the long term because changes and updates to specific processes need to be synchronized across many or even all models. It is, therefore, probably better to have just one general model (or just a hand full of them) that is put into context when evaluated with respect to given data.

A few projects to encode numerical values with relevance to biology have been launched, such as the Systems Biology Result Markup Language (SBRML) and NuML (Numerical values Markup Language). Maybe somebody on this list can give an update on the current state of SBRML or NuML?

An idea could be to link numbers in these formats (e.g., NuML or SBRML, or similar) to the SBML file and wrap them all in a COMBINE archive together with a SED-ML file that specifies how to run the simulation. In this way, the SBML file could continue to be restricted to the actual model, separate from specific measurement or evaluation details, i.e., we would have a general model that becomes context-specific when evaluated with respect to the data files, rather than having an array of context-specific model variants.

For first tests, however, a quick solution could be to define a custom annotation for the geneProduct objects in the FBC package that you could parse out and write additional numerical values there. This approach would require some implementation work and possibly a change in how you run the simulation. Besides, it would not be compatible with other tools. However, this could perhaps lead to the development of an extended standard, for instance, in the form of a new package for gene expression values or similar.

Cheers
Andreas

> Am 23.06.2019 um 01:26 schrieb Robert Phair <rp...@in...>:
> 
> If we limit our discussion to enzymatically catalyzed reactions and transport processes mediated by integral membrane proteins, and if we assume those enzymes and transporters are, themselves, species in your CSM that act as modifiers of the target reaction, then the gene expression data (transcripts per million, say) could be stored in the initial value attribute for the species. You will also have, I suppose, some mapping from this TPM value to the abundance of the corresponding protein, enzyme or transporter. This mapping could be included in the kinetic law for the corresponding reaction by partitioning the Vmax into its component parts, protein abundance and turnover number. Obviously this involves many assumptions, but everyone recognizes that quantifying large numbers of mRNAs is far easier than quantifying the corresponding proteins and their post-translational modifications.
> 
> Regards,
> Robert D Phair PhD  |  Chief Science Officer  |  Integrative Bioinformatics Inc
> Mountain View, CA, USA
> www.integrativebioinformatics.com
> 
> ProcessDB: Mechanistic modeling for systems biology
> 
> 
> On Sat, Jun 22, 2019 at 2:36 PM Miguel Ponce de Leon <mig...@gm...> wrote:
> Hi everyone,
> 
> I've been working in constraint-based metabolic modeling for a while and more recently I've started to work with context-specific models (CSM) as well with model-extraction methods, to study cancer metabolism.
> 
> In order to extract a CSM from a references model such as Recon2.2 or Recon3D an omic data set, in general gene expression, is used to identity active reactions. This mapping step is performed using the so called gene-protein-reaction (GPR) rules. Finally an optimization procedure, commonly referred as the model-extraction method (e.g. GIMME, INIT; FASCORE; CORDA) is used to extract a submodel which contains those reaction identified as active as well some other reaction needed to fill gaps. Recently, we have found that in virtually all available model-extraction methods the GPRs are not contextualized in the produced model, which means that when the model is stored part of the context is lost (see for further details: https://www.biorxiv.org/content/10.1101/593277v1.abstract). Moreover, the genes' expression used to reconstruct the CSM are generally not provided, which prevent other users to know which genes were treated as expressed and which were considered as not expressed. Altogether, this flaw affect the predictive capabilities of the CSMs, especially in the case of performing in-silico gene knockouts.
> 
> My collaborators and I have been working on this issue and come to the idea that the sbml standard could support the storage of these information, that is to say that when a CSM is generated the gene expression values are also stored in the sbml file; in the same way that reactions has an attribute to store flux, or metabolites to store a concentration it would be very useful to have a gene attribute to store expression value (e.g. TPM, FPMK) to provide a user of such model the needed information to perform correct simulations.
> 
> I do not know if the current SBML standard support such capability. If the answer is affirmative, we would like to know the correct way to store gene expression in an SBML. If the storage of gene expression is currently not supported by the SBML standard we would like to bring this possibility into the future releases.
> 
> Best regards,
> Miguel Ponce-de-Leon
> Life Science Department - Barcelona Supercomputing Center

With best regards

Dr. Andreas Draeger
Assistant Professor
---
University of Tübingen
Institute for Biomedical Informatics (IBMI)
Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens
Sand 14 · Office #C320 · 72076 Tübingen · Germany
Phone: +49-7071-29-70459 · Fax: +49-7071-29-5152
Web: http://systems-biology.info · Twitter: @dr_drae
YouTube: https://www.youtube.com/c/systemsbiology

Re: [sbml-flux] [sbml-discuss] Storing gene expression values in SBML

A file format for exchanging computational models in systems biology

Re: [sbml-flux] [sbml-discuss] Storing gene expression values in SBML