Re: [Gmod-ajax] summary

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Mitch,

Thanks for all this; I'm a little behind you & Chris on the discussion 
of RDF and semantic wikis and so on; but I did have a vague hand-wavy 
comment regarding this:

 > As I see it, the main point of having a genome wiki is to make genomic
 > data editable.

I would broaden this slightly & say that the main point of having a 
"genome wiki", whatever that actually ends up being, is to serve 
community annotation needs, and that "making genomic data editable" is a 
key step in this direction.

There are some important use cases we should look at, illustrating how 
people are going about doing community annotation in practice. These 
include...

(1) The "AAA wiki" for Drosophila comparative annotation:

http://rana.lbl.gov/drosophila/wiki/index.php/Main_Page

(2) The honeybee genome project (advanced as a model for community 
annotation; there is a workshop on this right before CSHL Biology of 
Genomes; actually going to BOTH could be a really good idea)

http://www.genome.org/cgi/content/full/16/11/1329

http://meetings.cshl.edu/meetings/honeyb07.shtml
http://meetings.cshl.edu/meetings/genome07.shtml

[scratch going to both; Biology of Genomes is oversubscribed]

(3) The GONUTS gene ontology wiki:

http://gowiki.tamu.edu/GO/wiki/index.php/Main_Page

These all offer slightly different perspectives on the problem. The 
genome annotation projects in particular reveal a wider array of data 
than just GFF files. There are alignments, protein sequences, GO terms, 
associations, phenotypes and various other data that need a place to "hang".

In my experience one of the problems with wikis is that there are no 
fixed slots to put things: of course this anarchy is a strength too, but 
it does make it hard to find stuff.

A semantic wiki might help somewhat, in that searching it becomes easier.

In any case I view all of these issues as somewhat downstream, as you say:

 > My first priority at the moment is to try and get some kind of
 > persistent feature upload/display working; my hope is that we'll have
 > thought through the IDspace issues by the time we get to implementing
 > that part.

I agree: I think this does all need some thinking through; but if we can 
make a reasonably robust/intuitive persistent version of GFF upload (or 
perhaps, eventually, a persistent version of the current "transient" 
upload functionality that is built into GBrowse, with all its fancy 
glyph display & grouping options) then we will have made a significant 
step in framing these questions about richer meta-content.

More importantly perhaps, we will have a real tool that could fit into 
these existing kinds of genome annotation effort, and then we can start 
to prioritize future improvements in the best possible way: via direct 
feedback from users. :-)

Ian

Mitch Skinner wrote:
> Sorry for the brain dump earlier--here's a shorter, better-digested
> version.
> 
> As I see it, the main point of having a genome wiki is to make genomic
> data editable.  It's important to note that making *data* editable is
> different from making *documents* editable--I expect data to be
> interpretable using software, but while documents can be managed by
> software, actually interpreting them using software is definitely an
> unsolved problem.  The data/document distinction is reflected in the
> difference between a semantic wiki and a regular wiki--in a semantic
> wiki the content contains handles for software to grab onto, but the
> slippery, hard-to-parse natural language content of a non-semantic wiki
> is much, much harder for software to pull information out of.
> 
> For data editing, lots of UIs exist already, of course.  There's an army
> of visual basic programmers out there putting editing interfaces in
> front of relational databases.  However, those data-editing UIs (and the
> databases behind them) are relatively inflexible; if some new situation
> arises and you want to store some new kind of information then you're
> SOL until your local programmer can get around to adding support for it.
> This is the reason for the appalling success of Excel and Access as
> data-management systems.  Having done data-management work in the
> biological trenches literally right next to the lab benches, I can tell
> you that this is an ongoing pain point.  Flexibility is especially
> important in a community annotation context, where you want people to be
> able to add information without having to agree on a data model first.
> 
> So the semantic wiki and its RDF data model occupy a nice middle ground
> between fast and efficient but relatively inflexible relational
> databases and the document-style wiki that's flexible but not really
> queryable.  The data content of a semantic wiki is more useful than pure
> natural language wiki content because you can pull data out of the
> semantic wiki and do something with it, like adding graphical
> decorations to features that have certain kinds of wiki annotations.
> Generic software that handles RDF (like Piggy Bank) can also make use of
> the semantic wiki data.
> 
> To some extent we can have our cake and eat it too by by integrating RDF
> data stores ("triplestores") with relational databases.   You can start
> out with a fast, efficient relational skeleton that's already supported
> by lots of software (like chado) and then hang whatever new kinds of
> information you want off of it.  The new kinds of information go into
> the triplestore, and at query time, data from the relational tables and
> from the triplestore can be blended together.
> 
> Over time, I expect some kinds of new information to get better
> understood.  Once there is consensus on how a particular kind of
> information should be modeled, it can be moved from the triplestore into
> a set of relational tables.  When this happens, it's possible to keep
> the same client-side RDF view of the data, with the only differences
> being that the whole system gets faster, and software for processing and
> analyzing the new data gets easier to write.
> 
> So, if you buy all this, then IMO the next steps in this area are:
> 
> 1. Evaluate RDF/relational integration tools.  The main contenders
> appear to be D2R and Virtuoso.  D2R is nice because it works with
> existing databases.  Virtuoso is nice because it has good
> relational/triplestore integration.  Whether it's easier to integrate
> D2R with a triplestore or port chado to Virtuoso is an open question.
> 
> 2. Get semantic mediawiki to talk to the chosen triplestore.
> 
> 3. Figure out how the namespaces/idspaces ought to work.  We want to
> have a system that's flat enough that it's easy for people to make wiki
> links between entities, but deep enough that IDs from various
> sources/applications don't step on each other.
> 
> My first priority at the moment is to try and get some kind of
> persistent feature upload/display working; my hope is that we'll have
> thought through the IDspace issues by the time we get to implementing
> that part.
> 
> Regards,
> Mitch
> 
> 
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Gmod-ajax mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax