You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(21) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(15) |
Feb
(34) |
Mar
(20) |
Apr
(19) |
May
(15) |
Jun
(15) |
Jul
(10) |
Aug
(6) |
Sep
(3) |
Oct
(1) |
Nov
|
Dec
(3) |
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
2009 |
Jan
(3) |
Feb
|
Mar
(27) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(16) |
Aug
(19) |
Sep
(55) |
Oct
(51) |
Nov
(15) |
Dec
(10) |
2010 |
Jan
(11) |
Feb
(3) |
Mar
(22) |
Apr
(13) |
May
(9) |
Jun
(23) |
Jul
(59) |
Aug
(63) |
Sep
(24) |
Oct
(46) |
Nov
(20) |
Dec
(14) |
2011 |
Jan
(16) |
Feb
(16) |
Mar
(4) |
Apr
(9) |
May
(3) |
Jun
(5) |
Jul
(1) |
Aug
(3) |
Sep
(6) |
Oct
(7) |
Nov
|
Dec
(5) |
2012 |
Jan
(6) |
Feb
(37) |
Mar
(24) |
Apr
(24) |
May
(19) |
Jun
(26) |
Jul
(14) |
Aug
(21) |
Sep
(27) |
Oct
(16) |
Nov
(43) |
Dec
(42) |
2013 |
Jan
(24) |
Feb
(26) |
Mar
(31) |
Apr
(56) |
May
(82) |
Jun
(79) |
Jul
(30) |
Aug
(76) |
Sep
(40) |
Oct
(85) |
Nov
(105) |
Dec
(136) |
2014 |
Jan
(92) |
Feb
(84) |
Mar
(48) |
Apr
(84) |
May
(80) |
Jun
(46) |
Jul
(104) |
Aug
(70) |
Sep
(74) |
Oct
(53) |
Nov
(36) |
Dec
(3) |
2015 |
Jan
(10) |
Feb
(37) |
Mar
(52) |
Apr
(30) |
May
(101) |
Jun
(42) |
Jul
(32) |
Aug
(25) |
Sep
(50) |
Oct
(60) |
Nov
(74) |
Dec
(41) |
2016 |
Jan
(26) |
Feb
(42) |
Mar
(89) |
Apr
(26) |
May
(50) |
Jun
(66) |
Jul
(54) |
Aug
(65) |
Sep
(57) |
Oct
(9) |
Nov
(42) |
Dec
(7) |
2017 |
Jan
(37) |
Feb
(24) |
Mar
(22) |
Apr
(22) |
May
(39) |
Jun
(57) |
Jul
(10) |
Aug
(39) |
Sep
(17) |
Oct
(43) |
Nov
(18) |
Dec
(32) |
2018 |
Jan
(31) |
Feb
(29) |
Mar
(23) |
Apr
(31) |
May
(13) |
Jun
(21) |
Jul
(32) |
Aug
(42) |
Sep
(25) |
Oct
(36) |
Nov
(16) |
Dec
(5) |
2019 |
Jan
(35) |
Feb
(25) |
Mar
(13) |
Apr
(3) |
May
(9) |
Jun
(9) |
Jul
(22) |
Aug
(19) |
Sep
(4) |
Oct
(5) |
Nov
(3) |
Dec
(1) |
2020 |
Jan
(9) |
Feb
(22) |
Mar
(13) |
Apr
(7) |
May
(4) |
Jun
(8) |
Jul
(9) |
Aug
(13) |
Sep
(24) |
Oct
(8) |
Nov
(21) |
Dec
(10) |
2021 |
Jan
(9) |
Feb
(4) |
Mar
(33) |
Apr
(9) |
May
(7) |
Jun
(1) |
Jul
(8) |
Aug
(14) |
Sep
(15) |
Oct
(10) |
Nov
(10) |
Dec
(2) |
2022 |
Jan
(8) |
Feb
(14) |
Mar
(17) |
Apr
(6) |
May
(37) |
Jun
(20) |
Jul
(7) |
Aug
(17) |
Sep
(2) |
Oct
(8) |
Nov
(11) |
Dec
|
2023 |
Jan
(6) |
Feb
|
Mar
(3) |
Apr
(6) |
May
(10) |
Jun
(16) |
Jul
(2) |
Aug
(3) |
Sep
(18) |
Oct
(9) |
Nov
(8) |
Dec
(14) |
2024 |
Jan
(5) |
Feb
(2) |
Mar
(11) |
Apr
(10) |
May
(4) |
Jun
(2) |
Jul
(4) |
Aug
|
Sep
|
Oct
(5) |
Nov
(8) |
Dec
|
2025 |
Jan
(3) |
Feb
|
Mar
(3) |
Apr
(7) |
May
(5) |
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Mitch S. <mit...@be...> - 2007-03-23 01:27:16
|
I've put up Dmel 3R here: http://genome.biowiki.org/gbrowse/dmel51/prototype_gbrowse.html It rendered in less than 8 hours on one CPU, and the tiles and HTML take up around 13 gigabytes on disk. Memory usage maxed out at 342 MB. The 3R chromosome arm is the largest Dmel one at 28 Mb, or about 20 times the size of chromosome 4. So to do human we just have to scale another order of magnitude; it's CPU bound and we've got a fair amount of CPU to throw at it, but if memory usage grows linearly with the amount of sequence then we may start running into problems. We could do some more work on rendering on demand, and that would help, but you've still got to have all the features in memory at the same time to do layout, so we may not be able to reduce max memory usage very much that way. Maybe we can partition the layout job if there are empty regions of the chromosome. One problem with the current approach is that the HTML files get quite large for the lower (more zoomed-out) zoom levels. For the Genes track the entire_landmark HTML file is 1.6 megabytes; for CDS it's 7.7 megabytes. The HTML for entire_landmark for all tracks totals 24 megabytes, which is how much you would have to download to actually view the entire_landmark zoom level. Maybe we should have some kind of limit above which we turn off the HTML. Also, if we had a way of lazily loading feature information then the HTML could be a lot smaller. With a large number of features the browser starts to slow down; for me this starts to be noticeable around the 1Mbp zoom level on Windows. On other platforms firefox starts to bog down at higher (less dense) zoom levels. Again, if we turn off the html above some feature density threshold this should be less of a problem. Mitch |
From: Ian H. <ih...@be...> - 2007-03-22 21:44:43
|
Todd -- makes sense -- thanks for the info Ian Todd Harris wrote: > Hi Scott, Ian - > > Here's my take on Knoppix/liveCD having just rolled out a VMWare > virtualization project for WormBase. > > With a goal for creating an easy system for archiving as a well as a way > for "bioinformatically naive" users to run WormBase locally I decided > against knoppix for a number of reasons, in order of importance. I > reserve the right to be completely wrong on any of these points since > it's been almost a year since I touched knoppix. > > 1. Knoppix/liveCD approaches are limited to media size. > > WormBase was too big for a DVD. This might not be a limit for GMOD. > > 2. Networking. I couldn't figure out how to bridge knoppix to the host. > > 3. VMXs are easy and free to create and simple to use. > > The release of VMServer last fall now means that a licensed commercial > copy of VMWare Workstation is not longer required to create VMXs. They > run on Mac, Windows, and Linux very well with *almost* double-click > simplicity. > > 4. Knoppix is most easily run from a CD, not the hard drive. This > didn't allow for suitable performance of a WormBase application. > > 5. Knoppix isn't open-virtualization. It's a specific (Debian-based) > distro of Linux. I wanted to run a different distribution. > > Todd > > > On Mar 22, 2007, at 9:36 PM, Scott Cain wrote: > >> On Thu, 2007-03-22 at 12:29 -0700, Ian Holmes wrote: >>> Scott, this sounds great. >>> >>> I would love to see our ongoing extensions to GBrowse (Ajax, wiki...) in >>> something like this. >>> >>> Can I ask why use VMware instead of e.g. a liveCD or Knoppix type >>> arrangement? >> >> Um, I dunno, VMware is the technology that Lincoln told me about--I've >> done virtually no research other than to try it out and find that it >> works. The other day I was running a linux host while listen to iTunes >> running inside of a Windows guest streaming from my wife's Mac. The >> ability to easily move between a host OS and the guest VM is pretty >> appealing. >> >> I guess one thing I like about VMware is the ease of creation. If I >> create a VM that has chado and GBrowse installed in it, and someone else >> wants to add Apollo, all they have to do is get my VM, add it and resave >> it (at least in theory, that I haven't tried out). >> >> Scott >> >>> >>> Ian >>> >>> >>> >>> Scott Cain wrote: >>>> Hello, >>>> >>>> Recently, the people in my lab have been experimenting a lot with >>>> VMware >>>> for creating virtual machines. I am excited about the potential uses >>>> for these virtual machines for several purposes, among them the ability >>>> to create a consistent platform for teaching (like giving a classroom a >>>> disk with the VMware player and a linux VM with GBrowse preinstalled to >>>> give a tutorial), and for creating virtual machines that could be used >>>> as a 'test server' that people interested in GMOD software could get >>>> and >>>> try out without having to set up a server to do it. That is what this >>>> email is about. >>>> >>>> My goal is two-fold: first I am looking for volunteers. If anybody >>>> would like to under take the task of creating a virtual machine and >>>> populate it with GMOD software and sample data, I would be thrilled to >>>> give guidance and moral support. >>>> >>>> Second, I want to get feedback on what should be installed. There is >>>> both the question of sample data and software. Here are my thoughts so >>>> far: >>>> >>>> Chado and related core components >>>> XORT >>>> GBrowse >>>> Apollo >>>> CMap >>>> Turnkey/gmod-web >>>> BLAST graphic viewer >>>> >>>> Also, there are some things 'on the bubble' that I haven't decided >>>> whether they should be installed: >>>> >>>> Sybil >>>> Flash GViewer >>>> Textpresso >>>> BioMart >>>> DAS2 server >>>> >>>> Any thoughts on these? >>>> >>>> Thanks for your time, >>>> Scott >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> >>>> Take Surveys. Earn Cash. Influence the Future of IT >>>> Join SourceForge.net's Techsay panel and you'll get the chance to >>>> share your >>>> opinions on IT & business topics through brief surveys-and earn cash >>>> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> _______________________________________________ >>>> Gmod-gbrowse mailing list >>>> Gmo...@li... >>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> >>> ------------------------------------------------------------------------- >>> >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share your >>> opinions on IT & business topics through brief surveys-and earn cash >>> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>> >>> _______________________________________________ >>> Gmod-ajax mailing list >>> Gmo...@li... >>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax >> -------------------------------------------------------------------------- >> >> Scott Cain, Ph. D. cai...@gm... >> GMOD Coordinator (http://www.gmod.org/) 216-392-3087 >> Cold Spring Harbor Laboratory >> >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to >> share your >> opinions on IT & business topics through brief surveys-and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV_______________________________________________ >> >> Gmod-gbrowse mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
From: Todd H. <ha...@cs...> - 2007-03-22 21:42:15
|
Hi Scott, Ian - Here's my take on Knoppix/liveCD having just rolled out a VMWare virtualization project for WormBase. With a goal for creating an easy system for archiving as a well as a way for "bioinformatically naive" users to run WormBase locally I decided against knoppix for a number of reasons, in order of importance. I reserve the right to be completely wrong on any of these points since it's been almost a year since I touched knoppix. 1. Knoppix/liveCD approaches are limited to media size. WormBase was too big for a DVD. This might not be a limit for GMOD. 2. Networking. I couldn't figure out how to bridge knoppix to the host. 3. VMXs are easy and free to create and simple to use. The release of VMServer last fall now means that a licensed commercial copy of VMWare Workstation is not longer required to create VMXs. They run on Mac, Windows, and Linux very well with *almost* double-click simplicity. 4. Knoppix is most easily run from a CD, not the hard drive. This didn't allow for suitable performance of a WormBase application. 5. Knoppix isn't open-virtualization. It's a specific (Debian-based) distro of Linux. I wanted to run a different distribution. Todd On Mar 22, 2007, at 9:36 PM, Scott Cain wrote: > On Thu, 2007-03-22 at 12:29 -0700, Ian Holmes wrote: >> Scott, this sounds great. >> >> I would love to see our ongoing extensions to GBrowse (Ajax, >> wiki...) in >> something like this. >> >> Can I ask why use VMware instead of e.g. a liveCD or Knoppix type >> arrangement? > > Um, I dunno, VMware is the technology that Lincoln told me about--I've > done virtually no research other than to try it out and find that it > works. The other day I was running a linux host while listen to > iTunes > running inside of a Windows guest streaming from my wife's Mac. The > ability to easily move between a host OS and the guest VM is pretty > appealing. > > I guess one thing I like about VMware is the ease of creation. If I > create a VM that has chado and GBrowse installed in it, and someone > else > wants to add Apollo, all they have to do is get my VM, add it and > resave > it (at least in theory, that I haven't tried out). > > Scott > >> >> Ian >> >> >> >> Scott Cain wrote: >>> Hello, >>> >>> Recently, the people in my lab have been experimenting a lot with >>> VMware >>> for creating virtual machines. I am excited about the potential >>> uses >>> for these virtual machines for several purposes, among them the >>> ability >>> to create a consistent platform for teaching (like giving a >>> classroom a >>> disk with the VMware player and a linux VM with GBrowse >>> preinstalled to >>> give a tutorial), and for creating virtual machines that could be >>> used >>> as a 'test server' that people interested in GMOD software could >>> get and >>> try out without having to set up a server to do it. That is what >>> this >>> email is about. >>> >>> My goal is two-fold: first I am looking for volunteers. If anybody >>> would like to under take the task of creating a virtual machine and >>> populate it with GMOD software and sample data, I would be >>> thrilled to >>> give guidance and moral support. >>> >>> Second, I want to get feedback on what should be installed. There is >>> both the question of sample data and software. Here are my >>> thoughts so >>> far: >>> >>> Chado and related core components >>> XORT >>> GBrowse >>> Apollo >>> CMap >>> Turnkey/gmod-web >>> BLAST graphic viewer >>> >>> Also, there are some things 'on the bubble' that I haven't decided >>> whether they should be installed: >>> >>> Sybil >>> Flash GViewer >>> Textpresso >>> BioMart >>> DAS2 server >>> >>> Any thoughts on these? >>> >>> Thanks for your time, >>> Scott >>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> ---- >>> >>> -------------------------------------------------------------------- >>> ----- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to >>> share your >>> opinions on IT & business topics through brief surveys-and earn cash >>> http://www.techsay.com/default.php? >>> page=join.php&p=sourceforge&CID=DEVDEV >>> >>> >>> -------------------------------------------------------------------- >>> ---- >>> >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmo...@li... >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> >> --------------------------------------------------------------------- >> ---- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to >> share your >> opinions on IT & business topics through brief surveys-and earn cash >> http://www.techsay.com/default.php? >> page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Gmod-ajax mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-ajax > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cai...@gm... > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV________________________________ > _______________ > Gmod-gbrowse mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
From: andy l. \(RI\) <and...@bb...> - 2007-03-22 21:22:32
|
>=20 > Can I ask why use VMware instead of e.g. a liveCD or Knoppix=20 > type arrangement? In our experience there are several advantages and one disadvantage to using VMWare. As Scott has mentioned, the ability to create and distribute VMWare images is a significant plus, although the images are (or can be) large and take time to move around even on a LAN. Another significant plus is the ability to take snapshots and rollback to them in the event of a disaster (or other reason). For example, you could set up a group of machines, run a course that involved all the users installing stuff within the machine and then roll each machine back in a matter of seconds to restore each virtual machine to its pristine state ready for the next group. I often use a VM to try out a new configuration safe in the knowledge that I can go back to where I started with *no* pain at all. It is possible to run the virtual machines as separate servers (i.e. separate from the host OS IP address) too, although our sysadmins are not keen on us doing that using DHCP-allocated addresses for some reason which makes our management a bit harder (we have to configue each machine with a unique hard-coded IP address which makes the distribution more complicated than it should be). With the VMPlayer, there is also no need to buy licenses for every client machine either, so the costs are now pretty reasonable. The main disadvantage in our hands is performance. I get reasonable performance (as good as I would expect given the processor) on my cra**y laptop, but moving the same VM onto a higher specced desktop machine does not - in our experience - give a proportionate increase in performance. I have no idea why this should be the case. Later, Andy ------------- Yada, yada, yada... Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. =20 |
From: Ian H. <ih...@be...> - 2007-03-22 19:39:46
|
Yeah, I guess running VMware is a bit more appealing than rebooting the computer, for the casually interested user.. Scott Cain wrote: > On Thu, 2007-03-22 at 12:29 -0700, Ian Holmes wrote: >> Scott, this sounds great. >> >> I would love to see our ongoing extensions to GBrowse (Ajax, wiki...) in >> something like this. >> >> Can I ask why use VMware instead of e.g. a liveCD or Knoppix type >> arrangement? > > Um, I dunno, VMware is the technology that Lincoln told me about--I've > done virtually no research other than to try it out and find that it > works. The other day I was running a linux host while listen to iTunes > running inside of a Windows guest streaming from my wife's Mac. The > ability to easily move between a host OS and the guest VM is pretty > appealing. > > I guess one thing I like about VMware is the ease of creation. If I > create a VM that has chado and GBrowse installed in it, and someone else > wants to add Apollo, all they have to do is get my VM, add it and resave > it (at least in theory, that I haven't tried out). > > Scott > >> Ian >> >> >> >> Scott Cain wrote: >>> Hello, >>> >>> Recently, the people in my lab have been experimenting a lot with VMware >>> for creating virtual machines. I am excited about the potential uses >>> for these virtual machines for several purposes, among them the ability >>> to create a consistent platform for teaching (like giving a classroom a >>> disk with the VMware player and a linux VM with GBrowse preinstalled to >>> give a tutorial), and for creating virtual machines that could be used >>> as a 'test server' that people interested in GMOD software could get and >>> try out without having to set up a server to do it. That is what this >>> email is about. >>> >>> My goal is two-fold: first I am looking for volunteers. If anybody >>> would like to under take the task of creating a virtual machine and >>> populate it with GMOD software and sample data, I would be thrilled to >>> give guidance and moral support. >>> >>> Second, I want to get feedback on what should be installed. There is >>> both the question of sample data and software. Here are my thoughts so >>> far: >>> >>> Chado and related core components >>> XORT >>> GBrowse >>> Apollo >>> CMap >>> Turnkey/gmod-web >>> BLAST graphic viewer >>> >>> Also, there are some things 'on the bubble' that I haven't decided >>> whether they should be installed: >>> >>> Sybil >>> Flash GViewer >>> Textpresso >>> BioMart >>> DAS2 server >>> >>> Any thoughts on these? >>> >>> Thanks for your time, >>> Scott >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> Take Surveys. Earn Cash. Influence the Future of IT >>> Join SourceForge.net's Techsay panel and you'll get the chance to share your >>> opinions on IT & business topics through brief surveys-and earn cash >>> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmo...@li... >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to share your >> opinions on IT & business topics through brief surveys-and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Gmod-ajax mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-ajax |
From: Scott C. <cai...@gm...> - 2007-03-22 19:36:56
|
On Thu, 2007-03-22 at 12:29 -0700, Ian Holmes wrote: > Scott, this sounds great. >=20 > I would love to see our ongoing extensions to GBrowse (Ajax, wiki...) in=20 > something like this. >=20 > Can I ask why use VMware instead of e.g. a liveCD or Knoppix type=20 > arrangement? Um, I dunno, VMware is the technology that Lincoln told me about--I've done virtually no research other than to try it out and find that it works. The other day I was running a linux host while listen to iTunes running inside of a Windows guest streaming from my wife's Mac. The ability to easily move between a host OS and the guest VM is pretty appealing. I guess one thing I like about VMware is the ease of creation. If I create a VM that has chado and GBrowse installed in it, and someone else wants to add Apollo, all they have to do is get my VM, add it and resave it (at least in theory, that I haven't tried out). Scott >=20 > Ian >=20 >=20 >=20 > Scott Cain wrote: > > Hello, > >=20 > > Recently, the people in my lab have been experimenting a lot with VMwar= e > > for creating virtual machines. I am excited about the potential uses > > for these virtual machines for several purposes, among them the ability > > to create a consistent platform for teaching (like giving a classroom a > > disk with the VMware player and a linux VM with GBrowse preinstalled to > > give a tutorial), and for creating virtual machines that could be used > > as a 'test server' that people interested in GMOD software could get an= d > > try out without having to set up a server to do it. That is what this > > email is about. > >=20 > > My goal is two-fold: first I am looking for volunteers. If anybody > > would like to under take the task of creating a virtual machine and > > populate it with GMOD software and sample data, I would be thrilled to > > give guidance and moral support. > >=20 > > Second, I want to get feedback on what should be installed. There is > > both the question of sample data and software. Here are my thoughts so > > far: > >=20 > > Chado and related core components > > XORT > > GBrowse > > Apollo > > CMap > > Turnkey/gmod-web > > BLAST graphic viewer > >=20 > > Also, there are some things 'on the bubble' that I haven't decided > > whether they should be installed: > >=20 > > Sybil > > Flash GViewer > > Textpresso > > BioMart > > DAS2 server > >=20 > > Any thoughts on these? > >=20 > > Thanks for your time, > > Scott > >=20 > > =20 > >=20 > >=20 > > -----------------------------------------------------------------------= - > >=20 > > -----------------------------------------------------------------------= -- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share= your > > opinions on IT & business topics through brief surveys-and earn cash > > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID= =3DDEVDEV > >=20 > >=20 > > -----------------------------------------------------------------------= - > >=20 > > _______________________________________________ > > Gmod-gbrowse mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >=20 > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share y= our > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax --=20 ------------------------------------------------------------------------ Scott Cain, Ph. D. cai...@gm... GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory |
From: Ian H. <ih...@be...> - 2007-03-22 19:29:58
|
Scott, this sounds great. I would love to see our ongoing extensions to GBrowse (Ajax, wiki...) in something like this. Can I ask why use VMware instead of e.g. a liveCD or Knoppix type arrangement? Ian Scott Cain wrote: > Hello, > > Recently, the people in my lab have been experimenting a lot with VMware > for creating virtual machines. I am excited about the potential uses > for these virtual machines for several purposes, among them the ability > to create a consistent platform for teaching (like giving a classroom a > disk with the VMware player and a linux VM with GBrowse preinstalled to > give a tutorial), and for creating virtual machines that could be used > as a 'test server' that people interested in GMOD software could get and > try out without having to set up a server to do it. That is what this > email is about. > > My goal is two-fold: first I am looking for volunteers. If anybody > would like to under take the task of creating a virtual machine and > populate it with GMOD software and sample data, I would be thrilled to > give guidance and moral support. > > Second, I want to get feedback on what should be installed. There is > both the question of sample data and software. Here are my thoughts so > far: > > Chado and related core components > XORT > GBrowse > Apollo > CMap > Turnkey/gmod-web > BLAST graphic viewer > > Also, there are some things 'on the bubble' that I haven't decided > whether they should be installed: > > Sybil > Flash GViewer > Textpresso > BioMart > DAS2 server > > Any thoughts on these? > > Thanks for your time, > Scott > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > ------------------------------------------------------------------------ > > _______________________________________________ > Gmod-gbrowse mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |
From: Mitch S. <mit...@be...> - 2007-02-26 22:46:35
|
I wrote: > So far, I've been testing my changes by doing diffs of the tiles; I'm > pretty sure I've only committed changes that generate tiles that are > bit-exactly the same as before. The tiles that I've generated with this > approach aren't the same bit-for-bit, but they do look the same (with > one exception: right ends of genes are now getting rendered correctly). > I think the difference is in the palette, so the tiles could still be > correct even if they're different. So I'm not yet fully convinced that > it's rendering exactly correctly, but it does look right to me. > I got the image diff program that the cairo project uses for its tests, which handles palette differences without a problem. It also tells you how many pixels are different, which is a useful metric IMO. I used it to compare the gbrowse-ajax-tiledimage version of tile generation with my experimental panel-based one, and after a few days of fiddling with the per-tile panel bounds, it's close. It produces the same images pixel-for-pixel as TiledImage, with the exception of the 2kbp and 1kbp zoom levels, where the gridlines are shifted by one pixel. I spent some time trying to chase this down, without much success. My best guess is that since 1kbp and 2kbp are right on a rounding boundary (1 and 0.5 bases per pixel, respectively) there's some kind of rounding difference. At any rate, while the difference is annoying, I think it's close enough that I'm going to go ahead and commit this. For the time being I plan to continue with this approach, unless something comes up that shows it's unworkable. Mitch |
From: Mitch S. <mit...@be...> - 2007-02-21 22:36:36
|
Since I'm going to go ahead with some fairly invasive changes, I've created a tag in the server directory that marks the current TiledImage approach. Also, just before that, I committed a change that fixes the right end of plus strand genes. At some point I intend to update the demo with new images but it's not at the top of my list. Mitch |
From: Ian H. <ih...@be...> - 2007-02-20 23:48:01
|
Mitch, Thanks for all this; I'm a little behind you & Chris on the discussion of RDF and semantic wikis and so on; but I did have a vague hand-wavy comment regarding this: > As I see it, the main point of having a genome wiki is to make genomic > data editable. I would broaden this slightly & say that the main point of having a "genome wiki", whatever that actually ends up being, is to serve community annotation needs, and that "making genomic data editable" is a key step in this direction. There are some important use cases we should look at, illustrating how people are going about doing community annotation in practice. These include... (1) The "AAA wiki" for Drosophila comparative annotation: http://rana.lbl.gov/drosophila/wiki/index.php/Main_Page (2) The honeybee genome project (advanced as a model for community annotation; there is a workshop on this right before CSHL Biology of Genomes; actually going to BOTH could be a really good idea) http://www.genome.org/cgi/content/full/16/11/1329 http://meetings.cshl.edu/meetings/honeyb07.shtml http://meetings.cshl.edu/meetings/genome07.shtml [scratch going to both; Biology of Genomes is oversubscribed] (3) The GONUTS gene ontology wiki: http://gowiki.tamu.edu/GO/wiki/index.php/Main_Page These all offer slightly different perspectives on the problem. The genome annotation projects in particular reveal a wider array of data than just GFF files. There are alignments, protein sequences, GO terms, associations, phenotypes and various other data that need a place to "hang". In my experience one of the problems with wikis is that there are no fixed slots to put things: of course this anarchy is a strength too, but it does make it hard to find stuff. A semantic wiki might help somewhat, in that searching it becomes easier. In any case I view all of these issues as somewhat downstream, as you say: > My first priority at the moment is to try and get some kind of > persistent feature upload/display working; my hope is that we'll have > thought through the IDspace issues by the time we get to implementing > that part. I agree: I think this does all need some thinking through; but if we can make a reasonably robust/intuitive persistent version of GFF upload (or perhaps, eventually, a persistent version of the current "transient" upload functionality that is built into GBrowse, with all its fancy glyph display & grouping options) then we will have made a significant step in framing these questions about richer meta-content. More importantly perhaps, we will have a real tool that could fit into these existing kinds of genome annotation effort, and then we can start to prioritize future improvements in the best possible way: via direct feedback from users. :-) Ian Mitch Skinner wrote: > Sorry for the brain dump earlier--here's a shorter, better-digested > version. > > As I see it, the main point of having a genome wiki is to make genomic > data editable. It's important to note that making *data* editable is > different from making *documents* editable--I expect data to be > interpretable using software, but while documents can be managed by > software, actually interpreting them using software is definitely an > unsolved problem. The data/document distinction is reflected in the > difference between a semantic wiki and a regular wiki--in a semantic > wiki the content contains handles for software to grab onto, but the > slippery, hard-to-parse natural language content of a non-semantic wiki > is much, much harder for software to pull information out of. > > For data editing, lots of UIs exist already, of course. There's an army > of visual basic programmers out there putting editing interfaces in > front of relational databases. However, those data-editing UIs (and the > databases behind them) are relatively inflexible; if some new situation > arises and you want to store some new kind of information then you're > SOL until your local programmer can get around to adding support for it. > This is the reason for the appalling success of Excel and Access as > data-management systems. Having done data-management work in the > biological trenches literally right next to the lab benches, I can tell > you that this is an ongoing pain point. Flexibility is especially > important in a community annotation context, where you want people to be > able to add information without having to agree on a data model first. > > So the semantic wiki and its RDF data model occupy a nice middle ground > between fast and efficient but relatively inflexible relational > databases and the document-style wiki that's flexible but not really > queryable. The data content of a semantic wiki is more useful than pure > natural language wiki content because you can pull data out of the > semantic wiki and do something with it, like adding graphical > decorations to features that have certain kinds of wiki annotations. > Generic software that handles RDF (like Piggy Bank) can also make use of > the semantic wiki data. > > To some extent we can have our cake and eat it too by by integrating RDF > data stores ("triplestores") with relational databases. You can start > out with a fast, efficient relational skeleton that's already supported > by lots of software (like chado) and then hang whatever new kinds of > information you want off of it. The new kinds of information go into > the triplestore, and at query time, data from the relational tables and > from the triplestore can be blended together. > > Over time, I expect some kinds of new information to get better > understood. Once there is consensus on how a particular kind of > information should be modeled, it can be moved from the triplestore into > a set of relational tables. When this happens, it's possible to keep > the same client-side RDF view of the data, with the only differences > being that the whole system gets faster, and software for processing and > analyzing the new data gets easier to write. > > So, if you buy all this, then IMO the next steps in this area are: > > 1. Evaluate RDF/relational integration tools. The main contenders > appear to be D2R and Virtuoso. D2R is nice because it works with > existing databases. Virtuoso is nice because it has good > relational/triplestore integration. Whether it's easier to integrate > D2R with a triplestore or port chado to Virtuoso is an open question. > > 2. Get semantic mediawiki to talk to the chosen triplestore. > > 3. Figure out how the namespaces/idspaces ought to work. We want to > have a system that's flat enough that it's easy for people to make wiki > links between entities, but deep enough that IDs from various > sources/applications don't step on each other. > > My first priority at the moment is to try and get some kind of > persistent feature upload/display working; my hope is that we'll have > thought through the IDspace issues by the time we get to implementing > that part. > > Regards, > Mitch > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax |
From: Hilmar L. <hl...@gm...> - 2007-02-20 04:19:46
|
It might be interesting to have a look at 'WikiProteins' (of which there only seems to be a flash demo yet): http://www.wikiprofessional.info/ This was featured in a Nature news article. Apparently it's coming out of a company called Knewco: http://www.knewco.com/ The people of that company also presented at KR-MED 2006 ('An Online Ontology: WiktionaryZ'), see http://ontoworld.org/wiki/WiktionaryZ. There is an RDF export. -hilmar On Feb 19, 2007, at 4:19 AM, Mitch Skinner wrote: > Sorry for the brain dump earlier--here's a shorter, better-digested > version. > > As I see it, the main point of having a genome wiki is to make genomic > data editable. It's important to note that making *data* editable is > different from making *documents* editable--I expect data to be > interpretable using software, but while documents can be managed by > software, actually interpreting them using software is definitely an > unsolved problem. The data/document distinction is reflected in the > difference between a semantic wiki and a regular wiki--in a semantic > wiki the content contains handles for software to grab onto, but the > slippery, hard-to-parse natural language content of a non-semantic > wiki > is much, much harder for software to pull information out of. > > For data editing, lots of UIs exist already, of course. There's an > army > of visual basic programmers out there putting editing interfaces in > front of relational databases. However, those data-editing UIs > (and the > databases behind them) are relatively inflexible; if some new > situation > arises and you want to store some new kind of information then you're > SOL until your local programmer can get around to adding support > for it. > This is the reason for the appalling success of Excel and Access as > data-management systems. Having done data-management work in the > biological trenches literally right next to the lab benches, I can > tell > you that this is an ongoing pain point. Flexibility is especially > important in a community annotation context, where you want people > to be > able to add information without having to agree on a data model first. > > So the semantic wiki and its RDF data model occupy a nice middle > ground > between fast and efficient but relatively inflexible relational > databases and the document-style wiki that's flexible but not really > queryable. The data content of a semantic wiki is more useful than > pure > natural language wiki content because you can pull data out of the > semantic wiki and do something with it, like adding graphical > decorations to features that have certain kinds of wiki annotations. > Generic software that handles RDF (like Piggy Bank) can also make > use of > the semantic wiki data. > > To some extent we can have our cake and eat it too by by > integrating RDF > data stores ("triplestores") with relational databases. You can > start > out with a fast, efficient relational skeleton that's already > supported > by lots of software (like chado) and then hang whatever new kinds of > information you want off of it. The new kinds of information go into > the triplestore, and at query time, data from the relational tables > and >> from the triplestore can be blended together. > > Over time, I expect some kinds of new information to get better > understood. Once there is consensus on how a particular kind of > information should be modeled, it can be moved from the triplestore > into > a set of relational tables. When this happens, it's possible to keep > the same client-side RDF view of the data, with the only differences > being that the whole system gets faster, and software for > processing and > analyzing the new data gets easier to write. > > So, if you buy all this, then IMO the next steps in this area are: > > 1. Evaluate RDF/relational integration tools. The main contenders > appear to be D2R and Virtuoso. D2R is nice because it works with > existing databases. Virtuoso is nice because it has good > relational/triplestore integration. Whether it's easier to integrate > D2R with a triplestore or port chado to Virtuoso is an open question. > > 2. Get semantic mediawiki to talk to the chosen triplestore. > > 3. Figure out how the namespaces/idspaces ought to work. We want to > have a system that's flat enough that it's easy for people to make > wiki > links between entities, but deep enough that IDs from various > sources/applications don't step on each other. > > My first priority at the moment is to try and get some kind of > persistent feature upload/display working; my hope is that we'll have > thought through the IDspace issues by the time we get to implementing > that part. > > Regards, > Mitch > > > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== |
From: Andrew U. <and...@gm...> - 2007-02-19 21:56:15
|
Hi all... I just committed a prototype on-demand tile rendering CGI script (get-tile.pl). The idea is to fill a database with primitives first (the fast step), and only render the tiles as the client requests them (the slow step). The script requires an htaccess Apache config file (provided), placed in the same directory as your "tileinfo.xml" file, that launches the CGI only when a tile file can't be found. So the client doesn't care/doesn't even have to know whether the tile is rendered or not - if a tile is on disk, it is returned, if not, the CGI script renders and returns it (also saving it to disk for next time). Thanks to Mitch for the suggestion of doing it this way. This is just a rough prototype and could be optimized and improved a LOT ("grep -i todo get-tile.pl" to see how). A major problem with it right now is that it fails to render GD::Font primitives on the fly. On some machines, rendering a GD::Font causes GD to crash when rendering the PNG. On other machines, you get weird output like this: http://biowiki.org/~avu/badtile.png This is for the demo volvox database from the GBrowse tutorial: http://www.gmod.org/nondrupal/tutorial/tutorial.html which makes it seem pretty clear why GD is crashing... but I'm not sure where those refs are failing to be deferenced. Now I know that GD::Font primitives are being stored and recalled correctly. For example, if I run generate-tiles.pl with "-m 1" to fill the database ONLY, then later run it with "-m 2" to render tiles based on a prior fill, they render just fine. It is something about how the CGI restores them that's a problem. I've tried to debug it with no success... alas, I'm pressed for time and can't really finish the job right... so I'm turning in what I have and leaving this on the back burner for now, but if anyone wants to take a stab at it, it would be welcome. Andrew |
From: Mitch S. <mit...@be...> - 2007-02-19 09:19:54
|
Sorry for the brain dump earlier--here's a shorter, better-digested version. As I see it, the main point of having a genome wiki is to make genomic data editable. It's important to note that making *data* editable is different from making *documents* editable--I expect data to be interpretable using software, but while documents can be managed by software, actually interpreting them using software is definitely an unsolved problem. The data/document distinction is reflected in the difference between a semantic wiki and a regular wiki--in a semantic wiki the content contains handles for software to grab onto, but the slippery, hard-to-parse natural language content of a non-semantic wiki is much, much harder for software to pull information out of. For data editing, lots of UIs exist already, of course. There's an army of visual basic programmers out there putting editing interfaces in front of relational databases. However, those data-editing UIs (and the databases behind them) are relatively inflexible; if some new situation arises and you want to store some new kind of information then you're SOL until your local programmer can get around to adding support for it. This is the reason for the appalling success of Excel and Access as data-management systems. Having done data-management work in the biological trenches literally right next to the lab benches, I can tell you that this is an ongoing pain point. Flexibility is especially important in a community annotation context, where you want people to be able to add information without having to agree on a data model first. So the semantic wiki and its RDF data model occupy a nice middle ground between fast and efficient but relatively inflexible relational databases and the document-style wiki that's flexible but not really queryable. The data content of a semantic wiki is more useful than pure natural language wiki content because you can pull data out of the semantic wiki and do something with it, like adding graphical decorations to features that have certain kinds of wiki annotations. Generic software that handles RDF (like Piggy Bank) can also make use of the semantic wiki data. To some extent we can have our cake and eat it too by by integrating RDF data stores ("triplestores") with relational databases. You can start out with a fast, efficient relational skeleton that's already supported by lots of software (like chado) and then hang whatever new kinds of information you want off of it. The new kinds of information go into the triplestore, and at query time, data from the relational tables and from the triplestore can be blended together. Over time, I expect some kinds of new information to get better understood. Once there is consensus on how a particular kind of information should be modeled, it can be moved from the triplestore into a set of relational tables. When this happens, it's possible to keep the same client-side RDF view of the data, with the only differences being that the whole system gets faster, and software for processing and analyzing the new data gets easier to write. So, if you buy all this, then IMO the next steps in this area are: 1. Evaluate RDF/relational integration tools. The main contenders appear to be D2R and Virtuoso. D2R is nice because it works with existing databases. Virtuoso is nice because it has good relational/triplestore integration. Whether it's easier to integrate D2R with a triplestore or port chado to Virtuoso is an open question. 2. Get semantic mediawiki to talk to the chosen triplestore. 3. Figure out how the namespaces/idspaces ought to work. We want to have a system that's flat enough that it's easy for people to make wiki links between entities, but deep enough that IDs from various sources/applications don't step on each other. My first priority at the moment is to try and get some kind of persistent feature upload/display working; my hope is that we'll have thought through the IDspace issues by the time we get to implementing that part. Regards, Mitch |
From: Andrew U. <and...@gm...> - 2007-02-18 21:50:08
|
On 2/18/07, Mitch Skinner <mit...@be...> wrote: > Nice. Since this is a wishlist item, I believe you get to be the one to > cross it off. Maybe a "completed items" section on the wishlist? > > Mitch I like the <strike> tag myself. OK, I moved some code around a bit to make it easier to put a GUI wrap on track reordering. All one needs to do is call: cif.moveTrack (trackName, newPosition); and the track order (including the track buttons) will be changed accordingly. E.g. look at how the current event handler for the "Move" button (the function TrackControlComponent_moveTrackButtonHandler() in TrackControlComponent.js) does it. The demo at genome.biowki.org isn't updated with the new code since there's no outward functionality change, but it's been committed. Andrew |
From: Mitch S. <mit...@be...> - 2007-02-16 18:08:32
|
I wrote: > For yeast_chr1, it takes 3 and a half minutes to render all tracks + all > zooms and uses 93MB of RAM. This is about 1/4 the space and less than > 1/4 of the time taken by current CVS HEAD with in-memory primitive > storage. For Drosophila chr. 4 (all tracks + zooms), it takes 31 and a > half minutes and uses 200 MB of RAM, which is less than 1/6 the space > and about 40% of the time taken by CVS HEAD. Also, I'm not sure how much I believe DProf anymore, but here's what the profile looks like for Dmel chr. 4 mRNA at zoom 1. It's odd because I don't recall GD::Image::_new taking up so much time before. I'd expect it to increase in relative terms since I've been working on other stuff, but I think it's increased in absolute terms. Total Elapsed Time = 143.3461 Seconds User+System Time = 142.6761 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 67.3 96.13 96.130 11492 0.0084 0.0084 GD::Image::_new 23.0 32.89 32.890 8656 0.0038 0.0038 GD::Image::png 5.35 7.631 7.631 115396 0.0000 0.0000 GD::Image::line 4.43 6.321 9.205 864 0.0073 0.0107 Bio::Graphics::Glyph::collides 3.44 4.910 4.924 115080 0.0000 0.0000 Bio::Graphics::Panel::map_pt 3.12 4.456 4.456 1276 0.0035 0.0035 Bio::Graphics::Glyph::_collision_k eys 1.44 2.049 2.193 12817 0.0002 0.0002 main::writeHTML 1.43 2.040 2.040 8656 0.0002 0.0002 GD::Image::copy 1.10 1.567 3.156 412 0.0038 0.0077 Bio::Graphics::Glyph::add_collisio n 1.08 1.546 138.26 1 1.5461 138.26 main::renderTileRange 0.36 0.510 0.510 11492 0.0000 0.0000 GD::Image::DESTROY |
From: Mitch S. <mit...@be...> - 2007-02-16 04:08:25
|
Over the last week or so I've been experimenting with a different way of doing the rendering. Performance-wise, it takes significantly less time and space. Correctness-wise, I haven't found any problems but checking it is a bit difficult. Code-elegance wise, it's worse, but I think that with some Panel api changes it could be cleaned up a lot by moving most of the code into a Panel subclass (it currently fiddles in odd ways with some of the Panel's state). For yeast_chr1, it takes 3 and a half minutes to render all tracks + all zooms and uses 93MB of RAM. This is about 1/4 the space and less than 1/4 of the time taken by current CVS HEAD with in-memory primitive storage. For Drosophila chr. 4 (all tracks + zooms), it takes 31 and a half minutes and uses 200 MB of RAM, which is less than 1/6 the space and about 40% of the time taken by CVS HEAD. I've put a set of tiles generated with this code here: http://genome.biowiki.org/gbrowse/dmel-noti/prototype_gbrowse.html http://genome.biowiki.org/gbrowse/yeast_chr1-noti/prototype_gbrowse.html and I'd appreciate any reports of incorrectly rendered tiles there. The rest of this email is a description of why I took this approach and how it's done. If you just want to render big chromosomes without reading all the details, then I'll be committing this code soon, either to HEAD or on a branch. There were two things that pointed me in this direction: the gridline thing, and the empty tile thing. The gridline thing was when I tried to avoid storing gridline primitives by just drawing the first tile's gridlines on every tile, without going through TiledImage. At first I thought I had to use the first tile's gridlines because otherwise the gridlines would have been drawn off-tile (because without going through TiledImage I didn't have TiledImage's primitive position translation functionality). The problem was that the first tile's gridlines were _different_ from the rest, because of the Panel's edge behavior at the first gridline. This all could have been solved (by copying and adjusting the gridline code, if nothing else), but there's a similar issue with "global feature" tracks like DNA and translation, and the ruler. For those, and for the gridlines, I wanted to be able to generate just one rendering tile's worth of primitives at a time, and not have to store all of the primitives for the entire chromosome, which take up a lot of space on these primitive-intensive tracks. My solution was to create a Panel for each rendering tile, and use that to draw the gridlines and global features. One problem with this approach is what happens when some primitive runs off the end of a rendering tile (which is one of the main reasons TiledImage exists in the first place). For the DNA and translation tracks, there are no labels or other primitives that extend in unpredictable ways, so it's not a big problem there. For the ruler (where the labels do have some extra width), my solution was to have the per-tile Panel extend a short distance (currently 100px) beyond the rendering tile on both sides. This way, any primitive which extends less than 100px off-tile does get rendered correctly on both sides of the tile boundary. The distance is set by the "$global_padding" variable in my experimental version of generate-tiles.pl. The empty tile thing had to with the fact that GD::Image::png was taking up a lot of time in the profiles I was generating. I figured I could use the glyph boxes used by the imagemap code to figure out which tiles were blank, and avoid generating those (by hardlinking the file name to a previous blank tile). This worked, and it gave a speedup on CVS HEAD of up to 12% on some tracks, but it got me thinking: if I knew the pixel span of each glyph in advance (which is what those boxes provide) then for each rendering tile I could use that information to only render the glyphs that overlap the tile. I also spent some time reading the panel code, and I realized that I could get the TiledImage primitive position translation functionality almost for free by giving the Panel a negative pad_left value. And only rendering a tile's worth of glyphs at a time did something similar to TiledImage's "only render the current tile's worth of primitives" functionality, without having to store any primitives. For non-global features I'm still using a chromosome-wide Panel to do the bumping, so the layout is still right. So this approach doesn't use TiledImage (or BatchTiledImage, or DBPrimStorage, or MemoryPrimStorage) at all. Which is a fairly radical change IMO but it's the only way I see to really scale to large chromosomes. I do like the fact that TiledImage is a nice clean abstraction, but there's no way to store a human chr. 1 worth of primitives in memory, and even if you had an infinitely fast disk storage method for primitives the (de)serialization overhead would still kill you, as far as I can tell. Actually, now that I think about it, I remember Data::Dumper (serialization) taking a nontrivial amount of time in the database primitive storage profiling I did last year, but I'm not sure about eval (deserialization). As for whether we should ditch TiledImage, I think there are two remaining questions: rendering on demand and correctness. If this approach can do both of those things, then I think it's the way we should go. I believe this can be applied to the rendering-on-demand scenario if we use mod_perl and do the layout step on startup. This would take a fair amount of RAM but it's only necessary for tracks that haven't been fully rendered yet. One plus of that approach is that handling single new features gets easier. Storing the layout in a database is theoretically possible but saving and restoring that information seems pretty complicated, unless we just serialize the entire panel. So far, I've been testing my changes by doing diffs of the tiles; I'm pretty sure I've only committed changes that generate tiles that are bit-exactly the same as before. The tiles that I've generated with this approach aren't the same bit-for-bit, but they do look the same (with one exception: right ends of genes are now getting rendered correctly). I think the difference is in the palette, so the tiles could still be correct even if they're different. So I'm not yet fully convinced that it's rendering exactly correctly, but it does look right to me. If you're curious, I've appended the code for the meat of the tile rendering below. The things to pay attention to here are how the per-tile panel is set up, any if statement that checks $is_global, and the $small_tile_gd->copy call near the end. @per_tile_glyphs is an array of arrays; for each rendering tile, it has an array of the glyphs that overlap that tile. Comments? Mitch for (my $x = $first_large_tile; $x <= $last_large_tile; $x++) { my $large_tile_gd; my $pixel_offset = (0 == $x) ? 0 : $global_padding; # we want to skip rendering whole tile if it's blank, but only if # there's a blank tile to which to hardlink that's already rendered if (defined($per_tile_glyphs[$x]) || (!defined($blankTile))) { # rendering tile bounds in pixel coordinates my $rtile_left = ($x * $rendering_tilewidth) - $pixel_offset; my $rtile_right = (($x + 1) * $rendering_tilewidth) + $global_padding - 1; # rendering tile bounds in bp coordinates my $first_base = ($rtile_left / $big_panel->scale) + 1; my $last_base = int(($rtile_right / $big_panel->scale) + 1); #print "pixel_offset: $pixel_offset first_base: $first_base last_base: $last_base " . tv_interval($start_time) . "\n"; # set up the per-rendering-tile panel, with the right # bp coordinates and pixel width my %tpanel_args = %$panel_args; $tpanel_args{-start} = $first_base; $tpanel_args{-end} = $last_base; $tpanel_args{-stop} = $last_base; $tpanel_args{-width} = $rtile_right - $rtile_left + 1; my $tile_panel = Bio::Graphics::Panel->new(%tpanel_args); if ($is_global) { # for global features we can just render everything # using the per-tile panel my @segments = $CONFIG->name2segments($landmark_name . ":" . $first_base . ".." . $last_base, $db, undef, 1); my $small_segment = $segments[0]; $tile_panel->add_track($small_segment, @$track_settings); $large_tile_gd = $tile_panel->gd(); } else { # add generic track to the tile panel, so that the # gridlines have the right height $tile_panel->add_track(-glyph => 'generic', @$track_settings, -height => $image_height); $large_tile_gd = $tile_panel->gd(); #print "got tile panel gd " . tv_interval($start_time) . "\n"; if (defined $per_tile_glyphs[$x]) { # some glyphs call set_pen on the big_panel; # we want that to go to the right GD object $big_panel->{gd} = $large_tile_gd; #move rendering onto the tile $big_panel->pad_left(-$rtile_left); # draw the glyphs for the current rendering tile foreach my $glyph (@{$per_tile_glyphs[$x]}) { # sometimes the glyph positions itself # using the panel's pad_left, sometimes # it just uses the x-coordinate it gets # in the draw method. We want them both # to be -$rtile_left. $glyph->draw($large_tile_gd, -$rtile_left, 0); } } } $tile_panel->finished; $tile_panel = undef; } # now to break up the large tile into small tiles and write them to PNG on disk... SMALLTILE: for (my $y = 0; $y < $small_per_large; $y++) { my $small_tile_num = $x * $small_per_large + $y; if ( ($small_tile_num >= $first_tile) && ($small_tile_num <= $last_tile) ) { # do we print it? my $outfile = "${tile_prefix}${small_tile_num}.png"; if (!$is_global) { writeHTML($tile_prefix, $x, $y, $small_tile_num, $tilewidth_pixels, $image_height, $track_num, $html_current_outdir, $per_tile_glyphs[$x]); if (!defined($nonempty_smalltiles[$x]{$y})) { if (defined($blankTile)) { #print "linking $outfile to $blankTile\n"; link $blankTile, $outfile || die "could not link blank tile: $!\n"; next SMALLTILE; } else { $blankTile = $outfile; } } } open (TILE, ">${outfile}") or die "ERROR: could not open ${outfile}!\n"; my $small_tile_gd = GD::Image->new($tilewidth_pixels, $image_height, 0); $small_tile_gd->copy($large_tile_gd, 0, 0, $y * $tilewidth_pixels + $pixel_offset, 0, $tilewidth_pixels, $image_height); print TILE $small_tile_gd->png or die "ERROR: could not write to ${outfile}!\n"; warn "done printing ${outfile}\n" if $verbose >= 2; } } } |
From: Andrew U. <and...@gm...> - 2007-02-14 22:01:32
|
Hi everyone... I came across a recent development that is useful if we decide to go with database storage of graphics primitives - an algorithm+data structure called Nested Containment List (NCList). abstract: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl647v1 The goal is to optimize 1-D range queries for databases storing intervals - exactly like our queries for all primitives overlapping some tile (that is, if we remove the y-coord query, which is useless anyway - tiles should always be the height of the entire track, reducing it to a 1-D problem). The authors claim that they can improve query and database construction time by 5 to 500 times over MySQL multi-column indexing, binning, and PostgreSQL R-trees. Their benchmarks looks pretty damn good, and I wonder how it would perform on our datasets. The database update performance seems to be an unexplored issue, so this should only be applied to static datasets, which is pretty much the case with us - if our feature set ever changes, we need to re-do the layout and rebuild the primitives database completely, and layout/database building is the quick step compared to querying anyway (that is if BioPerl can handle layout for really large chromosomes). They have a C implementation and a Python module implementation, so getting this to work for us might not be instantaneous. But if their claims are correct, this could outperform in-memory storage in time. Also, memory use for in-memory storage may become a problem for really large chromosomes, but the NCList implementation seems to be focused on being RAM-efficient and disk-intensive, yet still able to be fast... I really have no idea how well a Perl program will work when it runs out of RAM and starts swapping to disk, but the NCList tests seem to show that their disk-intensive approach is still pretty fast. Of course, only benchmarks on our data will tell for sure. Oh yeah, their results in the paper are shown for the low-RAM, disk-intensive implementation in Python. There is also a RAM-intensive version that would probably be faster, but require machines with more RAM... the C implementation they have will of course be even faster, also. Their algorithm actually would be applicable to any feature database outside of what we're doing though, so take note. Andrew |
From: Mitch S. <mit...@be...> - 2007-02-13 23:48:57
|
Chris Mungall wrote: > Existing relational databases can be wrapped using tools such as D2RQ. > There are definitely efficiency considerations. I'm exploring some > alternatives of the home-grown variety but don't have anything to > report yet. I think writing back to a non-generic schema from RDF is > difficult, but I'm not sure we need to do this. Well, I was vaguely thinking of having a semantic wiki be the interface to editing all of the data. For example, from chado we could generate semantic wiki text something like this: ============= Feature [http://genome.biowiki.org/genomes/scer/I#foo foo] is a [[feature type::SOFA:gene]] on [[reference sequence::SGD:I]] from [[start base:=5000bp]] to [[end base:=6000bp]]. It is involved with [[go term::GO:0019202|amino acid kinase activity]]. ============= This is using the Semantic Wikipedia syntax: http://ontoworld.org/wiki/Help:Annotation So when someone edits that wiki text and saves it, I was hoping that the right relational table<->RDF mapping (e.g., with D2R) would take the attributes that came from chado originally and automagically put them back in the right place back in chado. In other words, the D2R mapping (or possibly the semantic wiki software) could be taught to treat the "feature type", "reference sequence", "start base", "end base", and "go term" attributes specially in the RDF->DB direction. If this isn't already implemented, I think it's worthwhile to do. Also, if there were attributes that didn't have a "treat specially" mapping, they would automagically go into a triplestore. I'm hoping not to have to implement that myself but I think it's worthwhile as well. This is a big part of what "genome wiki" means to me--being able to edit all of the information (both from chado and from the triplestore), hopefully all through the same interface. Also, if this kind of editing is already implemented then that saves us from having to implement a custom genomic information editor in the browser. If we wanted to, later on we might implement some kind of click&drag editing interface in the browser, or somehow plug in AmiGO for adding GO terms but that would be optional. I agree with all of the things you say below, but it seems like you're mostly talking about the us->community direction. Querying chado+triplestore seems relatively straightforward <fingers crossed>, and annotation uploading makes sense to me (e.g., gmod_bulk_load_gff3.pl); it's the editing that I'm more worried about. I had the impression (hope) that it was mostly implemented and we could just wire it all together in a smart way but if not I'd be inclined to take a stab at it. Mitch > We want a community-based way of sharing data that fits neatly into 1d > feature paradigm, and we want this to be fast, standards-based and > interoperable with current genomics tools, so genomics datamodels and > exchange formats will continue to play a part. We may also want a way > of exposing the inherent semantics in those little boxes to computers > that don't speak genomics. It's unclear exactly who gains, when and > how, but the cost is not so high (avenues include: SPARQL queries for > genome databases; Das2rdf; use of microformats and rdf in gbrowse > display_). > > Then there are the annotations on these little boxes; statements about > the underlying biological entities. On the one hand this is the wild > untrammelled frontier - these entities may be linked to other entities > which are themselves described by a composites of other interlinked > entities. We can take a ride traversing these links through multiple > levels of biological granularity, from atomic structures through to > anatomical structures, physiological processes, phenotypes, > environments, life-forms living in the hydrothermal vents on Jupiter's > moons... OK, perhaps RDF can't deliver on the astrobiology quite yet, > but it seems that this open-ended world beyond genomics is a good > reason to try RDF. > > Orthogonal to this is the "reification" model. Even in our wiki-esque > community model we want to conform to good annotation practice and > encourage all links to be accompanied with provenance, evidence and so > on. > > What does this mean in terms of implementation? It could be fairly > simple. GBrowse could be augmented by a 3rd party triple-store. The > primary datastore would continue to be the genomics schema of choice, > eg chado, but freeform 3rd party annotations on features could go in > the triple-store. I have a few ideas about how this could be layered > on top of a gbrowse type display, and you have the advantage of > transparency to generic semweb software, to the extent it exists in > usable forms at the moment. > > This seems a fairly low risk approach to the community annotation > store problem. In fact, other approaches will be higher risk as they > will require rolling your own technology. Triplestores can be slow for > complex multi-join queries but I think many of your use cases will > involve simple neighbourhood graphs. Queries such as "find all genes > upstream of genes in a pathway implicated in disease X with function > Y" will perform dreadfully if you take the ontological closure into > account. We're working on technology for this in the Berkeley > Ontologies Project but you shouldn't place any dependencies on this yet. > > Well I've gone on a bit and haven't really covered all the bases - my > recommendation is to proceed enthusiastically but cautiously. As you > can see I'm part gung ho about rdf/semweb and part skeptical. The > basic idea of linking by URIs is simple and cool and powerful. > ironically, I think it is the semantic part that is somewhat lacking > with the lack of scalable OWL support, but this is changing.... > |
From: Ian H. <ih...@be...> - 2007-02-13 21:01:36
|
Jason Stajich blogs it here: http://fungalgenomes.org/blog/2007/02/wikis-for-genome-reannotation/ |
From: Andrew U. <and...@gm...> - 2007-02-13 18:46:09
|
Hi everyone... It seems that PLoS has started up a beta test of a community forum for annotating and discussing peer-reviewed papers... as it is related to what we're trying to do for genomes, I'm forwarding it to y'all (see below). Andrew ---------- Forwarded message ---------- From: PLoS ONE <ne...@li...> Date: Feb 12, 2007 12:40 PM Subject: PLoS ONE beta now live! To: and...@gm... Dear Colleague, PLoS ONE beta<http://lists.plos.org/lt.php?id=3DeklUBwtcVAUfUwAHSVADCQUCUQ%= 3D%3D>, a new way of communicating peer-reviewed science and medicine, is now launched. Before your first visit, I want to let you know about the inherent challenges of this project and the philosophy that compels PLoS to confront them. We want to speed up scientific progress and believe that scientific debate is as important as the investigation itself. PLoS ONE is a forum where research can be both shared and commented upon =96 we have launched it as a beta website so that the whole scientific community can help us develop the features. What makes the site beta? Not the content, which features peer-reviewed research from hundreds of authors across a diverse range of scientific disciplines. It's the additional tools and functionality surrounding these papers that will be continually refined and developed in response to user feedback. It is this union of continually evolving user tools provided by the Topaz publishing platform<http://lists.plos.org/lt.php?id=3DeklUBwtcVAYfUwAHSVADC= QUCUQ%3D%3D>and extensive content that will make PLoS ONE a success. The first beta release of PLoS ONE features tools that allow users to annotate articles and participate in discussion threads. Our goal is to spark lively discussion online and we'd like to invite you to participate. Future updates will include user ratings for both papers and the comments made about them, personalized content alerts and much more. Connect The first 100 people (first come, first served) to sign up<http://lists.plos.org/lt.php?id=3DeklUBwtcVAcfUwAHSVADCQUCUQ%3D%3D>for content alerts from this email will receive a free t-shirt. Signing up is a smart move because it allows you to both receive email content alerts tailored to your specific fields of interest and make use of the interactive tools, like annotation, as they become available. We will be watching with interest to see how our new platform and software responds to high volumes of traffic and encourage you to give your feedback on your first experience via the site itself. To stay involved: =95 Sign up for content alerts<http://lists.plos.org/lt.php?id=3DeklUBwtcVAcfUwAHSVADCQUCUQ%3D%3D> First 100 people receive a PLoS ONE t-shirt (first come, first served)= . =95 Submit your work<http://lists.plos.org/lt.php?id=3DeklUBwtcVAAfUwAHSVADCQUCUQ%3D%3D> Visit the PLoS ONE Journal Management System to upload your article. See you online. Chris Surridge Managing Editor PLoS ONE - - - - - - - - - - - - - - - - - - - - To unsubscribe from PLoS Announcements visit: this link<http://lists.plos.org/lt.php?id=3DeklUBwtcVAEfUwAHSVADCQUCUQ%3D%3D> If you need assistance, please send an email to web...@pl.... Public Library of Science 185 Berry Street, Suite 3100 San Francisco, CA 94107 USA |
From: Chris M. <cj...@fr...> - 2007-02-13 07:22:46
|
Hi Mitch Wow, that's quite a lot packed into that email! In a good way. You ask some good questions, and I certainly don't know the answer to all of them. RDF is certainly no panacea. There are definitely strikes against it. The way it is layered on top of XML is problematic (there are other syntaxes to choose from, some quite pleasant like n3, but this all just serves to make the barrier for entry higher). Tools and libraries can insulate your from this, to an extent. The layering of OWL (the web ontology language) onto RDF is also tricky, and at best RDF is quite a low-level way of expressing OWL. All relations in RDF are binary; the subject-predicate-object triple: you can say "Socrates_beard has_color white", but if you want to time- index this to say, 400 BC, you have to introduce ontologically problematic entities such as "socrates beard in 400 BC". This isn't a big deal for the semantic web for various reasons, but is important for accurate representation of biological entities that exist in time. Having said that, RDF is definitely our best shot at exposing some amount of database semantics in a maximally accessible and interoperable with with a minimum amount of coordination and schema churn. (I've seen a lot of grand interoperation schemes come and go over the years so this is actually quite a strong statement). Note that Chado isn't so far off RDF with it's various subject- predicate-object linking tables; we just chose to go with more of a hybrid approach; Chado is intended to control the range of what can be stated more than is possible with RDF and related technologies. The result is in principle quite easy to map to RDF, giving some of the benefits of both. I don't think it's an either or thing when it comes to RDF vs domain specific exchange formats. You correctly identified the tradeoffs with, for example, DAS2 vs RDF; whilst those tradeoffs exist there is room for both to live side-by-side. Now this may not true for ever - I'm longing for the day when it is possible to specify the semantics of data in a way that is computable and efficient, but we're not quite there yet. This doesn't mean we can't make a start, and some kind of RDF encoding of chado-style feature graphs and feature location graphs would be a good start. This would give us way of wrapping DAS2 and genomics databases that the wider semantic web can understand. You identify reification - statements about statements - as key for annotations - I agree. You may also want to check out Named Graphs too. Unfortunately the tool support for either is not as mature yet; you can still use reification, just in a low-level way. I'm not so worried by the seeming higher-order aspects of reification. But I won't go into this, as it's fairy abstruse, and I'm not sure I believe myself, which is a kind of curious higher- order statement in itself. Existing relational databases can be wrapped using tools such as D2RQ. There are definitely efficiency considerations. I'm exploring some alternatives of the home-grown variety but don't have anything to report yet. I think writing back to a non-generic schema from RDF is difficult, but I'm not sure we need to do this. OK, before we get too carried away we should check what problems we are trying to solve. Annotation means different things to different people (and something slightly different in the semantic web world unfortunately). We want a community-based way of sharing data that fits neatly into 1d feature paradigm, and we want this to be fast, standards-based and interoperable with current genomics tools, so genomics datamodels and exchange formats will continue to play a part. We may also want a way of exposing the inherent semantics in those little boxes to computers that don't speak genomics. It's unclear exactly who gains, when and how, but the cost is not so high (avenues include: SPARQL queries for genome databases; Das2rdf; use of microformats and rdf in gbrowse display_). Then there are the annotations on these little boxes; statements about the underlying biological entities. On the one hand this is the wild untrammelled frontier - these entities may be linked to other entities which are themselves described by a composites of other interlinked entities. We can take a ride traversing these links through multiple levels of biological granularity, from atomic structures through to anatomical structures, physiological processes, phenotypes, environments, life-forms living in the hydrothermal vents on Jupiter's moons... OK, perhaps RDF can't deliver on the astrobiology quite yet, but it seems that this open-ended world beyond genomics is a good reason to try RDF. Orthogonal to this is the "reification" model. Even in our wiki-esque community model we want to conform to good annotation practice and encourage all links to be accompanied with provenance, evidence and so on. What does this mean in terms of implementation? It could be fairly simple. GBrowse could be augmented by a 3rd party triple-store. The primary datastore would continue to be the genomics schema of choice, eg chado, but freeform 3rd party annotations on features could go in the triple-store. I have a few ideas about how this could be layered on top of a gbrowse type display, and you have the advantage of transparency to generic semweb software, to the extent it exists in usable forms at the moment. This seems a fairly low risk approach to the community annotation store problem. In fact, other approaches will be higher risk as they will require rolling your own technology. Triplestores can be slow for complex multi-join queries but I think many of your use cases will involve simple neighbourhood graphs. Queries such as "find all genes upstream of genes in a pathway implicated in disease X with function Y" will perform dreadfully if you take the ontological closure into account. We're working on technology for this in the Berkeley Ontologies Project but you shouldn't place any dependencies on this yet. Well I've gone on a bit and haven't really covered all the bases - my recommendation is to proceed enthusiastically but cautiously. As you can see I'm part gung ho about rdf/semweb and part skeptical. The basic idea of linking by URIs is simple and cool and powerful. ironically, I think it is the semantic part that is somewhat lacking with the lack of scalable OWL support, but this is changing.... On Feb 12, 2007, at 11:03 AM, Mitch Skinner wrote: > This is sort of a brain dump; I'm not sure what I really think about > this but I'm hoping for some discussion. This email therefore > meanders > a bit, which is dangerous given that people are already not reading my > email all the way through, but some decisions in this area need to be > made in the near future and I want to have some thoughts written down > about them. > > Also, given that this is somewhat fuzzy in my head at the moment > there's > some risk of going into architecture-astronaut mode and getting > lost in > abstruse philosophical questions. However, given that there are > people > out there that are in the middle of implementing that abstruse > stuff, if > we want to piggyback on their work then we have to have some idea > about > what we want/need. So there are some concrete and immediate things to > consider. > > Also, I know there are some people on this list that know more about > this stuff than I do, so hopefully rather than feeling patronized > they'll respond to tell me what's up. > > I've been thinking about how to integrate the relatively stable, > well-understood, structured parts of the annotations with the less > well > understood, less structured aspects. For example, a feature > usually has > a start and and end point on some reference sequence: there are a few > complications (0-based, 1-based, interbase) but generally speaking > this > is pretty basic and widespread and baked into a variety of > software. A > highly structured data store like a relational database is a good > choice > for this kind of information; knowing the structure of your > information > allows you to store and query it very efficiently. A relational > database is kind of like the chain saw of data management, if the > chain > saw were mounted on an extremely precise industrial robot. > > On the other hand, there are other things that are harder to predict. > Given that there's new research going on all the time that's producing > new kinds of data, it'll be a while before there's a chado module for > storing those. It's a bad idea to try and design a database schema to > store this information now when it's not so well (or widely) > understood > (c.f. organism vs. taxonomy in chado), but we do want to store it > (right?), so IMO we also have to have something less structured than a > relational database schema. > > It's certainly possible to have too little structure, though--every > time > I hear someone complain about feeling too restricted by a relational > schema I want to tell them, "hey, I've got a perfectly general format > for storing data: a stream of bits". Having a restriction on the data > is just the flip side of knowing something about the data. We do want > to be able to efficiently query the data; free text search is nice but > even in the google age we still have to wade through lots of > irrelevant > results. And we want to be able to write software to process the data > without having to solve the problem of natural language understanding. > > So, like Goldilocks, we want to find just the right amount of > structure. > Papa bear is clearly a relational database; mama bear is XML (or > possibly a non-semantic wiki), the document-oriented history of which > makes them a little soupy for my taste though this could be debated > (and > I would be happy to if anyone wants to); and baby bear is RDF. I > don't > want to write an RDF-advocacy essay, especially since there's already > been so much unfulfilled Semantic Web hype. I just want to say that I > think it's Just Right structure-wise. And there's a decently large > and > growing number of tools for dealing with it. > > If you're not familiar with RDF, here's the wikipedia introduction: > ============ > Resource Description Framework (RDF) is a family of World Wide Web > Consortium (W3C) specifications originally designed as a metadata > model > but which has come to be used as a general method of modeling > knowledge, > through a variety of syntax formats. > > The RDF metadata model is based upon the idea of making statements > about > resources in the form of subject-predicate-object expressions, called > triples in RDF terminology. The subject denotes the resource, and the > predicate denotes traits or aspects of the resource and expresses a > relationship between the subject and the object. For example, one > way to > represent the notion "The sky has the color blue" in RDF is as a > triple > of specially formatted strings: a subject denoting "the sky", a > predicate denoting "has the color", and an object denoting "blue". > ============== > > If you buy this so far, then the main problem to consider is how to > integrate the stuff that fits well in a relational database (feature, > reference sequence, start, end) with the stuff that doesn't (? need > some > examples). In Goldilocks terms I want to have papa bear and baby bear > all rolled into one. In web terms I want both relational and > semi-structured data to play a role in generating the > representation for > a single resource (e.g., to serve the data for a single feature > entity I > want to query both chado (or BioSQL?) and an RDF triplestore and > combine > the results into an RDF graph). > > So I've been doing some googling and I've noticed that there are some > systems for taking a relational database and serving RDF. Chris, > how do > you like D2R so far? Do you think chado and BioSQL would work equally > well with it, or is one better than the other? It appears that it > doesn't integrate directly with a triplestore, is that right? If the > client is only aware of RDF, how do we insert and update information? > And how do we make sure that information that's added via RDF ends > up in > the right place in the relational tables? > > In my googling I've also come across samizdat > http://www.nongnu.org/samizdat/ > which appears to do the relational table/triplestore integration > thing. > However, it doesn't appear to support SPARQL. And judging by the > mailing list the community there seems pretty small. > > One of the really interesting aspects of samizdat is that it uses RDF > reification to do moderation-type stuff. RDF reification, if > you're not > familiar, allows you to make RDF statements about other RDF > statements. > For example, without reification you could make statements like > "the sky > has the color blue"; reification allows you to say "Mitch says (the > sky > has the color blue)"--the original statement gets reified into the > space > of subjects and objects and can then participate in other RDF > statements. > > This all sounds fairly abstruse to me, but IMO it's pretty much > exactly > what we would want in a community annotation system. We want to store > data with some structure but not too much (RDF) and we also want to > take > those bits of data and allow people to make statements about their > source and quality ("annotation foo is from the holmes lab", > "annotation > foo is computationally-generated", "annotation bar was manually > curated", "(annotation bar was manually curated) by so-and-so"). And > then we want to take that information about how good a bit of data is > and use it to filter or highlight features in the browser or > something. > "show me all the features I've commented on", "show me all the > features > from so-and-so", "show me all the features approved by members of my > group", "click these buttons to increase/decrease the quality score > for > this feature", "show me only features with a quality score above > 6", and > so on. > > Reification seems like a somewhat more obscure part of the RDF > spec, so > I'm not sure how well it's supported in RDF tools in general, or > even to > what extent it needs to be specifically supported. Specifically, I > need > to try and figure out if the wiki editing in Semantic MediaWiki can be > used to enter RDF statements using reification. Or maybe we need to > develop some specialized UI for this in any case. > > As I understand it, one drawback of reification is that you're taking > something that was first-order and making it higher-order, which tends > to throw lots of computational tractability guarantees out the window. > But I don't know what specifically we'd be giving up there. I > wonder if > we'd be better off avoiding reification and trying to collapse all > meta-statements onto their referents somehow (e.g., instead of "Mitch > says (the sky is blue)" have something like "the sky is blue" and "the > sky was color-determined by Mitch"). > > Also, I was originally vaguely thinking of trying to squeeze RDF into > the DAS2 feature property mechanism but I'm wondering whether or > not it > would just be better to dispense with DAS2 entirely and just use > RDF to > describe feature boundaries, type, relationships and whatever else > DAS2 > covers. I thought DAS2 had some momentum but in trying to get the > gmod > das2 server running I actually came across what appears to be a syntax > error in one of its dependencies (MAGE::XML::Writer from CPAN) so I'm > having doubts about how much it's actually getting used. What > would be > the pros and cons of doing a SPARQL query via D2R<->chado vs. a DAS2 > query against chado? IMO the main relevant considerations are query > flexibility, query performance, and how easy it is to do in javascript > with XHR. I think I'm going to experiment a little with D2R and > Virtuoso and see how things go. > > I believe representing everything with RDF serves Chris' goal of being > "semantically transparent", which allows for lots of interesting > integration scenarios ("mashups"). And I agree, it's one of those > things that buys you lots of power almost for free. RDF is certainly > more widely supported than DAS2 is. > > Also, even though I'm relatively ignorant I'd like to respond to this: > http://www.bioontology.org/wiki/index.php/OBD:SPARQL- > GO#Representing_the_links_between_genes_and_GO_types > and say that although I'm not exactly sure what "interoperation" means > here, it seems to me that given a feature URI anyone can make an RDF > statement about that concrete feature instance. And all the > assertions > that have been made about classes can be "exploded" onto the > individual > instances, right?. So concrete instances seem to me to be the more > interoperable way to go. I suppose that if you do everything with > individuals it's hard to go back and make assertions about > classes--whats's a specific use case for that? > > I guess the thing that worries me about making universal assertions in > biology is that there are so many exceptions. In math/logic/CS you > can > make universally quantified assertions about abstractions because you > make up the abstractions and construct systems using them. The > classes/abstractions that you create are endogenous to the > systems. But > in biology the abstractions are exogenous; the cell doesn't care about > the central dogma (e.g., with ncRNA). So classes/abstractions in > biology will generally have to grow hairs and distinctions over time, > and then what happens to the concrete instances that have been tagged > with a certain class name? They have to be manually reclassified, > AFAICS. Hence the continuing presence of cvterms where is_obsolete is > true. > > So I guess I'm saying that I think with community annotation it's fine > for people to make statements about concrete instances rather than > classes, and I believe that they'll generally find it easier to do so. > I suppose the question of what's "natural" is one to do user > testing on > eventually. If we do in fact "let a thousand flowers bloom" then a > good > query/search engine can still give us digestible pieces to work with, > right? I hope. > > Sorry for the length and stream-of-consciousness-ness. I'm sure a lot > of what I'm saying is not new, but I think we have to have these > discussions. Unless this is already well-settled territory and > someone > can point me to a review paper. > Mitch > > > ---------------------------------------------------------------------- > --- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > |
From: Mitch S. <mit...@be...> - 2007-02-12 19:04:29
|
This is sort of a brain dump; I'm not sure what I really think about this but I'm hoping for some discussion. This email therefore meanders a bit, which is dangerous given that people are already not reading my email all the way through, but some decisions in this area need to be made in the near future and I want to have some thoughts written down about them. Also, given that this is somewhat fuzzy in my head at the moment there's some risk of going into architecture-astronaut mode and getting lost in abstruse philosophical questions. However, given that there are people out there that are in the middle of implementing that abstruse stuff, if we want to piggyback on their work then we have to have some idea about what we want/need. So there are some concrete and immediate things to consider. Also, I know there are some people on this list that know more about this stuff than I do, so hopefully rather than feeling patronized they'll respond to tell me what's up. I've been thinking about how to integrate the relatively stable, well-understood, structured parts of the annotations with the less well understood, less structured aspects. For example, a feature usually has a start and and end point on some reference sequence: there are a few complications (0-based, 1-based, interbase) but generally speaking this is pretty basic and widespread and baked into a variety of software. A highly structured data store like a relational database is a good choice for this kind of information; knowing the structure of your information allows you to store and query it very efficiently. A relational database is kind of like the chain saw of data management, if the chain saw were mounted on an extremely precise industrial robot. On the other hand, there are other things that are harder to predict. Given that there's new research going on all the time that's producing new kinds of data, it'll be a while before there's a chado module for storing those. It's a bad idea to try and design a database schema to store this information now when it's not so well (or widely) understood (c.f. organism vs. taxonomy in chado), but we do want to store it (right?), so IMO we also have to have something less structured than a relational database schema. It's certainly possible to have too little structure, though--every time I hear someone complain about feeling too restricted by a relational schema I want to tell them, "hey, I've got a perfectly general format for storing data: a stream of bits". Having a restriction on the data is just the flip side of knowing something about the data. We do want to be able to efficiently query the data; free text search is nice but even in the google age we still have to wade through lots of irrelevant results. And we want to be able to write software to process the data without having to solve the problem of natural language understanding. So, like Goldilocks, we want to find just the right amount of structure. Papa bear is clearly a relational database; mama bear is XML (or possibly a non-semantic wiki), the document-oriented history of which makes them a little soupy for my taste though this could be debated (and I would be happy to if anyone wants to); and baby bear is RDF. I don't want to write an RDF-advocacy essay, especially since there's already been so much unfulfilled Semantic Web hype. I just want to say that I think it's Just Right structure-wise. And there's a decently large and growing number of tools for dealing with it. If you're not familiar with RDF, here's the wikipedia introduction: ============ Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling knowledge, through a variety of syntax formats. The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue". ============== If you buy this so far, then the main problem to consider is how to integrate the stuff that fits well in a relational database (feature, reference sequence, start, end) with the stuff that doesn't (? need some examples). In Goldilocks terms I want to have papa bear and baby bear all rolled into one. In web terms I want both relational and semi-structured data to play a role in generating the representation for a single resource (e.g., to serve the data for a single feature entity I want to query both chado (or BioSQL?) and an RDF triplestore and combine the results into an RDF graph). So I've been doing some googling and I've noticed that there are some systems for taking a relational database and serving RDF. Chris, how do you like D2R so far? Do you think chado and BioSQL would work equally well with it, or is one better than the other? It appears that it doesn't integrate directly with a triplestore, is that right? If the client is only aware of RDF, how do we insert and update information? And how do we make sure that information that's added via RDF ends up in the right place in the relational tables? In my googling I've also come across samizdat http://www.nongnu.org/samizdat/ which appears to do the relational table/triplestore integration thing. However, it doesn't appear to support SPARQL. And judging by the mailing list the community there seems pretty small. One of the really interesting aspects of samizdat is that it uses RDF reification to do moderation-type stuff. RDF reification, if you're not familiar, allows you to make RDF statements about other RDF statements. For example, without reification you could make statements like "the sky has the color blue"; reification allows you to say "Mitch says (the sky has the color blue)"--the original statement gets reified into the space of subjects and objects and can then participate in other RDF statements. This all sounds fairly abstruse to me, but IMO it's pretty much exactly what we would want in a community annotation system. We want to store data with some structure but not too much (RDF) and we also want to take those bits of data and allow people to make statements about their source and quality ("annotation foo is from the holmes lab", "annotation foo is computationally-generated", "annotation bar was manually curated", "(annotation bar was manually curated) by so-and-so"). And then we want to take that information about how good a bit of data is and use it to filter or highlight features in the browser or something. "show me all the features I've commented on", "show me all the features from so-and-so", "show me all the features approved by members of my group", "click these buttons to increase/decrease the quality score for this feature", "show me only features with a quality score above 6", and so on. Reification seems like a somewhat more obscure part of the RDF spec, so I'm not sure how well it's supported in RDF tools in general, or even to what extent it needs to be specifically supported. Specifically, I need to try and figure out if the wiki editing in Semantic MediaWiki can be used to enter RDF statements using reification. Or maybe we need to develop some specialized UI for this in any case. As I understand it, one drawback of reification is that you're taking something that was first-order and making it higher-order, which tends to throw lots of computational tractability guarantees out the window. But I don't know what specifically we'd be giving up there. I wonder if we'd be better off avoiding reification and trying to collapse all meta-statements onto their referents somehow (e.g., instead of "Mitch says (the sky is blue)" have something like "the sky is blue" and "the sky was color-determined by Mitch"). Also, I was originally vaguely thinking of trying to squeeze RDF into the DAS2 feature property mechanism but I'm wondering whether or not it would just be better to dispense with DAS2 entirely and just use RDF to describe feature boundaries, type, relationships and whatever else DAS2 covers. I thought DAS2 had some momentum but in trying to get the gmod das2 server running I actually came across what appears to be a syntax error in one of its dependencies (MAGE::XML::Writer from CPAN) so I'm having doubts about how much it's actually getting used. What would be the pros and cons of doing a SPARQL query via D2R<->chado vs. a DAS2 query against chado? IMO the main relevant considerations are query flexibility, query performance, and how easy it is to do in javascript with XHR. I think I'm going to experiment a little with D2R and Virtuoso and see how things go. I believe representing everything with RDF serves Chris' goal of being "semantically transparent", which allows for lots of interesting integration scenarios ("mashups"). And I agree, it's one of those things that buys you lots of power almost for free. RDF is certainly more widely supported than DAS2 is. Also, even though I'm relatively ignorant I'd like to respond to this: http://www.bioontology.org/wiki/index.php/OBD:SPARQL-GO#Representing_the_links_between_genes_and_GO_types and say that although I'm not exactly sure what "interoperation" means here, it seems to me that given a feature URI anyone can make an RDF statement about that concrete feature instance. And all the assertions that have been made about classes can be "exploded" onto the individual instances, right?. So concrete instances seem to me to be the more interoperable way to go. I suppose that if you do everything with individuals it's hard to go back and make assertions about classes--whats's a specific use case for that? I guess the thing that worries me about making universal assertions in biology is that there are so many exceptions. In math/logic/CS you can make universally quantified assertions about abstractions because you make up the abstractions and construct systems using them. The classes/abstractions that you create are endogenous to the systems. But in biology the abstractions are exogenous; the cell doesn't care about the central dogma (e.g., with ncRNA). So classes/abstractions in biology will generally have to grow hairs and distinctions over time, and then what happens to the concrete instances that have been tagged with a certain class name? They have to be manually reclassified, AFAICS. Hence the continuing presence of cvterms where is_obsolete is true. So I guess I'm saying that I think with community annotation it's fine for people to make statements about concrete instances rather than classes, and I believe that they'll generally find it easier to do so. I suppose the question of what's "natural" is one to do user testing on eventually. If we do in fact "let a thousand flowers bloom" then a good query/search engine can still give us digestible pieces to work with, right? I hope. Sorry for the length and stream-of-consciousness-ness. I'm sure a lot of what I'm saying is not new, but I think we have to have these discussions. Unless this is already well-settled territory and someone can point me to a review paper. Mitch |
From: Ian H. <ih...@be...> - 2007-02-10 07:03:12
|
has been getting some press recently http://pipes.yahoo.com/ from a superficial look-over, it seems to be a Javascript-based GUI for creating the equivalent of Unix one-liners for common web datatypes (RSS, URLs...) just something to be aware of. we might want to play with it. or not. maybe this will be how mashups are done... or not. anyway, another beastie sharing the ecosystem with Simile, Piggybank et al, I guess. |
From: Mitch S. <mit...@be...> - 2007-02-08 23:31:46
|
I wrote: > Clearly, DProf is confused about some things (I'm pretty sure that > TiledImage::AUTOLOAD isn't taking 116% of the total runtime), but I'm > hoping that it's right about the general ranking. I've committed a change that uses closures to generate subs that do most of the work that AUTOLOAD was doing. This makes S. cerevisiae chrom 1 (all tracks + zoom levels) run about 38% faster. Here's the current profile (yeast_chr1 named gene zoom 1): Total Elapsed Time = -1.21238 Seconds User+System Time = 0 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 0.00 5.304 5.304 2303 0.0023 0.0023 GD::Image::png 0.00 1.380 10.355 1 1.3798 10.354 BatchTiledImage::renderTileRange 0.00 1.234 1.234 230361 0.0000 0.0000 GD::Image::line 0.00 1.194 1.194 230322 0.0000 0.0000 TiledImagePanel::map_pt 0.00 0.583 3.346 73 0.0080 0.0458 TiledImage::renderTile 0.00 0.558 0.558 460764 0.0000 0.0000 TiledImage::max 0.00 0.489 0.489 491 0.0010 0.0010 Bio::Graphics::Glyph::_collision_k eys 0.00 0.413 0.413 115472 0.0000 0.0000 MemoryPrimStorage::__ANON__ 0.00 0.398 0.567 253 0.0016 0.0022 Bio::Graphics::Glyph::collides 0.00 0.374 0.374 2377 0.0002 0.0002 GD::Image::_new 0.00 0.308 0.308 460764 0.0000 0.0000 TiledImage::min DProf is clearly still pretty confused. I think DProf tries to subtract out its own overhead, so maybe it's overestimating that and subtracting too much. But I'm still going to go on the assumption that the general ranking is about right. I know we're generating a pretty fair number of blank tiles at the higher zoom levels, so next I'm going to look into generating a blank tile ahead of time and just hardlinking to that whenever we're about to generate another blank one. Hopefully that will cut down on the time spent in GD::Image::png. I'm still not sure what to do about the gridlines; I'm going to leave that on the back burner for a bit. Mitch |
From: Seth C. <sjc...@lb...> - 2007-02-08 03:09:57
|
On Wed, 2007-02-07 at 17:21 -0800, Mitch Skinner wrote: > As I understand it, doing a mashup requires either server-side developer > help or a client-side plugin with special security privileges (like > Piggy Bank). Otherwise the javascript same-source requirement keeps you > from combining data from multiple sources, right? I think that there should be ways to get around this security stuff (did I just say that?). Since ajax is going back to the server for data by its nature, it's not that big of a leap to have the server act as a proxy to deliver some other service to the browser for a mash-up. Enter something like "kinase": http://toy.lbl.gov:9002/cgi-bin/amigo2/proxy-client.cgi A system of federated proxies that shared information might be interesting. The trick would then be having the information for each user stored so that different sites would be able to get access. Maybe a system like del.icio.us for biology metadata? -Seth |