You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
(1) |
Apr
(41) |
May
(41) |
Jun
(50) |
Jul
(14) |
Aug
(21) |
Sep
(37) |
Oct
(8) |
Nov
(4) |
Dec
(135) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(145) |
Feb
(110) |
Mar
(216) |
Apr
(101) |
May
(42) |
Jun
(42) |
Jul
(23) |
Aug
(17) |
Sep
(33) |
Oct
(15) |
Nov
(18) |
Dec
(6) |
2011 |
Jan
(8) |
Feb
(10) |
Mar
(8) |
Apr
(41) |
May
(48) |
Jun
(62) |
Jul
(7) |
Aug
(9) |
Sep
(7) |
Oct
(11) |
Nov
(49) |
Dec
(1) |
2012 |
Jan
(17) |
Feb
(63) |
Mar
(4) |
Apr
(13) |
May
(17) |
Jun
(21) |
Jul
(10) |
Aug
(10) |
Sep
|
Oct
|
Nov
|
Dec
(16) |
2013 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
From: Hilmar L. <hl...@ne...> - 2012-01-31 18:49:41
|
Well, yeah, that's I was suggesting with the AMI. -hilmar On Jan 31, 2012, at 1:18 PM, Rutger Vos wrote: > By the way, wouldn't it a good idea if instead we could simply share an image of the entire treebase environment? I heard PFAM has just started doing that and it might be a good idea if we want to entice volunteer developers. > > On Fri, Jan 27, 2012 at 4:21 PM, William Piel <met...@gm...> wrote: > > On Jan 27, 2012, at 6:48 AM, Rutger Vos wrote: > >> Ok, I will look at that before trying to suck down too much. > > Or perhaps "means-test" each study: first request the NEXUS file, but if the NTAX x NCHAR is too big (e.g. over 1 million characters for S12156) then use nexml.org to convert the NEXUS to NeXML instead of requesting the NeXML directly from TreeBASE (you lose the metadata, but it's better than nothing). Otherwise, if the NEXUS is sufficiently small, make a second request for the NeXML. > > bp > > > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > > > -- > Dr. Rutger A. Vos > Bioinformaticist > NCB Naturalis > Visiting address: Einsteinweg 2, 2333 CC, Leiden, the Netherlands > Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands > http://rutgervos.blogspot.com > ------------------------------------------------------------------------------ > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Rutger V. <rut...@gm...> - 2012-01-31 18:19:01
|
By the way, wouldn't it a good idea if instead we could simply share an image of the entire treebase environment? I heard PFAM has just started doing that and it might be a good idea if we want to entice volunteer developers. On Fri, Jan 27, 2012 at 4:21 PM, William Piel <met...@gm...> wrote: > > On Jan 27, 2012, at 6:48 AM, Rutger Vos wrote: > > Ok, I will look at that before trying to suck down too much. > > > Or perhaps "means-test" each study: first request the NEXUS file, but if > the NTAX x NCHAR is too big (e.g. over 1 million characters for S12156) > then use nexml.org to convert the NEXUS to NeXML instead of requesting > the NeXML directly from TreeBASE (you lose the metadata, but it's better > than nothing). Otherwise, if the NEXUS is sufficiently small, make a second > request for the NeXML. > > bp > > > > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > -- Dr. Rutger A. Vos Bioinformaticist NCB Naturalis Visiting address: Einsteinweg 2, 2333 CC, Leiden, the Netherlands Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands http://rutgervos.blogspot.com |
From: William P. <met...@gm...> - 2012-01-27 15:21:47
|
On Jan 27, 2012, at 6:48 AM, Rutger Vos wrote: > Ok, I will look at that before trying to suck down too much. Or perhaps "means-test" each study: first request the NEXUS file, but if the NTAX x NCHAR is too big (e.g. over 1 million characters for S12156) then use nexml.org to convert the NEXUS to NeXML instead of requesting the NeXML directly from TreeBASE (you lose the metadata, but it's better than nothing). Otherwise, if the NEXUS is sufficiently small, make a second request for the NeXML. bp |
From: William P. <wil...@ya...> - 2012-01-25 21:05:28
|
On Jan 25, 2012, at 12:16 AM, William Piel wrote: > Both TreeBASE's tallest dataset (~3,000 taxa) and it's widest dataset (~110,000 characters), download just fine Actually, I spoke too soon. Mattison just noticed that a lot of nexml requests are getting hung up after he asked for "S12156?format=nexml". On Jan 25, 2012, at 2:41 PM, Mattison Ward wrote: > FYI - I tried this url to make sure everything was working > http://purl.org/phylo/treebase/phylows/study/TB2:S12156?format=nexml > > Then I tried this - > http://purl.org/phylo/treebase/phylows/study/TB2:S12329?format=nexml > > Neither of these recent entries started downloading in the 5 or so > minutes that I waited. > > Other downloads seem to be working ok. It turns out that study S12156 has two matrices -- one that is 876,159 characters wide, and the other that is 352,120 characters wide, so a NeXML download for this study is asking for 1,228,279 characters. Oddly enough, the same data downloaded as "?format=nexus" works fine... so the problem is with the efficiency of generating the NeXML serialization. But another strange effect is that all subsequent requests for NeXML files are also blocked -- i.e. once you clog the system with a massive NeXML request, others get hung too. So, for example, S1205 works fine on dev: http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1205?format=nexml ... but remains clogged on production, due to (I guess) several open requests for S12156. Rutger -- maybe before trying to suck down the database, you should first look into why our NeXML serializations are hanging? I notice that some things make NeXML much more verbose -- e.g. when CHARSETs are present -- could those be a problem? bp 17 current requests Details Thread Request Elapsed time (ms) Mean time (ms) Cpu time (ms) Mean cpu time (ms) Hits sql Mean hits sql Time sql (ms) Mean time sql (ms) Executed method http-8080-Processor6 /search/downloadAStudy.html?id=12156&format=nexml GET 3,929,163 6,418 1,711 476 9 152 23,009 2,104 java.util.HashMap.getEntry(HashMap.java:364) http-8080-Processor17 /search/downloadAStudy.html?id=12156&format=nexml GET 3,794,021 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor50 /search/downloadAStudy.html?id=12156&format=nexml GET 3,715,306 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor9 /search/downloadAStudy.html?id=12156&format=nexml GET 3,628,908 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor16 /search/downloadAStudy.html?id=12156&format=nexml GET 3,420,516 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor7 /search/downloadAStudy.html?id=12329&format=nexml GET 3,386,509 6,418 738 476 9 152 335 2,104 java.util.HashMap.getEntry(HashMap.java:364) http-8080-Processor3 /search/downloadAStudy.html?id=12329&format=nexml GET 3,331,477 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor49 /search/downloadAStudy.html?id=12329&format=nexml GET 3,239,040 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor15 /search/downloadAStudy.html?id=12156&format=nexml GET 3,045,588 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor10 /search/downloadAStudy.html?id=12156&format=nexml GET 3,013,936 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor1 /search/downloadAStudy.html?id=12156&format=nexml GET 2,968,063 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor19 /search/downloadAStudy.html?id=10149&format=nexml GET 2,934,114 6,418 0 476 6 152 260 2,104 java.util.HashMap.getEntry(HashMap.java:364) http-8080-Processor20 /search/downloadAStudy.html?id=12329&format=nexml GET 2,863,803 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor4 /search/downloadAStudy.html?id=736&format=nexml GET 2,797,320 6,418 0 476 11 152 439 2,104 java.util.HashMap.getEntry(HashMap.java:364) http-8080-Processor47 /search/downloadAStudy.html?id=1205&format=nexml GET 1,477,133 6,418 0 476 9 152 151 2,104 java.util.HashMap.getEntry(HashMap.java:364) http-8080-Processor8 /search/downloadAStudy.html?id=1205&format=nexml GET 955,541 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) http-8080-Processor43 /search/downloadAStudy.html?id=1205&format=nexml GET 698,686 6,418 0 476 0 152 0 2,104 java.lang.Object.wait(Native Method) |
From: William P. <wil...@ya...> - 2012-01-25 05:17:05
|
On Jan 24, 2012, at 3:12 PM, Hilmar Lapp wrote: > and we know already that the queries for some studies will time out if you use the REST API. That certainly was true at one time, but we have since made fixes that should have solved those problems. Rod Page's attempt to suck down all of TreeBASE did encounter studies that were timing out -- and he sent me a list of them. But later, when I tried to fetch them, they downloaded fine. So I think the problem was one of hitting the application in rapid fire, with an overall performance slowdown resulting from the cumulative effects of this rapid fire, and as a result certain studies were timing out on him. Hence my suggestion that Rutger purposely throttle his scripts. Both TreeBASE's tallest dataset (~3,000 taxa) and it's widest dataset (~110,000 characters), download just fine: tallest: http://purl.org/phylo/treebase/phylows/study/TB2:S11686?format=nexus widest: http://purl.org/phylo/treebase/phylows/study/TB2:S12064?format=nexus And this works to get a list of all URIs. So unless there are specific cases of corrupt data (which there probably are), or the cumulative effects of excessive web service load causes subsequent time-outs, I don't anticipate any fundamental problems. (And if the former, we'd like to hear about which ones are corrupt). So I think this is worth the experiment, on the understanding that Rutger might need to halt what he's doing should we discover that he has a crippling effect on the service. bp |
From: Hilmar L. <hl...@ne...> - 2012-01-24 22:44:19
|
And to add to my previous response, a useful byproduct of such an effort could be a shared AMI, and in fact if you load up the Postgres dump to S3, you could slice up the file dump generation to run in parallel on multiple EC2 nodes. This could also be a nice target for an Education & Research grant from AWS, the next round of which, I think, are due in the first or second week of February. -hilmar Sent with a tap. On Jan 24, 2012, at 11:07 AM, William Piel <wil...@ya...> wrote: > > On Jan 24, 2012, at 7:53 AM, Rutger Vos wrote: > >> Hi all, >> >> I've had a request from one of Enrico Pontelli's students for a complete dump in NeXML of TreeBASE. I would like to have one as well for my own purposes. Because we now have caching this may not be as big a problem as previously, though most studies will not yet ever have been serialized to NeXML since the start of caching so we still need to be careful. On the plus side: once we've done this we will have all of them in cache so all subsequent requests should be more snappy. Can we come up with a reasonable waiting time between requests so we don't kill the server? Is there a quiet time during which this can best be done? Do tb-stage or tb-dev also have caches? >> >> Rutger > > I think this is a good idea, given that it will build up a war-chest of cached data. (In fact, maybe we should first extend the expire date on the cache so that this lasts longer?) Perhaps it will also catch datasets that are problematic. > > Google Analytics shows that activity is lowest on the weekend -- no surprise there. But maybe it would be better to do it during the week so that it's easy to intervene if the application gets locked up. Also, it might make sense to throttle the download process intentionally (e.g. interspersing requests with the "sleep" function in perl, for example) so that the application has ample time for garbage collection, etc, and so not to impact the system too much. Finally, even if you're not capturing NEXUS, maybe it would help to also download NEXUS as well, as the NEXUS cache is also valuable to build up. > > bp > > > > > ------------------------------------------------------------------------------ > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: Hilmar L. <hl...@ne...> - 2012-01-24 22:44:19
|
I do agree that having downloadable dumps of the TreeBASE content in different formats would be a good idea - in fact it was one of the deliverables of the just declined ABI grant. So of you want put this in place now without support that's cool of course. The problem is though that contrary to the plans in the grant you wouldn't be doing this here based on a NoSQL document store and SOLR index, but from the relational database, and we know already that the queries for some studies will time out of you use the REST API. So I think the best way to accomplish this would be to dump the PostgreSQL database and reload it on a different server, where you can then generate the NEXUS and NeXML dumps. -hilmar Sent with a tap. On Jan 24, 2012, at 11:07 AM, William Piel <wil...@ya...> wrote: > > On Jan 24, 2012, at 7:53 AM, Rutger Vos wrote: > >> Hi all, >> >> I've had a request from one of Enrico Pontelli's students for a complete dump in NeXML of TreeBASE. I would like to have one as well for my own purposes. Because we now have caching this may not be as big a problem as previously, though most studies will not yet ever have been serialized to NeXML since the start of caching so we still need to be careful. On the plus side: once we've done this we will have all of them in cache so all subsequent requests should be more snappy. Can we come up with a reasonable waiting time between requests so we don't kill the server? Is there a quiet time during which this can best be done? Do tb-stage or tb-dev also have caches? >> >> Rutger > > I think this is a good idea, given that it will build up a war-chest of cached data. (In fact, maybe we should first extend the expire date on the cache so that this lasts longer?) Perhaps it will also catch datasets that are problematic. > > Google Analytics shows that activity is lowest on the weekend -- no surprise there. But maybe it would be better to do it during the week so that it's easy to intervene if the application gets locked up. Also, it might make sense to throttle the download process intentionally (e.g. interspersing requests with the "sleep" function in perl, for example) so that the application has ample time for garbage collection, etc, and so not to impact the system too much. Finally, even if you're not capturing NEXUS, maybe it would help to also download NEXUS as well, as the NEXUS cache is also valuable to build up. > > bp > > > > > ------------------------------------------------------------------------------ > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: William P. <wil...@ya...> - 2012-01-24 16:07:40
|
On Jan 24, 2012, at 7:53 AM, Rutger Vos wrote: > Hi all, > > I've had a request from one of Enrico Pontelli's students for a complete dump in NeXML of TreeBASE. I would like to have one as well for my own purposes. Because we now have caching this may not be as big a problem as previously, though most studies will not yet ever have been serialized to NeXML since the start of caching so we still need to be careful. On the plus side: once we've done this we will have all of them in cache so all subsequent requests should be more snappy. Can we come up with a reasonable waiting time between requests so we don't kill the server? Is there a quiet time during which this can best be done? Do tb-stage or tb-dev also have caches? > > Rutger I think this is a good idea, given that it will build up a war-chest of cached data. (In fact, maybe we should first extend the expire date on the cache so that this lasts longer?) Perhaps it will also catch datasets that are problematic. Google Analytics shows that activity is lowest on the weekend -- no surprise there. But maybe it would be better to do it during the week so that it's easy to intervene if the application gets locked up. Also, it might make sense to throttle the download process intentionally (e.g. interspersing requests with the "sleep" function in perl, for example) so that the application has ample time for garbage collection, etc, and so not to impact the system too much. Finally, even if you're not capturing NEXUS, maybe it would help to also download NEXUS as well, as the NEXUS cache is also valuable to build up. bp |
From: Rutger V. <rut...@gm...> - 2012-01-24 14:32:58
|
On Tue, Jan 24, 2012 at 3:29 PM, Mattison Ward <mat...@ne...>wrote: > tb-stage and tb-dev have caching enabled. > > From about 3 PM to 4 PM EST yesterday, the load on tb-production went > through the roof from database activity even with caching enabled. > Mmmm... can you tell where it's coming from? I haven't started yet. Maybe it's Rod Page: he's been putting some of his code for his phyloinformatics course online. > On Tue, Jan 24, 2012 at 7:53 AM, Rutger Vos <rut...@gm...> wrote: > > Hi all, > > > > I've had a request from one of Enrico Pontelli's students for a complete > > dump in NeXML of TreeBASE. I would like to have one as well for my own > > purposes. Because we now have caching this may not be as big a problem as > > previously, though most studies will not yet ever have been serialized to > > NeXML since the start of caching so we still need to be careful. On the > plus > > side: once we've done this we will have all of them in cache so all > > subsequent requests should be more snappy. Can we come up with a > reasonable > > waiting time between requests so we don't kill the server? Is there a > quiet > > time during which this can best be done? Do tb-stage or tb-dev also have > > caches? > > > > Rutger > > > > -- > > Dr. Rutger A. Vos > > http://rutgervos.blogspot.com > > > > > ------------------------------------------------------------------------------ > > Keep Your Developer Skills Current with LearnDevNow! > > The most comprehensive online learning library for Microsoft developers > > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > > Metro Style Apps, more. Free future releases when you subscribe now! > > http://p.sf.net/sfu/learndevnow-d2d > > _______________________________________________ > > Treebase-devel mailing list > > Tre...@li... > > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > > > > -- > Mattison Ward > NESCent at Duke University > 2024 W. Main Street, Suite A200 > Durham, NC 27705-4667 > 919-668-4585 (desk) > 919-668-4551 (alternate) > 919-668-9198 (fax) > -- Dr. Rutger A. Vos http://rutgervos.blogspot.com |
From: Mattison W. <mat...@ne...> - 2012-01-24 14:30:12
|
tb-stage and tb-dev have caching enabled. >From about 3 PM to 4 PM EST yesterday, the load on tb-production went through the roof from database activity even with caching enabled. -Mattison On Tue, Jan 24, 2012 at 7:53 AM, Rutger Vos <rut...@gm...> wrote: > Hi all, > > I've had a request from one of Enrico Pontelli's students for a complete > dump in NeXML of TreeBASE. I would like to have one as well for my own > purposes. Because we now have caching this may not be as big a problem as > previously, though most studies will not yet ever have been serialized to > NeXML since the start of caching so we still need to be careful. On the plus > side: once we've done this we will have all of them in cache so all > subsequent requests should be more snappy. Can we come up with a reasonable > waiting time between requests so we don't kill the server? Is there a quiet > time during which this can best be done? Do tb-stage or tb-dev also have > caches? > > Rutger > > -- > Dr. Rutger A. Vos > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Mattison Ward NESCent at Duke University 2024 W. Main Street, Suite A200 Durham, NC 27705-4667 919-668-4585 (desk) 919-668-4551 (alternate) 919-668-9198 (fax) |
From: Rutger V. <rut...@gm...> - 2012-01-24 12:53:30
|
Hi all, I've had a request from one of Enrico Pontelli's students for a complete dump in NeXML of TreeBASE. I would like to have one as well for my own purposes. Because we now have caching this may not be as big a problem as previously, though most studies will not yet ever have been serialized to NeXML since the start of caching so we still need to be careful. On the plus side: once we've done this we will have all of them in cache so all subsequent requests should be more snappy. Can we come up with a reasonable waiting time between requests so we don't kill the server? Is there a quiet time during which this can best be done? Do tb-stage or tb-dev also have caches? Rutger -- Dr. Rutger A. Vos http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@ne...> - 2012-01-02 02:46:27
|
Would this be useful for TreeBASE to use, at least at the free level? Maybe at a minimum it could help with some of the bandwidth generated by crawlers. https://www.cloudflare.com/overview -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Rutger V. <R....@re...> - 2011-12-14 21:37:20
|
Something like David's suggestion should go on the TreeBASE web pages, so I'm forwarding it to get it on the radar. ---------- Forwarded message ---------- From: David Maddison <dav...@sc...> Date: Wed, Dec 14, 2011 at 11:11 AM Subject: [Prfboard] suggested blurb for page To: prf...@ne... Suggestion for the blurb at the bottom of the ToLWeb page. An equivalent one could go on the TreeBase home page: The Tree of Life Web Project is governed/administered/managed by the <a href="http://phylofoundation.org">Phyloinformatics Research Foundation</a>, a non-profit organization devoted to promotion of research and maintenance of the core databases of relevance to phylogenetic biology, their associated software code, and other resources in support of the field of phyloinformatics. --------------------------------- David R. Maddison Department of Zoology 3029 Cordley Hall Oregon State University Corvallis, OR 97331 USA dav...@sc... http://david.bembidion.org http://mesquiteproject.org http://macclade.org http://tolweb.org (541) 737 2834 _______________________________________________ Prfboard mailing list Prf...@ne... https://lists.nescent.org/mailman/listinfo/prfboard -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2011-11-29 22:12:50
|
Hi Mattison, What you suggest looks fine with me, i.e. set it to cache the following: CacheEnable disk /treebase-web/search/downloadAStudy.html CacheEnable disk /treebase-web/search/study/anyObjectAsRDF.rdf CacheEnable disk /treebase-web/search/study/summary.html but in addition, we should set it to cache these: CacheEnable disk /treebase-web/search/downloadANexusFile.html CacheEnable disk /treebase-web/search/downloadAMatrix.html CacheEnable disk /treebase-web/search/downloadATree.html CacheEnable disk /treebase-web/search/downloadAnAnalysisStep.html But we don't want it to cache these: http://treebase.org/treebase-web/search/studySearch.html http://treebase.org/treebase-web/search/treeSearch.html http://treebase.org/treebase-web/search/matrixSearch.html http://treebase.org/treebase-web/search/taxonSearch.html .. because the results are unstable. Is there any easy way that I can trigger de-caching for an object or a set of objects? For example, it is not unusual for someone to upload data to TreeBASE, trigger it to have the data released to the public, and then suddenly recant and ask that the data be withheld. Typically, I'll just go in and toggle the status of their submission back to "private." But if we're caching everything, it might take a few months before the data are really unavailable to the public. I suppose if I had privileges to access the cache directory I could delete the offending objects. bp On Nov 29, 2011, at 3:41 PM, Mattison Ward wrote: > Hi Bill. > > These links resolve respectively to: > > http://treebase-dev.nescent.org/treebase-web/search/study/anyObjectAsRDF.rdf?namespacedGUID=TB2:S1925 > http://treebase-dev.nescent.org/treebase-web/search/study/summary.html?id=1925 > http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=1925&format=nexml > http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=1925&format=nexus > > Mod_cache would need to be set to something like this > > CacheEnable disk /treebase-web/search/downloadAStudy.html > CacheEnable disk /treebase-web/search/study/anyObjectAsRDF.rdf > CacheEnable disk /treebase-web/search/study/summary.html > > but I don't see an obvious way to restrict caching to TB2 objects. Querystrings cannot be used to enable or disable caching for a specific page using mod_cache. > |
From: Mattison W. <mat...@ne...> - 2011-11-29 20:42:17
|
Hi Bill. These links resolve respectively to: http://treebase-dev.nescent.org/treebase-web/search/study/anyObjectAsRDF.rdf?namespacedGUID=TB2:S1925 http://treebase-dev.nescent.org/treebase-web/search/study/summary.html?id=1925 http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=1925&format=nexml http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=1925&format=nexus Mod_cache would need to be set to something like this CacheEnable disk /treebase-web/search/downloadAStudy.html CacheEnable disk /treebase-web/search/study/anyObjectAsRDF.rdf CacheEnable disk /treebase-web/search/study/summary.html<http://treebase-dev.nescent.org/treebase-web/search/study/summary.html?id=1925> but I don't see an obvious way to restrict caching to TB2 objects. Querystrings cannot be used to enable or disable caching for a specific page using mod_cache. On Thu, Nov 17, 2011 at 2:11 PM, William Piel <wil...@ya...> wrote: > > On Nov 17, 2011, at 12:25 PM, Mattison Ward wrote: > > I will test this on treebasedev. > > > Thanks. To test it you could do the following: > > http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925 > http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=html > http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=nexml > http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=nexus > > And in each case you should see a new record stored in cacheRoot, each in > a different format. > > bp > > > > > On Thu, Nov 17, 2011 at 11:14 AM, William Piel <wil...@ya...>wrote: > >> >> On Nov 17, 2011, at 10:02 AM, William Piel wrote: >> >> > Therefore, if you also know of an Apache plugin that will cache results >> for "/phylows/study/TB2:", that would greatly help. >> >> Looking at mod_cache, I wonder if this would work: >> >> cacheRoot c:/cacheroot >> cacheEnable disk /treebase-web/phylows/study/TB2: >> cacheDirLevels 1 >> cacheDirLength 20 >> cacheMinFileSize 1 >> cacheMaxFileSize 50000000 >> cacheIgnorecacheControl Off >> cacheIgnoreNoLastMod On >> cacheMaxExpire 2592000 >> >> ... resulting in a one-month cache on all TB2 objects. What's unclear to >> me is whether the cacheEnable string allows substrings or whether it needs >> to end in "/". If that's a limitation, are there third-party plugins that >> can cache using wildcards? >> >> bp >> >> >> >> >> > > > -- > Mattison Ward > NESCent at Duke University > 2024 W. Main Street, Suite A200 > Durham, NC 27705-4667 > 919-668-4585 (desk) > 919-668-4551 (alternate) > 919-668-9198 (fax) > > > -- Mattison Ward NESCent at Duke University 2024 W. Main Street, Suite A200 Durham, NC 27705-4667 919-668-4585 (desk) 919-668-4551 (alternate) 919-668-9198 (fax) |
From: William P. <wil...@ya...> - 2011-11-28 20:03:13
|
Okay, I think I figured out the source of the problem -- and it's not related to any settings changes (etc) that happened at the Tomcat / Postgres end. Turns out that when the original submitter had uploaded his 800,000 character dataset, he didn't have any "character sets" exposed -- i.e. the notation that indicates the beginning and end of each gene. Later, this notation was introduced in subsequent uploads (that then failed). And then when I've been testing smaller uploads (e.g. 80,000 characters) I also used charsets, and it was getting hung up. Just now I uploaded the same 80,000 character dataset but without the charset, and boom -- right away it uploaded fine. A charset only has a small effect on the database. For example, my 80,000-character file with a charset only required that three records be created (one in the table charset, another in the table charset_colrange, and a third in the table columnrange). However, for Mesquite (the program that parses all incoming data), it probably means building up a lot of data in memory. My guess, then, is that our version of Mesquite is choking on large files that have charsets. The file with charsets is still choking up treebasedev (see here), so feel free to kill the current request that's hung there. thanks, bp On Nov 28, 2011, at 12:28 PM, Mattison Ward wrote: > Ok - pushed. > > On Mon, Nov 28, 2011 at 11:56 AM, William Piel <wil...@ya...> wrote: > > On Nov 28, 2011, at 11:52 AM, Mattison Ward wrote: > >> Hi Bill and Harry. >> >> A little before 10:55 AM the load went very high on treebase production. A little before 11:44 the tomcat service stopped responding. I restarted it. The tomcat error logs are attached. >> >> -- >> Mattison Ward > > Thanks. > > Maybe it is time to do a push to production so that we can monitor activity with JavaMelody, and seeing as some bugs have been addressed. > > bp |
From: William P. <wil...@ya...> - 2011-11-28 16:09:08
|
On Nov 28, 2011, at 9:43 AM, Mattison Ward wrote: > The only change in response to the large number of API hits was to limit the number of requests per second. I just disabled that setting on both systems. Please try another upload. > > Mattison Thanks Mattison. Production is feeling sluggish right now, so I'm uploading to dev instead. bp |
From: Mattison W. <mat...@ne...> - 2011-11-28 14:43:52
|
Hi Bill. No changes to any settings except increasing the max heap size on tomcat on production from 3 GB to 4 GB. No updates to Tomcat, but regular updates to Apache and the Linux OS do occur on an ongoing basis. The only change in response to the large number of API hits was to limit the number of requests per second. I just disabled that setting on both systems. Please try another upload. Mattison On Thu, Nov 24, 2011 at 12:00 AM, William Piel <wil...@ya...>wrote: > > So after failing to upload several large files to production, I tried > uploading a large file to dev. (10 taxa x 66472 characters). The upload > page did a proxy time-out, but the request kept going on and on. Here it > shows it some 6 hours later, consuming a lot of memory. > > It's odd because I don't think this problem used to happen. TreeBASE has > files that are considerably larger that this one -- e.g. we had a > submission on September 7. And I think a much larger file was uploaded > about two weeks ago. > > So something weird has happened, with the result that TreeBASE is now > underperforming. > > In two hours from now this wii all be reset because dev will be refreshed > from production. But clearly there is a problem (a) because tasks that > TreeBASE used to be able to do, presently it cannot do, and (b) because > what it cannot do seems to tie up a lot of memory and CPU, crippling it. > > Mattison: Can you think of any changes in terms of memory allocation (or > recent upgrades) that may be affecting performance? e.g. were SQL timeouts > shortened to deal with the hits we were getting on the API last week? > > It's all very vexing. > > bp > > -- Mattison Ward NESCent at Duke University 2024 W. Main Street, Suite A200 Durham, NC 27705-4667 919-668-4585 (desk) 919-668-4551 (alternate) 919-668-9198 (fax) |
From: William P. <wil...@ya...> - 2011-11-24 18:45:44
|
Just a follow-up to this problem, with some more details: On November 12 2011 17:53:02 GMT, a user sucessfully uploaded a "Fig4" data file with 10 taxa x 352,120 characters. On November 22 2011 22:03:41 GMT, he successfully uploaded his "Fig5" data file with 10 taxa x 876,159 characters (even bigger!). But starting November 23, the very same "Fig4" data file wouldn't upload. Even by subdividing this file into smaller files (e.g. 10 x 60,000 each), these still cause the system to choke. So what happened around November 22-23 that has hobbled our ability to ingest large files? Did we, for example, shorten the SQL timeout? (in response to heavy API hits). bp On Nov 24, 2011, at 12:00 AM, William Piel wrote: > > So after failing to upload several large files to production, I tried uploading a large file to dev. (10 taxa x 66472 characters). The upload page did a proxy time-out, but the request kept going on and on. Here it shows it some 6 hours later, consuming a lot of memory. > > It's odd because I don't think this problem used to happen. TreeBASE has files that are considerably larger that this one -- e.g. we had a submission on September 7. And I think a much larger file was uploaded about two weeks ago. > > So something weird has happened, with the result that TreeBASE is now underperforming. > > In two hours from now this wii all be reset because dev will be refreshed from production. But clearly there is a problem (a) because tasks that TreeBASE used to be able to do, presently it cannot do, and (b) because what it cannot do seems to tie up a lot of memory and CPU, crippling it. > > Mattison: Can you think of any changes in terms of memory allocation (or recent upgrades) that may be affecting performance? e.g. were SQL timeouts shortened to deal with the hits we were getting on the API last week? > > It's all very vexing. > > bp |
From: William P. <wil...@ya...> - 2011-11-24 05:00:20
|
So after failing to upload several large files to production, I tried uploading a large file to dev. (10 taxa x 66472 characters). The upload page did a proxy time-out, but the request kept going on and on. Here it shows it some 6 hours later, consuming a lot of memory. It's odd because I don't think this problem used to happen. TreeBASE has files that are considerably larger that this one -- e.g. we had a submission on September 7. And I think a much larger file was uploaded about two weeks ago. So something weird has happened, with the result that TreeBASE is now underperforming. In two hours from now this wii all be reset because dev will be refreshed from production. But clearly there is a problem (a) because tasks that TreeBASE used to be able to do, presently it cannot do, and (b) because what it cannot do seems to tie up a lot of memory and CPU, crippling it. Mattison: Can you think of any changes in terms of memory allocation (or recent upgrades) that may be affecting performance? e.g. were SQL timeouts shortened to deal with the hits we were getting on the API last week? It's all very vexing. bp |
From: Mattison W. <mat...@ne...> - 2011-11-21 16:40:07
|
The TB2 requests resolve to requests with querystrings such as /treebase-web/search/downloadAStudy.html?id=1925&format=nexml This results in this error on Apache "cache: /treebase-web/search/studySearch.html?query=prism.publicationName=Nature&format=null&recordSchema=null not cached. Reason: Query string present but no explicit expiration time" Mod_cache will not work with querystrings from Tomcat until an expire time is explicitly set within your application. Here are three articles I found with suggestions on how to do that. http://raibledesigns.com/rd/entry/adding_expires_headers_with_oscache (download now http://java.net/downloads/oscache/OSCache%202.4.1/oscache-2.4.1.jar) http://www.tomred.net/java-tomcat-set-expires-headers.html http://juliusdev.blogspot.com/2008/06/tomcat-add-expires-header.html -Mattison On Thu, Nov 17, 2011 at 11:14 AM, William Piel <wil...@ya...>wrote: > > On Nov 17, 2011, at 10:02 AM, William Piel wrote: > > > Therefore, if you also know of an Apache plugin that will cache results > for "/phylows/study/TB2:", that would greatly help. > > Looking at mod_cache, I wonder if this would work: > > cacheRoot c:/cacheroot > cacheEnable disk /treebase-web/phylows/study/TB2: > cacheDirLevels 1 > cacheDirLength 20 > cacheMinFileSize 1 > cacheMaxFileSize 50000000 > cacheIgnorecacheControl Off > cacheIgnoreNoLastMod On > cacheMaxExpire 2592000 > > ... resulting in a one-month cache on all TB2 objects. What's unclear to > me is whether the cacheEnable string allows substrings or whether it needs > to end in "/". If that's a limitation, are there third-party plugins that > can cache using wildcards? > > bp > > > > > -- Mattison Ward NESCent at Duke University 2024 W. Main Street, Suite A200 Durham, NC 27705-4667 919-668-4585 (desk) 919-668-4551 (alternate) 919-668-9198 (fax) |
From: Rutger V. <R....@re...> - 2011-11-18 12:15:21
|
Mmmmm... it applies an xslt stylesheet, but because of CDAO's verbose design apparently this doesn't always work so well. We could hide the rdf links for the time being? CDAO is slated to undergo further design to make it more scalable. On Thu, Nov 17, 2011 at 7:15 PM, William Piel <wil...@ya...> wrote: > > Hi Rutger: > > Do you know what the deal is with format=rdf ? > > For example: > > http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=rdf > > This seems to tie TreeBASE in a knot, not returning anything but causing a spike in CPU and Memory, as judged by javamelody: > > http://treebasedev.nescent.org/treebase-web/monitoring > > bp > > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2011-11-17 19:15:14
|
Hi Rutger: Do you know what the deal is with format=rdf ? For example: http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=rdf This seems to tie TreeBASE in a knot, not returning anything but causing a spike in CPU and Memory, as judged by javamelody: http://treebasedev.nescent.org/treebase-web/monitoring bp |
From: William P. <wil...@ya...> - 2011-11-17 19:11:59
|
On Nov 17, 2011, at 12:25 PM, Mattison Ward wrote: > I will test this on treebasedev. Thanks. To test it you could do the following: http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925 http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=html http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=nexml http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=nexus And in each case you should see a new record stored in cacheRoot, each in a different format. bp > On Thu, Nov 17, 2011 at 11:14 AM, William Piel <wil...@ya...> wrote: > > On Nov 17, 2011, at 10:02 AM, William Piel wrote: > > > Therefore, if you also know of an Apache plugin that will cache results for "/phylows/study/TB2:", that would greatly help. > > Looking at mod_cache, I wonder if this would work: > > cacheRoot c:/cacheroot > cacheEnable disk /treebase-web/phylows/study/TB2: > cacheDirLevels 1 > cacheDirLength 20 > cacheMinFileSize 1 > cacheMaxFileSize 50000000 > cacheIgnorecacheControl Off > cacheIgnoreNoLastMod On > cacheMaxExpire 2592000 > > ... resulting in a one-month cache on all TB2 objects. What's unclear to me is whether the cacheEnable string allows substrings or whether it needs to end in "/". If that's a limitation, are there third-party plugins that can cache using wildcards? > > bp > > > > > > > > -- > Mattison Ward > NESCent at Duke University > 2024 W. Main Street, Suite A200 > Durham, NC 27705-4667 > 919-668-4585 (desk) > 919-668-4551 (alternate) > 919-668-9198 (fax) > |
From: Mattison W. <mat...@ne...> - 2011-11-17 17:25:36
|
I will test this on treebasedev. On Thu, Nov 17, 2011 at 11:14 AM, William Piel <wil...@ya...>wrote: > > On Nov 17, 2011, at 10:02 AM, William Piel wrote: > > > Therefore, if you also know of an Apache plugin that will cache results > for "/phylows/study/TB2:", that would greatly help. > > Looking at mod_cache, I wonder if this would work: > > cacheRoot c:/cacheroot > cacheEnable disk /treebase-web/phylows/study/TB2: > cacheDirLevels 1 > cacheDirLength 20 > cacheMinFileSize 1 > cacheMaxFileSize 50000000 > cacheIgnorecacheControl Off > cacheIgnoreNoLastMod On > cacheMaxExpire 2592000 > > ... resulting in a one-month cache on all TB2 objects. What's unclear to > me is whether the cacheEnable string allows substrings or whether it needs > to end in "/". If that's a limitation, are there third-party plugins that > can cache using wildcards? > > bp > > > > > -- Mattison Ward NESCent at Duke University 2024 W. Main Street, Suite A200 Durham, NC 27705-4667 919-668-4585 (desk) 919-668-4551 (alternate) 919-668-9198 (fax) |