treebase-devel Mailing List for TreeBASE (Page 8)

Status: Beta

Brought to you by: hlapp, naturalis, rvos, sfrgpiel

treebase-devel — TreeBASE developer mailing list

You can subscribe to this list here.

2009	Jan	Feb	Mar (1)	Apr (41)	May (41)	Jun (50)	Jul (14)	Aug (21)	Sep (37)	Oct (8)	Nov (4)	Dec (135)
2010	Jan (145)	Feb (110)	Mar (216)	Apr (101)	May (42)	Jun (42)	Jul (23)	Aug (17)	Sep (33)	Oct (15)	Nov (18)	Dec (6)
2011	Jan (8)	Feb (10)	Mar (8)	Apr (41)	May (48)	Jun (62)	Jul (7)	Aug (9)	Sep (7)	Oct (11)	Nov (49)	Dec (1)
2012	Jan (17)	Feb (63)	Mar (4)	Apr (13)	May (17)	Jun (21)	Jul (10)	Aug (10)	Sep	Oct	Nov	Dec (16)
2013	Jan (10)	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan (5)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (5)	Nov	Dec
2015	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec

Flat | Threaded

<< < 1 .. 6 7 8 9 10 .. 63 > >> (Page 8 of 63)

Re: [Treebase-devel] harvesting

From: Hilmar L. <hl...@ne...> - 2012-01-31 18:49:41

Well, yeah, that's I was suggesting with the AMI. -hilmar

On Jan 31, 2012, at 1:18 PM, Rutger Vos wrote:

> By the way, wouldn't it a good idea if instead we could simply share an image of the entire treebase environment? I heard PFAM has just started doing that and it might be a good idea if we want to entice volunteer developers.
> 
> On Fri, Jan 27, 2012 at 4:21 PM, William Piel <met...@gm...> wrote:
> 
> On Jan 27, 2012, at 6:48 AM, Rutger Vos wrote:
> 
>> Ok, I will look at that before trying to suck down too much.
> 
> Or perhaps "means-test" each study: first request the NEXUS file, but if the NTAX x NCHAR is too big (e.g. over 1 million characters for S12156) then use nexml.org to convert the NEXUS to NeXML instead of requesting the NeXML directly from TreeBASE (you lose the metadata, but it's better than nothing). Otherwise, if the NEXUS is sufficiently small, make a second request for the NeXML. 
> 
> bp
> 
> 
> 
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel
> 
> 
> 
> 
> -- 
> Dr. Rutger A. Vos
> Bioinformaticist
> NCB Naturalis
> Visiting address: Einsteinweg 2, 2333 CC, Leiden, the Netherlands
> Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
> http://rutgervos.blogspot.com
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d_______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel

-- 
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================

Re: [Treebase-devel] harvesting

From: Rutger V. <rut...@gm...> - 2012-01-31 18:19:01

By the way, wouldn't it a good idea if instead we could simply share an
image of the entire treebase environment? I heard PFAM has just started
doing that and it might be a good idea if we want to entice volunteer
developers.

On Fri, Jan 27, 2012 at 4:21 PM, William Piel <met...@gm...> wrote:

>
> On Jan 27, 2012, at 6:48 AM, Rutger Vos wrote:
>
> Ok, I will look at that before trying to suck down too much.
>
>
> Or perhaps "means-test" each study: first request the NEXUS file, but if
> the NTAX x NCHAR is too big (e.g. over 1 million characters for S12156)
> then use nexml.org to convert the NEXUS to NeXML instead of requesting
> the NeXML directly from TreeBASE (you lose the metadata, but it's better
> than nothing). Otherwise, if the NEXUS is sufficiently small, make a second
> request for the NeXML.
>
> bp
>
>
>
>
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel
>
>


-- 
Dr. Rutger A. Vos
Bioinformaticist
NCB Naturalis
Visiting address: Einsteinweg 2, 2333 CC, Leiden, the Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

Re: [Treebase-devel] harvesting

From: William P. <met...@gm...> - 2012-01-27 15:21:47

On Jan 27, 2012, at 6:48 AM, Rutger Vos wrote:

> Ok, I will look at that before trying to suck down too much.

Or perhaps "means-test" each study: first request the NEXUS file, but if the NTAX x NCHAR is too big (e.g. over 1 million characters for S12156) then use nexml.org to convert the NEXUS to NeXML instead of requesting the NeXML directly from TreeBASE (you lose the metadata, but it's better than nothing). Otherwise, if the NEXUS is sufficiently small, make a second request for the NeXML. 

bp

Re: [Treebase-devel] harvesting

From: William P. <wil...@ya...> - 2012-01-25 21:05:28

On Jan 25, 2012, at 12:16 AM, William Piel wrote:

> Both TreeBASE's tallest dataset (~3,000 taxa) and it's widest dataset (~110,000 characters), download just fine

Actually, I spoke too soon. Mattison just noticed that a lot of nexml requests are getting hung up after he asked for "S12156?format=nexml". 

On Jan 25, 2012, at 2:41 PM, Mattison Ward wrote:
> FYI - I tried this url to make sure everything was working
> http://purl.org/phylo/treebase/phylows/study/TB2:S12156?format=nexml
> 
> Then I tried this -
> http://purl.org/phylo/treebase/phylows/study/TB2:S12329?format=nexml
> 
> Neither of these recent entries started downloading in the 5 or so
> minutes that I waited.
> 
> Other downloads seem to be working ok.

It turns out that study S12156 has two matrices -- one that is 876,159 characters wide, and the other that is 352,120 characters wide, so a NeXML download for this study is asking for 1,228,279 characters. 

Oddly enough, the same data downloaded as "?format=nexus" works fine... so the problem is with the efficiency of generating the NeXML serialization. But another strange effect is that all subsequent requests for NeXML files are also blocked -- i.e. once you clog the system with a massive NeXML request, others get hung too. 

So, for example, S1205 works fine on dev: 

http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1205?format=nexml

... but remains clogged on production, due to (I guess) several open requests for S12156. 

Rutger -- maybe before trying to suck down the database, you should first look into why our NeXML serializations are hanging? I notice that some things make NeXML much more verbose -- e.g. when CHARSETs are present -- could those be a problem? 

bp



17 current requests        Details      
Thread
Request
Elapsed time (ms)
Mean time (ms)
Cpu time (ms)
Mean cpu time (ms)
Hits sql
Mean hits sql
Time sql (ms)
Mean time sql (ms)
Executed method
http-8080-Processor6	
 /search/downloadAStudy.html?id=12156&format=nexml GET
 3,929,163	6,418	1,711	476	9	152	23,009	2,104	java.util.HashMap.getEntry(HashMap.java:364)
http-8080-Processor17	
 /search/downloadAStudy.html?id=12156&format=nexml GET
 3,794,021	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor50	
 /search/downloadAStudy.html?id=12156&format=nexml GET
 3,715,306	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor9	
 /search/downloadAStudy.html?id=12156&format=nexml GET
 3,628,908	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor16	
 /search/downloadAStudy.html?id=12156&format=nexml GET
 3,420,516	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor7	
 /search/downloadAStudy.html?id=12329&format=nexml GET
 3,386,509	6,418	738	476	9	152	335	2,104	java.util.HashMap.getEntry(HashMap.java:364)
http-8080-Processor3	
 /search/downloadAStudy.html?id=12329&format=nexml GET
 3,331,477	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor49	
 /search/downloadAStudy.html?id=12329&format=nexml GET
 3,239,040	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor15	
 /search/downloadAStudy.html?id=12156&format=nexml GET
 3,045,588	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor10	
 /search/downloadAStudy.html?id=12156&format=nexml GET
 3,013,936	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor1	
 /search/downloadAStudy.html?id=12156&format=nexml GET
 2,968,063	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor19	
 /search/downloadAStudy.html?id=10149&format=nexml GET
 2,934,114	6,418	0	476	6	152	260	2,104	java.util.HashMap.getEntry(HashMap.java:364)
http-8080-Processor20	
 /search/downloadAStudy.html?id=12329&format=nexml GET
 2,863,803	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor4	
 /search/downloadAStudy.html?id=736&format=nexml GET
 2,797,320	6,418	0	476	11	152	439	2,104	java.util.HashMap.getEntry(HashMap.java:364)
http-8080-Processor47	
 /search/downloadAStudy.html?id=1205&format=nexml GET
 1,477,133	6,418	0	476	9	152	151	2,104	java.util.HashMap.getEntry(HashMap.java:364)
http-8080-Processor8	
 /search/downloadAStudy.html?id=1205&format=nexml GET
 955,541	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)
http-8080-Processor43	
 /search/downloadAStudy.html?id=1205&format=nexml GET
 698,686	6,418	0	476	0	152	0	2,104	java.lang.Object.wait(Native Method)

Re: [Treebase-devel] harvesting

From: William P. <wil...@ya...> - 2012-01-25 05:17:05

On Jan 24, 2012, at 3:12 PM, Hilmar Lapp wrote:

> and we know already that the queries for some studies will time out if you use the REST API.

That certainly was true at one time, but we have since made fixes that should have solved those problems.

Rod Page's attempt to suck down all of TreeBASE did encounter studies that were timing out -- and he sent me a list of them. But later, when I tried to fetch them, they downloaded fine. So I think the problem was one of hitting the application in rapid fire, with an overall performance slowdown resulting from the cumulative effects of this rapid fire, and as a result certain studies were timing out on him. Hence my suggestion that Rutger purposely throttle his scripts.

Both TreeBASE's tallest dataset (~3,000 taxa) and it's widest dataset (~110,000 characters), download just fine:

tallest:

http://purl.org/phylo/treebase/phylows/study/TB2:S11686?format=nexus

widest:

http://purl.org/phylo/treebase/phylows/study/TB2:S12064?format=nexus

And this works to get a list of all URIs.

So unless there are specific cases of corrupt data (which there probably are), or the cumulative effects of excessive web service load causes subsequent time-outs, I don't anticipate any fundamental problems. (And if the former, we'd like to hear about which ones are corrupt). So I think this is worth the experiment, on the understanding that Rutger might need to halt what he's doing should we discover that he has a crippling effect on the service.

Re: [Treebase-devel] harvesting

From: Hilmar L. <hl...@ne...> - 2012-01-24 22:44:19

And to add to my previous response, a useful byproduct of such an effort could be a shared AMI, and in fact if you load up the Postgres dump to S3, you could slice up the file dump generation to run in parallel on multiple EC2 nodes. This could also be a nice target for an Education & Research grant from AWS, the next round of which, I think, are due in the first or second week of February.

-hilmar

Sent with a tap.

On Jan 24, 2012, at 11:07 AM, William Piel <wil...@ya...> wrote:

> 
> On Jan 24, 2012, at 7:53 AM, Rutger Vos wrote:
> 
>> Hi all,
>> 
>> I've had a request from one of Enrico Pontelli's students for a complete dump in NeXML of TreeBASE. I would like to have one as well for my own purposes. Because we now have caching this may not be as big a problem as previously, though most studies will not yet ever have been serialized to NeXML since the start of caching so we still need to be careful. On the plus side: once we've done this we will have all of them in cache so all subsequent requests should be more snappy. Can we come up with a reasonable waiting time between requests so we don't kill the server? Is there a quiet time during which this can best be done? Do tb-stage or tb-dev also have caches?
>> 
>> Rutger
> 
> I think this is a good idea, given that it will build up a war-chest of cached data. (In fact, maybe we should first extend the expire date on the cache so that this lasts longer?) Perhaps it will also catch datasets that are problematic. 
> 
> Google Analytics shows that activity is lowest on the weekend -- no surprise there. But maybe it would be better to do it during the week so that it's easy to intervene if the application gets locked up. Also, it might make sense to throttle the download process intentionally (e.g. interspersing requests with the "sleep" function in perl, for example) so that the application has ample time for garbage collection, etc, and so not to impact the system too much. Finally, even if you're not capturing NEXUS, maybe it would help to also download NEXUS as well, as the NEXUS cache is also valuable to build up. 
> 
> bp
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel

Re: [Treebase-devel] harvesting

From: Hilmar L. <hl...@ne...> - 2012-01-24 22:44:19

I do agree that having downloadable dumps of the TreeBASE content in different formats would be a good idea - in fact it was one of the deliverables of the just declined ABI grant. So of you want put this in place now without support that's cool of course. The problem is though that contrary to the plans in the grant you wouldn't be doing this here based on a NoSQL document store and SOLR index, but from the relational database, and we know already that the queries for some studies will time out of you use the REST API.

So I think the best way to accomplish this would be to dump the PostgreSQL database and reload it on a different server, where you can then generate the NEXUS and NeXML dumps. 

-hilmar

Sent with a tap.

On Jan 24, 2012, at 11:07 AM, William Piel <wil...@ya...> wrote:

> 
> On Jan 24, 2012, at 7:53 AM, Rutger Vos wrote:
> 
>> Hi all,
>> 
>> I've had a request from one of Enrico Pontelli's students for a complete dump in NeXML of TreeBASE. I would like to have one as well for my own purposes. Because we now have caching this may not be as big a problem as previously, though most studies will not yet ever have been serialized to NeXML since the start of caching so we still need to be careful. On the plus side: once we've done this we will have all of them in cache so all subsequent requests should be more snappy. Can we come up with a reasonable waiting time between requests so we don't kill the server? Is there a quiet time during which this can best be done? Do tb-stage or tb-dev also have caches?
>> 
>> Rutger
> 
> I think this is a good idea, given that it will build up a war-chest of cached data. (In fact, maybe we should first extend the expire date on the cache so that this lasts longer?) Perhaps it will also catch datasets that are problematic. 
> 
> Google Analytics shows that activity is lowest on the weekend -- no surprise there. But maybe it would be better to do it during the week so that it's easy to intervene if the application gets locked up. Also, it might make sense to throttle the download process intentionally (e.g. interspersing requests with the "sleep" function in perl, for example) so that the application has ample time for garbage collection, etc, and so not to impact the system too much. Finally, even if you're not capturing NEXUS, maybe it would help to also download NEXUS as well, as the NEXUS cache is also valuable to build up. 
> 
> bp
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel

Re: [Treebase-devel] harvesting

From: William P. <wil...@ya...> - 2012-01-24 16:07:40

On Jan 24, 2012, at 7:53 AM, Rutger Vos wrote:

> Hi all,
> 
> I've had a request from one of Enrico Pontelli's students for a complete dump in NeXML of TreeBASE. I would like to have one as well for my own purposes. Because we now have caching this may not be as big a problem as previously, though most studies will not yet ever have been serialized to NeXML since the start of caching so we still need to be careful. On the plus side: once we've done this we will have all of them in cache so all subsequent requests should be more snappy. Can we come up with a reasonable waiting time between requests so we don't kill the server? Is there a quiet time during which this can best be done? Do tb-stage or tb-dev also have caches?
> 
> Rutger

I think this is a good idea, given that it will build up a war-chest of cached data. (In fact, maybe we should first extend the expire date on the cache so that this lasts longer?) Perhaps it will also catch datasets that are problematic. 

Google Analytics shows that activity is lowest on the weekend -- no surprise there. But maybe it would be better to do it during the week so that it's easy to intervene if the application gets locked up. Also, it might make sense to throttle the download process intentionally (e.g. interspersing requests with the "sleep" function in perl, for example) so that the application has ample time for garbage collection, etc, and so not to impact the system too much. Finally, even if you're not capturing NEXUS, maybe it would help to also download NEXUS as well, as the NEXUS cache is also valuable to build up. 

bp

Re: [Treebase-devel] harvesting

From: Rutger V. <rut...@gm...> - 2012-01-24 14:32:58

On Tue, Jan 24, 2012 at 3:29 PM, Mattison Ward <mat...@ne...>wrote:

> tb-stage and tb-dev have caching enabled.
>
> From about 3 PM to 4 PM EST yesterday, the load on tb-production went
> through the roof from database activity even with caching enabled.
>

Mmmm... can you tell where it's coming from? I haven't started yet. Maybe
it's Rod Page: he's been putting some of his code for his phyloinformatics
course online.


> On Tue, Jan 24, 2012 at 7:53 AM, Rutger Vos <rut...@gm...> wrote:
> > Hi all,
> >
> > I've had a request from one of Enrico Pontelli's students for a complete
> > dump in NeXML of TreeBASE. I would like to have one as well for my own
> > purposes. Because we now have caching this may not be as big a problem as
> > previously, though most studies will not yet ever have been serialized to
> > NeXML since the start of caching so we still need to be careful. On the
> plus
> > side: once we've done this we will have all of them in cache so all
> > subsequent requests should be more snappy. Can we come up with a
> reasonable
> > waiting time between requests so we don't kill the server? Is there a
> quiet
> > time during which this can best be done? Do tb-stage or tb-dev also have
> > caches?
> >
> > Rutger
> >
> > --
> > Dr. Rutger A. Vos
> > http://rutgervos.blogspot.com
> >
> >
> ------------------------------------------------------------------------------
> > Keep Your Developer Skills Current with LearnDevNow!
> > The most comprehensive online learning library for Microsoft developers
> > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> > Metro Style Apps, more. Free future releases when you subscribe now!
> > http://p.sf.net/sfu/learndevnow-d2d
> > _______________________________________________
> > Treebase-devel mailing list
> > Tre...@li...
> > https://lists.sourceforge.net/lists/listinfo/treebase-devel
> >
>
>
>
> --
> Mattison Ward
> NESCent at Duke University
> 2024 W. Main Street, Suite A200
> Durham, NC 27705-4667
> 919-668-4585 (desk)
> 919-668-4551 (alternate)
> 919-668-9198 (fax)
>



-- 
Dr. Rutger A. Vos
http://rutgervos.blogspot.com

Re: [Treebase-devel] harvesting

From: Mattison W. <mat...@ne...> - 2012-01-24 14:30:12

tb-stage and tb-dev have caching enabled.

>From about 3 PM to 4 PM EST yesterday, the load on tb-production went
through the roof from database activity even with caching enabled.

-Mattison

On Tue, Jan 24, 2012 at 7:53 AM, Rutger Vos <rut...@gm...> wrote:
> Hi all,
>
> I've had a request from one of Enrico Pontelli's students for a complete
> dump in NeXML of TreeBASE. I would like to have one as well for my own
> purposes. Because we now have caching this may not be as big a problem as
> previously, though most studies will not yet ever have been serialized to
> NeXML since the start of caching so we still need to be careful. On the plus
> side: once we've done this we will have all of them in cache so all
> subsequent requests should be more snappy. Can we come up with a reasonable
> waiting time between requests so we don't kill the server? Is there a quiet
> time during which this can best be done? Do tb-stage or tb-dev also have
> caches?
>
> Rutger
>
> --
> Dr. Rutger A. Vos
> http://rutgervos.blogspot.com
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel
>



-- 
Mattison Ward
NESCent at Duke University
2024 W. Main Street, Suite A200
Durham, NC 27705-4667
919-668-4585 (desk)
919-668-4551 (alternate)
919-668-9198 (fax)

[Treebase-devel] harvesting

From: Rutger V. <rut...@gm...> - 2012-01-24 12:53:30

Hi all,

I've had a request from one of Enrico Pontelli's students for a complete
dump in NeXML of TreeBASE. I would like to have one as well for my own
purposes. Because we now have caching this may not be as big a problem as
previously, though most studies will not yet ever have been serialized to
NeXML since the start of caching so we still need to be careful. On the
plus side: once we've done this we will have all of them in cache so all
subsequent requests should be more snappy. Can we come up with a reasonable
waiting time between requests so we don't kill the server? Is there a quiet
time during which this can best be done? Do tb-stage or tb-dev also have
caches?

Rutger

-- 
Dr. Rutger A. Vos
http://rutgervos.blogspot.com

[Treebase-devel] cloudflare

From: Hilmar L. <hl...@ne...> - 2012-01-02 02:46:27

Would this be useful for TreeBASE to use, at least at the free level?  
Maybe at a minimum it could help with some of the bandwidth generated  
by crawlers.

https://www.cloudflare.com/overview

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================

[Treebase-devel] Fwd: [Prfboard] suggested blurb for page

From: Rutger V. <R....@re...> - 2011-12-14 21:37:20

Something like David's suggestion should go on the TreeBASE web pages,
so I'm forwarding it to get it on the radar.

---------- Forwarded message ----------
From: David Maddison <dav...@sc...>
Date: Wed, Dec 14, 2011 at 11:11 AM
Subject: [Prfboard] suggested blurb for page
To: prf...@ne...

Suggestion for the blurb at the bottom of the ToLWeb page.   An
equivalent one could go on the TreeBase home page:

The Tree of Life Web Project is governed/administered/managed by the
<a href="http://phylofoundation.org">Phyloinformatics Research
Foundation</a>, a non-profit organization devoted to promotion of
research and maintenance of the core databases of relevance to
phylogenetic biology, their associated software code, and other
resources in support of the field of phyloinformatics.

---------------------------------
David R. Maddison
Department of Zoology
3029 Cordley Hall
Oregon State University
Corvallis, OR  97331   USA

dav...@sc...

http://david.bembidion.org
http://mesquiteproject.org
http://macclade.org
http://tolweb.org

(541) 737 2834

_______________________________________________
Prfboard mailing list
Prf...@ne...
https://lists.nescent.org/mailman/listinfo/prfboard

-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading, RG6 6BX, United Kingdom
Tel: +44 (0) 118 378 7535
http://rutgervos.blogspot.com

Re: [Treebase-devel] Treebase Dev problems

From: William P. <wil...@ya...> - 2011-11-29 22:12:50

Hi Mattison,

What you suggest looks fine with me, i.e. set it to cache the following:

CacheEnable disk /treebase-web/search/downloadAStudy.html
CacheEnable disk /treebase-web/search/study/anyObjectAsRDF.rdf
CacheEnable disk /treebase-web/search/study/summary.html

but in addition, we should set it to cache these:

CacheEnable disk /treebase-web/search/downloadANexusFile.html
CacheEnable disk /treebase-web/search/downloadAMatrix.html
CacheEnable disk /treebase-web/search/downloadATree.html
CacheEnable disk /treebase-web/search/downloadAnAnalysisStep.html

But we don't want it to cache these:

http://treebase.org/treebase-web/search/studySearch.html
http://treebase.org/treebase-web/search/treeSearch.html
http://treebase.org/treebase-web/search/matrixSearch.html
http://treebase.org/treebase-web/search/taxonSearch.html

.. because the results are unstable. 

Is there any easy way that I can trigger de-caching for an object or a set of objects? For example, it is not unusual for someone to upload data to TreeBASE, trigger it to have the data released to the public, and then suddenly recant and ask that the data be withheld. Typically, I'll just go in and toggle the status of their submission back to "private." But if we're caching everything, it might take a few months before the data are really unavailable to the public.

I suppose if I had privileges to access the cache directory I could delete the offending objects. 

bp

On Nov 29, 2011, at 3:41 PM, Mattison Ward wrote:

> Hi Bill.
> 
> These links resolve respectively to:
> 
> http://treebase-dev.nescent.org/treebase-web/search/study/anyObjectAsRDF.rdf?namespacedGUID=TB2:S1925
> http://treebase-dev.nescent.org/treebase-web/search/study/summary.html?id=1925
> http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=1925&format=nexml
> http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=1925&format=nexus
> 
> Mod_cache would need to be set to something like this
> 
> CacheEnable disk /treebase-web/search/downloadAStudy.html
> CacheEnable disk /treebase-web/search/study/anyObjectAsRDF.rdf
> CacheEnable disk /treebase-web/search/study/summary.html
> 
> but I don't see an obvious way to restrict caching to TB2 objects.  Querystrings cannot be used to enable or disable caching for a specific page using mod_cache.  
>

Re: [Treebase-devel] Treebase Dev problems

From: Mattison W. <mat...@ne...> - 2011-11-29 20:42:17

Hi Bill.

These links resolve respectively to:

http://treebase-dev.nescent.org/treebase-web/search/study/anyObjectAsRDF.rdf?namespacedGUID=TB2:S1925
http://treebase-dev.nescent.org/treebase-web/search/study/summary.html?id=1925
http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=1925&format=nexml
http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=1925&format=nexus

Mod_cache would need to be set to something like this

CacheEnable disk /treebase-web/search/downloadAStudy.html
CacheEnable disk /treebase-web/search/study/anyObjectAsRDF.rdf
CacheEnable disk
/treebase-web/search/study/summary.html<http://treebase-dev.nescent.org/treebase-web/search/study/summary.html?id=1925>

but I don't see an obvious way to restrict caching to TB2 objects.
 Querystrings cannot be used to enable or disable caching for a specific
page using mod_cache.





On Thu, Nov 17, 2011 at 2:11 PM, William Piel <wil...@ya...> wrote:

>
> On Nov 17, 2011, at 12:25 PM, Mattison Ward wrote:
>
> I will test this on treebasedev.
>
>
> Thanks. To test it you could do the following:
>
> http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925
> http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=html
> http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=nexml
> http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=nexus
>
> And in each case you should see a new record stored in cacheRoot, each in
> a different format.
>
> bp
>
>
>
>
> On Thu, Nov 17, 2011 at 11:14 AM, William Piel <wil...@ya...>wrote:
>
>>
>> On Nov 17, 2011, at 10:02 AM, William Piel wrote:
>>
>> > Therefore, if you also know of an Apache plugin that will cache results
>> for "/phylows/study/TB2:", that would greatly help.
>>
>> Looking at mod_cache, I wonder if this would work:
>>
>> cacheRoot c:/cacheroot
>> cacheEnable disk /treebase-web/phylows/study/TB2:
>> cacheDirLevels 1
>> cacheDirLength 20
>> cacheMinFileSize 1
>> cacheMaxFileSize 50000000
>> cacheIgnorecacheControl Off
>> cacheIgnoreNoLastMod On
>> cacheMaxExpire 2592000
>>
>> ... resulting in a one-month cache on all TB2 objects. What's unclear to
>> me is whether the cacheEnable string allows substrings or whether it needs
>> to end in "/".  If that's a limitation, are there third-party plugins that
>> can cache using wildcards?
>>
>> bp
>>
>>
>>
>>
>>
>
>
> --
> Mattison Ward
> NESCent at Duke University
> 2024 W. Main Street, Suite A200
> Durham, NC 27705-4667
> 919-668-4585 (desk)
> 919-668-4551 (alternate)
> 919-668-9198 (fax)
>
>
>


-- 
Mattison Ward
NESCent at Duke University
2024 W. Main Street, Suite A200
Durham, NC 27705-4667
919-668-4585 (desk)
919-668-4551 (alternate)
919-668-9198 (fax)

Re: [Treebase-devel] treebase-error

From: William P. <wil...@ya...> - 2011-11-28 20:03:13

Okay, I think I figured out the source of the problem -- and it's not related to any settings changes (etc) that happened at the Tomcat / Postgres end.

Turns out that when the original submitter had uploaded his 800,000 character dataset, he didn't have any "character sets" exposed -- i.e. the notation that indicates the beginning and end of each gene. Later, this notation was introduced in subsequent uploads (that then failed). And then when I've been testing smaller uploads (e.g. 80,000 characters) I also used charsets, and it was getting hung up. Just now I uploaded the same 80,000 character dataset but without the charset, and boom -- right away it uploaded fine. 

A charset only has a small effect on the database. For example, my 80,000-character file with a charset only required that three records be created (one in the table charset, another in the table charset_colrange, and a third in the table columnrange). However, for Mesquite (the program that parses all incoming data), it probably means building up a lot of data in memory. My guess, then, is that our version of Mesquite is choking on large files that have charsets. 

The file with charsets is still choking up treebasedev (see here), so feel free to kill the current request that's hung there.

thanks,

bp

On Nov 28, 2011, at 12:28 PM, Mattison Ward wrote:

> Ok - pushed.
> 
> On Mon, Nov 28, 2011 at 11:56 AM, William Piel <wil...@ya...> wrote:
> 
> On Nov 28, 2011, at 11:52 AM, Mattison Ward wrote:
> 
>> Hi Bill and Harry.
>> 
>> A little before 10:55 AM the load went very high on treebase production.  A little before 11:44 the tomcat service stopped responding.  I restarted it.  The tomcat error logs are attached.
>> 
>> -- 
>> Mattison Ward
> 
> Thanks. 
> 
> Maybe it is time to do a push to production so that we can monitor activity with JavaMelody, and seeing as some bugs have been addressed. 
> 
> bp

Re: [Treebase-devel] treebase problem this afternoon

From: William P. <wil...@ya...> - 2011-11-28 16:09:08

On Nov 28, 2011, at 9:43 AM, Mattison Ward wrote:

> The only change in response to the large number of API hits was to limit the number of requests per second.  I just disabled that setting on both systems.  Please try another upload.
> 
> Mattison

Thanks Mattison. 

Production is feeling sluggish right now, so I'm uploading to dev instead. 

bp

Re: [Treebase-devel] treebase problem this afternoon

From: Mattison W. <mat...@ne...> - 2011-11-28 14:43:52

Hi Bill.

No changes to any settings except increasing the max heap size on tomcat on
production from 3 GB to 4 GB.  No updates to Tomcat, but regular updates to
Apache and the Linux OS do occur on an ongoing basis.

The only change in response to the large number of API hits was to limit
the number of requests per second.  I just disabled that setting on both
systems.  Please try another upload.

Mattison

On Thu, Nov 24, 2011 at 12:00 AM, William Piel <wil...@ya...>wrote:

>
> So after failing to upload several large files to production, I tried
> uploading a large file to dev. (10 taxa x 66472 characters). The upload
> page did a proxy time-out, but the request kept going on and on. Here it
> shows it some 6 hours later, consuming a lot of memory.
>
> It's odd because I don't think this problem used to happen. TreeBASE has
> files that are considerably larger that this one -- e.g. we had a
> submission on September 7. And I think a much larger file was uploaded
> about two weeks ago.
>
> So something weird has happened, with the result that TreeBASE is now
> underperforming.
>
> In two hours from now this wii all be reset because dev will be refreshed
> from production.  But clearly there is a problem (a) because tasks that
> TreeBASE used to be able to do, presently it cannot do, and (b) because
> what it cannot do seems to tie up a lot of memory and CPU, crippling it.
>
> Mattison: Can you think of any changes in terms of memory allocation (or
> recent upgrades) that may be affecting performance? e.g. were SQL timeouts
> shortened to deal with the hits we were getting on the API last week?
>
> It's all very vexing.
>
> bp
>
>


-- 
Mattison Ward
NESCent at Duke University
2024 W. Main Street, Suite A200
Durham, NC 27705-4667
919-668-4585 (desk)
919-668-4551 (alternate)
919-668-9198 (fax)

Re: [Treebase-devel] treebase problem this afternoon

From: William P. <wil...@ya...> - 2011-11-24 18:45:44

Just a follow-up to this problem, with some more details:

On November 12 2011 17:53:02 GMT, a user sucessfully uploaded a "Fig4" data file with 10 taxa x 352,120 characters. On November 22 2011 22:03:41 GMT, he successfully uploaded his "Fig5" data file with 10 taxa x 876,159 characters (even bigger!). 

But starting November 23, the very same "Fig4" data file wouldn't upload. Even by subdividing this file into smaller files (e.g. 10 x 60,000 each), these still cause the system to choke. 

So what happened around November 22-23 that has hobbled our ability to ingest large files? Did we, for example, shorten the SQL timeout? (in response to heavy API hits). 

bp

On Nov 24, 2011, at 12:00 AM, William Piel wrote:

> 
> So after failing to upload several large files to production, I tried uploading a large file to dev. (10 taxa x 66472 characters). The upload page did a proxy time-out, but the request kept going on and on. Here it shows it some 6 hours later, consuming a lot of memory.
> 
> It's odd because I don't think this problem used to happen. TreeBASE has files that are considerably larger that this one -- e.g. we had a submission on September 7. And I think a much larger file was uploaded about two weeks ago. 
> 
> So something weird has happened, with the result that TreeBASE is now underperforming. 
> 
> In two hours from now this wii all be reset because dev will be refreshed from production.  But clearly there is a problem (a) because tasks that TreeBASE used to be able to do, presently it cannot do, and (b) because what it cannot do seems to tie up a lot of memory and CPU, crippling it. 
> 
> Mattison: Can you think of any changes in terms of memory allocation (or recent upgrades) that may be affecting performance? e.g. were SQL timeouts shortened to deal with the hits we were getting on the API last week?  
> 
> It's all very vexing. 
> 
> bp

Re: [Treebase-devel] treebase problem this afternoon

From: William P. <wil...@ya...> - 2011-11-24 05:00:20

Attachments: treebase-web_treebasedev.pdf

So after failing to upload several large files to production, I tried uploading a large file to dev. (10 taxa x 66472 characters). The upload page did a proxy time-out, but the request kept going on and on. Here it shows it some 6 hours later, consuming a lot of memory.

It's odd because I don't think this problem used to happen. TreeBASE has files that are considerably larger that this one -- e.g. we had a submission on September 7. And I think a much larger file was uploaded about two weeks ago. 

So something weird has happened, with the result that TreeBASE is now underperforming. 

In two hours from now this wii all be reset because dev will be refreshed from production.  But clearly there is a problem (a) because tasks that TreeBASE used to be able to do, presently it cannot do, and (b) because what it cannot do seems to tie up a lot of memory and CPU, crippling it. 

Mattison: Can you think of any changes in terms of memory allocation (or recent upgrades) that may be affecting performance? e.g. were SQL timeouts shortened to deal with the hits we were getting on the API last week?  

It's all very vexing. 

bp

Re: [Treebase-devel] Treebase Dev problems

From: Mattison W. <mat...@ne...> - 2011-11-21 16:40:07

The TB2 requests resolve to requests with querystrings such
as /treebase-web/search/downloadAStudy.html?id=1925&format=nexml

This results in this error on Apache "cache:
/treebase-web/search/studySearch.html?query=prism.publicationName=Nature&format=null&recordSchema=null
not cached. Reason: Query string present but no explicit expiration time"

Mod_cache will not work with querystrings from Tomcat until an expire time
is explicitly set within your application.  Here are three articles I found
with suggestions on how to do that.

http://raibledesigns.com/rd/entry/adding_expires_headers_with_oscache (download
now http://java.net/downloads/oscache/OSCache%202.4.1/oscache-2.4.1.jar)

http://www.tomred.net/java-tomcat-set-expires-headers.html

http://juliusdev.blogspot.com/2008/06/tomcat-add-expires-header.html

-Mattison

On Thu, Nov 17, 2011 at 11:14 AM, William Piel <wil...@ya...>wrote:

>
> On Nov 17, 2011, at 10:02 AM, William Piel wrote:
>
> > Therefore, if you also know of an Apache plugin that will cache results
> for "/phylows/study/TB2:", that would greatly help.
>
> Looking at mod_cache, I wonder if this would work:
>
> cacheRoot c:/cacheroot
> cacheEnable disk /treebase-web/phylows/study/TB2:
> cacheDirLevels 1
> cacheDirLength 20
> cacheMinFileSize 1
> cacheMaxFileSize 50000000
> cacheIgnorecacheControl Off
> cacheIgnoreNoLastMod On
> cacheMaxExpire 2592000
>
> ... resulting in a one-month cache on all TB2 objects. What's unclear to
> me is whether the cacheEnable string allows substrings or whether it needs
> to end in "/".  If that's a limitation, are there third-party plugins that
> can cache using wildcards?
>
> bp
>
>
>
>
>


-- 
Mattison Ward
NESCent at Duke University
2024 W. Main Street, Suite A200
Durham, NC 27705-4667
919-668-4585 (desk)
919-668-4551 (alternate)
919-668-9198 (fax)

Re: [Treebase-devel] Treebase Dev problems

From: Rutger V. <R....@re...> - 2011-11-18 12:15:21

Mmmmm... it applies an xslt stylesheet, but because of CDAO's verbose
design apparently this doesn't always work so well. We could hide the
rdf links for the time being? CDAO is slated to undergo further design
to make it more scalable.

On Thu, Nov 17, 2011 at 7:15 PM, William Piel <wil...@ya...> wrote:
>
> Hi Rutger:
>
> Do you know what the deal is with format=rdf ?
>
> For example:
>
> http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=rdf
>
> This seems to tie TreeBASE in a knot, not returning anything but causing a spike in CPU and Memory, as judged by javamelody:
>
>  http://treebasedev.nescent.org/treebase-web/monitoring
>
> bp
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel
>



-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading, RG6 6BX, United Kingdom
Tel: +44 (0) 118 378 7535
http://rutgervos.blogspot.com

Re: [Treebase-devel] Treebase Dev problems

From: William P. <wil...@ya...> - 2011-11-17 19:15:14

Hi Rutger:

Do you know what the deal is with format=rdf ? 

For example:

http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=rdf

This seems to tie TreeBASE in a knot, not returning anything but causing a spike in CPU and Memory, as judged by javamelody:

 http://treebasedev.nescent.org/treebase-web/monitoring

bp

Re: [Treebase-devel] Treebase Dev problems

From: William P. <wil...@ya...> - 2011-11-17 19:11:59

On Nov 17, 2011, at 12:25 PM, Mattison Ward wrote:

> I will test this on treebasedev.  

Thanks. To test it you could do the following:

http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925
http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=html
http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=nexml
http://purl.org/phylo/treebase/dev/phylows/study/TB2:S1925?format=nexus

And in each case you should see a new record stored in cacheRoot, each in a different format.

bp




> On Thu, Nov 17, 2011 at 11:14 AM, William Piel <wil...@ya...> wrote:
> 
> On Nov 17, 2011, at 10:02 AM, William Piel wrote:
> 
> > Therefore, if you also know of an Apache plugin that will cache results for "/phylows/study/TB2:", that would greatly help.
> 
> Looking at mod_cache, I wonder if this would work:
> 
> cacheRoot c:/cacheroot
> cacheEnable disk /treebase-web/phylows/study/TB2:
> cacheDirLevels 1
> cacheDirLength 20
> cacheMinFileSize 1
> cacheMaxFileSize 50000000
> cacheIgnorecacheControl Off
> cacheIgnoreNoLastMod On
> cacheMaxExpire 2592000
> 
> ... resulting in a one-month cache on all TB2 objects. What's unclear to me is whether the cacheEnable string allows substrings or whether it needs to end in "/".  If that's a limitation, are there third-party plugins that can cache using wildcards?
> 
> bp
> 
> 
> 
> 
> 
> 
> 
> -- 
> Mattison Ward
> NESCent at Duke University
> 2024 W. Main Street, Suite A200
> Durham, NC 27705-4667
> 919-668-4585 (desk)
> 919-668-4551 (alternate)
> 919-668-9198 (fax)
>

Re: [Treebase-devel] Treebase Dev problems

From: Mattison W. <mat...@ne...> - 2011-11-17 17:25:36

I will test this on treebasedev.


On Thu, Nov 17, 2011 at 11:14 AM, William Piel <wil...@ya...>wrote:

>
> On Nov 17, 2011, at 10:02 AM, William Piel wrote:
>
> > Therefore, if you also know of an Apache plugin that will cache results
> for "/phylows/study/TB2:", that would greatly help.
>
> Looking at mod_cache, I wonder if this would work:
>
> cacheRoot c:/cacheroot
> cacheEnable disk /treebase-web/phylows/study/TB2:
> cacheDirLevels 1
> cacheDirLength 20
> cacheMinFileSize 1
> cacheMaxFileSize 50000000
> cacheIgnorecacheControl Off
> cacheIgnoreNoLastMod On
> cacheMaxExpire 2592000
>
> ... resulting in a one-month cache on all TB2 objects. What's unclear to
> me is whether the cacheEnable string allows substrings or whether it needs
> to end in "/".  If that's a limitation, are there third-party plugins that
> can cache using wildcards?
>
> bp
>
>
>
>
>


-- 
Mattison Ward
NESCent at Duke University
2024 W. Main Street, Suite A200
Durham, NC 27705-4667
919-668-4585 (desk)
919-668-4551 (alternate)
919-668-9198 (fax)

9 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 6 7 8 9 10 .. 63 > >> (Page 8 of 63)