Thread: [Xbrlapi-developer] Difficulties working with the rendering example
Brought to you by:
shuetrim
From: Matthew D. <ro...@gm...> - 2011-07-08 19:50:35
|
Hi Geoff, Following your suggestion, I have started to work with aspects and dimensions. I have been using your rendering example (org.xbrlapi.data.bdbxml.examples.render, Run.java) as a template, since it uses fact sets and aspect values in a number of ways. After some problems with my own code, I attempted to run your Run.java example as is, on http://www.sec.gov/Archives/edgar/data/93751/000119312510036481/stt-20091231.xml, which is in the data store. It takes a long time to run (upwards of a half hour), and fails with an Out of Memory Error after line 394 (once it starts building childConcepts). My memory is set to 2G (out of 4G system memory), which I should think would be sufficient for a single instance document. As mentioned above, I have also been running into problems with my own code, attempting to load all of the Facts in a particular instance into a FactSet. Specifically, if I retrieve all Facts from an instance, and attempt to use addFacts() to add those Facts to the FactSet, I get an Out of Memory Error. I have taken to loading the Facts one by one into the FactSet, which bypasses the memory error, but is extremely slow: each Fact takes many minutes (while writing, I have watched one take over 10 minutes). This does not make sense to me: if all of the Facts can be retrieved from the instance in a few seconds, I do not see why adding the Facts to the FactSet should take so long. In an attempt to speed the process up, I ran your utility to persist relationships (org.xbrlapi.data.bdbxml.examples.utilities, PersistAllRelationshipsInStore.java), but this did not help. I am beginning to suspect that there is something wrong with my installation, but I am not sure how to figure out what. Do you have any idea why these functions are eating so much memory and CPU on my system? I don't think that I will be able to use dimensions and aspects for my project (which will analyze over 500 annual reports) if I cannot solve these issues. Regards, Matt |
From: Geoff S. <ge...@ga...> - 2011-07-10 04:41:17
|
Matthew, I have tried running the Rendering example using a fresh BDBXML data store. The data took a few minutes to load (less than 10) and then then rendering process was done in about 1-2 minutes. I used 2 GB of memory and did not come up against any constraints. Note that the freemarker template does need a couple of small changes to get it to work with the updated freemarker library and to reflect the change in the name of the serializeToString method of the Store interface. To get the fixed version of the Freemarker template, you can update from SVN or just download it directly from *http://tinyurl.com/3aubjlj *The rendering result is in a file in SVN "result.html" that is also accessible from the SVN browse facility. Without more information I am not going to be able to troubleshoot your problem. Give it a go with the revised freemarker template in SVN and let me know if the problems persist. If they do, you might need to do some of your own timing evaluations to point us in the right direction. Regards Geoff Shuetrim On 9 July 2011 05:50, Matthew DeAngelis <ro...@gm...> wrote: > Hi Geoff, > > Following your suggestion, I have started to work with aspects and > dimensions. I have been using your rendering example > (org.xbrlapi.data.bdbxml.examples.render, Run.java) as a template, since it > uses fact sets and aspect values in a number of ways. After some problems > with my own code, I attempted to run your Run.java example as is, on > http://www.sec.gov/Archives/edgar/data/93751/000119312510036481/stt-20091231.xml, > which is in the data store. It takes a long time to run (upwards of a half > hour), and fails with an Out of Memory Error after line 394 (once it starts > building childConcepts). My memory is set to 2G (out of 4G system memory), > which I should think would be sufficient for a single instance document. > > As mentioned above, I have also been running into problems with my own > code, attempting to load all of the Facts in a particular instance into a > FactSet. Specifically, if I retrieve all Facts from an instance, and > attempt to use addFacts() to add those Facts to the FactSet, I get an Out of > Memory Error. I have taken to loading the Facts one by one into the > FactSet, which bypasses the memory error, but is extremely slow: each Fact > takes many minutes (while writing, I have watched one take over 10 minutes). > This does not make sense to me: if all of the Facts can be retrieved from > the instance in a few seconds, I do not see why adding the Facts to the > FactSet should take so long. In an attempt to speed the process up, I ran > your utility to persist relationships > (org.xbrlapi.data.bdbxml.examples.utilities, > PersistAllRelationshipsInStore.java), but this did not help. I am beginning > to suspect that there is something wrong with my installation, but I am not > sure how to figure out what. > > Do you have any idea why these functions are eating so much memory and CPU > on my system? I don't think that I will be able to use dimensions and > aspects for my project (which will analyze over 500 annual reports) if I > cannot solve these issues. > > > Regards, > Matt > > > > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2d-c2 > _______________________________________________ > Xbrlapi-developer mailing list > Xbr...@li... > https://lists.sourceforge.net/lists/listinfo/xbrlapi-developer > > |
From: Matthew D. <ro...@gm...> - 2011-07-14 15:49:56
|
Hi Geoff, I think that the problems I am having with my large datastore are caused by aspect models and linkroles being based on stores, not instances. As a result, when I invoke these functions, the program loads every relationship in the data store, and not just those for a particular instance. Understandably, this crushes the system. Is there a way to build aspect models and use linkroles from a particular instance only? I could not find a way in the documentation, but I could easily have missed it. If not, is there an easy way to create a kind of "virtual store", which pulls documents and persisted relationships for a particular instance out of the main store, but can be treated as a store object? If not, I will look into how to do this, and would appreciate any guidance that you might have. Regards, Matt On Mon, Jul 11, 2011 at 5:29 PM, Matthew DeAngelis <ro...@gm...>wrote: > Hi Geoff, > > Using the new material with a fresh, single-item BDBXML database works as > expected. Using the new material with my existing database (which contains > over 500 instance and related documents) continues to have (some) of the > same problems. As such, I'm going to guess that the slowdown and tendency > for memory overruns is due to database performance. I will try rebuilding > my database again to see if there is anything wrong with it. > > If it turns out that the size is the problem, I will have to look into > tuning BDBXML databases. Have you noticed scaling issues with these > databases in the past? From your experience, do you think that an eXist > database would fare better with a large volume of data? > > Thanks for helping me troubleshoot. Now that I can at least work with a > single document quickly, I am starting to make progress. > > > Regards, > Matt > > On Sun, Jul 10, 2011 at 12:41 AM, Geoff Shuetrim <ge...@ga...> wrote: > >> Matthew, >> >> I have tried running the Rendering example using a fresh BDBXML data >> store. The data took a few minutes to load (less than 10) and then then >> rendering process was done in about 1-2 minutes. I used 2 GB of memory and >> did not come up against any constraints. >> >> Note that the freemarker template does need a couple of small changes to >> get it to work with the updated freemarker library and to reflect the change >> in the name of the serializeToString method of the Store interface. To get >> the fixed version of the Freemarker template, you can update from SVN or >> just download it directly from *http://tinyurl.com/3aubjlj >> >> *The rendering result is in a file in SVN "result.html" that is also >> accessible from the SVN browse facility. >> >> Without more information I am not going to be able to troubleshoot your >> problem. Give it a go with the revised freemarker template in SVN and let >> me know if the problems persist. If they do, you might need to do some of >> your own timing evaluations to point us in the right direction. >> >> Regards >> >> Geoff Shuetrim >> >> >> >> >> On 9 July 2011 05:50, Matthew DeAngelis <ro...@gm...> wrote: >> >>> Hi Geoff, >>> >>> Following your suggestion, I have started to work with aspects and >>> dimensions. I have been using your rendering example >>> (org.xbrlapi.data.bdbxml.examples.render, Run.java) as a template, since it >>> uses fact sets and aspect values in a number of ways. After some problems >>> with my own code, I attempted to run your Run.java example as is, on >>> http://www.sec.gov/Archives/edgar/data/93751/000119312510036481/stt-20091231.xml, >>> which is in the data store. It takes a long time to run (upwards of a half >>> hour), and fails with an Out of Memory Error after line 394 (once it starts >>> building childConcepts). My memory is set to 2G (out of 4G system memory), >>> which I should think would be sufficient for a single instance document. >>> >>> As mentioned above, I have also been running into problems with my own >>> code, attempting to load all of the Facts in a particular instance into a >>> FactSet. Specifically, if I retrieve all Facts from an instance, and >>> attempt to use addFacts() to add those Facts to the FactSet, I get an Out of >>> Memory Error. I have taken to loading the Facts one by one into the >>> FactSet, which bypasses the memory error, but is extremely slow: each Fact >>> takes many minutes (while writing, I have watched one take over 10 minutes). >>> This does not make sense to me: if all of the Facts can be retrieved from >>> the instance in a few seconds, I do not see why adding the Facts to the >>> FactSet should take so long. In an attempt to speed the process up, I ran >>> your utility to persist relationships >>> (org.xbrlapi.data.bdbxml.examples.utilities, >>> PersistAllRelationshipsInStore.java), but this did not help. I am beginning >>> to suspect that there is something wrong with my installation, but I am not >>> sure how to figure out what. >>> >>> Do you have any idea why these functions are eating so much memory and >>> CPU on my system? I don't think that I will be able to use dimensions and >>> aspects for my project (which will analyze over 500 annual reports) if I >>> cannot solve these issues. >>> >>> >>> Regards, >>> Matt >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> All of the data generated in your IT infrastructure is seriously >>> valuable. >>> Why? It contains a definitive record of application performance, security >>> threats, fraudulent activity, and more. Splunk takes this data and makes >>> sense of it. IT sense. And common sense. >>> http://p.sf.net/sfu/splunk-d2d-c2 >>> _______________________________________________ >>> Xbrlapi-developer mailing list >>> Xbr...@li... >>> https://lists.sourceforge.net/lists/listinfo/xbrlapi-developer >>> >>> >> > |
From: Geoff S. <ge...@ga...> - 2011-07-14 21:37:04
|
Matthew, What you have found is indeed a feature of the XBRLAPI. The API is designed to work with large stores of data, seeing through the boundaries of XBRL instances. For data like the SEC filings, where many filings make a range of changes to the metadata in taxonomies, in ways that are often contradictory and sometimes inconsistent, this means that the traditional approach to working with linkbase networks of relationships can result in useless network analysis. It also can mean that things take a long time. I spent some time back when the SEC was looking for a visualisation tool, thinking about how to deal with specific instances only and that led to some new features in the API. Specifically, you can specify that query results in a store have to come from a specified set of documents (where those documents are those in a DTS of interest). That is done with the data store setFilteringURIs method. What URIs should you use? Find that out by using the data store's getMinimumDocumentSet methods. That works but not with the kind of performance that I would like (basically because it makes XQuery where clauses too lengthy). I would be interested in your mileage on that but I am guessing it will be inadequate. You could write the documents in the target set of URIs to a separate data store and work from there. I have never tried that but it may perform OK. I hope that at least clarifies what your situation is. Any progress you can make on this will definitely be useful to others, if not myself, if I can incorporate the features into the XBRLAPI. Let me know if I can help in any way. Regards Geoff Shuetrim On 15 July 2011 01:49, Matthew DeAngelis <ro...@gm...> wrote: > Hi Geoff, > > I think that the problems I am having with my large datastore are caused by > aspect models and linkroles being based on stores, not instances. As a > result, when I invoke these functions, the program loads every relationship > in the data store, and not just those for a particular instance. > Understandably, this crushes the system. > > Is there a way to build aspect models and use linkroles from a particular > instance only? I could not find a way in the documentation, but I could > easily have missed it. If not, is there an easy way to create a kind of > "virtual store", which pulls documents and persisted relationships for a > particular instance out of the main store, but can be treated as a store > object? If not, I will look into how to do this, and would appreciate any > guidance that you might have. > > > Regards, > Matt > > > On Mon, Jul 11, 2011 at 5:29 PM, Matthew DeAngelis <ro...@gm...>wrote: > >> Hi Geoff, >> >> Using the new material with a fresh, single-item BDBXML database works as >> expected. Using the new material with my existing database (which contains >> over 500 instance and related documents) continues to have (some) of the >> same problems. As such, I'm going to guess that the slowdown and tendency >> for memory overruns is due to database performance. I will try rebuilding >> my database again to see if there is anything wrong with it. >> >> If it turns out that the size is the problem, I will have to look into >> tuning BDBXML databases. Have you noticed scaling issues with these >> databases in the past? From your experience, do you think that an eXist >> database would fare better with a large volume of data? >> >> Thanks for helping me troubleshoot. Now that I can at least work with a >> single document quickly, I am starting to make progress. >> >> >> Regards, >> Matt >> >> On Sun, Jul 10, 2011 at 12:41 AM, Geoff Shuetrim <ge...@ga...>wrote: >> >>> Matthew, >>> >>> I have tried running the Rendering example using a fresh BDBXML data >>> store. The data took a few minutes to load (less than 10) and then then >>> rendering process was done in about 1-2 minutes. I used 2 GB of memory and >>> did not come up against any constraints. >>> >>> Note that the freemarker template does need a couple of small changes to >>> get it to work with the updated freemarker library and to reflect the change >>> in the name of the serializeToString method of the Store interface. To get >>> the fixed version of the Freemarker template, you can update from SVN or >>> just download it directly from *http://tinyurl.com/3aubjlj >>> >>> *The rendering result is in a file in SVN "result.html" that is also >>> accessible from the SVN browse facility. >>> >>> Without more information I am not going to be able to troubleshoot your >>> problem. Give it a go with the revised freemarker template in SVN and let >>> me know if the problems persist. If they do, you might need to do some of >>> your own timing evaluations to point us in the right direction. >>> >>> Regards >>> >>> Geoff Shuetrim >>> >>> >>> >>> >>> On 9 July 2011 05:50, Matthew DeAngelis <ro...@gm...> wrote: >>> >>>> Hi Geoff, >>>> >>>> Following your suggestion, I have started to work with aspects and >>>> dimensions. I have been using your rendering example >>>> (org.xbrlapi.data.bdbxml.examples.render, Run.java) as a template, since it >>>> uses fact sets and aspect values in a number of ways. After some problems >>>> with my own code, I attempted to run your Run.java example as is, on >>>> http://www.sec.gov/Archives/edgar/data/93751/000119312510036481/stt-20091231.xml, >>>> which is in the data store. It takes a long time to run (upwards of a half >>>> hour), and fails with an Out of Memory Error after line 394 (once it starts >>>> building childConcepts). My memory is set to 2G (out of 4G system memory), >>>> which I should think would be sufficient for a single instance document. >>>> >>>> As mentioned above, I have also been running into problems with my own >>>> code, attempting to load all of the Facts in a particular instance into a >>>> FactSet. Specifically, if I retrieve all Facts from an instance, and >>>> attempt to use addFacts() to add those Facts to the FactSet, I get an Out of >>>> Memory Error. I have taken to loading the Facts one by one into the >>>> FactSet, which bypasses the memory error, but is extremely slow: each Fact >>>> takes many minutes (while writing, I have watched one take over 10 minutes). >>>> This does not make sense to me: if all of the Facts can be retrieved from >>>> the instance in a few seconds, I do not see why adding the Facts to the >>>> FactSet should take so long. In an attempt to speed the process up, I ran >>>> your utility to persist relationships >>>> (org.xbrlapi.data.bdbxml.examples.utilities, >>>> PersistAllRelationshipsInStore.java), but this did not help. I am beginning >>>> to suspect that there is something wrong with my installation, but I am not >>>> sure how to figure out what. >>>> >>>> Do you have any idea why these functions are eating so much memory and >>>> CPU on my system? I don't think that I will be able to use dimensions and >>>> aspects for my project (which will analyze over 500 annual reports) if I >>>> cannot solve these issues. >>>> >>>> >>>> Regards, >>>> Matt >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> All of the data generated in your IT infrastructure is seriously >>>> valuable. >>>> Why? It contains a definitive record of application performance, >>>> security >>>> threats, fraudulent activity, and more. Splunk takes this data and makes >>>> sense of it. IT sense. And common sense. >>>> http://p.sf.net/sfu/splunk-d2d-c2 >>>> _______________________________________________ >>>> Xbrlapi-developer mailing list >>>> Xbr...@li... >>>> https://lists.sourceforge.net/lists/listinfo/xbrlapi-developer >>>> >>>> >>> >> > |
From: Geoff S. <ge...@ga...> - 2011-07-14 22:10:59
|
Matthew, Something else to consider: the rendering example does a lot more than just aspect analysis. The biggest impact on your processing is its work exploring presentation relationship networks which get quite massive in a huge datastore where everyone feels like tailoring the one network to their own purposes. What I suggest is that you cut that work out an use only the aspect analysis component of the code. You won't get the pretty printing of the data but you will get a nice factSet that can be rapidly extracted to a relational database, as I think you were working towards a while ago. Regards Geoff Shuetrim On 15 July 2011 07:36, Geoff Shuetrim <ge...@ga...> wrote: > Matthew, > > What you have found is indeed a feature of the XBRLAPI. The API is > designed to work with large stores of data, seeing through the boundaries of > XBRL instances. For data like the SEC filings, where many filings make a > range of changes to the metadata in taxonomies, in ways that are often > contradictory and sometimes inconsistent, this means that the traditional > approach to working with linkbase networks of relationships can result in > useless network analysis. It also can mean that things take a long time. > > I spent some time back when the SEC was looking for a visualisation tool, > thinking about how to deal with specific instances only and that led to some > new features in the API. > > Specifically, you can specify that query results in a store have to come > from a specified set of documents (where those documents are those in a DTS > of interest). That is done with the data store setFilteringURIs method. > > What URIs should you use? Find that out by using the data store's > getMinimumDocumentSet methods. > > That works but not with the kind of performance that I would like > (basically because it makes XQuery where clauses too lengthy). I would be > interested in your mileage on that but I am guessing it will be inadequate. > > You could write the documents in the target set of URIs to a separate data > store and work from there. I have never tried that but it may perform OK. > > I hope that at least clarifies what your situation is. Any progress you > can make on this will definitely be useful to others, if not myself, if I > can incorporate the features into the XBRLAPI. Let me know if I can help in > any way. > > Regards > > Geoff Shuetrim > > > > > On 15 July 2011 01:49, Matthew DeAngelis <ro...@gm...> wrote: > >> Hi Geoff, >> >> I think that the problems I am having with my large datastore are caused >> by aspect models and linkroles being based on stores, not instances. As a >> result, when I invoke these functions, the program loads every relationship >> in the data store, and not just those for a particular instance. >> Understandably, this crushes the system. >> >> Is there a way to build aspect models and use linkroles from a particular >> instance only? I could not find a way in the documentation, but I could >> easily have missed it. If not, is there an easy way to create a kind of >> "virtual store", which pulls documents and persisted relationships for a >> particular instance out of the main store, but can be treated as a store >> object? If not, I will look into how to do this, and would appreciate any >> guidance that you might have. >> >> >> Regards, >> Matt >> >> >> On Mon, Jul 11, 2011 at 5:29 PM, Matthew DeAngelis <ro...@gm...>wrote: >> >>> Hi Geoff, >>> >>> Using the new material with a fresh, single-item BDBXML database works as >>> expected. Using the new material with my existing database (which contains >>> over 500 instance and related documents) continues to have (some) of the >>> same problems. As such, I'm going to guess that the slowdown and tendency >>> for memory overruns is due to database performance. I will try rebuilding >>> my database again to see if there is anything wrong with it. >>> >>> If it turns out that the size is the problem, I will have to look into >>> tuning BDBXML databases. Have you noticed scaling issues with these >>> databases in the past? From your experience, do you think that an eXist >>> database would fare better with a large volume of data? >>> >>> Thanks for helping me troubleshoot. Now that I can at least work with a >>> single document quickly, I am starting to make progress. >>> >>> >>> Regards, >>> Matt >>> >>> On Sun, Jul 10, 2011 at 12:41 AM, Geoff Shuetrim <ge...@ga...>wrote: >>> >>>> Matthew, >>>> >>>> I have tried running the Rendering example using a fresh BDBXML data >>>> store. The data took a few minutes to load (less than 10) and then then >>>> rendering process was done in about 1-2 minutes. I used 2 GB of memory and >>>> did not come up against any constraints. >>>> >>>> Note that the freemarker template does need a couple of small changes to >>>> get it to work with the updated freemarker library and to reflect the change >>>> in the name of the serializeToString method of the Store interface. To get >>>> the fixed version of the Freemarker template, you can update from SVN or >>>> just download it directly from *http://tinyurl.com/3aubjlj >>>> >>>> *The rendering result is in a file in SVN "result.html" that is also >>>> accessible from the SVN browse facility. >>>> >>>> Without more information I am not going to be able to troubleshoot your >>>> problem. Give it a go with the revised freemarker template in SVN and let >>>> me know if the problems persist. If they do, you might need to do some of >>>> your own timing evaluations to point us in the right direction. >>>> >>>> Regards >>>> >>>> Geoff Shuetrim >>>> >>>> >>>> >>>> >>>> On 9 July 2011 05:50, Matthew DeAngelis <ro...@gm...> wrote: >>>> >>>>> Hi Geoff, >>>>> >>>>> Following your suggestion, I have started to work with aspects and >>>>> dimensions. I have been using your rendering example >>>>> (org.xbrlapi.data.bdbxml.examples.render, Run.java) as a template, since it >>>>> uses fact sets and aspect values in a number of ways. After some problems >>>>> with my own code, I attempted to run your Run.java example as is, on >>>>> http://www.sec.gov/Archives/edgar/data/93751/000119312510036481/stt-20091231.xml, >>>>> which is in the data store. It takes a long time to run (upwards of a half >>>>> hour), and fails with an Out of Memory Error after line 394 (once it starts >>>>> building childConcepts). My memory is set to 2G (out of 4G system memory), >>>>> which I should think would be sufficient for a single instance document. >>>>> >>>>> As mentioned above, I have also been running into problems with my own >>>>> code, attempting to load all of the Facts in a particular instance into a >>>>> FactSet. Specifically, if I retrieve all Facts from an instance, and >>>>> attempt to use addFacts() to add those Facts to the FactSet, I get an Out of >>>>> Memory Error. I have taken to loading the Facts one by one into the >>>>> FactSet, which bypasses the memory error, but is extremely slow: each Fact >>>>> takes many minutes (while writing, I have watched one take over 10 minutes). >>>>> This does not make sense to me: if all of the Facts can be retrieved from >>>>> the instance in a few seconds, I do not see why adding the Facts to the >>>>> FactSet should take so long. In an attempt to speed the process up, I ran >>>>> your utility to persist relationships >>>>> (org.xbrlapi.data.bdbxml.examples.utilities, >>>>> PersistAllRelationshipsInStore.java), but this did not help. I am beginning >>>>> to suspect that there is something wrong with my installation, but I am not >>>>> sure how to figure out what. >>>>> >>>>> Do you have any idea why these functions are eating so much memory and >>>>> CPU on my system? I don't think that I will be able to use dimensions and >>>>> aspects for my project (which will analyze over 500 annual reports) if I >>>>> cannot solve these issues. >>>>> >>>>> >>>>> Regards, >>>>> Matt >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> All of the data generated in your IT infrastructure is seriously >>>>> valuable. >>>>> Why? It contains a definitive record of application performance, >>>>> security >>>>> threats, fraudulent activity, and more. Splunk takes this data and >>>>> makes >>>>> sense of it. IT sense. And common sense. >>>>> http://p.sf.net/sfu/splunk-d2d-c2 >>>>> _______________________________________________ >>>>> Xbrlapi-developer mailing list >>>>> Xbr...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/xbrlapi-developer >>>>> >>>>> >>>> >>> >> > |