|
From: Diogo FC P. <djo...@gm...> - 2015-02-18 17:53:13
|
Hello First, congratulations on the release of this exciting project! I have a knowledge base about cancer patients with 6.240.880 triples, loaded on a open source virtuoso server. Before loading, we materialized inferences using pellet - it took almost 3 days in a 30GB RAM server. I have a configuration dl-learner file (based on the Actors example) which queries the SPARQL endpoint. But as I set the recursion level to 4, dl-learner end up taking too much memory and throws and exception. I have two questions: 1) Is it possible to configure what would be queried by dl-learner? There are data and object properties which could be removed from the analysis. Is there a way to filter it? 2) Can I turn off the built-in inference provided by dl-learner? (considering that everything is materialized in my KB). Thanks! -- diogo patrão |
|
From: Lorenz B. <spo...@st...> - 2015-02-19 11:26:09
|
Hello Diogo, thanks a lot. Can you share the conf file with us and maybe give us access to the endpoint or share the dump? Recursion depth 4 usually leads to a large amount of imported data for each given pos/neg example so there might be indeed some memory issue. Do you get just an OOM exception or something else? can you share the error stack trace? 1) Actually, it's not possible to configure a black or white list of allowed properties, but we could add this of course. 2) I'm not sure what you exactly mean by that. Do you mean the initialization step of the reasoner before the learning algorithm is started? Kind regards, Lorenz > Hello > > First, congratulations on the release of this exciting project! > > I have a knowledge base about cancer patients with 6.240.880 triples, > loaded on a open source virtuoso server. Before loading, we > materialized inferences using pellet - it took almost 3 days in a 30GB > RAM server. > > I have a configuration dl-learner file (based on the Actors example) > which queries the SPARQL endpoint. But as I set the recursion level to > 4, dl-learner end up taking too much memory and throws and exception. > > I have two questions: > > 1) Is it possible to configure what would be queried by dl-learner? > There are data and object properties which could be removed from the > analysis. Is there a way to filter it? > > 2) Can I turn off the built-in inference provided by dl-learner? > (considering that everything is materialized in my KB). > > Thanks! > > -- > diogo patrão > > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > > > _______________________________________________ > dl-learner-discussion mailing list > dl-...@li... > https://lists.sourceforge.net/lists/listinfo/dl-learner-discussion |
|
From: Diogo FC P. <djo...@gm...> - 2015-02-19 12:26:48
|
Hello Lorenz
Thanks for your fast response!
2) I read on the manual that dl reasoner has a built-in reasoner; It could
be pellet, hermit, fact++ or the owl api basic reasoner. I was wondering if
it can be turned off to save memory, as my endpoint contains all needed
inferences materialized.
Thanks!
Below is the configuration file. I'm sorry, but I can't share the sparql
endpoint, because it's hosted on our intranet. Besides this, I changed the
script *cli* to increase Xmmx to 25000M.
prefixes = [ ("of","http://cipe.accamargo.org.br/ontologias/ontofamily.owl#")
]
// SPARQL options
sparql.type = "SPARQL endpoint fragment"
sparql.url = "http://192.18.0.125:8890/sparql"
sparql.defaultGraphURIs = {""}
sparql.recursionDepth = 4
//TODOREFACTOR check if predefinedFilter works at all
//predefined filter (1 = YAGO based learning)
// 2 = SKOS, more Options are needed then though. replacePredicate,
breakSuperClassRetrievalAfter
sparql.predefinedFilter = "YAGO"
// the set of objects as starting point for fragment selection
// (should be identical to the set of examples)
sparql.instances = {
"http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente7166750", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente7166750", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente8057650", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10741180", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente7007779", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente7007779",
"http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente50024160", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente8063290", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10180430", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente6049338", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10119100", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10119120", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10170120"
}
reasoner.type = "fast instance checker"
reasoner.sources = {sparql}
lp.type = "posNegStandard"
lp.positiveExamples = {
"http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente7166750", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente7166750", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente8057650", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10741180", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente7007779", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente7007779"
}
lp.negativeExamples = {
"http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente50024160", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente8063290", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10180430", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente6049338", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10119100", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10119120", "
http://cipe.accamargo.org.br/ontologias/recruit.owl#paciente10170120"
}
lp.reasoner = reasoner
alg.type = "celoe"
---
below the exception:
Exception encountered during context initialization - cancelling refresh
attempt
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name 'alg': Injection of autowired dependencies failed; nested
exception is org.springframework.beans.fa
ctory.BeanCreationException: Could not autowire method: public void
org.dllearner.core.AbstractCELA.setReasoner(org.dllearner.core.AbstractReasonerComponent);
nested exception is org.spr
ingframework.beans.factory.BeanCreationException: Error creating bean with
name 'reasoner': Cannot resolve reference to bean 'sparql' while setting
bean property 'sources' with key [0];
nested exception is
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name 'sparql': Initialization of bean failed; nested exception is
java.lang.OutOfMem
oryError: Java heap space
at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:298)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1148)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458)
at
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:293)
at
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:223)
at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:290)
at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:191)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:636)
at
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:934)
at
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:479)
at
org.dllearner.configuration.spring.DefaultApplicationContextBuilder.buildApplicationContext(DefaultApplicationContextBuilder.java:60)
at org.dllearner.cli.CLI.main(CLI.java:259)
Caused by: org.springframework.beans.factory.BeanCreationException: Could
not autowire method: public void
org.dllearner.core.AbstractCELA.setReasoner(org.dllearner.core.AbstractReasonerComponent);
nested exception is
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name 'reasoner': Cannot resolve reference to bean 'sparql' while
setting bean property 'sources' with key [0]; nested exception is
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name 'sparql': Initialization of bean failed; nested exception is
java.lang.OutOfMemoryError: Java heap space
at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.inject(AutowiredAnnotationBeanPostProcessor.java:618)
at
org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:88)
at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:295)
... 12 more
Caused by: org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'reasoner': Cannot resolve reference to bean
'sparql' while setting bean property 'sources' with key [0]; nested
exception is org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'sparql': Initialization of bean failed; nested
exception is java.lang.OutOfMemoryError: Java heap space
at
org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:334)
at
org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:108)
at
org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveManagedSet(BeanDefinitionValueResolver.java:371)
at
org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:161)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1419)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1160)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458)
at
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:293)
at
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:223)
at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:290)
at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:191)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:921)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:864)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:779)
at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.inject(AutowiredAnnotationBeanPostProcessor.java:575)
... 14 more
Caused by: org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'sparql': Initialization of bean failed; nested
exception is java.lang.OutOfMemoryError: Java heap space
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:529)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458)
at
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:293)
at
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:223)
at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:290)
at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:191)
at
org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:328)
... 29 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:234)
at java.lang.StringBuilder.toString(StringBuilder.java:405)
at
org.apache.jena.atlas.json.io.parser.TokenizerJSON.allBetween(TokenizerJSON.java:575)
at
org.apache.jena.atlas.json.io.parser.TokenizerJSON.parseToken(TokenizerJSON.java:137)
at
org.apache.jena.atlas.json.io.parser.TokenizerJSON.hasNext(TokenizerJSON.java:59)
at
org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50)
at
org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92)
at
org.apache.jena.atlas.json.io.parser.JSONParserBase.nextToken(JSONParserBase.java:107)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:75)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseArray(JSONP.java:146)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:98)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
at
org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
at org.apache.jena.atlas.json.io.parser.JSONP.parse(JSONP.java:50)
at
org.apache.jena.atlas.json.io.parser.JSONParser.parse(JSONParser.java:58)
at
org.apache.jena.atlas.json.io.parser.JSONParser.parse(JSONParser.java:40)
at org.apache.jena.atlas.json.JSON._parse(JSON.java:141)
at org.apache.jena.atlas.json.JSON.parse(JSON.java:37)
at
com.hp.hpl.jena.sparql.resultset.JSONInput.parse(JSONInput.java:125)
at
com.hp.hpl.jena.sparql.resultset.JSONInput.process(JSONInput.java:109)
at
com.hp.hpl.jena.sparql.resultset.JSONInput.fromJSON(JSONInput.java:66)
at
com.hp.hpl.jena.query.ResultSetFactory.fromJSON(ResultSetFactory.java:346)
at
org.dllearner.kb.sparql.SparqlQuery.convertJSONtoResultSet(SparqlQuery.java:300)
at
org.dllearner.kb.sparql.SPARQLTasks.queryAsRDFNodeTuple(SPARQLTasks.java:413)
at
org.dllearner.kb.aquisitors.SparqlTupleAquisitor.retrieveTupel(SparqlTupleAquisitor.java:70)
at
org.dllearner.kb.aquisitors.TupleAquisitor.getTupelForResource(TupleAquisitor.java:65)
at
org.dllearner.kb.extraction.InstanceNode.expand(InstanceNode.java:68)
An Error Has Occurred During Processing.
Terminating DL-Learner...and writing stacktrace to: log/error.log
--
diogo patrão
On Thu, Feb 19, 2015 at 9:25 AM, Lorenz Bühmann <
spo...@st...> wrote:
> Hello Diogo,
>
> thanks a lot.
>
> Can you share the conf file with us and maybe give us access to the
> endpoint or share the dump? Recursion depth 4 usually leads to a large
> amount of imported data for each given pos/neg example so there might be
> indeed some memory issue. Do you get just an OOM exception or something
> else? can you share the error stack trace?
>
> 1) Actually, it's not possible to configure a black or white list of
> allowed properties, but we could add this of course.
>
> 2) I'm not sure what you exactly mean by that. Do you mean the
> initialization step of the reasoner before the learning algorithm is
> started?
>
> Kind regards,
> Lorenz
>
> Hello
>
> First, congratulations on the release of this exciting project!
>
> I have a knowledge base about cancer patients with 6.240.880 triples,
> loaded on a open source virtuoso server. Before loading, we materialized
> inferences using pellet - it took almost 3 days in a 30GB RAM server.
>
> I have a configuration dl-learner file (based on the Actors example)
> which queries the SPARQL endpoint. But as I set the recursion level to 4,
> dl-learner end up taking too much memory and throws and exception.
>
> I have two questions:
>
> 1) Is it possible to configure what would be queried by dl-learner?
> There are data and object properties which could be removed from the
> analysis. Is there a way to filter it?
>
> 2) Can I turn off the built-in inference provided by dl-learner?
> (considering that everything is materialized in my KB).
>
> Thanks!
>
> --
> diogo patrão
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> dl-learner-discussion mailing lis...@li...://lists.sourceforge.net/lists/listinfo/dl-learner-discussion
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> dl-learner-discussion mailing list
> dl-...@li...
> https://lists.sourceforge.net/lists/listinfo/dl-learner-discussion
>
>
|
|
From: Jens L. <le...@in...> - 2015-02-20 06:39:48
|
Hello, Am 19.02.2015 um 13:26 schrieb Diogo FC Patrao: > > Below is the configuration file. I'm sorry, but I can't share the sparql > endpoint, because it's hosted on our intranet. Besides this, I changed > the script /cli/ to increase Xmmx to 25000M. Is the data itself confidential? Otherwise, you could also share the dump behind it via dropbox etc. (not necessarily public, just sharing with Lorenz or me would be sufficient as it could save as some time to look into the problem - we can sign NDAs as well if needed). We can then load it into an endpoint here for testing. Also in the conf file, it may be good to specify some termination criterion (e.g. 5 minutes via alg.maxExecutionTimeInSeconds = 300) to avoid the algorithm running forever. (If it doesn't find a perfect solution, it will indeed always run out of memory at some point otherwise.) Recursion depth 4 could be quite high depending on the data. Trying lower depths first would be something to test. (It depends on how deeply nested you expect the learned constructs to be.) Generally, we are currently looking into various approaches and algorithms related to scalability (also across several machines), so if you like to involve us in the cancer patient use case, we'd be more than happy to do so and could run classifications on larger machines here. For us, it would be a good additional test case to verify whether the improvements we are planning at the moment lead to good results. Kind regards, Jens -- Dr. Jens Lehmann AKSW Group, Department of Computer Science, University of Leipzig Homepage: http://www.jens-lehmann.org GPG Key: http://jens-lehmann.org/jens_lehmann.asc |
|
From: Diogo FC P. <djo...@gm...> - 2015-02-20 11:54:42
|
Hi Jens Yes, data is confidential for two reasons: (1) it contains real patient identification and (2) it is being used on a research project not yet published. So I can't share it as it is yet. But I can generate a random ontology following the same topology as the original file to reproduce the problem. I already tried with lower recursion levels, but I think the interesting stuff will be in fact at level 4. However the ontology has many circular paths, so I guess the connections will exponentially rise with the recursion level. Does dl-learner implement some sort of loop detection? I'll talk with the project PI to know how she feels about a cooperation on this, but I think it should be extremelly and mutually benefitious. I'll contact you in private later. Thanks! -- diogo patrão On Fri, Feb 20, 2015 at 4:39 AM, Jens Lehmann < le...@in...> wrote: > > Hello, > > Am 19.02.2015 um 13:26 schrieb Diogo FC Patrao: > > > > Below is the configuration file. I'm sorry, but I can't share the sparql > > endpoint, because it's hosted on our intranet. Besides this, I changed > > the script /cli/ to increase Xmmx to 25000M. > > Is the data itself confidential? Otherwise, you could also share the > dump behind it via dropbox etc. (not necessarily public, just sharing > with Lorenz or me would be sufficient as it could save as some time to > look into the problem - we can sign NDAs as well if needed). We can then > load it into an endpoint here for testing. > > Also in the conf file, it may be good to specify some termination > criterion (e.g. 5 minutes via alg.maxExecutionTimeInSeconds = 300) to > avoid the algorithm running forever. (If it doesn't find a perfect > solution, it will indeed always run out of memory at some point otherwise.) > > Recursion depth 4 could be quite high depending on the data. Trying > lower depths first would be something to test. (It depends on how deeply > nested you expect the learned constructs to be.) > > Generally, we are currently looking into various approaches and > algorithms related to scalability (also across several machines), so if > you like to involve us in the cancer patient use case, we'd be more than > happy to do so and could run classifications on larger machines here. > For us, it would be a good additional test case to verify whether the > improvements we are planning at the moment lead to good results. > > Kind regards, > > Jens > > -- > Dr. Jens Lehmann > AKSW Group, Department of Computer Science, University of Leipzig > Homepage: http://www.jens-lehmann.org > GPG Key: http://jens-lehmann.org/jens_lehmann.asc > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > dl-learner-discussion mailing list > dl-...@li... > https://lists.sourceforge.net/lists/listinfo/dl-learner-discussion > |