|
From: Diogo FC P. <djo...@gm...> - 2015-02-20 11:54:42
|
Hi Jens Yes, data is confidential for two reasons: (1) it contains real patient identification and (2) it is being used on a research project not yet published. So I can't share it as it is yet. But I can generate a random ontology following the same topology as the original file to reproduce the problem. I already tried with lower recursion levels, but I think the interesting stuff will be in fact at level 4. However the ontology has many circular paths, so I guess the connections will exponentially rise with the recursion level. Does dl-learner implement some sort of loop detection? I'll talk with the project PI to know how she feels about a cooperation on this, but I think it should be extremelly and mutually benefitious. I'll contact you in private later. Thanks! -- diogo patrão On Fri, Feb 20, 2015 at 4:39 AM, Jens Lehmann < le...@in...> wrote: > > Hello, > > Am 19.02.2015 um 13:26 schrieb Diogo FC Patrao: > > > > Below is the configuration file. I'm sorry, but I can't share the sparql > > endpoint, because it's hosted on our intranet. Besides this, I changed > > the script /cli/ to increase Xmmx to 25000M. > > Is the data itself confidential? Otherwise, you could also share the > dump behind it via dropbox etc. (not necessarily public, just sharing > with Lorenz or me would be sufficient as it could save as some time to > look into the problem - we can sign NDAs as well if needed). We can then > load it into an endpoint here for testing. > > Also in the conf file, it may be good to specify some termination > criterion (e.g. 5 minutes via alg.maxExecutionTimeInSeconds = 300) to > avoid the algorithm running forever. (If it doesn't find a perfect > solution, it will indeed always run out of memory at some point otherwise.) > > Recursion depth 4 could be quite high depending on the data. Trying > lower depths first would be something to test. (It depends on how deeply > nested you expect the learned constructs to be.) > > Generally, we are currently looking into various approaches and > algorithms related to scalability (also across several machines), so if > you like to involve us in the cancer patient use case, we'd be more than > happy to do so and could run classifications on larger machines here. > For us, it would be a good additional test case to verify whether the > improvements we are planning at the moment lead to good results. > > Kind regards, > > Jens > > -- > Dr. Jens Lehmann > AKSW Group, Department of Computer Science, University of Leipzig > Homepage: http://www.jens-lehmann.org > GPG Key: http://jens-lehmann.org/jens_lehmann.asc > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > dl-learner-discussion mailing list > dl-...@li... > https://lists.sourceforge.net/lists/listinfo/dl-learner-discussion > |