Re: [DL-Learner discussion] tips on large knowledge bases

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

Am 19.02.2015 um 13:26 schrieb Diogo FC Patrao:
>
> Below is the configuration file. I'm sorry, but I can't share the sparql
> endpoint, because it's hosted on our intranet. Besides this, I changed
> the script /cli/ to increase Xmmx to 25000M.

Is the data itself confidential? Otherwise, you could also share the 
dump behind it via dropbox etc. (not necessarily public, just sharing 
with Lorenz or me would be sufficient as it could save as some time to 
look into the problem - we can sign NDAs as well if needed). We can then 
load it into an endpoint here for testing.

Also in the conf file, it may be good to specify some termination 
criterion (e.g. 5 minutes via alg.maxExecutionTimeInSeconds = 300) to 
avoid the algorithm running forever. (If it doesn't find a perfect 
solution, it will indeed always run out of memory at some point otherwise.)

Recursion depth 4 could be quite high depending on the data. Trying 
lower depths first would be something to test. (It depends on how deeply 
nested you expect the learned constructs to be.)

Generally, we are currently looking into various approaches and 
algorithms related to scalability (also across several machines), so if 
you like to involve us in the cancer patient use case, we'd be more than 
happy to do so and could run classifications on larger machines here. 
For us, it would be a good additional test case to verify whether the 
improvements we are planning at the moment lead to good results.

Kind regards,

Jens

-- 
Dr. Jens Lehmann
AKSW Group, Department of Computer Science, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc