Re: [DL-Learner discussion] tips on large knowledge bases

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Jens

Yes, data is confidential for two reasons: (1) it contains real patient
identification and (2) it is being used on a research project not yet
published. So I can't share it as it is yet. But I can generate a random
ontology following the same topology as the original file to reproduce the
problem.

I already tried with lower recursion levels, but I think the interesting
stuff will be in fact at level 4. However the ontology has many circular
paths, so I guess the connections will exponentially rise with the
recursion level. Does dl-learner implement some sort of loop detection?

I'll talk with the project PI to know how she feels about a cooperation on
this, but I think it should be extremelly and mutually benefitious. I'll
contact you in private later.

Thanks!

--
diogo patrão

On Fri, Feb 20, 2015 at 4:39 AM, Jens Lehmann <
le...@in...> wrote:

>
> Hello,
>
> Am 19.02.2015 um 13:26 schrieb Diogo FC Patrao:
> >
> > Below is the configuration file. I'm sorry, but I can't share the sparql
> > endpoint, because it's hosted on our intranet. Besides this, I changed
> > the script /cli/ to increase Xmmx to 25000M.
>
> Is the data itself confidential? Otherwise, you could also share the
> dump behind it via dropbox etc. (not necessarily public, just sharing
> with Lorenz or me would be sufficient as it could save as some time to
> look into the problem - we can sign NDAs as well if needed). We can then
> load it into an endpoint here for testing.
>
> Also in the conf file, it may be good to specify some termination
> criterion (e.g. 5 minutes via alg.maxExecutionTimeInSeconds = 300) to
> avoid the algorithm running forever. (If it doesn't find a perfect
> solution, it will indeed always run out of memory at some point otherwise.)
>
> Recursion depth 4 could be quite high depending on the data. Trying
> lower depths first would be something to test. (It depends on how deeply
> nested you expect the learned constructs to be.)
>
> Generally, we are currently looking into various approaches and
> algorithms related to scalability (also across several machines), so if
> you like to involve us in the cancer patient use case, we'd be more than
> happy to do so and could run classifications on larger machines here.
> For us, it would be a good additional test case to verify whether the
> improvements we are planning at the moment lead to good results.
>
> Kind regards,
>
> Jens
>
> --
> Dr. Jens Lehmann
> AKSW Group, Department of Computer Science, University of Leipzig
> Homepage: http://www.jens-lehmann.org
> GPG Key: http://jens-lehmann.org/jens_lehmann.asc
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> dl-learner-discussion mailing list
> dl-...@li...
> https://lists.sourceforge.net/lists/listinfo/dl-learner-discussion
>