Re: [Dbpedia-developers] Dbpedia Framework and Spring

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Sebastian,

very nice. It had a closer look at your example and extended it to
also handle the dependencies of extractors. I used mixin traits
instead of structural typing. It's a matter of taste but I find this
solution clearer and it should also make it possible to write
extractors in Java. I attached a maven module with the extended
example.

Extractors are still implemented by inheriting from the Extractor
trait, but can additionally define their dependencies.

In the example there are two configurations:
WikipediaProfile: This specifies the extraction profile, which defines
the available dependencies (e.g. mappings). In addition we could
define a Wiktionary profile.
MyExtractionConfig: This specifies a specific extraction job (e.g.
wikisource, extractors).

The example is still very crude and still needs to be worked out in detail.

Cheers,
Robert

On Wed, Oct 13, 2010 at 1:17 AM, Sebastian Hellmann
<hel...@in...> wrote:
> Hi,
> I looked at
> http://jonasboner.com/2008/10/06/real-world-scala-dependency-injection-di.html
> and the first alternative: Using structural typing
> seems to be quite similar to what is done in Spring.
> I still have to admit that I didn't understand 90% of the rest that is
> written in the post, though.
> Jonas will start working on doing some adaptions during the next month (at
> least for the wiktionary things).
> @Robert Do you think this would be the way to go?
> I tried to apply it:
>
> class Extractionjob(env: {
>   val source: Source
>   val destination: Destination
>   val extractor: List[Extractor]
> })
>
> object Config {
>     lazy val source = XMLSource.fromFile(new File(mediawiki_test_dump.xml),
> _.namespace ==
>     wikiTitle.Namespace.File)
>
>     lazy val extractor = List[new TestExtractor(1,20 "myOption")]
>     lazy val destination = new StringDestination
>
>     lazy val extractionjob = new Extractionjob(this)
> }
>
> It looks very slim and efficient ;)
> I will be on holiday from 21.10.2010 until 20.11.2010, so it would be nice,
> if we'd decide right now.
> Cheers,
> Sebastian
>
>
> Am 12.10.2010 13:03, schrieb Robert Isele:
>
> Hi Sebastian,
>
> I also agree that we s
>
> hould generalize the DBpedia Framework. In my
> opinion, it's biggest drawback is the lacking configurability. e.g. at
> the moment each extractor takes an ExtractionContext object in its
> constructor, which contains the complete configuration even if most
> extractors only need a part of it (e.g. some extractors don't need the
> ontology in which case it does not need to be loaded). We could gain
> much flexibility, if we make this more configurable.
>
> I see two ways two achieve this:
> 1) Using Spring. As I understand, this would make it possible to
> configure the complete extraction process including all extractors
> using an XML configuration file.
> 2) Making the API more flexible. e.g. letting the extractors define
> which data they need in a static way (e.g. by using the Cake pattern
> [1]). The configuration would then be a small Scala script.
>
> I will have to take a deeper look into this. The biggest drawback,
> that I see with Spring is, that it might not fit smoothly into Scala
> and makes the configuration more complicated than necessary.
> e.g. comparing the configuration of the XMLFileSource:
> Spring:
>     <bean id="testSource" class="org.dbpedia.extraction.XMLFileSource">
>         <constructor-arg index="0">
>             <value>file:mediawiki_test_dump.xml</value>
>         </constructor-arg>
>         <constructor-arg index="1">
>             <list value-type="java.lang.Integer">
>                 <value>0</value>
>             </list>
>         </constructor-arg>
>     </bean>
>
> Scala:
> XMLSource.fromFile(new File(mediawiki_test_dump.xml), _.namespace ==
> WikiTitle.Namespace.File)
>
> For me, the second version looks much clearer and more descriptive. I
> understand that using the implementation language itself for
> configuration, instead of XML, may sound unusual in the Java World
> (although it is advocated by Google Guice [2]). But I think it is much
> cleaner and more flexible, than Spring's way of replicating the Java
> Beans Model in XML. While this may be a good idea in Java, I think
> Scala with it's more concise syntax and better type system, would
> provide a perfect way to configure a specific extraction script.
>
> As I don't know much about Spring yet, especially in the context of
> using it together with Scala, I will take a deeper look into it in the
> next days. As we are planning to make another release of  DBpedia in a
> few weeks, I can also commit some time into improving the framework,
> but will discuss it over the list before making any bigger
> refactoring.
>
> [1]
> http://jonasboner.com/2008/10/06/real-world-scala-dependency-injection-di.html
> [2] http://code.google.com/p/google-guice/
>
> On Mon, Oct 11, 2010 at 11:53 PM, Sebastian Hellmann
> <hel...@in...> wrote:
>
>
> Hi,
> today, I tried if the framework was compatible with Spring and it works:
> See the Wiktionary module:
> wiktionary/src/main/resources/config.xml
> wiktionary/src/main/scala/org.dbpedia.extraction.wiktionary.Extract line 32
> wiktionary/src/main/scala/org.dbpedia.extraction.XMLFileSource
> (in XMLFileSource note that I didn't succeed to do the conversion here
> correctly on line 19)
> wiktionary/src/main/scala/org.dbpedia.extraction.mappints/TestExtractor
>
> Per default all resources instantiated with spring are singletons.
> This might be useful for stuff like the ontology source or the commons.
> These can be injected easily into the extractors without the
> ExtractionContext.
> We would gain quite some flexibility with that. One of the LOD2 tasks is
> about generalizing the Framework software,
> which would be easily achived, if we had a Wiktionary dump ;)
>
> - @Robert/Max Please review and give feedback ASAP, I'm only here for 10
> more days.
> - Could somebody add Jonas to the developers list... I seem to have the
> forgotten the password.
>
> Cheers,
> Sebastian
>
>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
>
>
> ------------------------------------------------------------------------------
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
> Spend less time writing and  rewriting code and more time creating great
> experiences on the web. Be a part of the beta today.
> http://p.sf.net/sfu/beautyoftheweb
> _______________________________________________
> Dbpedia-developers mailing list
> Dbp...@li...
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
>
> ------------------------------------------------------------------------------
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
> Spend less time writing and  rewriting code and more time creating great
> experiences on the web. Be a part of the beta today.
> http://p.sf.net/sfu/beautyoftheweb
> _______________________________________________
> Dbpedia-developers mailing list
> Dbp...@li...
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>