From: Pablo N. M. <pab...@gm...> - 2014-12-01 15:42:50
|
Hi Ruben, Do you think this could improve on the current Gender extractor that Max and I created? We'd love to have it improved. Why don't you send a pull request over there? https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/GenderExtractor.scala I also like your idea to use this for anomaly detection. I wonder if we already have a way to output suggested "negative triples" in a standard fashion for the DEF? Meaning that we could have a bunch of "negative extractors" suggesting which triples should be deleted. I think Heiko and Dimitris have played with ideas related to this? Cheers Pablo On Mon, Dec 1, 2014, 04:07 Ruben Verborgh <rub...@ug...> wrote: > Dear all, > > This weekend, I quickly experimented with gender extraction from the Dutch > Wikipedia. > A summary of the approach and results is available here: > http://ruben.verborgh.org/blog/2014/11/30/distinguishing-between-frank- > and-nancy/ > > The highlights are: > - I extracted 52,686 gender indications with high confidence out of 80,499 > “people” articles. > 44,614 (85%) are man; 8,072 (15%) are women. > - A brief manual check didn't reveal any errors (yet). > - The algorithm can also help to improve data quality. > For instance, the article “27th government of Israel” is incorrectly > marked a person. It's results are: > 27e regering van Israël { male: 0.5, female: 0.0 }, compared to, for > instance: > A.H. Nijhoff { male: 3.5, female: 33.3 }. > This is an indication the “Person” label might be incorrect. > > The resulting software and datasets are on GitHub: > - https://github.com/RubenVerborgh/DBpediaGender > - https://github.com/RubenVerborgh/DBpediaGenderResults > The approach has now also been tested in the English version; > results in the above repository. > > Please let me know any feedback and/or questions. > > Best, > > Ruben > ------------------------------------------------------------ > ------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751& > iu=/4140/ostg.clktrk > _______________________________________________ > Dbpedia-discussion mailing list > Dbp...@li... > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > |