|
From: Tomaž Š. <tom...@ta...> - 2012-05-16 10:12:33
|
You don't need to change Wikiprep for that. Categories are just like normal page in Wikipedia. So to find their names, just find the title of the page with that ID. Either in the original XML dump or in the Wikiprep output (gum.xml). There are not that many categories, so you can simply hold the mapping in memory and apply it in your indexer. Regards Tomaž On 05/15/2012 01:20 PM, vineet yadav wrote: > Hi, > I want to create lucene index of wikipedia. I want to create index of > wikipedia category names and store them in separate field. So I want > to store category names in hgw.xml file and use it for indexing. > Wikiprep gives category id instead of category names. Can you point me > out what changes I need to make to get category names? > Thanks > Vineet Yadav > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Wikiprep-user mailing list > Wik...@li... > https://lists.sourceforge.net/lists/listinfo/wikiprep-user > |