From: Emmanuel E. <ke...@ki...> - 2017-08-03 21:29:55
|
Hi I want to announce here the publication of new datasets to easy make selections of Wikipedia articles. This data can be used by any developer or tech-friendly guy to create subset of Wikipedia. You can find the data here: http://download.kiwix.org/wp1/ (or via FTP). This data repository will be kept up-to-date every month thanks to a few scripts which are published here: https://github.com/openzim/wp1_selection_tools. Of course, everything is free software. For each of the 500.000+ Wikipedias, you can find there TSV tables which contain usual indicators of importance for each article: like number of interlanguage links, number of links pointing to an articles, pageviews, ... All gathered in one file. For the Wikipedia in English you will benefit in addition of the Wikiproject importance/quality evaluations. If you are really lazy, there is a "score" file which mix all these indicators to give a unique score number per article. The methodology is described here https://github.com/openzim/wp1_selection_tools. For example, if you want tje TOP1000 articles of Wikipedia, just take the first thousand lines of the "score" file to get your list of articles. All this work has been done to allow the creation of TOP Wikipedia articles ZIM files. It has also been done to make possible the creation of ZIM extension files, a concept we want to develop to improve our WikiMed Android apps. Both of them will appear before the end of the year. Stay tuned! Regards Emmanuel -- Kiwix - Wikipedia Offline & more * Web: http://www.kiwix.org * Twitter: https://twitter.com/KiwixOffline * more: http://www.kiwix.org/wiki/Communication |