Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Creating Zim files from Wikipedia data dumps.

Help
Anonymous
2013-11-10
2013-11-10

  • Anonymous
    2013-11-10

    Can someone please clarify if this workflow makes sense and how images can be incorporated into this:

    1. Download Wikipedia data dump
    2. Use pearl script from Wikipedia to import data to mysql
    3. Create a mediawiki installation and link it to the imported mysql data
    4. Run a script to create static html files from the mediawiki installation
    5. Use a script to package static files to zim files

    There are some image data dumps available but how do you link them to the data or are they already setup correctly if we place them in the correct directory of the mediawiki installation. Also the total size of this data will be very large is there any easy way to filter the data so only specific categories of images are downloaded? Can I write a script the selects articles based on some filtering criteria - importance, size, references, access statistics and just download the relevant media data for it - will this be to complicated?

    Thanks

     
    • The approach is valid. Getting recursively the content of a category of Wikipedia is challenging, and probably won't give good results in many cases.

       


Anonymous


Cancel   Add attachments