Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Share dump

luc
2013-02-24
2014-07-17
  • luc
    luc
    2013-02-24

    Hello,
    Can somebody share a valid couple of

    enwiki-20110722-pages-articles.xml.bz2
    and
    enwiki-20110722-csv.tar.gz

    To test installation before building a new csv summary (long process).

    Thanh you in advance
    best regards

     
    • lioo
      lioo
      2013-07-27

      do you hava 20110722.xml? if you do,please share it
      thank you

       
  • Alaa Alahmadi
    Alaa Alahmadi
    2013-02-25

    Hi

    I have it , it will take a time to upload it .

    Alaa

     
    • lioo
      lioo
      2013-07-27

      you have 20110722.xml? can you send it to me?
      thank u

       
  • luc
    luc
    2013-02-27

    Hi,
    Can you put it on FTP or share as big file (zipped)

    Bes regards

     
  • Hi,
    Yup. I need it as well. Can you share it out?

     
  • luc
    luc
    2013-03-07

    Thank you so much for sharing!!! There is no problem using a wikipedia dump from 2008 and a CSV summary from 2011?

    Best regards

     
  • Alaa Alahmadi
    Alaa Alahmadi
    2013-03-08

    Sorry about this ,I upload the old file for wikipedia miner 1 , you should downlwd the same wikipedia dump as  CSV summary . You can find it in the link below and  it has this name enwiki-20110722-pages-articles.xml .

    http://dumps.wikimedia.org/enwiki/

    best regards

     
  • luc
    luc
    2013-03-08

    This is exactly the key point !!
    enwiki-20110722-pages-articles.xml is not anymore available!!!

    So need to rebuild Csv file or find a couple of valid dataset CSV and Dump.

    Any on could provide one?

    Best regards

     
  • Sarah
    Sarah
    2013-04-09

    Hi,

    Did you get the enwiki-20110722-pages-articles.xml.bz2 file by any chance? If so, can you kindly share :)?
    thanks

     
  • Guillaume
    Guillaume
    2013-04-15

    Hi ,

    I also need a dump with his corresponding CSV summary.
    It seems that I'm not the only one and that would be very kind  to share a couple of valid data.

    Best regards.

     
  • Guillaume
    Guillaume
    2013-04-25

    I finally managed to extract the CSV summaries of a recent wikipedia dump ( I don't remember the exact date of the dump…).

    If someone needs it, I can upload it to a FTP server or an online service of your choice (9GB for the dump and 5.8GB (uncompressed) for the summaries).

     
  • Kyle
    Kyle
    2013-05-05

    Hi Muonique, that would be awesome if you could share the summaries! I have sent you some info via SF message and I can host publicly after receiving it.

     
  • Guillaume
    Guillaume
    2013-05-07

    No problem for sharing but I didn't receive your message.

     
    • Amrita Lakshmi
      Amrita Lakshmi
      2013-07-16

      Hi Guillaume,
      Which wikipedia dump have you extracted the CSV summaries for? I don't mean the exact date but is it a 2013 dump?
      Also, what hardware resources did you need for extraction? I'm trying to get an idea of how long it would take to process any of the recent Wikipedia dumps and how big a Hadoop cluster I will need for this, what memory size for each node etc.

      Thanks in advance.

       
  • Guillaume
    Guillaume
    2013-08-06

    Hi,

    I extracted the latest dump available in April 2013.
    It took about 2 days on a single node (8 core Xeon processor) and a few hours on 30 nodes (4 core processor). Sending the data on each node and the reduce phase were the main bottlenecks on the grid.

     
    • Mridul
      Mridul
      2013-11-05

      Hi Guillaume,

      It would be really great if you can share the files, I am working on a local test case and have to present it to people for which am using the wiki dump and the csv summary dump however I am not able to get the xml dump for either of the csv dumps which is available here (http://heanet.dl.sourceforge.net/project/wikipedia-miner/data/)

      It will be really cool that we put an effort and get it uploaded on sourceforgenet as it will help in solving the problem of a lot of people around here. Please reply soon as its quite urgent.

      thanks
      Mridul

       
  • rah kah
    rah kah
    2013-08-28

    Hi Guillaume,
    I really need an CSV summary for a recent dump, if you have it can you please share it with me. or the enwiki-20110722-pages-articles.xml file, I couldn;t find it any where, and I really need this...

     
    Last edit: rah kah 2013-08-28
  • Han Xiao
    Han Xiao
    2014-07-17

    Hi.

    I am trying to make the system work but the embarrasing parts are:

    1. extracting the summary won't work for me
    2. if the summary exists, the wikipedia dump does not.

    Any dumps as well as the corresponding summary available for sharing?