Menu

#353 Allow import in multiple sessions

PFE
pfe
nobody
import (31)
v2.*
2015-01-09
2014-10-09
Anonymous
No

I still haven't succeeded to update my copy of Wikidata, as it would take about 12 hours, and I don't want to run my computer for that long (of course I could do so, but I just don't want to). For some reason lbzip2 is slower than bzip for me, so this doesn't help, either. Would it be possible to import in multiple sessions? I can imagine that it is possible to split the compressed dump by bzip2recover into several smaller files and import them one after another, allowing to pause after each of them. --Schnark

Discussion

  • gnosygnu

    gnosygnu - 2014-10-10

    Would it be possible to import in multiple sessions? I can imagine that it is possible to split the compressed dump by bzip2recover into several smaller files and import them one after another, allowing to pause after each of them.

    I think this is still going to be tough. The main problem is that I do some things at the end (like generating indexes). If the import is done in parts, then I need some way for the user to signify "this is the end" and then have XOWA kick off the indexes (and some other minor tasks).

    In the end, I just need to add a general feature for "suspend-import-and-resume-later". I've had a few people ask about it, but it's another high priority item that I haven't had time for.

    In the meantime, I'm trying to upload wikidata up to archive.org. Unfortunately, archive.org has been slow recently, and the 7.7 GB timed out an hour ago (after 12 hours). I'm going to give it another try tonight, and will post again in this ticket when it's there. Hopefully this will (a) be faster and (b) tide you over until I get this ticket done

    For some reason lbzip2 is slower than bzip for me, so this doesn't help, either.

    Out of curiosity, have you just done a straightforward unzip to .xml and then import the .xml? Wikidata takes about 90 min - 120 minutes for me. I usually run it over night....

     
  • gnosygnu

    gnosygnu - 2014-10-10
    • labels: --> import
    • status: new --> pfe
    • Expected release: --> v2.*
    • Milestone: v1.8.* --> PFE
     
  • Anselm D

    Anselm D - 2014-10-12

    Hi Schnark!
    How many processors cores does your computer has?
    lbzip2 -dkc
    uses all cores and the complete import get slower, if it use more than two.
    So with lbzip2 two or one processors should be optimal.
    This is the command for two:
    lbzip2 -dkcn2

    Can you tell me a little bit more about your system? HD, filesystem, memory, processor, OS?

     
  • gnosygnu

    gnosygnu - 2014-10-13

    In the meantime, I'm trying to upload wikidata up to archive.org. Unfortunately, archive.org has been slow recently, and the 7.7 GB timed out an hour ago (after 12 hours).

    Just an update. This failed two more times over the weekend. At this point, since 20141009 is built, I'm going to try uploading that sometime tomorrow. Don't know what to do about the archive.org timeout though....

     
  • gnosygnu

    gnosygnu - 2014-10-14

    I finally managed to upload wikidata to archive.org this morning: https://archive.org/details/Xowa_wikidatawiki_latest

    This is the version from the 2014-10-09 dump. I haven't had a chance to proof it, but initial usage shows it is fine.

     
  • Anonymous

    Anonymous - 2014-11-05

    Sorry for not responding earlier, I had too many other things to do (and still have).

    @Anselm: I have two cores, but the -n2 seems to drastically improve the speed.

    I'll try and see whether my download manager finally succeeds to download the XOWA dump from archive.org, or whether I now am able to import the XML dump myself. --Schnark

     
  • Anonymous

    Anonymous - 2015-01-07

    Using -n2 allowed me to import Wikidata quite fast. So an option to import in multiple sessions isn't really necessary, though I still think it would be nice to have. --Schnark

     
  • gnosygnu

    gnosygnu - 2015-01-09

    Cool. Thanks for the follow-up. I think multiple sessions is still a way off (I'm still bogged down in Android). Out of curiosity, did you ever end up downloading the Wikidata dump? Do you think this is useful enough to update periodically in the future?

     

Anonymous
Anonymous

Add attachments
Cancel