#303 Import: Importing from offline file doesn't default to Search v2 (Categories don't work)

v1.2.*
closed
nobody
v1.2.2
2015-03-10
2014-01-08
Anonymous
No

For me, all de.wikipedia-categories are empty (sqlite, v2). Other wikis work as expected, e.g. de.wikibooks (still xdat, v1). My last de.wikpedia-import was with 0.12.2.0 (IIRC), and I currently use 1.1.1.1. --Schnark

Discussion

  • gnosygnu
    gnosygnu
    2014-01-09

    • labels: --> category, import
    • status: new --> investigating
    • Milestone: PFE --> v1.1.*
     
  • gnosygnu
    gnosygnu
    2014-01-09

    Hmmm.... I tried now with dewiki, and it upgraded from v1 to v2. I tested with de.wikipedia.org/wiki/Kategorie:Präsident_der_Vereinigten_Staaten. (the page lists the presidents correctly by their sort key)

    Two points:

    Then again, I'm guessing that you would have had the category files in place (i.e.: categorylinks.sql and page_props.sql in /xowa/wiki/de.wikipedia.org)

    If the reupgrade fixes it, I'll keep it in mind for later investigation.
    If the reupgrade fails, check the log and see if there are any errors.

    Thanks.

     

  • Anonymous
    2014-01-10

    Try to reupgrade again. Reupgrading v2 will rebuild the entire category databases for v2 (i.e.: you don't have to have v1 to upgrade to v2)

    I tried to update, but this wasn't successful. The progress log ends with

    loading dump file: 0000000027.csv
    loading category_registry files: /home/michael/bin/xowa/wiki/de.wikipedia.org/tmp/import.sql.category_registry/

    session.txt ends with

    20140109_130708.515 wiki.init.bgn:de.wikipedia.org
    20140109_130708.920 wiki.init.db_mgr
    20140109_130709.015 wiki.init.lang
    20140109_130709.268 wiki.init.css
    20140109_130709.270 wiki.init.done
    20140109_130709.547 cmd bgn: import.sql.category_registry
    20140109_130712.569 cmd end: import.sql.category_registry 3s 20f
    20140109_130712.570 cmd bgn: import.sql.categorylinks

    The files in tmp/import.sql.category_registry/ look sensible, the last one 5.csv ends with

    Žíp|!,8ZZ
    ŽŪR-Funktionär|!."]&
    ǁKaras|!-_@\ ǁKhara_Hais|!-a+4
    ’s-Hertogenbosch|!&Pi1

    When I try to exit XOWA the warning about a running import is shown, but even after several hours no progress appears to happen.

    Do you remember how you upgraded? There is a known issue with upgrading to v2 if you don't have the category files already in place.

    The two files were in the the de.wikipedia.org directory, and I chose cat v2 when I imported de.wikipedia.org. (Note: I actually never update a wiki. I rename the de.wikipedia.org folder to de.wikipedia.old, create a new one, put the files in it, run the import process, and delete the old folder after I confirmed, that the import process was successful.) I remember the alert box with "Script done" appeared.

    But I noticed another strange thing in that import: Though I have search v2 in the options, the wiki was imported with search v1, and I had to update manually (which worked). --Schnark

     
  • gnosygnu
    gnosygnu
    2014-01-11

    Thanks for the incredibly detailed breakdown.

    "Short" answer: I'm honestly at a loss why it doesn't work. If you still have the files / times, try to reimport the wiki. I did this twice more today and it worked. See the section at the bottom.

    Longer answer:

    The progress log ends with
    loading dump file: 0000000027.csv
    loading category_registry files: /home/michael/bin/xowa/wiki/de.wikipedia.org/tmp/import.sql.category_registry/

    That's pretty interesting. Just so you know, there are three steps to the category.v2 import process

    • Parse the .sql file and generate smaller .txt files from it
    • Sort the .txt files
    • Insert the .txt files into the database.

    Your statements indicate that it finished the "parse" stage (you have all 27 files) and failed sometime during the "sort" or the "insert" stage.

    Out of curiosity, check the following directory: /xowa/wiki/de.wikipedia.org/tmp/import.sql.categorylinks/sort
    You should have 27 files, with 0000000027.csv ending with the following:

    Übersetzer|p|WIEMKEN, HELMUT|!.'Ag|#9;3D|
    Śródmieście_(Warschau)|p|Q22|!.$c$|#97HD|
    Świdwin|p|SWIDWIN|!.&tG|#9;)D|
    Świnoujście|p|FLUSSIGGASTERMINAL SWINEMUNDE|!.&Jo|#9:+V|
    

    If you don't have this, then the Sort failed. I'd be pretty surprised if it failed, as it was old code that was used heavily in the xdat wiki generation. I haven't changed it in months.
    If this is not complete, let me know the last file, and the last lines in it. Also, check to make sure you have enough disk space / file permissions (I really can't think of anything else)

    If you do have this, then it may have failed while creating the database. Is there a /xowa/wiki/de.wikipedia.org/de.wikipedia.org.004.sqlite3?
    If there isn't, then somehow it failed to create (which would be strange).

    If the file does exist, open it up in sqlite and run the following:

    sqlite> select count(*) from categorylinks;
    --8358673
    

    If the table doesn't exist, then XOWA / SQLite failed in a weird way.

    Finally, the next lines after those you cited are the following:

    loading dump file: 0000000000.csv
    inserting category row: 100000
    inserting category row: 200000
    inserting category row: 300000
    

    The files in tmp/import.sql.category_registry/ look sensible, the last one 5.csv ends with

    Yup. This is what's on my machine.

    (Note: I actually never update a wiki...)

    Your way is much cleaner, as you guarantee that you always start with a clean slate, and can fall back on the old copy if something goes wrong. I'm hoping you only do this b/c of Category 2 issues. I don't know anything else that would require fallback (for example, failing to parse the main articles.xml)

    Though I have search v2 in the options, the wiki was imported with search v1, and I had to update manually (which worked)

    This is odd. It worked when I tried it tonight. However, for the record, I rarely use Help:Import/Script.

    • When testing, I use Help:Import/Script (which is always category v1)
    • When building image databases, I use the command-line (home/wiki/Help:Import/Command-line/Script). It is always category v2, but the mechanism is slightly different (Import/Script adds a few more steps)

    Import process

     

  • Anonymous
    2014-01-11

    While you wrote your answer, things changed on my side, so I currently can only answer your question about /xowa/wiki/de.wikipedia.org/de.wikipedia.org.004.sqlite3: No, that file is missing.

    Now Wiki maintenance shows:

    de.wikipedia.org Error Error 2013-12-29 Dump in progress sqlite3 v0 v1

    There is no .004.sqlite3 file, and trying to access de.wikipedia fails (with some error message I didn't write down, but which complained about the fact that there is a file .005.sqlite3, but no .004.sqlite3). Trying to re-import didn't do anything after showing "preparing import: de.wikipedia.org" in the progress bar. I tried to revert to an older version of XOWA, but even the 0.12.2 didn't import it, though it is exactly the same dump, I already successfully imported. The (manually) uncompressed XML-file looks good. Later I found out, that the import process works, when I manually delete the 000.sqlite3-file, but I hadn't time to see what I get in the end.

    I'm hoping you only do this b/c of Category 2 issues.

    Some time ago I had problems with corrupted downloads of the XML-dumps, but I'm too lazy to check the md5 sumbs, so I just import, and hope it works, and re-download when it didn't.

    I'll try to import everything from scratch and report on the outcome on Monday. --Schnark

     
  • gnosygnu
    gnosygnu
    2014-01-12

    I currently can only answer your question about /xowa/wiki/de.wikipedia.org/de.wikipedia.org.004.sqlite3: No, that file is missing.

    Hmm... So it failed before the inserts.

    Trying to re-import didn't do anything after showing "preparing import: de.wikipedia.org" in the progress bar

    Yeah, this is a defect in XOWA. If you have a corrupt wiki (and it sounds like you do), then you will need to manually delete the wiki. Otherwise the import will fail in strange ways.

    I'll look at this in the future.

    Some time ago I had problems with corrupted downloads of the XML-dumps

    Phew. Worried that there were import issues that you hadn't told me.

    I'll try to import everything from scratch and report on the outcome on Monday.

    Ok. A clean import really should work. (it worked for me 3x)

     

  • Anonymous
    2014-01-13

    I re-imported (with XOWA v1.1.1.1), the following script:

    // import wiki from dump file
    app.setup.cmds.cmd_add("wiki.dump_file", "/windows/system/dump.xml", "de.wikipedia.org", "unzip");

    // upgrade category to version 2
    app.setup.cmds.dump_add_many( 'de.wikipedia.org', 'latest', '', 'wiki.category2.build');

    with all files in place, and search v2 as default in the options. What I got was (probably as before) a wiki with working (!) category v2, but with search v1. I'll attach the log, apart from errors due to missing internet connection there is a strange SQL error, which might be the root cause of the problems. I did not update to search v2 manually, as this somehow probably destroyed the category system. --Schnark

     
    Attachments
  • gnosygnu
    gnosygnu
    2014-01-14

    Thanks for the script excerpt as well as the session.txt

    there is a strange SQL error, which might be the root cause of the problems

    Oops. There is a SQL error, but it is benign. "failed to save file: ttl=Bodensee_Raddampfer_Schiff.JPG" occurs b/c it is trying to download a file that's not in /xowa/file/de.wikipedia.org. XOWA tries to download missing files, but since you've got "download disabled", I think it fails badly. I'll look at this more later.

    The error is caught, logged, and ignored. Note that the categorylinks still completes. The next line is: "cmd end: import.sql.categorylinks 15m 24s 82f"

    a wiki with working (!) category v2

    That's good, no? I can't explain what happened the first time, but hopefully it was a fluke?

    but with search v1

    Ah, this is an issue. I was able to reproduce this now (and I never checked this before).

    The problem is that "read from file" doesn't default to Search v2. "download" does.

    Also, another user reported a similar issue: http://www.reddit.com/r/xowa/comments/1uikht/release_rollup_v111_rollup_release_offline/. I didn't realize that he was using a dump file.

    I'll fix this for v1.1.3 or v1.1.4.

    I did not update to search v2 manually, as this somehow probably destroyed the category system.

    I tried it now, and it worked fine. I did this on Help:Import/Script with the following:

    • Changed Language to German
    • Changed "Search system" to "only update to v2"
    • Clicked Import now (you can also use the script of app.setup.cmds.dump_add_many( 'de.wikipedia.org', 'latest', '', 'wiki.search2.build');)

    If you have hesitations you can backup the wikis before you try.

    Just so you know, updating to search v2 does the following steps:

    • Opens up the .000 database
    • Generates a new database (.005), and creates the search tables there
    • Reads every title in the page table and generates search data
    • When finished
      • registers the new database in the xowa_db table
      • sets a flag that says search is now v2

    So the only way it could mess up the category system, is if it somehow fails badly in the xowa_db save. (which it didn't do when I tried now, and shouldn't do in general)

     
  • gnosygnu
    gnosygnu
    2014-01-14

    • labels: category, import --> category, import, search
    • summary: Categories don't work --> Import: Importing from offline file doesn't default to Search v2 (Categories don't work)
    • status: investigating --> in-progress
    • Expected release: --> v1.1.4
     
  • Anselm D
    Anselm D
    2014-01-14

    That's good, no? I can't explain what happened the first time, but hopefully it was a fluke?

    but with search v1

    Ah, this is an issue. I was able to reproduce this now (and I never checked this before).

    For your information: I had this issue too:

    xowa / Discussion / General Discussion:upgrade search to version 2
    https://sourceforge.net/p/xowa/discussion/general/thread/7e0b1cf5/

     
  • gnosygnu
    gnosygnu
    2014-01-15

    Yup. I indirectly cited this thread above:

    There is a known issue with upgrading to v2 if you don't have the category files already in place. See https://sourceforge.net/p/xowa/discussion/general/thread/109b49a9/#e1bf

    109b49a9 has a link to your 7e0b1cf5

     

  • Anonymous
    2014-01-15

    Strange. Now the update to search v2 worked as expected. So let's just ignore the issue and hope it won't happen again. Nethertheless I'll attach again the session.txt, which includes a "error while executing script: err= class java.lang.NullPointerException null <java.lang.NullPointerException>" during the update (though I'm not sure whether this is enough information to let you anything do about it). --Schnark

     
    Attachments
  • gnosygnu
    gnosygnu
    2014-01-16

    Strange. Now the update to search v2 worked as expected.

    Ok. Thanks for confirming.

    I'm not sure what happened either. I'm going to set aside a chunk of time for v1.1.4 and Help:Import/Script. In addition to addressing known issues, I'll also put in better logging / automated-testing. Hopefully that will help if the issue arises again.

    Nethertheless I'll attach again the session.txt, which includes a "error while executing script: err= class java.lang.NullPointerException null "

    Yeah, that's my own fault it's so embarassingly useless. My best guess is....

    • You went to Help:Wiki_maintenance (it's the only thing that loads multiple disparate wikis at once)
    • You clicked on the "Search" button and then "Run script" (there's only a .8 second lag between "wiki.init.done" and "error while", but I'm going to assume you're a fast clicker.)
    • The error occurred just before the "Search" kicked off (the "Upgrade Search" is executed via a script)

    Needless to say, I can't reproduce this here.

    I'm hoping this will covered by the better logging I mentioned above. Otherwise, I wouldn't worry about it.

    Let me know if you run across anything else. In the meantime, I'll update this ticket again when I return to it next week.

     
    Last edit: gnosygnu 2014-01-16

  • Anonymous
    2014-01-16

    Yes, I updated through Help:Wiki_maintenance. The error might be related to my backup: For simplicity I copied the de.wikipedia.org folder to de.wikipedia.old, which was listed among the wikis, but of course with errors, as it wasn't recognized as valid wiki. So perhaps that caused the class java.lang.NullPointerException. --Schnark

     
  • gnosygnu
    gnosygnu
    2014-01-17

    I copied the de.wikipedia.org folder to de.wikipedia.old

    Thanks! And very good deduction.

    I was able to reproduce this now by renaming my de.wikipedia.org to de.wikipedia.old. The NullPointerException occurs because it is trying to find an entry for "de.wikipedia.old" from the html (http://dumps.wikimedia.org/backup-index.html).

    It is benign, but I fixed it for v1.1.3

     
  • gnosygnu
    gnosygnu
    2014-02-03

    I didn't get a chance to look at anything import related. I'll try to take a look at this again this week.

     
  • gnosygnu
    gnosygnu
    2014-02-03

    • Expected release: v1.1.4 --> v1.2.2
    • Milestone: v1.1. --> v1.2.
     
  • gnosygnu
    gnosygnu
    2014-02-08

    • status: in-progress --> done
     
  • gnosygnu
    gnosygnu
    2014-02-08

    I fixed this for v1.2.1. The import was actually populating the search v2 database, but it never "saved" the database, so it got overwritten with the category database.

     
  • gnosygnu
    gnosygnu
    2014-02-10

    • status: done --> closed
     


Anonymous


Cancel   Add attachments