Since the xml dumps have gotten bigger, the extraction scripts have started failing, or at least thrashing around memory until they slow completely down. The problem areas are in building the anchor_summary and page_links_in files
I have exactly the same kind of problem and it's not resolved in the last version of patchWikipediaData.pl
Log in to post a comment.
I have exactly the same kind of problem and it's not resolved in the last version of patchWikipediaData.pl