Is it possible to get all the pages from "Portal" namespace?(e.g. http://en.wikipedia.org/wiki/Portal:Contents/Portals & http://en.wikipedia.org/wiki/Portal:Java). My goal is to derive a hierarchical tree of all portal pages from the root portal page above. If so, what would be the best way to achieve this with wikipedia-miner? Thanks!
Unfortunately this isn't something I've catered for. The toolkit currently only extracts Article, Category and Template namespaces, so you would have to alter the extraction jobs (and re-run them) to get this information.
Log in to post a comment.