A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow wikipedia2XML

wikipedia2XML Web Site

Other Useful Business Software

2018 Network Intelligence Planning Guide Icon
2018 Network Intelligence Planning Guide Icon

Get insights on net neutrality, cloud readiness, security and WAN transformation.

Networking is becoming cloudier, hybrid and more Internet-centric. IT managers now own user experience, whether they own the networks or not. Get our latest ebook to learn how network intelligence will help you adapt to a quickly changing Internet-centric environment.
Are you involved with your company's network performance/operations team?
Get Ebook

Rate This Project

Login To Rate This Project

User Reviews

Be the first to post a review of wikipedia2XML!

Additional Project Details

Intended Audience

Developers, Science/Research

User Interface

Console/Terminal

Programming Language

Python

Registered

2008-03-30