A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow wikipedia2XML

wikipedia2XML Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of wikipedia2XML!

Additional Project Details

Intended Audience

Science/Research, Developers

User Interface

Console/Terminal

Programming Language

Python

Related Categories

Python Scientific Engineering, Python Wiki Software

Registered

2008-03-30