A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.
License
MIT LicenseFollow wikipedia2XML
Other Useful Business Software
Go From AI Idea to AI App Fast
Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of wikipedia2XML!