A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow wikipedia2XML

wikipedia2XML Web Site

You Might Also Like
AI-based, Comprehensive Service Management for Businesses and IT Providers Icon
AI-based, Comprehensive Service Management for Businesses and IT Providers

Modular solutions for change management, asset management and more

ChangeGear provides IT staff with the functions required to manage everything from ticketing to incident, change and asset management and more. ChangeGear includes a virtual agent, self-service portals and AI-based features to support analyst and end user productivity.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of wikipedia2XML!

Additional Project Details

Intended Audience

Science/Research, Developers

User Interface

Console/Terminal

Programming Language

Python

Related Categories

Python Scientific Engineering, Python Wiki Software

Registered

2008-03-30