A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow wikipedia2XML

wikipedia2XML Web Site

Other Useful Business Software
Construction Management Software for subcontractors Icon
Construction Management Software for subcontractors

PLEXXIS is a subcontractor solution uniting project management, accounting, estimating, takeoff and mobile apps on a single tech stack.

Plexxis serves subcontractors who seek elite team cohesion and performance. Coupling cloud construction management software, on-premise and hosted solutions, we unite operations, estimating, accounting and field apps on a single technology stack that enables live feedback between bidding, field and finance while in-house services drive continuous adoption.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of wikipedia2XML!

Additional Project Details

Intended Audience

Developers, Science/Research

User Interface

Console/Terminal

Programming Language

Python

Related Categories

Python Scientific Engineering, Python Wiki Software

Registered

2008-03-30