A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.
License
MIT LicenseFollow wikipedia2XML
Other Useful Business Software
Grafana: The open and composable observability platform
Grafana is the open source analytics & monitoring solution for every database.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of wikipedia2XML!