Tree [112303] eden / script.module.xmltodict /

File Date Author Commit
lib 2012-10-08 amet amet [e81401] [script.module.xmltodict] -v 0.2.0
tests 2012-10-08 amet amet [e81401] [script.module.xmltodict] -v 0.2.0
LICENSE.txt 2012-10-08 amet amet [e81401] [script.module.xmltodict] -v 0.2.0 2012-10-08 amet amet [e81401] [script.module.xmltodict] -v 0.2.0
addon.xml 2012-10-08 amet amet [e81401] [script.module.xmltodict] -v 0.2.0
changelog.txt 2012-10-08 amet amet [e81401] [script.module.xmltodict] -v 0.2.0

Read Me


xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":

Build Status


doc = xmltodict.parse("""
... <mydocument has="an attribute">
... <and>
... <many>elements</many>
... <many>more elements</many>
... </and>
... <plus a="complex">
... element as well
... </plus>
... </mydocument>
... """)

u'an attribute'
[u'elements', u'more elements']
u'element as well'

It's very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:


def handle_artist(_, artist):
... print artist['name']

... item_depth=2, item_callback=handle_artist)
A Perfect Circle
King Crimson
Chris Potter

It can also be used from the command line to pipe objects to a script like this:

python import sys, marshal while True: _, article = marshal.load(sys.stdin) print article['title']

sh $ cat enwiki-pages-articles.xml.bz2 | bunzip2 | 2 | AccessibleComputing Anarchism AfghanistanHistory AfghanistanGeography AfghanistanPeople AfghanistanCommunications Autism ...

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

sh $ cat enwiki-pages-articles.xml.bz2 | bunzip2 | 2 | gzip > enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

sh $ cat enwiki.dicts.gz | gunzip | $ cat enwiki.dicts.gz | gunzip | ...

Ok, how do I get it?

You just need to

sh $ pip install xmltodict