From: Waylan L. <wa...@gm...> - 2012-06-01 00:13:43
|
You might what to check out https://github.com/mrcoles/readmd Not sure it it will do what you want. Another option would be to run your text through markdown, then run markdown's output through a HTML parser. Then use the parse tree's "return text only feature" to get raw text with no markup. I'd start with either the html5lib [1] or lxml [2] html parsers. I prefer lxml, but it is a C lib and not so easy to install. But once you get it working, it is worth the effort. [1]: https://code.google.com/p/html5lib/ [2]: http://lxml.de/ Hope that helps. Waylan On Thu, May 31, 2012 at 5:59 PM, Mandaris <man...@gm...> wrote: > I'm new to python and I'm trying to make a script that will strip out the > meta data and give me the raw text. > > -- > Valediction, > Mandaris Moore III > (916) 538 - 2611 > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- ---- \X/ /-\ `/ |_ /-\ |\| Waylan Limberg |