Re: [Pyparsing] Problem with eastern european characters when scraping data from the European Parli

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Thursday, June 10, 2010 13:27:11 Thomas Jensen wrote:
> Dear PyParser Experts
> 
> I am trying to scrape a lot of data from the European Parliament
> website for a research project. The first step is to create a list of
> all parliamentarians, however due to the many Eastern European names
> and the accents they use i get a lot of missing entries. Here is an
> example of what is giving me troubles (notice the accents at the end
> of the family name):

I would suggest you use BeatifulSoup for this instead of pyparsing. Pyparsing 
is great, but parsing HTML is a done thing, and to get it robust actually 
requires a *lot* of effort.

Diez

Re: [Pyparsing] Problem with eastern european characters when scraping data from the European Parli

Re: [Pyparsing] Problem with eastern european characters when scraping data from the European Parliament Website