Re: [Pyparsing] Problem with eastern european characters when scraping data from the European Parli
Brought to you by:
ptmcg
From: Diez B. R. <de...@we...> - 2010-06-10 12:57:00
|
On Thursday, June 10, 2010 13:27:11 Thomas Jensen wrote: > Dear PyParser Experts > > I am trying to scrape a lot of data from the European Parliament > website for a research project. The first step is to create a list of > all parliamentarians, however due to the many Eastern European names > and the accents they use i get a lot of missing entries. Here is an > example of what is giving me troubles (notice the accents at the end > of the family name): I would suggest you use BeatifulSoup for this instead of pyparsing. Pyparsing is great, but parsing HTML is a done thing, and to get it robust actually requires a *lot* of effort. Diez |