[Htmlparser-user] Parsing ISO-8859-1 --> storing in UTF-8?
Brought to you by:
derrickoswald
From: Jan H. <jan...@gm...> - 2006-10-15 20:47:02
|
Hi guys, I'm trying to parse a website which is encoded in ISO-8859-1. I need to store extracted link-texts in UTF-8 format. My code looks like this: <code> Parser myParser = new Parser(); myParser.setURL(url); // I created a filter named "myLinkFilter" which filters LinkNodes NodeList myLinkNodeList = myParser.parse(myLinkFilter); Node myLinkNode = myLinkNodeList.elementAt(0); LinkTag linkTag = (LinkTag) myLinkNode; String linkText = linkTag.getLinkText(); </code> The problem now is, that certain characters (like the lower quotation marks: „Quote“) are converted to question marks. So I tried a coding like this: <code> String isoString = linkTag.getLinkText(); String utf8String = null; try { byte[] stringBytesISO = isoString.getBytes("ISO-8859-1"); utf8String = new String(stringBytesISO, "UTF-8"); } catch (UnsupportedEncodingException e) { // do something... } </code> But this still returns question marks in the utf8String. Any ideas what I need to change? Thanks and regards Jan Hempel |