[Htmlparser-user] Encoding issue
Brought to you by:
derrickoswald
From: Rupanu R. <rup...@ya...> - 2007-09-26 06:07:33
|
Hello there, Well, I copied and pasted the code you gave but there seems to be an issue with encoding.I am trying to read from a non-unicode htm/html file and extract its contents and write them into a text file. Here's the code ********************************* String inputfile = args[0]; Parser parser = new Parser (inputfile); StringBean sb = new StringBean (); parser.visitAllNodesWith (sb); String content = sb.getStrings(); String outputfilename= "E:\\outputfile.txt"; OutputStreamWriter osw= new OutputStreamWriter(new FileOutputStream(outputfilename)); //, "UTF8" osw.write(content); osw.close(); ********************************************** and here is the exception I get org.htmlparser.util.EncodingChangeException: character mismatch (new: ? [0xfeff] != old: [0xefï]) for encoding change from ISO-8859-1 to UTF-8 at character offset 0 However then I wrote the following code which served my purpose to some extent.But could you please explain what was the issue there and how can i render the encoding of an htm/html file.(offline/saved in my hard drive). *************** StringExtractor strext = new StringExtractor(input); String content = strext.extractStrings(false); String outputfilename="output.txt"; OutputStreamWriter osw= new OutputStreamWriter(new FileOutputStream(outputfilename), "UTF8"); osw.write(content); ************* --------------------------------- Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search. |