[Htmlparser-user] Encoding issue

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello there,

Well, I copied and pasted the code you gave but there seems to be an issue with encoding.I am trying to read from a non-unicode htm/html file and extract its contents and write them into a text file.
Here's the code 
*********************************
String inputfile = args[0];
          Parser parser = new Parser (inputfile);
          StringBean sb = new StringBean ();
          parser.visitAllNodesWith (sb);
            String content = sb.getStrings();
            String outputfilename= "E:\\outputfile.txt";            
            OutputStreamWriter osw= new OutputStreamWriter(new FileOutputStream(outputfilename));    //, "UTF8"
            osw.write(content);

                        osw.close();
**********************************************
and here is the exception I get
org.htmlparser.util.EncodingChangeException: character mismatch (new: ? [0xfeff] != old:  [0xefÃ¯]) for encoding change from ISO-8859-1 to UTF-8 at character offset 0

However then I wrote the following code which served my purpose to some extent.But could you please explain what was the issue there and how can i render the encoding of an htm/html file.(offline/saved in my hard drive).

***************
StringExtractor strext = new StringExtractor(input);
String content = strext.extractStrings(false);

        String outputfilename="output.txt";
        OutputStreamWriter osw= new OutputStreamWriter(new FileOutputStream(outputfilename), "UTF8");
        osw.write(content);
*************

---------------------------------
Luggage? GPS? Comic books? 
Check out fitting  gifts for grads at Yahoo! Search.