Re: [Htmlparser-user] format problem of text file after convertion of html to text file
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-02-08 06:03:25
|
Try : thewriter.write(HTMLParserUtils.removeEscapeCharacters(node.toPlainTextStrin g())); That should make it better. Regards, Somik ----- Original Message ----- From: "ChennaDulla" <che...@go...> To: <htm...@li...> Sent: Friday, February 07, 2003 6:26 AM Subject: [Htmlparser-user] format problem of text file after convertion of html to text file > hi i downloded htmlparser1.2 zip and i put htmlparser.jar > file under lib on my server and org folder under > web_inf ... it is wokring fine to convert html to text file > but the problem is format of text file ... > When i see text file after convertion the format is worst .. > why is the happending like that ... no certain format by > the time writing inot text file ... > here is the code i am using to convert html to text file ... > > import org.htmlparser.util.HTMLEnumeration; > import org.htmlparser.util.HTMLParserException; > import org.htmlparser.HTMLNode; > import org.htmlparser.HTMLParser; > import java.io.*; > import java.util.Properties; > > public class StringExtractor { > // String htmlFile = "/export/a.html"; > public StringExtractor() { > } > public void extractStrings(String htmlFile) throws > HTMLParserException { > try{ > HTMLParser parser = new HTMLParser > (htmlFile); > BufferedWriter thewriter = new BufferedWriter > (new FileWriter("/export/d.txt")); > HTMLNode node; > StringBuffer results= new StringBuffer(); > for (HTMLEnumeration e = parser.elements > ();e.hasMoreNodes();) { > node = e.nextHTMLNode(); > thewriter.write(node.toPlainTextString > ()); > } > thewriter.close(); > }catch(IOException e) { System.out.println > ("error in ConvertJspToHtml.java==="+e ); } > } > > } > > what changes i have to do to see html file in readable > format .. if i run above file it the text file is generating but > the format doesn't look good ... > Any help on this please ... > I am sending the one file as attachment .. i am getting > output in text file like that. ... > > thanks. > > > > -----Original Message----- > > From: htm...@li... > > [mailto:htm...@li...] On Behalf Of > > dha...@or... > > Sent: Thursday, February 06, 2003 11:47 PM > > To: htm...@li... > > Subject: RE: [Htmlparser-user] strip comments HTML source > > > > << File: BDY.RTF >> << File: BDY.RTF >> > |