Thread: [Htmlparser-user] StringExtractor from String
Brought to you by:
derrickoswald
From: Tiago F. <tia...@gm...> - 2006-05-16 20:21:30
|
Hi all! I have a html document in a string, i want to strip html tags and so on... But, the StringExtractor only accept files and links. var "origem" is a string with a html document: try { StringExtractor temp =3D new StringExtractor(origem); origem =3D temp.extractStrings(false); } catch (org.htmlparser.util.ParserException e) { new Log("ParserException ("+e.getMessage()+")"); origem =3D e.getMessage(); } Any idea? Tks! []s FlycKER |
From: Derrick O. <der...@ro...> - 2006-05-16 23:31:05
|
I believe you can get the Parser the StringExtractor is using and then use SetInputHtml() to make your string the source. Tiago Fischer <tia...@gm...> wrote: Hi all! I have a html document in a string, i want to strip html tags and so on... But, the StringExtractor only accept files and links. var "origem" is a string with a html document: try { StringExtractor temp = new StringExtractor(origem); origem = temp.extractStrings(false); } catch (org.htmlparser.util.ParserException e) { new Log("ParserException ("+e.getMessage()+")"); origem = e.getMessage(); } Any idea? Tks! []s FlycKER ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Ian M. <ian...@gm...> - 2006-05-17 09:13:23
|
You want this: Parser parser =3D new Parser(); parser.setInputHTML(source); NodeList nodes =3D parser.parse(null); (catching the ParseException it will throw) Ian On 5/17/06, Derrick Oswald <der...@ro...> wrote: > I believe you can get the Parser the StringExtractor is using and then us= e > SetInputHtml() to make your string the source. > > > Tiago Fischer <tia...@gm...> wrote: > > Hi all! > > I have a html document in a string, i want to strip html tags and so on..= . > But, the StringExtractor only accept files and links. > > var "origem" is a string with a html document: > try { > StringExtractor temp =3D new StringExtractor(origem); > origem =3D temp.extractStrings(false); > } catch (org.htmlparser.util.ParserException e) { > new Log("ParserException ("+e.getMessage()+")"); > origem =3D e.getMessage(); > } > > Any idea? > Tks! > []s > FlycKER > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronim= o > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=120709&bid&3057&dat=121642 > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Tiago F. <tia...@gm...> - 2006-05-17 14:02:46
|
Tks, its working. here is the complete code: StringBean sb =3D new StringBean(); Parser parser =3D new Parser(); parser.setInputHTML(origem); // String with html parser.visitAllNodesWith (sb); sb.setLinks (false); String s =3D sb.getStrings(); On 5/17/06, Ian Macfarlane <ian...@gm...> wrote: > You want this: > > Parser parser =3D new Parser(); > parser.setInputHTML(source); > NodeList nodes =3D parser.parse(null); > > (catching the ParseException it will throw) > > Ian > > On 5/17/06, Derrick Oswald <der...@ro...> wrote: > > I believe you can get the Parser the StringExtractor is using and then = use > > SetInputHtml() to make your string the source. > > > > > > Tiago Fischer <tia...@gm...> wrote: > > > > Hi all! > > > > I have a html document in a string, i want to strip html tags and so on= ... > > But, the StringExtractor only accept files and links. > > > > var "origem" is a string with a html document: > > try { > > StringExtractor temp =3D new StringExtractor(origem); > > origem =3D temp.extractStrings(false); > > } catch (org.htmlparser.util.ParserException e) { > > new Log("ParserException ("+e.getMessage()+")"); > > origem =3D e.getMessage(); > > } > > > > Any idea? > > Tks! > > []s > > FlycKER > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, securit= y? > > Get stuff done quickly with pre-integrated technology to make your job > > easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geron= imo > > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=120709&bid&3057&dat=12164= 2 > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job ea= sier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronim= o > http://sel.as-us.falkag.net/sel?cmdlnk&kid=120709&bid&3057&dat=121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |