htmlparser-user Mailing List for HTML Parser (Page 9)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: ExpoFieraItalia <bo...@ex...> - 2013-03-08 18:49:19
|
ADSL + TELEFONO a 24,90/mese per sempre Prezzo Garantito Promozione riservata alle nuove attivazioni su Rete TeleTu <style>body,a { font-family:Arial; font-size:10px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:10px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da ExpoFieraItalia, mittente della comunicazione.Pertanto ExpoFieraItalia, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, ExpoFieraItalia non assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni.I suoi dati, forniti a ExpoFieraItalia direttamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da ExpoFieraItalia e dai marchi correlati.E' suo diritto rimuoversi in ogni momento dal database, al link di disiscrizione (Se non visualizzi correttamente il link di disiscrizione copialo e incollalo nel tuo browser verificando che non vengano inseriti anche spazi vuoti.)http://expofieraitalia.com/admin/public/unsubscribe.php?g=21&addr=htm...@li.... Problemi con la cancellazione? Visita il sito <a href=http://expofieraitalia.com>expofieraitalia.com</a> </span> |
From: ContactDem c. D. L. <bo...@co...> - 2013-03-08 16:30:10
|
La polizza auto ti regala le domeniche <style>body,a { font-family:Arial; font-size:10px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:10px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da DevelopIdeas, mittente della comunicazione. Pertanto DevelopIdeas, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, DevelopIdeas non assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni. I suoi dati, forniti a DevelopIdeas direttamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da DevelopIdeas e dai marchi correlati .E' suo diritto rimuoversi in ogni momento dal database, al link di disiscriziione (Se non visualizzi correttamente il link di disiscrizione copialo e incollalo nel tuo browser verificando che non vengano inseriti anche spazi vuoti.)http://contactdem.com/admin/public/unsubscribe.php?g=21&addr=htm...@li.... Problemi con la cancellazione? Visita il sito <a href=http://contectdem.com>contectdem.com</a> </span> |
From: Casa s. <bo...@fi...> - 2013-03-07 07:01:30
|
l'antifurto su misura per te <style>body,a { font-family:Arial; font-size:12px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:12px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da fieradem, mittente della comunicazione.Pertanto fieradem, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, fieradem non assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni.I suoi dati, forniti a fieradem direttamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da fieradem e dai marchi correlati.E' suo diritto rimuoversi in ogni momento dal database, al link di disiscrizione (Se non visualizzi correttamente il link di disiscrizione copialo e incollalo nel tuo browser verificando che non vengano inseriti anche spazi vuoti.)http://fieradem.com/admin/public/unsubscribe.php?g=21&addr=htm...@li.... Problemi con la cancellazione? Visita il sito <a href=http://fieradem.com>fieradem.com</a> </span> |
From: ExpoFiereItalia <no...@ex...> - 2013-02-12 06:42:04
|
Google trasforma i tuoi primi 25 di pubblicità in 100 <style>body,a { font-family:Arial; font-size:12px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:12px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da ExpoFiereItalia, mittente della comunicazione. Pertanto ExpoFiereItalia, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, ExpoFiereItalianon assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni.I suoi dati, forniti a ExpoFiereItaliadirettamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da ExpoFiereItaliae dai marchi correlati.E' suo diritto rimuoversi in ogni momento dal database, al link di disiscriziione (Se non visualizzi correttamente il link di disiscrizione copialo e incollalo nel tuo browser verificando che non vengano inseriti anche spazi vuoti.)http://expofiereitalia.com/admin/public/unsubscribe.php?g=21&addr=htm...@li.... Problemi con la cancellazione? Visita il sito <a href=http://expofiereitalia.com>expofiereitalia.com</a> </span> |
From: ExpoFiereItalia <no...@ex...> - 2013-01-19 13:21:45
|
Per lei un buono da 10 per scoprire Dalani, il primo shopping club per l'arredamento e la casa. <style>body,a { font-family:Arial; font-size:12px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:12px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da MasteryAdv, mittente della comunicazione.Pertanto MasteryAdv, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, MasteryAdv non assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni.I suoi dati, forniti a MasteryAdv direttamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da MasteryAdv e dai marchi correlati.E' suo diritto rimuoversi in ogni momento dal database, al link di disiscriziione (Se non visualizzi correttamente il link di disiscrizione copialo e incollalo nel tuo browser verificando che non vengano inseriti anche spazi vuoti.)http://expofiereitalia.com/admin/public/unsubscribe.php?g=334&addr=htm...@li.... Problemi con la cancellazione? Visita il sito <a href=http://expofiereitalia.com>expofiereitalia.com</a> </span> |
From: ExpoFieraItalia <no...@ex...> - 2012-12-13 21:25:29
|
Scopri le offerte del natale su Gooutlet.it <style>body,a { font-family:Arial; font-size:10px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:10px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da ExpoFieraItalia, mittente della comunicazione.Pertanto ExpoFieraItalia, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, ExpoFieraItalia non assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni.I suoi dati, forniti a ExpoFieraItalia direttamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da ExpoFieraItalia e dai marchi correlati.E' suo diritto rimuoversi in ogni momento dal database, al link di disiscriziione (Se non visualizzi correttamente il link di disiscrizione copialo e incollalo nel tuo browser verificando che non vengano inseriti anche spazi vuoti.)http://expofieraitalia.com/admin/public/unsubscribe.php?g=334&addr=htm...@li.... </span> |
From: Miguel A. M. <mig...@gm...> - 2012-12-03 11:56:52
|
Hello Jovana, Although I am no owner of the web-site I do have made some collaborations. As far as I know all the content is freely available for everyone in the world, to use it, translate it and spread it as far as you cite the orinial source. Hope it helps. Kind regards, Miguel On 2 December 2012 21:12, Jovana Milutinovich <jo...@we...>wrote: > > Dear Sir or Madam, > > My name is Jovana. I found your page about EncodingChangeException > extremely interesting and would like to spread the word for people from > Ex Yugoslavia. > > Here's the URL of your page: http://htmlparser.sourceforge.net/faq.html > > Would you mind if I translate your page to Serbo-Croatian language and > post it on our site? > My purpose is to help people from Ex Yugoslavia better understand some > very useful information about computer science. > > Some quick info about myself: > I was born in Yugoslavia, Europe. Former Yugoslavia consisted of now > totally independent states like Serbia, Montenegro, Croatia, Bosnia & > Hercezovina, Slovenia and Macedonia, which are all united by > Serbo-Croatian language. > I'm currently studying Computer Science at the University of > Belgrade,Serbia. > > > With Kind Regards, > > Jovana Milutinovich > http://science.webhostinggeeks.com/ > jo...@we... > Tel: +381 63 8049100 > > > > ------------------------------------------------------------------------------ > Keep yourself connected to Go Parallel: > DESIGN Expert tips on starting your parallel project right. > http://goparallel.sourceforge.net/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Jovana M. <jo...@we...> - 2012-12-02 20:12:25
|
Dear Sir or Madam, My name is Jovana. I found your page about EncodingChangeException extremely interesting and would like to spread the word for people from Ex Yugoslavia. Here's the URL of your page: http://htmlparser.sourceforge.net/faq.html Would you mind if I translate your page to Serbo-Croatian language and post it on our site? My purpose is to help people from Ex Yugoslavia better understand some very useful information about computer science. Some quick info about myself: I was born in Yugoslavia, Europe. Former Yugoslavia consisted of now totally independent states like Serbia, Montenegro, Croatia, Bosnia & Hercezovina, Slovenia and Macedonia, which are all united by Serbo-Croatian language. I'm currently studying Computer Science at the University of Belgrade,Serbia. With Kind Regards, Jovana Milutinovich http://science.webhostinggeeks.com/ jo...@we... Tel: +381 63 8049100 |
From: Agenzia a. F. <no...@fa...> - 2012-11-26 18:54:01
|
Chiama e naviga con soli 29 Euro al mese <style>a { font-family:Arial; font-size:12px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:12px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da FairEuroTrade, mittente della comunicazione. Pertanto FairEuroTrade, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, FairEuroTrade non assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni. I suoi dati, forniti a FairEuroTrade direttamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da FairEuroTrade e dai marchi correlati. E' suo diritto rimuoversi in ogni momento dal database, al link di disiscriziione (Se non visualizzi correttamente il link di disiscrizione copialo e incollalo nel tuo browser verificando che non vengano inseriti anche spazi vuoti.) http://faireurotrade.com/admin/public/unsubscribe.php?g=334&addr=htm...@li.... </span> |
From: Personaladv <no...@pe...> - 2012-09-05 11:34:40
|
Fiat 500L ti offre un caffè Lavazza, anzi 500! <style>body,a { font-family:Arial; font-size:10px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:10px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da PersonalAdv, mittente della comunicazione. Pertanto PersonalAdv, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, PersonalAdv non assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni. I suoi dati, forniti a PersonalAdv direttamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da PersonalAdv e dai marchi correlati. E' suo diritto rimuoversi in ogni momento dal database, accedendo a http://personaladv.com/admin/public/unsubscribe.php?g=334&addr=htm...@li... </span> |
From: <ad...@bo...> - 2012-09-04 11:55:17
|
htm...@li..., 请查收此电子邮件附加的隔离摘要。 |
From: Miguel A. M. <mig...@gm...> - 2012-08-27 08:23:18
|
Hello Ernest, This is the function I use in order to extract the text. I hope it helps you. public StringBuilder textExtractor(String URL){ StringBuilder textInPage = null; try { Parser parser = new Parser(URL); TextExtractingVisitor visitor = new TextExtractingVisitor(); parser.visitAllNodesWith(visitor); textInPage = new StringBuilder(visitor.getExtractedText()); } catch (ParserException ex) { Logger.getLogger(HTMLAnalizer.class.getName()).log(Level.SEVERE, null, ex); } return textInPage; } Regards, Miguel On 24 August 2012 21:14, Ernest Cronin <ern...@gm...> wrote: > Hi, > > I use the parser a lot for work. one thing i've noticed is that in many > news articles there are comment sections, and in these sections, plain > text. but the parser doesn't pick them up. what is about the comment > sections that make it unreadable? is there a different class i should be > using? > > Thank you, > ernest > > On Wed, Aug 17, 2011 at 4:25 PM, ernest cronin <ern...@gm...>wrote: > >> Hi, >> >> I have been trying to use the parser for some time and I have been unable >> to get it to do exactly what I want, which is to gather only the plaintext >> without javascript or style stuff. Here is the code I've been running: >> >> public class Test >> { >> public static void main (String[] args) >> { >> try >> { >> Parser parser = new Parser (args[0]); >> TextExtractingVisitor visitor = new TextExtractingVisitor(); >> parser.visitAllNodesWith(visitor); >> String textInPage = visitor.getExtractedText(); >> System.out.println(textInPage); >> } >> catch (ParserException pe) >> { >> pe.printStackTrace (); >> } >> } >> } >> >> I could really use some help with this! >> >> Thanks, >> Ernest >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Ernest C. <ern...@gm...> - 2012-08-24 19:14:07
|
Hi, I use the parser a lot for work. one thing i've noticed is that in many news articles there are comment sections, and in these sections, plain text. but the parser doesn't pick them up. what is about the comment sections that make it unreadable? is there a different class i should be using? Thank you, ernest On Wed, Aug 17, 2011 at 4:25 PM, ernest cronin <ern...@gm...>wrote: > Hi, > > I have been trying to use the parser for some time and I have been unable > to get it to do exactly what I want, which is to gather only the plaintext > without javascript or style stuff. Here is the code I've been running: > > public class Test > { > public static void main (String[] args) > { > try > { > Parser parser = new Parser (args[0]); > TextExtractingVisitor visitor = new TextExtractingVisitor(); > parser.visitAllNodesWith(visitor); > String textInPage = visitor.getExtractedText(); > System.out.println(textInPage); > } > catch (ParserException pe) > { > pe.printStackTrace (); > } > } > } > > I could really use some help with this! > > Thanks, > Ernest > > |
From: Aniket P <ani...@gm...> - 2012-08-09 17:15:57
|
hello, Can anyone help me in my work? I am stuck somewhere. I am parsing a page using htmlparser. In the page there can be a call to a particular function. Let function is f(a,b,c) {a+b+c;}. And it is called somewhere in page like f(p,q,r). I want to ask that, how will i come to know that there is a call from the page to 'f'. Is there any provision that can be used to identify the call made and to which function?? Please help me, I need an urgent help. |
From: Miguel A. M. <mig...@gm...> - 2012-08-08 15:42:16
|
Hello AniketP, I had the same problem but whit the bold and italics tags (<b> and <i> respectively). Here is my solution for <i> tags: Create a class for the tag you are interested in, that extends CompositeTag: public class ItalicTag *extends CompositeTag*{ private static final String[] mIds = new String[] {*"I"*}; //Change this as appropriate public ItalicTag () { } public String[] getIds () { return (mIds); } public String[] getEnders () { return (mIds); } public String[] getEndTagEnders () { return (new String[0]); } } //In your main class: factory = new PrototypicalNodeFactory(); // create a factory factory.registerTag(new ItalicTag ()); //register your new tag try { Parser parser = new Parser (URL); parser.setNodeFactory(factory); NodeList list; NodeFilter tagfilter = new NodeClassFilter(ItalicTag.class); list = parser.extractAllNodesThatMatch(tagfilter); for (Node node : list.toNodeArray()) { String texto = *extractText*(node); // In this function we will extract the content between tags (<i> </i>) } } catch (ParserException ex) { //do something } //the extactText method : /** *Gets the text that is enclosed between labels. In order to do that *it studies the children components in the labels in a recursive way. * * @param studiedNode * @return Text between nested tags */ public String *extactText *(Node studiedNode ) { Node node; String text = ""; boolean exit= false; try { for (SimpleNodeIterator e = studiedNode .getChildren().elements(); e.hasMoreNodes() && !exit;) { node = e.nextNode(); if (node instanceof CompositeTag) { text = extactText (node); } else { if (node != null) { text = node.getText(); } exit= true; } } } catch (NullPointerException ex) { // do something } return text.trim(); } I hope this helps. On 8 August 2012 10:57, Aniket P <ani...@gm...> wrote: > Hello all, > Currently I am using the htmlparser in my work. I want to extract <script> > </script> part, and more specifically I want to extract different functions > in <script> </script>. After that I need to execute those functions. So can > anyone please help me how to use that?? > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Aniket P <ani...@gm...> - 2012-08-08 08:57:17
|
Hello all, Currently I am using the htmlparser in my work. I want to extract <script> </script> part, and more specifically I want to extract different functions in <script> </script>. After that I need to execute those functions. So can anyone please help me how to use that?? |
From: Agenzia A. F. <no...@fa...> - 2012-07-26 14:35:44
|
Metti a dieta la tua bolletta <style>a { font-family:Arial; font-size:12px; color:#CCCCCC; text-decoration:none; } </style> <span style="font-family:Arial; font-size:12px; color:#CCCCCC;"> Premessa l'adesione alle Condizioni di utilizzo per l'utente e al Regolamento Privacy in calce, l'utente è consapevole che le informazioni, i dati ed i materiali di cui è destinatario non sono prodotti da FairEuroTrade, mittente della comunicazione. Pertanto FairEuroTrade, non assume alcuna responsabilità in ordine al contenuto della comunicazione stessa. Inoltre, FairEuroTrade non assume alcun obbligo di controllare la veridicità, la completezza e la legittimità delle informazioni, dei dati e dei materiali contenuti nelle comunicazioni. I suoi dati, forniti a FairEuroTrade direttamente o tramite soggetti terzi autorizzati, sono trattati con la massima riservatezza tramite l'ausilio strumenti elettronici da FairEuroTrade e dai marchi correlati. E' suo diritto rimuoversi in ogni momento dal database, al link di disiscriziione (Se non visualizzi correttamente il link di disiscrizione copialo e incollalo nel tuo browser verificando che non vengano inseriti anche spazi vuoti.) http://faireurotrade.com/admin/public/unsubscribe.php?g=334&addr=htm...@li.... </span> |
From: jaaf64 <jaa...@zo...> - 2012-06-17 05:06:07
|
Thanks a lot. I eventually guessed so. But in this case, what about my problem : getEndTag() returns null for h1,h2,strong, em and p tags (see example below) Le 16/06/2012 17:42, Derrick Oswald a écrit : > > Composite simply means the tag acts as a nest for other tags, <html> > ... </html>. > > On Jun 15, 2012 7:24 PM, "jaaf64" <jaa...@zo... > <mailto:jaa...@zo...>> wrote: > > (resent from the right address) > Hi everybody, > > I am new to html parsing and I have trouble understanding the > meaning of > "composite tag": At first I thought that it was a tag such as > <tagname></tagname> opposing this notion to the non-composite version > <tagname/>. > I wrote a short html file to check it. Here it is: > > /<html> > <head></head> > <body> > <br/> > text just afer html > <br/> > <H1>Title 1 1</H1> > Text 1 > <h2>Title 2</h2> > <p>Phrase 1 p1. Phrase 2 p1</p> > <br/> > <p>Phrase 1 p2.<strong>Phrase in<em>bold</em> </strong> 2 p2</p> > </body> > </html>/ > > > Then I started my "visitTag(Tag tag) method that way: > > /public void visitTag(Tag tag) { > > > System.out.println("Visit tag : " + tag.getTagName()); > if (tag.getEndTag() != null) { > System.out.println("getEndTag > returns:"+tag.getEndTag().getTagName()); > } else { > System.out.println("getEndTag returns null"); > }/ > > I get this: > > /Visit tag : HTML > getEndTag returns:HTML > Visit tag : HEAD > getEndTag returns:HEAD > visit endTag :/head > Visit tag : BODY > getEndTag returns:BODY > Visit tag : BR > getEndTag returns null > Visit tag : BR > getEndTag returns null > Visit tag : H1 > getEndTag returns null > visit endTag :/H1 > Visit tag : H2 > getEndTag returns null > visit endTag :/h2 > Visit tag : P > getEndTag returns null > visit endTag :/p > Visit tag : BR > getEndTag returns null > Visit tag : P > getEndTag returns null > Visit tag : STRONG > getEndTag returns null > Visit tag : EM > getEndTag returns null > visit endTag :/em > visit endTag :/strong > visit endTag :/p > visit endTag :/body > 11202: Info: Load project source files: 329ms > visit endTag :/html/ > > Apparently, getEndTag() returns null for h1,h2,strong, em and p tags. > It seems I don't understand the word «composite"! > Could somebody help me? > > > > > ------------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. > Discussions > will include endpoint security, mobile security and the latest in > malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > <mailto:Htm...@li...> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Derrick O. <der...@gm...> - 2012-06-16 15:42:08
|
Composite simply means the tag acts as a nest for other tags, <html> ... </html>. On Jun 15, 2012 7:24 PM, "jaaf64" <jaa...@zo...> wrote: > (resent from the right address) > Hi everybody, > > I am new to html parsing and I have trouble understanding the meaning of > "composite tag": At first I thought that it was a tag such as > <tagname></tagname> opposing this notion to the non-composite version > <tagname/>. > I wrote a short html file to check it. Here it is: > > /<html> > <head></head> > <body> > <br/> > text just afer html > <br/> > <H1>Title 1 1</H1> > Text 1 > <h2>Title 2</h2> > <p>Phrase 1 p1. Phrase 2 p1</p> > <br/> > <p>Phrase 1 p2.<strong>Phrase in<em>bold</em> </strong> 2 p2</p> > </body> > </html>/ > > > Then I started my "visitTag(Tag tag) method that way: > > /public void visitTag(Tag tag) { > > > System.out.println("Visit tag : " + tag.getTagName()); > if (tag.getEndTag() != null) { > System.out.println("getEndTag > returns:"+tag.getEndTag().getTagName()); > } else { > System.out.println("getEndTag returns null"); > }/ > > I get this: > > /Visit tag : HTML > getEndTag returns:HTML > Visit tag : HEAD > getEndTag returns:HEAD > visit endTag :/head > Visit tag : BODY > getEndTag returns:BODY > Visit tag : BR > getEndTag returns null > Visit tag : BR > getEndTag returns null > Visit tag : H1 > getEndTag returns null > visit endTag :/H1 > Visit tag : H2 > getEndTag returns null > visit endTag :/h2 > Visit tag : P > getEndTag returns null > visit endTag :/p > Visit tag : BR > getEndTag returns null > Visit tag : P > getEndTag returns null > Visit tag : STRONG > getEndTag returns null > Visit tag : EM > getEndTag returns null > visit endTag :/em > visit endTag :/strong > visit endTag :/p > visit endTag :/body > 11202: Info: Load project source files: 329ms > visit endTag :/html/ > > Apparently, getEndTag() returns null for h1,h2,strong, em and p tags. > It seems I don't understand the word «composite"! > Could somebody help me? > > > > > > ------------------------------------------------------------------------------ > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: jaaf64 <jaa...@zo...> - 2012-06-15 17:23:32
|
(resent from the right address) Hi everybody, I am new to html parsing and I have trouble understanding the meaning of "composite tag": At first I thought that it was a tag such as <tagname></tagname> opposing this notion to the non-composite version <tagname/>. I wrote a short html file to check it. Here it is: /<html> <head></head> <body> <br/> text just afer html <br/> <H1>Title 1 1</H1> Text 1 <h2>Title 2</h2> <p>Phrase 1 p1. Phrase 2 p1</p> <br/> <p>Phrase 1 p2.<strong>Phrase in<em>bold</em> </strong> 2 p2</p> </body> </html>/ Then I started my "visitTag(Tag tag) method that way: /public void visitTag(Tag tag) { System.out.println("Visit tag : " + tag.getTagName()); if (tag.getEndTag() != null) { System.out.println("getEndTag returns:"+tag.getEndTag().getTagName()); } else { System.out.println("getEndTag returns null"); }/ I get this: /Visit tag : HTML getEndTag returns:HTML Visit tag : HEAD getEndTag returns:HEAD visit endTag :/head Visit tag : BODY getEndTag returns:BODY Visit tag : BR getEndTag returns null Visit tag : BR getEndTag returns null Visit tag : H1 getEndTag returns null visit endTag :/H1 Visit tag : H2 getEndTag returns null visit endTag :/h2 Visit tag : P getEndTag returns null visit endTag :/p Visit tag : BR getEndTag returns null Visit tag : P getEndTag returns null Visit tag : STRONG getEndTag returns null Visit tag : EM getEndTag returns null visit endTag :/em visit endTag :/strong visit endTag :/p visit endTag :/body 11202: Info: Load project source files: 329ms visit endTag :/html/ Apparently, getEndTag() returns null for h1,h2,strong, em and p tags. It seems I don't understand the word «composite"! Could somebody help me? ------------------------------------------------------------------------------ |
From: denentzat <den...@zo...> - 2012-06-15 16:58:03
|
Hi everybody, I am new to html parsing and I have trouble understanding the meaning of "composite tag": At first I thought that it was a tag such as <tagname></tagname> opposing this notion to the non-composite version <tagname/>. I wrote a short html file to check it. Here it is: /<html> <head></head> <body> <br/> text just afer html <br/> <H1>Title 1 1</H1> Text 1 <h2>Title 2</h2> <p>Phrase 1 p1. Phrase 2 p1</p> <br/> <p>Phrase 1 p2. <strong>Phrase in <em>bold</em> </strong> 2 p2</p> </body> </html>/ Then I started my "visitTag(Tag tag) method that way: /public void visitTag(Tag tag) { System.out.println("Visit tag : " + tag.getTagName()); if (tag.getEndTag() != null) { System.out.println("getEndTag returns:"+tag.getEndTag().getTagName()); } else { System.out.println("getEndTag returns null"); }/ I get this: /Visit tag : HTML getEndTag returns:HTML Visit tag : HEAD getEndTag returns:HEAD visit endTag :/head Visit tag : BODY getEndTag returns:BODY Visit tag : BR getEndTag returns null Visit tag : BR getEndTag returns null Visit tag : H1 getEndTag returns null visit endTag :/H1 Visit tag : H2 getEndTag returns null visit endTag :/h2 Visit tag : P getEndTag returns null visit endTag :/p Visit tag : BR getEndTag returns null Visit tag : P getEndTag returns null Visit tag : STRONG getEndTag returns null Visit tag : EM getEndTag returns null visit endTag :/em visit endTag :/strong visit endTag :/p visit endTag :/body 11202: Info: Load project source files: 329ms visit endTag :/html/ Apparently, getEndTag() returns null for h1,h2,strong, em and p tags. It seems I don't understand the word «composite"! Could somebody help me? |
From: 香港大学 <tin...@hk...> - 2012-06-12 18:20:40
|
<div style="width:100%;padding:5px;"> <center> <table style="border-collapse:collapse;border:gray 1px solid;font-size:9pt;color:gray;margin:0px;" align="center"> <tr> <td style="width:50px;background-color:white;text-align:center;border:gray 1px solid;padding-top:4px;text-align:center;"> <a href="http://ct.talk.ttbxv.com/51/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/" style="text-decoration:none;color:gray">阅读</a> </td> <td style="width:50px;background-color:white;text-align:center;border:gray 1px solid;padding-top:4px;text-align:center;"> <a href="http://ct.talk.ttbxv.com/53/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/" style="text-decoration:none;color:gray">退订</a> </td> <td style="width:50px;background-color:white;text-align:center;border:gray 1px solid;padding-top:4px;text-align:center;"> <a href="http://ct.talk.ttbxv.com/54/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/" style="text-decoration:none;color:gray">投诉</a> </td> </tr> </table> </center> </div><img src='http://ct.talk.ttbxv.com/ho/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/68/' width=1 height=1 border=0/><br/><html> <head> </head> <body bgcolor="#BFBFBF"> <table border="0" width="800" cellspacing="0" cellpadding="0" bgcolor="#FFFFFF" align="center"> <tbody> <tr> <td><img width="800" height="356" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/top.jpg" /></td> </tr> <tr> <td> </td> </tr> <tr> <td> <table border="0" width="684" cellspacing="0" cellpadding="0" align="center"> <tbody> <tr> <td> </td> </tr> <tr> <td><img width="140" height="31" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/t1.jpg" /></td> </tr> <tr> <td><span style="color: #0068b6"> <font face="微软雅黑" style="font-size: 18px"> </font><font face="微软雅黑" style="font-size: 16px">日期:</font></span><font face="微软雅黑" style="font-size: 16px">2012年6月21日 19:00-21:00</font></td> </tr> <tr> <td><span style="color: #0068b6"> <font face="微软雅黑" style="font-size: 18px"> </font><font face="微软雅黑" style="font-size: 16px">地点:</font></span><font face="微软雅黑" style="font-size: 16px">北京市朝阳工体北路甲2号盈科中心A座601</font></td> </tr> <tr> <td> <table border="0" width="100%" cellspacing="0" cellpadding="0"> <tbody> <tr> <td colspan="3"> </td> </tr> <tr> <td width="11%"><img style="vertical-align:top;margin-top:5px;" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/baoming.gif" /></td> <td colspan="2"> <table border="0" width="100%" cellspacing="0" cellpadding="0"> <tbody> <tr> <td><font color="#0068B6" face="微软雅黑"> <span style="font-size: 12px">点击</span></font></td> <td><font color="#0068B6" face="微软雅黑"> <span style="font-size: 12px"> <a href="http://ct.talk.ttbxv.com/ct/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/1210936/aHR0cDovL3pzancuaGt1c3BhY2VjaGluYS5jb20vbW9kdWxlL2EwMDMwL2luZGV4LnBocD9pbnRNb2R1bGVQSz1CMDAyOV8yNyZ0YWdvcmRlcj0xJmlzaW5uZXI9MCZtbWVudT0xMDAmc3RyQ3VzPUBAMjgwMDkyKioxMDYwNDU_/0/0/"> <img align="top" border="0" width="74" height="24" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/booking.jpg" /></a></span></font></td> <td width="503"><font face="微软雅黑"> <span style="font-size: 12px; float: left">如需咨询致电(010)85185526转8016 王老师。</span></font></td> </tr> </tbody> </table> </td> </tr> <tr> <td width="11%"><img style="vertical-align:top;" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/lianxi.png" /></td> <td width="2%"><font face="微软雅黑"><span style="font-size: 12px"> <img style=" vertical-align:middle;" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/quan.png" /></span></font></td> <td width="87%"><font face="微软雅黑"> <span style="font-size: 12px"> 北京市朝阳工体北路甲2号盈科中心A座1007-1008(地铁10号线团结湖站D出口)</span></font></td> </tr> <tr> <td width="11%"><font color="#0068B6" style="font-size: 14px; font-weight: 700" face="微软雅黑">席位预订:</font></td> <td colspan="2"><font face="微软雅黑" style="font-size: 12px">如无法参加,请点击 <strong> <a target="_blank" style="text-decoration:none; color:#e61414;" href="http://ct.talk.ttbxv.com/ct/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/1210937/aHR0cDovL3pzancuaGt1c3BhY2VjaGluYS5jb20vbW9kdWxlL2EwMDMwL2luZGV4LnBocD9pbnRNb2R1bGVQSz1CMDAyOV8yOCZ0YWdvcmRlcj0xJmlzaW5uZXI9MCZtbWVudT0xMDAmc3RyQ3VzPUBAMjg3NDQyKioxMDU0NDI_/0/0/">索取课程资料</a></strong></font></td> </tr> </tbody> </table> </td> </tr> <tr> <td> </td> </tr> <tr> <td><img align="top" border="0" width="678" height="1" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/xian.jpg" /></td> </tr> <tr> <td height="37"><img alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/sc1.jpg" /></td> </tr> <tr> <td height="72"><font face="微软雅黑"><span style="font-size: 15px">香港大学近十年亚洲平均排名第一。 <br /> HKU SPACE是香港大学直属学院,具50多年历史,是全球三大专业人才教育机构,与纽约大学、伦敦大学同类学院齐名。 </span></font></td> </tr> <tr> <td height="22"><img border="0" width="682" height="7" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/xian.png" /></td> </tr> <tr> <td height="52"><img alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/qy.jpg" /></td> </tr> <tr> <td><font face="微软雅黑" style="font-size: 18px">北京第4期 2012年9月 开学</font></td> </tr> <tr> <td><font face="微软雅黑"><span style="font-size: 14px"> <img style=" vertical-align:middle;" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/quan2.png" /> 周末兼修制,每月集中一个周末(周六、周日)修读一门课程,约一年修读完毕。<br /> <img style=" vertical-align:middle;" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/quan2.png" /> 北京班主要在 北京上课,另有1次课在香港上课。 <br /> <img style=" vertical-align:middle;" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/quan2.png" /> 普通话教学,辅以英语。 <br /> <img style=" vertical-align:middle;" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/quan2.png" /> 课程学费人民币56,000元。 </span></font></td> </tr> <tr> <td><img style="margin-top:15px;" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/kc.jpg" /></td> </tr> <tr> <td><font face="微软雅黑"><span style="font-size: 13px"> <font color="#135A90">课程提要:</font>香港大学企业财务与投资管理(CFIM)课程是以价值国际化为导向的系统、专业和前瞻性课程,融合财务管理与投资管理的精髓。课程关注价值创造的来源和资本的使用效率,由系统地教授财务与投资管理的知识和方法开始,学员将逐步积累量化投资管理、环球投资工具和知识资产管理方面的前沿知识,进而了解环球法制和上市实务, 帮助组织和个人变被动为主动,实现最大化增值。 <br /> <font color="#135A90">师资力量:</font>除香港大学资深教授外,还有来自通用电气、IBM、瑞银、苏格兰皇家银行、汇丰银行、普华永道等著名金融机构及跨国企业的高管师资。讲者都具有丰富的内地、港台和海外机构的财务与投资管理经验,能为学员提供全方位的专业培训及实战指导。 </span></font></td> </tr> <tr> <td> </td> </tr> <tr> <td> <table border="0" width="100%" cellspacing="0" cellpadding="0"> <tbody> <tr> <td width="436" valign="top"> <table border="0" width="90%" cellspacing="0" cellpadding="0"> <tbody> <tr> <td height="104"><font face="微软雅黑"> <span style="font-size: 13px"> <font color="#135A90">学员构成:</font>往届学员平均年龄36岁,平均工作年限10.5年,硕士以上学历占30%,本科学历占60%,专科学历占10%,来自北京、上海、天津、青岛、西安、苏州、南京、杭州、宁波、深圳、广州等地,32%国内大中型企业、外资企业的财务管理人员;47%为企业非财务高级管理人员、副总经理、董事会秘书等。</span></font></td> </tr> <tr> <td> </td> </tr> <tr> <td><img alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/nr.jpg" /></td> </tr> <tr> <td><font face="微软雅黑"> <span style="font-size: 13px">1. 财务与投资管理透视 2. 投资管理与量化工具 3. 公司财务战略分析 4. 环球投资工具 5. 资本预算与分析 6. 财务风险管理与金融工程 7. 信息系统与优化管理 8. 财富管理与投资计划 9. 环球市场、法制环境与上市程序 10. 知识资产战略管理与核算 11.整合行动学习项目</span></font></td> </tr> <tr> <td> </td> </tr> <tr> <td><img alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/gy.jpg" /></td> </tr> <tr> <td><font face="微软雅黑"> <span style="font-size: 13px">中粮信托风险管理部 副部长 -- 秦菊杰 <br /> CFIM课程结构设计非常合理,综合了会计、财务管理、金融等多学科知识;老师的资深背景和严谨教学风格让人尊敬;学员层次和人数的有效控制使得课堂能够有效互动;班主任热心、周到的帮助和反馈使得同学们更容易团结、交流和进步。</span></font></td> </tr> <tr> <td> </td> </tr> </tbody> </table> </td> <td><img border="0" width="242" height="435" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/rimg.jpg" /></td> </tr> </tbody> </table> </td> </tr> <tr> <td> </td> </tr> <tr> <td><img align="top" border="0" width="682" height="7" alt="" src="http://content.talk.ttbxv.com/userfiles/22362_1904d5e6ad23102d892300215e383a96/zip/16661339488887834/images/xian.png" /></td> </tr> <tr> <td align="center" bgcolor="#BDD8ED"><font face="微软雅黑"><span style="font-size: 12px"> <font color="#135A90">学院内地官网</font><a style="text-decoration:none; color:#135a90;" target="_blank" href="http://ct.talk.ttbxv.com/ct/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/1210938/aHR0cDovL3d3dy5oa3VzcGFjZWNoaW5hLmNvbQ__/0/0/">www.hkuspacechina.com</a><font color="#135A90"> <br /> 您的朋友可能对本邮件的内容感兴趣,请转发本邮件,分享香港的前沿课程 </font></span></font></td> </tr> </tbody> </table> </td> </tr> </tbody> </table> <table> <tbody> <tr> <td><img height="0" width="0" alt="" src="http://ct.talk.ttbxv.com/tk/WQRMbr/0/1011/94249/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/87/" /></td> </tr> </tbody> </table> </body> </html><div style="width:100%;padding:2px;"> <center> <table style="border-collapse:collapse;border:gray 1px solid;font-size:9pt;color:gray;margin:0px;" align="center"> <tr> <td style="width:50px;background-color:white;text-align:center;border:gray 1px solid;padding-top:4px;text-align:center;"> <a href="http://ct.talk.ttbxv.com/51/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/" style="text-decoration:none;color:gray">阅读</a> </td> <td style="width:50px;background-color:white;text-align:center;border:gray 1px solid;padding-top:4px;text-align:center;"> <a href="http://ct.talk.ttbxv.com/53/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/" style="text-decoration:none;color:gray">退订</a> </td> <td style="width:50px;background-color:white;text-align:center;border:gray 1px solid;padding-top:4px;text-align:center;"> <a href="http://ct.talk.ttbxv.com/54/ju36clcp/1011/94249/1666/95401/19774/SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA__/" style="text-decoration:none;color:gray">投诉</a> </td> </tr> </table> </center> </div> |
From: Derrick O. <der...@gm...> - 2012-03-30 19:48:40
|
Oh, I see the problem now. You need the recursive flag as the second argument to extractAllNodesThatMatch: public NodeList extractAllNodesThatMatch (NodeFilter filter, boolean recursive) On Wed, Mar 21, 2012 at 22:05, Randy Paries <rtp...@gm...> wrote: > hello, > I have the snippet of html(see below) and i need to get the content of > the <h3 id=h3_2.> > there a bunch of these container divs with unqiue id's in my file. > I can get the divs and their inner html just fine. I can not figure > out how to get the whats between the H3 tags > > this snippet of code works for divs but not the h3: > if finds the h3 with the correct ID, i just can not figure out how to > get the innerHTML or whats between the <h3> tags. > > thanks for any help > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > //tag = is the container_2 info > > innerparser = new Parser(); > innerparser.setInputHTML(tag.toHtml()); > innerparser.setEncoding("UTF-8"); > innerNodes = innerparser.extractAllNodesThatMatch( > new TagNameFilter("h3") ); > for (int x=0; x<innerNodes.size(); x++){ > TagNode itag = (TagNode)innerNodes.elementAt(x); > String innerIdAttribute = itag.getAttribute("id"); > if ( innerIdAttribute != null && > innerIdAttribute.equals( "h3_"+num ) ){ > System.out.println("id-->"+innerIdAttribute); > h3Data = itag.toHtml(); > } > } > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > <div class="container" id="container_2"> > <h3 id="h3_2">Adding a few</h3> <div > class="maindiv" id="div_2"> > ...new articles in here jus tto flesh it out. > </div><!--end of div_2--> > <div class="stardiv" id="star_2"> > <a class="aEdit" > href="javascript:editSection('div_2',2);"><img > src="images/edit.png" border=0></a> > <a class="aDelete" > href="javascript:deleteSection('container_2',2);"><img > src="images/delete.png" border=0></a> > </div><!--end of star_2--> > </div> > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Randy P. <rtp...@gm...> - 2012-03-21 21:05:09
|
hello, I have the snippet of html(see below) and i need to get the content of the <h3 id=h3_2.> there a bunch of these container divs with unqiue id's in my file. I can get the divs and their inner html just fine. I can not figure out how to get the whats between the H3 tags this snippet of code works for divs but not the h3: if finds the h3 with the correct ID, i just can not figure out how to get the innerHTML or whats between the <h3> tags. thanks for any help ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ //tag = is the container_2 info innerparser = new Parser(); innerparser.setInputHTML(tag.toHtml()); innerparser.setEncoding("UTF-8"); innerNodes = innerparser.extractAllNodesThatMatch( new TagNameFilter("h3") ); for (int x=0; x<innerNodes.size(); x++){ TagNode itag = (TagNode)innerNodes.elementAt(x); String innerIdAttribute = itag.getAttribute("id"); if ( innerIdAttribute != null && innerIdAttribute.equals( "h3_"+num ) ){ System.out.println("id-->"+innerIdAttribute); h3Data = itag.toHtml(); } } ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ <div class="container" id="container_2"> <h3 id="h3_2">Adding a few</h3> <div class="maindiv" id="div_2"> ...new articles in here jus tto flesh it out. </div><!--end of div_2--> <div class="stardiv" id="star_2"> <a class="aEdit" href="javascript:editSection('div_2',2);"><img src="images/edit.png" border=0></a> <a class="aDelete" href="javascript:deleteSection('container_2',2);"><img src="images/delete.png" border=0></a> </div><!--end of star_2--> </div> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |