htmlparser-user Mailing List for HTML Parser (Page 10)

Brought to you by: derrickoswald

htmlparser-user — The user mailing list for users of the htmlparser library

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2002	Jan (7)	Feb	Mar (9)	Apr (50)	May (20)	Jun (47)	Jul (37)	Aug (32)	Sep (30)	Oct (11)	Nov (37)	Dec (47)
2003	Jan (31)	Feb (70)	Mar (67)	Apr (34)	May (66)	Jun (25)	Jul (48)	Aug (43)	Sep (58)	Oct (25)	Nov (10)	Dec (25)
2004	Jan (38)	Feb (17)	Mar (24)	Apr (25)	May (11)	Jun (6)	Jul (24)	Aug (42)	Sep (13)	Oct (17)	Nov (13)	Dec (44)
2005	Jan (10)	Feb (16)	Mar (16)	Apr (23)	May (6)	Jun (19)	Jul (39)	Aug (15)	Sep (40)	Oct (49)	Nov (29)	Dec (41)
2006	Jan (28)	Feb (24)	Mar (52)	Apr (41)	May (31)	Jun (34)	Jul (22)	Aug (12)	Sep (11)	Oct (11)	Nov (11)	Dec (4)
2007	Jan (39)	Feb (13)	Mar (16)	Apr (24)	May (13)	Jun (12)	Jul (21)	Aug (61)	Sep (31)	Oct (13)	Nov (32)	Dec (15)
2008	Jan (7)	Feb (8)	Mar (14)	Apr (12)	May (23)	Jun (20)	Jul (9)	Aug (6)	Sep (2)	Oct (7)	Nov (3)	Dec (2)
2009	Jan (5)	Feb (8)	Mar (10)	Apr (22)	May (85)	Jun (82)	Jul (45)	Aug (28)	Sep (26)	Oct (50)	Nov (8)	Dec (16)
2010	Jan (3)	Feb (11)	Mar (39)	Apr (56)	May (80)	Jun (64)	Jul (49)	Aug (48)	Sep (16)	Oct (3)	Nov (5)	Dec (5)
2011	Jan (13)	Feb	Mar (1)	Apr (7)	May (7)	Jun (7)	Jul (7)	Aug (8)	Sep	Oct (6)	Nov (2)	Dec
2012	Jan (5)	Feb	Mar (3)	Apr (3)	May (4)	Jun (8)	Jul (1)	Aug (5)	Sep (10)	Oct (3)	Nov (2)	Dec (4)
2013	Jan (4)	Feb (2)	Mar (7)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar (1)	Apr	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (4)	Nov (2)	Dec (4)
2015	Jan (4)	Feb (2)	Mar (8)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug (1)	Sep (1)	Oct (4)	Nov (3)	Dec (4)
2016	Jan (4)	Feb (6)	Mar (9)	Apr (9)	May (6)	Jun (1)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2017	Jan	Feb (1)	Mar (3)	Apr (1)	May	Jun (1)	Jul (2)	Aug (3)	Sep (6)	Oct (3)	Nov (2)	Dec (5)
2018	Jan (3)	Feb (13)	Mar (28)	Apr (5)	May (4)	Jun (2)	Jul (2)	Aug (8)	Sep (2)	Oct (1)	Nov (5)	Dec (1)
2019	Jan (8)	Feb (1)	Mar	Apr (1)	May (4)	Jun	Jul (1)	Aug	Sep	Oct	Nov (2)	Dec (2)
2020	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (2)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec (1)
2021	Jan (3)	Feb (2)	Mar (1)	Apr (1)	May (2)	Jun (1)	Jul (2)	Aug (1)	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2024	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct (1)	Nov	Dec

Flat | Threaded

<< < 1 .. 8 9 10 11 12 .. 99 > >> (Page 10 of 99)

[Htmlparser-user] 游台第一站：台北101忠孝东路精品街

From: 宝岛旅游线上特展 <cr...@ms...> - 2012-03-19 11:10:09

 
<http://img03.taobaocdn.com/imgextra/i3/12980320/T266OhXhlaXXXXXXXX_!!12
980320.jpg>
<http://mail.webz.com.tw/HL/Lf2e7L4091da8L0L1886Le29Led3L5L10eL800.htm>
<http://mail.webz.com.tw/HL/Lf2e7L4091da8L0L1886Le29Led3L5L10eL800.htm>
<http://mail.webz.com.tw/HL/Nf2e9N4091da8N0N1886Ne29Ned3N5N10eN800.htm>
<http://mail.webz.com.tw/HL/Nf2e9N4091da8N0N1886Ne29Ned3N5N10eN800.htm> 
 <http://count.mailz.com.tw/1.gif?55412> 
 <http://mail.webz.com.tw/HL/K0K4091da8K0K1886Ke29Ked3K5K10eK0.gif>

[Htmlparser-user] (no subject)

From: Asish S. <asi...@ho...> - 2012-01-16 06:20:11

...Hi! Baby, you wont be disappointed!  http://www.os-bc.de/new-year.link.php?dgoogleId=50e0

Re: [Htmlparser-user] Check for HTML file

From: Derrick O. <der...@gm...> - 2012-01-05 15:19:12

Hi Steve,

The HTTP header information can be inspected, using the class
ConnectionMonitor and/or ConnectionManager,
but often there are misconfigured or malicious web servers that say
one mime type in the header while serving up a different type in the
content.

Derrick

On Wed, Jan 4, 2012 at 8:51 PM, Stefan Schindler <sch...@gm...> wrote:
> Hi,
> I was wondering, if there is the possibility to check, IF the file to
> inspect is a html file (and not, for instance, pdf).
>
> Greets,
> Steve
>

Re: [Htmlparser-user] Help using StringBean

From: Derrick O. <der...@gm...> - 2012-01-05 15:15:22

Hi Ido,

The project is not very active any more. The version 2.1 was a upgrade
for people building with Maven, but had no substantial changes.
It's a remarkably stable project. There were over 60,000 downloads
last year and only 8 opened tickets.

It seems you have already started editing the code locally.
If you upload the patch (you seem to have a specific line in a
specific file) to the patches area, it can be tracked and others can
benefit from your effort.
In the fullness of time it may be incorporated into a release.
Alternatively, if you log a bug with the test case that is failing it
could also help.
   http://sourceforge.net/tracker/?group_id=24399

Derrick

On Wed, Jan 4, 2012 at 4:29 PM, Ido Barav <ido...@sy...> wrote:
> I'm trying to use stringbean to extract text from a short html.
>
>
>
> I have the following problem:
>
>    When looking at an html that starts with 1 letter in one paragraph, and
> then it ends and another paragraph starts, then a CR is not added.
>
> I think the carriagereturn adding function has a bug there (It should be an
> || instead of the second &&).
>
> My questions are:
>
> 1.       Is the project still active? I've seen a 2.1 version hidden
> somewhere, but can't see any update on the sourceforge update. (I don't want
> to start installing patches and editing the code locally).
>
> 2.       I actually wish to read an html and when encountering a text tag,
> extract the text from it, while using the text editing capabilities of
> StringBean. Is there any good way to do this?
>
>
>
> Thanks,
>
> Ido
>
>
> ------------------------------------------------------------------------------
> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
> infrastructure or vast IT resources to deliver seamless, secure access to
> virtual desktops. With this all-in-one solution, easily deploy virtual
> desktops for less than the cost of PCs and save 60% on VDI infrastructure
> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

[Htmlparser-user] Check for HTML file

From: Stefan S. <sch...@gm...> - 2012-01-04 19:51:22

Hi,
I was wondering, if there is the possibility to check, IF the file to
inspect is a html file (and not, for instance, pdf).

Greets,
Steve

[Htmlparser-user] Help using StringBean

From: Ido B. <ido...@sy...> - 2012-01-04 15:45:14

I'm trying to use stringbean to extract text from a short html.

I have the following problem:
   When looking at an html that starts with 1 letter in one paragraph, and then it ends and another paragraph starts, then a CR is not added.
I think the carriagereturn adding function has a bug there (It should be an || instead of the second &&).
My questions are:

1.       Is the project still active? I've seen a 2.1 version hidden somewhere, but can't see any update on the sourceforge update. (I don't want to start installing patches and editing the code locally).

2.       I actually wish to read an html and when encountering a text tag, extract the text from it, while using the text editing capabilities of StringBean. Is there any good way to do this?

Thanks,
Ido

[Htmlparser-user] Interested in your products, can you give me your price quotation?

From: Vu N. i. <ngu...@gm...> - 2011-11-23 01:22:36

Good day Sir/Madam,


I browse through your contact and I find some items which we 
have interest in purchasing to our store in Romania for urgent 
supply, I will like to know the FOB prices per each items plus 
the shipping cost,I also want to know the kind of method you 
accept for payment.I await your quick response so I can proceed 
with my needed items and quantity.

Thanks and Regards,

Vu Nguyen

Address: Rivium Boulevard 427
2909 LK Capelle aan den IJssel
Postbus 1131,BC Rotterdam,
Romania

Re: [Htmlparser-user] performing onclick

From: Jessop, I. R <isa...@hp...> - 2011-10-17 21:04:35

The on click event triggers JavaScript
In this case a function called redirectUrl
and passes it a reference to the html element in this
case the image tag.

My guess ( and it is only a guess as I don't have the source of you're the page you are parsing)
Is that this function handles all the image clicks on the page and uses the reference passed to determine what url to redirect to

In order for you  " perform the on click " you would need to parse out the
JavaScript function and determine what action it would take ( url redirect) when this image is clicked then do the redirect.

Isaac Jessop


From: tubin gen [mailto:fac...@gm...]
Sent: Monday, October 17, 2011 1:51 PM
To: htm...@li...
Subject: [Htmlparser-user] performing onclick

I was  using html parser to parse some html ,Now My html has an image
here is the html

<img src="Repository/Movie%20Section/Telugu%20Movies/Gundamma%20G-Gundamma%20Gari%20Krishnulu%20VCD_T.jpg" id="Movies_dlMovies_ctl14_imgMovieImage" class="imgStyle" onclick="javascript:return redirectUrl(this);" alt="Gundamma Gari Krishnulu">


this img tag has onClick function , so when I clikc the image the new page whose url is not in the html but the function generates it  is opened , using htmlparser can I perform onclick on this image ?

If not what library should I use to perform onclick?

[Htmlparser-user] performing onclick

From: tubin g. <fac...@gm...> - 2011-10-17 20:51:18

I was  using html parser to parse some html ,Now My html has an image
here is the html

<img
src="Repository/Movie%20Section/Telugu%20Movies/Gundamma%20G-Gundamma%20Gari%20Krishnulu%20VCD_T.jpg"
id="Movies_dlMovies_ctl14_imgMovieImage" class="imgStyle"
onclick="javascript:return redirectUrl(this);" alt="Gundamma Gari
Krishnulu">


this img tag has onClick function , so when I clikc the image the new page
whose url is not in the html but the function generates it  is opened ,
using htmlparser can I perform onclick on this image ?

If not what library should I use to perform onclick?

[Htmlparser-user] hello htmlparser-user@lists.sourceforge.net

From: <pul...@ya...> - 2011-10-16 11:51:13

hey htm...@li... wow this is awesome http://www.web10i.com

Re: [Htmlparser-user] Help

From: Asutosh P. <as...@gm...> - 2011-10-04 05:32:37

Hi,
zhouyang I don't know what is your requirement,but i used HTML parser to get
the body content of a html file

the code sample is down there ...Hope it will help you....






******************************************************************

 public static String getBodyOfResumeAsText(String path) {
        final String METHOD_NAME = "getBodyOfResumeAsText :";
        String plainText = "";
        NodeFilter filter = null;
        Parser parser = null;
        try {
                   parser = new Parser(path);
                   filter = new TagNameFilter ("body");
                   plainText = parser.parse(filter).asString();
                   plainText =
plainText.replaceAll("\\r\\n|\\r|\\n|\\s|\\s+", " ");
                   plainText = plainText.replaceAll(" {2,}", " ");
                   logger.debug(CLASS_NAME + METHOD_NAME + ":generating
plainText for :" + path);
                   logger.debug(CLASS_NAME + METHOD_NAME + ":plainText :" +
plainText);
        } catch (Exception e) {
                   e.printStackTrace();
        }

        return plainText;
    }

******************************************************************

2011/10/3 <zho...@si...>

> Hello,
>
>      There is a scentence "Although some example programs are provided that
> may be useful as they stand..." on HTML Parser home page.But I can't
> find the example programs in the Web site HTML Parser.Could you send a link
> or the examples' src to me? Thank you very much.
>
>     I'm a Chinese student and my English is poor, if there is something
> wrong in my email, please forgive me.I will try my best to improve my
> English.
>
>     I think HTML Parser is greate, she gives me so much help.Thank you very
> much again.
>
>                      Zhou Yang
>
>                   Oct 3 2011
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
Thanks & Regards

Asutosh .

Re: [Htmlparser-user] Help

From: Derrick O. <der...@gm...> - 2011-10-03 19:00:44

Just check for:
    public void main (String[] args)
signatures in the source code.

2011/10/3 <zho...@si...>

> Hello,
>
>      There is a scentence "Although some example programs are provided that
> may be useful as they stand..." on HTML Parser home page.But I can't
> find the example programs in the Web site HTML Parser.Could you send a link
> or the examples' src to me? Thank you very much.
>
>     I'm a Chinese student and my English is poor, if there is something
> wrong in my email, please forgive me.I will try my best to improve my
> English.
>
>     I think HTML Parser is greate, she gives me so much help.Thank you very
> much again.
>
>                      Zhou Yang
>
>                   Oct 3 2011
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>

[Htmlparser-user] Help

From: <zho...@si...> - 2011-10-03 13:37:27

Hello,
     There is a scentence "Although some example programs are provided that may be useful as they stand..." on HTML Parser home page.But I can't find the example programs in the Web site HTML Parser.Could you send a link or the examples' src to me? Thank you very much.
    I'm a Chinese student and my English is poor, if there is something wrong in my email, please forgive me.I will try my best to improve my English.
    I think HTML Parser is greate, she gives me so much help.Thank you very much again.
                     Zhou Yang
                  Oct 3 2011

Re: [Htmlparser-user] parser help

From: Derrick O. <der...@gm...> - 2011-08-18 18:40:42

Did you try the StringBean?
Same code except:
   StringBean visitor = new StringBean ();
    parser.visitAllNodesWith(visitor);
      String textInPage = visitor.getStrings ();

Or you can use some of the other facilities - like it will make it's own
parser if you don't want to - as shown in the mainline:
            StringBean sb = new StringBean ();
            sb.setLinks (false);
            sb.setReplaceNonBreakingSpaces (true);
            sb.setCollapse (true);
            sb.setURL (args[0]);
            System.out.println (sb.getStrings ());



On Wed, Aug 17, 2011 at 10:25 PM, ernest cronin <ern...@gm...>wrote:

> Hi,
>
> I have been trying to use the parser for some time and I have been unable
> to get it to do exactly what I want, which is to gather only the plaintext
> without javascript or style stuff. Here is the code I've been running:
>
>   public class Test
>    {
>       public static void main (String[] args)
>       {
>          try
>          {
>             Parser parser = new Parser (args[0]);
>      TextExtractingVisitor visitor = new TextExtractingVisitor();
>     parser.visitAllNodesWith(visitor);
>       String textInPage = visitor.getExtractedText();
>    System.out.println(textInPage);
>          }
>             catch (ParserException pe)
>             {
>                pe.printStackTrace ();
>             }
>       }
>     }
>
> I could really use some help with this!
>
> Thanks,
> Ernest
>
>
>
> ------------------------------------------------------------------------------
> Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> user administration capabilities and model configuration. Take
> the hassle out of deploying and managing Subversion and the
> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>

[Htmlparser-user] parser help

From: ernest c. <ern...@gm...> - 2011-08-17 20:25:40

Hi,

I have been trying to use the parser for some time and I have been unable to
get it to do exactly what I want, which is to gather only the plaintext
without javascript or style stuff. Here is the code I've been running:

  public class Test
   {
      public static void main (String[] args)
      {
         try
         {
            Parser parser = new Parser (args[0]);
    TextExtractingVisitor visitor = new TextExtractingVisitor();
    parser.visitAllNodesWith(visitor);
      String textInPage = visitor.getExtractedText();
   System.out.println(textInPage);
         }
            catch (ParserException pe)
            {
               pe.printStackTrace ();
            }
      }
   }

I could really use some help with this!

Thanks,
Ernest

[Htmlparser-user] Reminder about your invitation from Tamizh Vendan

From: Tamizh V. (L. Invitations) <inv...@li...> - 2011-08-08 18:43:17

LinkedIn
------------
This invitation is awaiting your response:
 From Tamizh Vendan 
 
-- 
(c) 2011, LinkedIn Corporation

Re: [Htmlparser-user] EncodingChangeException: character mismatch

From: Derrick O. <der...@gm...> - 2011-08-08 18:11:34

I don't think it's possible to help without a stack trace.
Are you sure you are checking for null if there are no links returned?

On Mon, Aug 8, 2011 at 4:08 PM, Krishna Arjun <kri...@gm...>wrote:

> Marcin <bigger <at> op.pl> writes:
>
> >
> > Dear Derrick,
> >
> > > >I get the following error:
> > > >
> > > >org.htmlparser.util.EncodingChangeException: character mismatch (new:
> ?
> > !=
> > > >old:
> > > >¬) for encoding change from ISO-8859-2 to ISO-8859-1 at character
> offset
> > > >4162
> > > >Output from LinkExtractor example.
> > > >
> > > >If I'll try-catch it I won't get any resoult. What can I do with it?
> >
> > > The exception is thrown because some of the nodes already given out are
> > > in error.  You can try a second time after discarding the information
> > > you've gained so far, like StringBean does:
> >
> > Thank you for answer but I it's no good solution :( Please try LinkBean
> > example with that code:
> >
> > import java.net.URL;
> > import org.htmlparser.beans.LinkBean;
> >
> > public class LinkDemo
> > {
> >     public static void main (String[] args)
> >     {
> >         LinkBean lb = new LinkBean ();
> >         lb.setURL ("http://www.puszta.pl");
> >         URL[] urls = lb.getLinks ();
> >         for (int i = 0; i < urls.length; i++)
> >             System.out.println (urls[i]);
> >     }
> > }
> >
> > Exception in thread "main" java.lang.NullPointerException
> >         at LinkDemo.main(LinkDemo.java:11)
> >
> > I can deal with that page with low level lexer but there must by a way to
> > extract links from pages with mixed up encodings with NodeVisitor. Is it?
> >
> > Greets,
> > B
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: IBM Linux Tutorials
> > Free Linux tutorial presented by Daniel Robbins, President and CEO of
> > GenToo technologies. Learn everything from fundamentals to system
> > administration.http://ads.osdn.com/?ad_id 70&alloc_id638&op=click
> >
>
>
> hi,
>
> this is regarding java.lang.nullpointerException
>
> i am extracting urls using LinkBean
>
> LinkBean lb = new LinkBean ();
> lb.setURL ("http://www.puszta.pl");
> URL[] urls = lb.getLinks ();
>
> Instead of "http://www.puszta.pl" i am giving input from DB. Here am
> repeatedly
> executing the above code to extract urls of given website name from DB. In
> this
> case, its get executing well for around 1500 inputs when it goes more than
> that
> it throws java.lang.nullpointerException error.
>
> I am trying to fix this problem since last one week but i didn't get. I
> shall be
> grateful to you if you provide me solution for this.
>
> Thank indeed,,,
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
> The must-attend event for mobile developers. Connect with experts.
> Get tools for creating Super Apps. See the latest technologies.
> Sessions, hands-on labs, demos & much more. Register early & save!
> http://p.sf.net/sfu/rim-blackberry-1
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

Re: [Htmlparser-user] EncodingChangeException: character mismatch

From: Krishna A. <kri...@gm...> - 2011-08-08 14:15:31

Marcin <bigger <at> op.pl> writes:

> 
> Dear Derrick,
> 
> > >I get the following error:
> > >
> > >org.htmlparser.util.EncodingChangeException: character mismatch (new: ?
> !=
> > >old:
> > >¬) for encoding change from ISO-8859-2 to ISO-8859-1 at character offset
> > >4162
> > >Output from LinkExtractor example.
> > >
> > >If I'll try-catch it I won't get any resoult. What can I do with it?
> 
> > The exception is thrown because some of the nodes already given out are
> > in error.  You can try a second time after discarding the information
> > you've gained so far, like StringBean does:
> 
> Thank you for answer but I it's no good solution :( Please try LinkBean
> example with that code:
> 
> import java.net.URL;
> import org.htmlparser.beans.LinkBean;
> 
> public class LinkDemo
> {
>     public static void main (String[] args)
>     {
>         LinkBean lb = new LinkBean ();
>         lb.setURL ("http://www.puszta.pl");
>         URL[] urls = lb.getLinks ();
>         for (int i = 0; i < urls.length; i++)
>             System.out.println (urls[i]);
>     }
> }
> 
> Exception in thread "main" java.lang.NullPointerException
>         at LinkDemo.main(LinkDemo.java:11)
> 
> I can deal with that page with low level lexer but there must by a way to
> extract links from pages with mixed up encodings with NodeVisitor. Is it?
> 
> Greets,
> B
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click
> 


hi,

this is regarding java.lang.nullpointerException

i am extracting urls using LinkBean

LinkBean lb = new LinkBean ();
lb.setURL ("http://www.puszta.pl");
URL[] urls = lb.getLinks ();

Instead of "http://www.puszta.pl" i am giving input from DB. Here am repeatedly
executing the above code to extract urls of given website name from DB. In this
case, its get executing well for around 1500 inputs when it goes more than that
it throws java.lang.nullpointerException error.

I am trying to fix this problem since last one week but i didn't get. I shall be
grateful to you if you provide me solution for this.

Thank indeed,,,

[Htmlparser-user] Using Proxy Configuration

From: Duh ¨ <edu...@ho...> - 2011-08-01 21:24:36

Hello, I've been trying to set and use the SiteCapturer with proxy settings, to do so I use this:
        ConnectionManager manager = new ConnectionManager ();
        manager.setProxyHost("...");
        manager.setProxyPort(8080);
        manager.setProxyUser("...");
        manager.setProxyPassword("...");
        mParser.setConnectionManager(manager);
But all I ve got so far is this message:  org.htmlparser.util.ParserException: Connection timed out: connect;
java.net.ConnectException: Connection timed out: connect.

how do I should procede to use the siteCapturer application with proxy?

Thanks

Re: [Htmlparser-user] Can't extract any div's, redux.

From: Derrick O. <der...@gm...> - 2011-07-31 13:51:21

Using the FilterBuilder
tool<http://htmlparser.sourceforge.net/samples.html>is a good way to
play with filters.
Using that for a minute I got this code which fetches your storybook text:

import org.htmlparser.*;
import org.htmlparser.filters.*;
import org.htmlparser.beans.*;
import org.htmlparser.util.*;

public class StorytextFilter
{
    public static void main (String args[])
    {
        TagNameFilter filter0 = new TagNameFilter ();
        filter0.setName ("DIV");
        HasAttributeFilter filter1 = new HasAttributeFilter ();
        filter1.setAttributeName ("id");
        filter1.setAttributeValue ("storytext");
        NodeFilter[] array0 = new NodeFilter[2];
        array0[0] = filter0;
        array0[1] = filter1;
        AndFilter filter2 = new AndFilter ();
        filter2.setPredicates (array0);
        NodeFilter[] array1 = new NodeFilter[1];
        array1[0] = filter2;
        FilterBean bean = new FilterBean ();
        bean.setFilters (array1);
        if (0 != args.length)
        {
            bean.setURL (args[0]);
            System.out.println (bean.getNodes ().toHtml ());
        }
        else
            System.out.println ("Usage: java -classpath
.;htmlparser.jar;htmllexer.jar StorytextFilter <url>");
    }
}


Then you can apply the StringBuiler to the NodeList using the visitor
pattern.


2011/7/30 Jan Sokołowski <net...@gm...>

> Thanks for answering! However, I'm afraid it didn't help me much :(
>
> So, all I've changed in the code is the nodeFilter object ( now
> constructed as new AndFilter(new TagNameFilter("div"),new
> HasAttributeFilter("storytext")); )
> Then, I do the
> for(NodeIterator e = parser.elements(); e.hasMoreNodes();){
>                e.nextNode().collectInto(nodeList, nodeFilter);
>            }
>
> And according to nodeLIst.toNodeArray().lenght, there are no matching
> nodes.
>
> Therefore, I don't have anything to pass to anything you've said, not
> to mention I don't know, for example, what a StringBean is (that
> means, I've read the javadoc on your page, but I don't have the
> foggiest idea how to use it there) (And why couldn't I use the
> toPlainTextString() method? I'd like to get the inner HTML of div
> without removing any tags there, which StringBean removes, as I've
> noticed, unless I've misunderstood it) :(
> I'd be very thankful if you could elaborate more on what should I do
> there to make it work, please.
>
> By the way, how do I respond to the posts on that mailing list? I
> can't find the response option anywhere?
>
>
> ------------------------------------------------------------------------------
> Got Input?   Slashdot Needs You.
> Take our quick survey online.  Come on, we don't ask for help often.
> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
> http://p.sf.net/sfu/slashdot-survey
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

[Htmlparser-user] Can't extract any div's, redux.

From: Jan S. <net...@gm...> - 2011-07-30 20:21:07

Thanks for answering! However, I'm afraid it didn't help me much :(

So, all I've changed in the code is the nodeFilter object ( now
constructed as new AndFilter(new TagNameFilter("div"),new
HasAttributeFilter("storytext")); )
Then, I do the
for(NodeIterator e = parser.elements(); e.hasMoreNodes();){
                e.nextNode().collectInto(nodeList, nodeFilter);
            }

And according to nodeLIst.toNodeArray().lenght, there are no matching nodes.

Therefore, I don't have anything to pass to anything you've said, not
to mention I don't know, for example, what a StringBean is (that
means, I've read the javadoc on your page, but I don't have the
foggiest idea how to use it there) (And why couldn't I use the
toPlainTextString() method? I'd like to get the inner HTML of div
without removing any tags there, which StringBean removes, as I've
noticed, unless I've misunderstood it) :(
I'd be very thankful if you could elaborate more on what should I do
there to make it work, please.

By the way, how do I respond to the posts on that mailing list? I
can't find the response option anywhere?

Re: [Htmlparser-user] Problem with HTMLParser - I can't extract any div's.

From: Derrick O. <der...@gm...> - 2011-07-30 06:14:22

You should maybe filter for new AndFilter (new TagNameFilter("div"), new
HasAttributeFilter("storytext"))
and then pass the resulting (single) node to the StringBean for extracting
the text:
nodelist.visitAllNodesWith (stringbean)
The contents of the string bean after that should be the text you're looking
for.

2011/7/29 Jan Sokołowski <net...@gm...>

> I've got a small problem there, and I'd like to ask you to help me, please.
> Ok, so I'm trying to use HTMLParser in my project, and there's the problem
> -
> Example page that I'm trying to process:
> http://www.fanfiction.net/s/7229512/1/A_Horse_With_No_Name
>
> Looking at the source code, there's a div with id and class
> 'storytext' within a div with id and class 'storytextp', and there's a
> lot of <p> tags within the 'storytext' div. I want to extract the
> contents of that 'storytext' div to plain text string.
> That's what I'm trying to do:
>          NodeList nodeList = new NodeList();
>            NodeFilter nodeFilter = new AndFilter(new
> TagNameFilter("div"),new HasChildFilter(new TagNameFilter("p")));
>
>            for(NodeIterator e = parser.elements(); e.hasMoreNodes();){
>                e.nextNode().collectInto(nodeList, nodeFilter);
>            }
>
>            System.out.println(nodeList.toNodeArray().length);
>
>            for(Node node : nodeList.toNodeArray()){
>                System.out.println(node.toPlainTextString());
>            }
>
> The result? Lenght of nodeList.toNodeArray is equal to zero.
> Therefore, it means that I'm screwing something up there. I also tried
> using RegexFilter("storytext"), but this isn't working anyway.
> The question is, how should I do it?
> Please, help, I've been trying to run it past the last week :p
>
>
> ------------------------------------------------------------------------------
> Got Input?   Slashdot Needs You.
> Take our quick survey online.  Come on, we don't ask for help often.
> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
> http://p.sf.net/sfu/slashdot-survey
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

[Htmlparser-user] Problem with HTMLParser - I can't extract any div's.

From: Jan S. <net...@gm...> - 2011-07-29 07:44:41

I've got a small problem there, and I'd like to ask you to help me, please.
Ok, so I'm trying to use HTMLParser in my project, and there's the problem -
Example page that I'm trying to process:
http://www.fanfiction.net/s/7229512/1/A_Horse_With_No_Name

Looking at the source code, there's a div with id and class
'storytext' within a div with id and class 'storytextp', and there's a
lot of <p> tags within the 'storytext' div. I want to extract the
contents of that 'storytext' div to plain text string.
That's what I'm trying to do:
          NodeList nodeList = new NodeList();
            NodeFilter nodeFilter = new AndFilter(new
TagNameFilter("div"),new HasChildFilter(new TagNameFilter("p")));

            for(NodeIterator e = parser.elements(); e.hasMoreNodes();){
                e.nextNode().collectInto(nodeList, nodeFilter);
            }

            System.out.println(nodeList.toNodeArray().length);

            for(Node node : nodeList.toNodeArray()){
                System.out.println(node.toPlainTextString());
            }

The result? Lenght of nodeList.toNodeArray is equal to zero.
Therefore, it means that I'm screwing something up there. I also tried
using RegexFilter("storytext"), but this isn't working anyway.
The question is, how should I do it?
Please, help, I've been trying to run it past the last week :p

[Htmlparser-user] kbpo男人，为什么你挺不起来？！。。。ozrjkw

From: UnEpgPj2 <UnE...@v8...> - 2011-07-13 01:03:22

Attachments: Tseydf_5047.jpg

pfxqxj
你好 Htmlparser-user: 
qalp
xhobr
2011年07月13日wxsuyf
此致 祝商祺！dtdubxfuzklokh

[Htmlparser-user] 限量贈禮！台灣設計師款T恤免費帶回家

From: YiD3時尚誌 <cr...@ms...> - 2011-07-01 17:45:57

 <http://mail.webz.com.tw/HL/Id293I340108eI0I14afIbf9Ic96I5I111I800.htm>
<http://mail.webz.com.tw/HL/Jd294J340108eJ0J14afJbf9Jc96J5J111J800.htm>
<http://mail.webz.com.tw/HL/Jd294J340108eJ0J14afJbf9Jc96J5J111J800.htm>
<http://mail.webz.com.tw/HL/Jd294J340108eJ0J14afJbf9Jc96J5J111J800.htm>
<http://mail.webz.com.tw/HL/Md297M340108eM0M14afMbf9Mc96M5M111M800.htm>
<http://mail.webz.com.tw/HL/Md297M340108eM0M14afMbf9Mc96M5M111M800.htm>
<http://mail.webz.com.tw/HL/Md297M340108eM0M14afMbf9Mc96M5M111M800.htm>
<http://mail.webz.com.tw/HL/Id293I340108eI0I14afIbf9Ic96I5I111I800.htm>
<http://mail.webz.com.tw/HL/H0H340108eH0H14afHbf9Hc96H5H111H0.gif> 
 <http://count.mailz.com.tw/1.gif?43794>

790 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 8 9 10 11 12 .. 99 > >> (Page 10 of 99)