htmlparser-developer Mailing List for HTML Parser (Page 33)

Brought to you by: derrickoswald

htmlparser-developer — The developer mailing list of the htmlparser project

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (1)	Dec (4)
2002	Jan (12)	Feb	Mar (7)	Apr (27)	May (14)	Jun (16)	Jul (27)	Aug (74)	Sep (1)	Oct (23)	Nov (12)	Dec (119)
2003	Jan (31)	Feb (23)	Mar (28)	Apr (59)	May (119)	Jun (10)	Jul (3)	Aug (17)	Sep (8)	Oct (38)	Nov (6)	Dec (1)
2004	Jan (4)	Feb (4)	Mar (1)	Apr (2)	May	Jun (7)	Jul (6)	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan	Feb (1)	Mar	Apr (8)	May	Jun	Jul	Aug (2)	Sep (10)	Oct (4)	Nov (15)	Dec
2006	Jan	Feb (1)	Mar	Apr (4)	May (11)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2007	Jan (3)	Feb (2)	Mar	Apr (2)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (5)	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr (2)	May	Jun (4)	Jul	Aug (1)	Sep	Oct	Nov	Dec (2)
2010	Jan (1)	Feb	Mar	Apr (8)	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (1)	Dec
2011	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (2)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov (2)	Dec (2)

Flat | Threaded

<< < 1 .. 31 32 33 (Page 33 of 33)

[Htmlparser-developer] HTMLParser 1.0 released

From: Somik R. <so...@ya...> - 2002-01-03 20:04:50

Hi Folks,
    A new year present - HTMLParser 1.0 is released. We've finally made =
the transition from alpha to a beta stage. Modifications henceforth =
would only be of a maintenance nature and API should remain constant.
    There are huge changes in the architecture, and lots of bug fixes. =
Thanks a lot to Kaarle Kaaila for some great support and ideas. Thanks =
also to Rodney Foley, for some nice ideas for improvement. And thanks to =
everyone else who's been supporting this project.=20
    Looking forward to your continuing support, and wishing you a very =
happy new year.

Cheers,
Somik

Re: [Htmlparser-developer] Bugs fixed - pls check for release version

From: Somik R. <so...@ya...> - 2001-12-26 04:48:30

Merry Christmas to all (I do have it on my schedule :)

> I don't know if I have much to give here but I would
> remind that see that the binary file is OK and that it's name is
> OK. I once suggested that call it HTMLParser.jar and not
> Parse.jar as that is very close to the common XML-parser filename
(Parser.jar)
> and does not tell anything what it's about.

Thanks for the tips - I am planning to do a thorough job this time. And I
agree with you - changing the name is a good idea.

Regards,
Somik



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Re: [Htmlparser-developer] Bugs fixed - pls check for release version

From: Kaarle K. <kaa...@ik...> - 2001-12-25 23:28:53

At 16:27 25.12.2001 +0900, Somik Raha wrote:
>Hi Folks,

hi!

And Merry Christmas for those of you who have it in your schedule!

I don't know if I have much to give here but I would
remind that see that the binary file is OK and that it's name is
OK. I once suggested that call it HTMLParser.jar and not
Parse.jar as that is very close to the common XML-parser filename (Parser.jar)
and does not tell anything what it's about.

In 0.98 Parse.jar had also files in wrong classes.

regards
Kaarle

>     The two bugs that I mentioned in my last mail are fixed. The Robot 
> crawler now crawls thru Google very comfortably. The problem was the 
> inclusion of placeholder images (which dont use any real image files). 
> Also, a big bug in HTMLStyleScanner has been fixed - yahoo is getting 
> parsed fine.
>     And a big internal change - I have incorporated parseParameters() 
> (written by Kaarle Kaaila) finally, and it works great with the Image and 
> Link scanners. It has made the code of both the scanners much simpler to read.
>     Thanks Kaarle!
>
>     This is it for the release version. I need some help to make a decent 
> release - I want to create proper docs this time. If you folks can pitch 
> in, I'd be very grateful. Also, pls go thru the code, and see if u can 
> find any glaring bugs or changes. CVS is updated - more testcases are 
> added and all are passing.
>
>Cheers,
>Somik
>

---------------------------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...
tel: +358 50 3725844

[Htmlparser-developer] Bugs fixed - pls check for release version

From: Somik R. <so...@ya...> - 2001-12-25 07:32:33

Hi Folks,
    The two bugs that I mentioned in my last mail are fixed. The Robot =
crawler now crawls thru Google very comfortably. The problem was the =
inclusion of placeholder images (which dont use any real image files). =
Also, a big bug in HTMLStyleScanner has been fixed - yahoo is getting =
parsed fine.
    And a big internal change - I have incorporated parseParameters() =
(written by Kaarle Kaaila) finally, and it works great with the Image =
and Link scanners. It has made the code of both the scanners much =
simpler to read.
    Thanks Kaarle!

    This is it for the release version. I need some help to make a =
decent release - I want to create proper docs this time. If you folks =
can pitch in, I'd be very grateful. Also, pls go thru the code, and see =
if u can find any glaring bugs or changes. CVS is updated - more =
testcases are added and all are passing.
   =20
Cheers,
Somik
   =20

[Htmlparser-developer] Big Architecture Overhaul

From: Somik R. <so...@ya...> - 2001-12-24 09:35:04

Hi Folks,
    I have finally fulfilled my promise - a major overhaul of the design =
is done - all test cases are passing. I have updated the latest code on =
CVS. Ive tried to keep the interface consistent, so user applications =
wont break. The changes are mainly internal. However, big change is - =
you need to call registerScanners() on the parser object.=20
    No more confusing anonymous scanner registration. You can register =
by calling parser.addScanner(some scanner object), and also remove the =
same.
    Was able to do all this within an hour (thanks to the test cases).

    Bad news though - I discovered two bugs (which I verified, have =
existed earlier) -=20

[1] When scanning yahoo.com, the parser goes into an infinite loop
[2] In extractImageLocn(), there seems to be some problem in parsing =
dynamic links, in constructing relative paths.=20

Also extractImageLocn is badly in need of refactoring.

I think we can look forward to a release of HTMLParser 1.0 pretty soon =
with these two bugs fixed, and also incorporating parseParameters inside =
the Scanners' logic. Looking forward to your comments (bug findings) and =
help.

Cheers,
Somik

[Htmlparser-developer] Architecture Modified

From: Somik R. <so...@ya...> - 2001-11-13 16:56:19

Hi folks,
    I have modified the architecture, to include the change I spoke of =
last. Now, the parser throws an exception if no scanners have been =
registered. This feature can be turned off by setting a boolean flag, =
but by default it is set to true.
    Also, a static method called registerScanners is now available in =
HTMLParser, which will register some of the common scanners.
    Hopefully, this will alleviate much of the confusion being caused by =
the scanner registration process.

Regards,
Somik

[Htmlparser-developer] parseParameters

From: Kaarle K. <kaa...@ik...> - 2001-10-24 19:24:00

I looked at the different classes in HTMLParser on how to
utilize parseParaneters in parsing the tags.

I created another evaluate method in HTMLTagScanner that
uses the parsed parameters and it seemed to work OK
at least in some cases.

In HTMLTag you can see in scan method where I thought
the tag should be parsed. At the end of the code you can see the
methods to retrieve values from it how I thought it could be used.
They would be in use after scan method has been called.

Some problem that I had during my tests I think were
e.g. with JspTags. I don't know how well that one
is like the rest?

I have not put these changes into CVS as the TestCases gave
some errors that I did not have time to check.

Should we make changes in this direction?

Kaarle

---------------------------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...
tel: +358 50 3725844

[Htmlparser-developer] (no subject)

From: Kaarle K. <kaa...@ik...> - 2001-10-24 18:50:16

hi!

I have made modifications to htmlParser to modules

com\kizna\html\tags\HTMLLinkNode.java
com\kizna\html\tags\HTMLTag.java
com\kizna\html\scanners\HTMLLinkScanner.java
com\kizna\htmlTests\HTMLTagTest.java

I have modified the classes so that method getText() and parseParameters()
functions in HTMLTag even if LinkScanner is active.

Added some testcases too.

I hope it went OK!
It is now in CVS.

regards
Kaarle
---------------------------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...
tel: +358 50 3725844

[Htmlparser-developer] parseParameters with scanners activated

From: Kaarle K. <kaa...@ik...> - 2001-10-24 15:22:52

hi!

I have made modifications to htmlParser to modules

com\kizna\html\tags\HTMLLinkNode.java
com\kizna\html\tags\HTMLTag.java
com\kizna\html\scanners\HTMLLinkScanner.java
com\kizna\htmlTests\HTMLTagTest.java

I have modified the classes so that method getText()  and parseParameters()
functions in HTMLTag even if LinkScanner is active.

Added some testcases too.

I hope it went OK!
It is now in CVS.

regards
Kaarle

---------------------------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...
tel: +358 50 3725844

[Htmlparser-developer] Simple extarcting of data from html-file

From: Kaarle K. <kaa...@ik...> - 2001-10-13 18:17:18

Hi!

I have added the parseParameters method to
parse the parameters of the tag.

parseParameters does not function yet if some of the
listener classes have been registered but without
any listeners you can extract data you need from your html file.

This example is also in the class comments.
It parses all HREF parameters from all A tag's.

I use it myself for some more special extracts.

This version of HTMLTag.java you can find in the CVS-repository.

regards
Kaarle Kaila


        HTMLTag tag;
            Hashtable h;
            String tmp;
        try {
            HTMLReader in = new HTMLReader(new FileReader(path),2048);
            HTMLParser p = new HTMLParser(in);
            Enumeration en = p.elements();
            while (en.hasMoreElements()) {
                try {
                    tag = (HTMLTag)en.nextElement();
                    h = tag.parseParameters();
                    tmp = (String)h.get(tag.TAGNAME);
                    if (tmp != null && tmp.equalsIgnoreCase("A")) {;
                        System.out.println("URL is :" + h.get("HREF"));
                    }
                } catch (ClassCastException ce){}
            }
        }
        catch (IOException ie) {
            ie.printStackTrace();
        }


---------------------------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...

14 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 31 32 33 (Page 33 of 33)