htmlparser-announce Mailing List for HTML Parser (Page 3)

Brought to you by: derrickoswald

htmlparser-announce — Announcement list for releases

You can subscribe to this list here.

2002	Jan (6)	Feb	Mar (2)	Apr (1)	May	Jun (4)	Jul (3)	Aug (3)	Sep (1)	Oct (3)	Nov (2)	Dec (5)
2003	Jan (2)	Feb (4)	Mar (4)	Apr (3)	May (2)	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec
2004	Jan (1)	Feb (1)	Mar (1)	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2005	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2006	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep (1)	Oct	Nov (1)	Dec (6)
2007	Jan	Feb (6)	Mar (6)	Apr (6)	May (1)	Jun (1)	Jul (1)	Aug (27)	Sep (7)	Oct (4)	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (2)
2009	Jan	Feb	Mar (1)	Apr (15)	May (83)	Jun (72)	Jul (39)	Aug (14)	Sep (16)	Oct (30)	Nov (5)	Dec (4)
2010	Jan	Feb (1)	Mar (37)	Apr (57)	May (74)	Jun (66)	Jul (44)	Aug (54)	Sep (19)	Oct	Nov	Dec
2011	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 2 3 4 > >> (Page 3 of 4)

[Htmlparser-announce] Integration Release 1.3-20030323 is out

From: Somik R. <so...@ya...> - 2003-03-24 01:22:12

Hi Folks,
    This week's integration release has two important fixes :

Integration build 1.3 - 20030323
--------------------------------
[1] Fixed bug 702547 - single quotes parsed more robustly now
[2] Fixed bug 702614 - empty tags handled correctly now. Tag now has a
method isEmptyXmlTag().

#2 refers to tags like <tag/>.

Thanks to Joe Robbins for a fine bug report that helped in putting in the
fix for #1 faster. Thanks also to Marc Novakowski for the other report.

Thanks are also due to Huang-Chun Yu for uncovering a serious bug with the
script scanning mechanism. The parser can currently handle script tags like
:

<script>
<!--
    code here
-->
</script>

But when the tags are like:
<script>
    code here
</script>

the parser is unable to identify the code and treats it like regular tags.
Such pages are quite widespread and ought to be supported. I was curious if
anyone has ideas on solving this - given the existing design - fresh ideas
often lead to a better perspective. If you have some ideas, feel free to
join the developer list
(http://lists.sourceforge.net/lists/listinfo/htmlparser-developer) and post.

Regards,
Somik

[Htmlparser-announce] Major Milestone: Integration Release 1.3-20030316 is out

From: Somik R. <so...@ya...> - 2003-03-16 21:36:46

Hi Folks,
    This is a major milestone release. A massive refactoring has been
completed (took two weeks) - which has brought all the robust error handling
cases into CompositeTagScanner. This means, all tags that have children will
be able to do error correction uniformly. Form tag (and table tags too)
should be robust.

    Table tags are not yet in the standard set of scanners (you still need
to add them manually). They should make the cut next week.
    We have a new method - registerDomScanners() in Parser - that allows you
to build html dom objects.

    Interesting fact, as a result of the refactorings, the LOC of the
scanners package has reduced from 1553 to 1355 (I was surprised at the
digits).

    Documentation has been updated - we've started putting up answers by our
list members to common questions. Pls feel free to update the Wiki and
improve it. No login is required.

    From the change log:

Integration build 1.3 - 20030316
--------------------------------
[1] Added method finishedParsing() to NodeVisitor
[2] LinkScanner uses CompositeTagScanner.scan()
[3] BulletScanner added
[4] FormScanner uses CompositeTagScanner.scan()
[5] AppletScanner uses CompositeTagScanner.scan()

    We highly recommend an upgrade to this version.

Regards,
Somik

[Htmlparser-announce] Integration Release 1.3-20030302 is out

From: Somik R. <so...@ya...> - 2003-03-03 03:52:40

Hi Folks,
    In this week's release, the change log is :

Integration build 1.3 - 20030302
--------------------------------
[1] Fixed bug in LinkScanner
[2] Cleaned up StringNode interface
[3] Cleaned up RemarkNode interface
[4] Refactored Parser, created ParserHelper

Regards,
Somik

[Htmlparser-announce] Re: [Htmlparser-user] Integration Release 1.3-20030223 is out (API changes)

From: Somik R. <so...@ya...> - 2003-02-24 18:11:59

I was trying to integrate the changes of the latest
parser with some existing projects at work - and of
course, I had to modify the code to use the new API.

I had some suggestions - as I know many of you will be
facing the same issue. I use Eclipse, and I hope most
of you use a decent IDE that supports refactoring. Get
the parser into your IDE, and let all your other
project code refer to it (thats how it is setup in my
IDE). Then, rename Parser to HTMLParser using your
refactoring tool. Rename it back to Parser, and all
your existing code will automatically get fixed. Do
this for some other classes like HTMLNode/Node, etc..
and within minutes it should be done.

Regards,
Somik

--- Somik Raha <so...@ya...> wrote:
> Hi Folks,
>     This week's release is out. I've finally taken
> heed of all the feedback
> I had been receiving about the terrible naming
> convention, and have removed
> "HTML" from all class names. In addition,
> HTMLEnumeration is now
> NodeIterator and SimpleEnumeration is
> SimpleNodeIterator. HTMLParser is just
> Parser.
> 
>     This is a big step, so to make it easy for
> everyone, there have been no
> major bug fixes that will require you to upgrade
> right away. I apologize in
> advance for inconvenience caused - I hope you don't
> curse me too much for
> having to modify your programs. I had the option of
> doing it in stages, and
> forcing you to modify some small thing in every
> release, or get it over with
> in one sweep. I chose the latter bcos there were too
> many changes and
> suffering over a long period of time didn't make
> sense. Hopefully, once you
> have migrated to the new names, you will appreciate
> not having to type
> "HTML" each time.
> 
>     The BodyScanner contributed by Dhaval Udani is
> finally in (Dhaval -
> sorry for the delay).
>     The interesting part is that the documentation
> accompanying the package
> is now the latest one on the site - it has been
> ripped off a Php Wiki. I am
> thinking that the ripping program might be useful
> for those who wish to
> provide wiki content as offline documentation (any
> feedback on this is
> welcome).
> 
>     From the change log :
> Integration build 1.3 - 20030223
> --------------------------------
> [1] Modification of documentation packaging
> - the new documentation is actually produced
> by a tiny program that coverts wiki pages
> into documentation (works with PhpWiki)
> [2] Inclusion of BodyScanner, BodyTag
> [3] HTMLVisitor is now NodeVisitor - and has an
> extra param to
> visit itself
> [4] HTMLParser is now Parser. No class has HTML
> prefix anymore.
> [5] HTMLEnumeration is now NodeIterator,
> SimpleEnumeration is
> SimpleNodeIterator
> 
> Regards,
> Somik
> 
> 
> 
>
-------------------------------------------------------
> This SF.net email is sponsored by: SlickEdit Inc.
> Develop an edge.
> The most comprehensive and flexible code editor you
> can use.
> Code faster. C/C++, C#, Java, HTML, XML, many more.
> FREE 30-Day Trial.
> www.slickedit.com/sourceforge
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-user


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

[Htmlparser-announce] Integration Release 1.3-20030223 is out (API changes)

From: Somik R. <so...@ya...> - 2003-02-24 06:15:44

Hi Folks,
    This week's release is out. I've finally taken heed of all the feedback
I had been receiving about the terrible naming convention, and have removed
"HTML" from all class names. In addition, HTMLEnumeration is now
NodeIterator and SimpleEnumeration is SimpleNodeIterator. HTMLParser is just
Parser.

    This is a big step, so to make it easy for everyone, there have been no
major bug fixes that will require you to upgrade right away. I apologize in
advance for inconvenience caused - I hope you don't curse me too much for
having to modify your programs. I had the option of doing it in stages, and
forcing you to modify some small thing in every release, or get it over with
in one sweep. I chose the latter bcos there were too many changes and
suffering over a long period of time didn't make sense. Hopefully, once you
have migrated to the new names, you will appreciate not having to type
"HTML" each time.

    The BodyScanner contributed by Dhaval Udani is finally in (Dhaval -
sorry for the delay).
    The interesting part is that the documentation accompanying the package
is now the latest one on the site - it has been ripped off a Php Wiki. I am
thinking that the ripping program might be useful for those who wish to
provide wiki content as offline documentation (any feedback on this is
welcome).

    From the change log :
Integration build 1.3 - 20030223
--------------------------------
[1] Modification of documentation packaging
- the new documentation is actually produced
by a tiny program that coverts wiki pages
into documentation (works with PhpWiki)
[2] Inclusion of BodyScanner, BodyTag
[3] HTMLVisitor is now NodeVisitor - and has an extra param to
visit itself
[4] HTMLParser is now Parser. No class has HTML prefix anymore.
[5] HTMLEnumeration is now NodeIterator, SimpleEnumeration is
SimpleNodeIterator

Regards,
Somik

[Htmlparser-announce] Integration Release 1.3-20030215 is out

From: Somik R. <so...@ya...> - 2003-02-16 04:33:26

Hi Folks,
    Integration release 1.3-20030215 is out.

From the change log:
Integration build 1.3 - 20030215
--------------------------------
[1] Added HtmlScanner
[2] Removed Table, Div and Span from registry of scanners,
can still be added individually
[3] Reference test directory of project home page to maybe cure some
sporadic errors in BeanTest.
[4] Added setAttribute method
[5] Cleaned up HTMLNode interface (removed TYPE, getType() and print())

With HtmlScanner, you can now get the entire page - sort of a DOM model in a
Html object. Useful for testing.

Regards,
Somik

[Htmlparser-announce] Integration release 1.3-20030202 is out

From: Somik R. <so...@ya...> - 2003-02-03 07:21:35

Hi Folks,
    Integration release 1.3-20030202 is out.

From the change log :

Integration build 1.3 - 20030202
--------------------------------
[1] Renamed HTMLCompositeTagScanner to CompositeTagScanner
[2] Renamed HTMLTag.getParameter() to HTMLTag.getAttribute()
[3] Added TableScanner
[4] Added HtmlPage
[5] Added SpanScanner
[6] Added assertType in HTMLParserTestCase
[7] Added TextExtractingVisitor
[8] Added non-recursive visiting (flag in HTMLVisitor)
[9] Added DivScanner
[10] Modified collectInto to use NodeList
[11] Added collectInto(NodeList, Class)
[12] CompositeTagScanner can handle single xml-like tags e.g. <div/>
[13] Fixed bug 678969 - StringParser was not going into ignore mode on
encountering double quotes
[14] Added LabelScanner

Dhaval Udani has contributed LabelScanner. (He has also contributed a
BodyScanner which will make it next week's release).

We've shipped this time with two tests failing-  both tests replicate the
same bug - 677874 - "mishandling of double quotes". I made this release for
two reasons :
[1] This bug is not a new addition but was always there - its a deep bug in
AttributeParser (previously known as ParameterParser) - and it might take a
little time to fix
[2] There are lot of new additions which we'd like to get out there - we
finally have a table scanner!
[3] Important bug fixes have been made which further stabilize the parser's
performance (and at least one user was desperately waiting for the fix)

Notable addition - HTMLNode.collectInto() has a new mode of operation -
using the class type.
Suppose you need to get to a node (e.g. images) that is within a composite
(like a table), you can do :
NodeList imageList = new ImageList();
tableTag.collectInto(imageList,HTMLImageTag.class);

You can also do this directly from the parser - like so :
HTMLNode node [] = parser.extractAllNodesThatAre(HTMLLinkTag.class);

And here's some more news - we now have our own wiki (finally!). Go to
http://htmlparser.sourceforge.net/docs/
This is a free-for-all wiki. It is a little too much for me to write the
entire documentation on my own - so I'd highly appreciate if the
user/developer community pitches in - that would be a great benefit for the
community. The current documentation on the site is already obsolete, and I
am going to take it down soon (hopefully by the next release).

Regards,
Somik

[Htmlparser-announce] Integration Release 1.3-20030125 is out

From: Somik R. <so...@ya...> - 2003-01-25 23:41:45

Hi Folks,
    The next integration release is out. From the change log :

Integration build 1.3 - 20030125
--------------------------------
[1] HTMLCompositeTagScanner now takes an array of match strings
[2] toHTML(HTMLRenderer ...) was replaced by UrlModifyingVisitor
[3] Fixed NullPointerException in HTMLScriptTag.toString()
[4] Fixed bug in HTMLStringNode (breaking up empty lines into seperate
string nodes)
[5] Fixed thread safety issue and introduced parser helpers
[6] Fixed bug 664404 - spewing incorrect line breaks in
HTMLRemarkNode.toHTML()
[7] Added assertXmlEquals() in HTMLParserTestCase
[8] Added better option tag support
[9] Replaced instanceof with getType() mechanism - much faster
[10] Incorporated NodeList instead of Vector in HTMLCompositeTag
[11] Added HTMLRemarkNode support in Visitor
[12] Fixed bug 673379 (infinite loop on encountering links like
".someurl.html")

Among the notable additions is assertXmlEquals() - this is present to enable
us to perform xml testing. This method actually creates the parser and
performs a node for node comparison.

Reconstruction has improved a lot - you will find that the parser now does
not add unnecessary line breaks - and preserves the html as it came in.

One significant addition is the use of NodeList instead of Vector. The
integration has been performed, so there should be a significant performance
increase - check
http://htmlparser.sourceforge.net/performance/simpleEnumerationPerformance.h
tml

In the coming week, we will be setting up a wiki on sourceforge, where we
can collaboratively create documentation - hopefully that will finally take
the burden out of the documentation process.

Regards,
Somik

[Htmlparser-announce] Integration Release 1.3-20020112 is out

From: Somik R. <so...@ya...> - 2003-01-13 04:50:14

Hi Folks,
    This week's integration release is out. This release has significant
contributions from Derrick Oswald and Josh Kerievsky. Derrick is building a
nice UI for the parser - and making tons of improvements. Thanks to Josh's
insight, we have done some major refactorings on the scanners - resulting in
a massive drop in code duplication. Here are some statistics - the scanners
package in the last release had 1693 lines of code. In the current release,
this has dropped to 1300 lines of code.

We have a new class HTMLCompositeTagScanner which does the hard-work for
picking up child tags. Most scanners use this code. HTMLTagScanner too does
some useful work-  and from this release, new scanners dont need to override
evaluate() or scan(). Take a look at the refactored scanner code and you
might be surprised with its size and simplicity.

    Here's the change log :

Integration build 1.3 - 20030112
--------------------------------
[1] Assume charset is correct for JVM's without Charset class to check it
[2] Beanize the parser
[3] Switch to swingui junit runner by default
[4] Half baked beans
[5] Fix javadoc warnings in JDK 1.4
[6] Added StringFindingVisitor + test code + new visitors packages
[7] Fixed bug 659723, but HTMLStringNode is not thread-safe anymore.
[8] JDK 1.2 compilability
[9] Modified HTMLEnumeration interface (made less verbose)
[10] Added HTMLCompositeTagScanner
[11] Refactored following scanners to use HTMLCompositeTagScanner :
    (i) HTMLStyleScnner
    (ii) HTMLSelectScanner
    (iii) HTMLFrameSetScanner
    (iv) HTMLTitleScanner
    (v) HTMLTextAreaScanner
    (vi) HTMLScriptScanner
    (vii) HTMLFrameSetScanner
[12] Made StringNode the last parse attempt, so now Reader trys in this
order:
remark
tag
endtag
string
(this will return more HTMLStringNode objects than it did before).
[13] Improve speed by performing tag/string triage based on '<' as next
character.
[14] Refactored HTMLTagScanner. The following scanners use refactored code:
    (i) HTMLBaseHREFScanner
    (ii) HTMLDoctypeScanner
    (iii) HTMLFrameScanner
    (iv) HTMLJspScanner
    (v) HTMLMetaTagScanner

Regards,
Somik

[Htmlparser-announce] Integration Release 1.3 - 20021228 is out

From: Somik R. <so...@ya...> - 2002-12-29 08:09:57

Hi Folks,
    The integration release for this week is out.
    You can download it from http://htmlparser.sourceforge.net

Integration build 1.3 - 20021228
--------------------------------
[1] Added URLConnection constructors to HTMLParser
[2] Honour charset parameter on HTTP header and in HTML meta tag
[3] Following tags now inherit from HTMLCompositeTag
 (i) HTMLFormTag
 (ii) HTMLLinkTag
 (iii) HTMLSelectTag
 (iv) HTMLFrameSetTag
 (v) HTMLTitleTag
 (vi) HTMLTextAreaTag
 (vii) HTMLStyleTag
 (viii) HTMLScriptTag
 (ix) HTMLAppletTag

[4] Performed Refactoring "Introduce Parameter Object" on HTMLTag,
HTMLCompositeTag, HTMLLinkTag, HTMLFormTag
[5] Refactored HTMLFormTag, pulling up the search methods into
HTMLCompositeTag
[6] Added HTMLVector, which can return HTMLSimpleEnumeration - a
no-exception flavor of HTMLEnumeration
[7] Refactored HTMLEnumeration - created new interface -
HTMLPeekingEnumeration

Notes : HTMLVector is not yet integrated with the tags. That should happen
in the next release.

Regards,
Somik

[Htmlparser-announce] HTMLParser 1.2 (Production Release) is out

From: Somik R. <so...@ya...> - 2002-12-22 03:24:26

Hi Folks,
    Finally, after 8 months of hard work, we have the next production
release of the parser. 1.2 has tons of bug fixes and features.

    The change log difference b/w 1.2 and 1.1 is too big to be listed in
this mail - check the change log when you are downloading (its also in the
download package). Documentation has been considerably improved (the Sample
programs would be the place to start). There's a section on the patterns in
action as well. You can modify the rendering process for links and images,
as well as provide collecting parameters to pick up nodes that you wish
(currently images and links supported).

    Below is the change log (as compared to last week's integration release)
:
Production Release 1.2
----------------------
[1] Rewrote HTMLLinkProcessor.extract() so URL class does all the heavy
lifting
[2] Partially fixed bug 654746 - HTMLLinkScanner error, code review needed
[3] Rendering bug fixed - allowing uniform rendering for links and images
[4] Fixed bug 655917, made HTMLParameterParser.parseParameters() thread-safe
[5] Refactored HTMLFormTag (introduced POST and GET static members)
[6] Bug fixed in HTMLFormTag.getInputTag() (NullPointerException when input
tag has no name)
[7] Added ability to get textarea tag from HTMLFormTag.
[8] Added search capability in HTMLFormTag
[9] Fixed bug 655627 - JSP tags with < sign (for loops) were not being
parsed correctly
[10] Fixed bug 655603 - JSP tags within src of script not recognized
correctly when using
single apostrophes
[11] Fixed bug 655580 - JSP tags within title tags not recognized correctly
[12] Fixed bug 655599 - Erroneous end-of-line characters were being added in
string nodes
[13] Fixed bug 656870 - HTMLFormScanner goes into infinite loop if a
previous link has not been closed

    Thanks to Derrick Oswald and Dhaval Udani for their work on the last few
releases. Thanks to Joe Robins for pointing out an important bug in
HTMLFormScanner. A special mention for Dhaval - all his bug reports come
with testcases making it really easy for us to reproduce the bug and fix
them.

Regards,
Somik

[Htmlparser-announce] Candidate 6 is out

From: Somik R. <so...@ya...> - 2002-12-15 09:29:43

Hi Folks,
    Candidate 6 is out, and there are some goodies in this one.. Thanks to
Derrick Oswald and Leslie Rohde (our two new developers) who have put in
their time.

From the Change Log :
Integration Build 1.2 - 20021215
--------------------------------
[1] Modified API of HTMLImageTag (refactored name of image loc), HTMLLinkTag
(added getters)
[2] Fixed bug 650457 - removeEscapeCharacters() incorrect
[3] Fixed bug 652263 - HTMLParser and null feedback
[4] Changed encoding used from 8859_4 to 8859_1
[5] HTMLRemarkNode returns string data in toPlainTextString() (This is a
rollback)
[6] Fixed bug 652746 - HTMLFormTag gets links correctly now
[7] Fixed bug 653720 - HTMLNode uses sun specific class
[8] Improved StringExtractor parser application
[9] Major design improvement, implemented Collection-Parameter pattern - in
HTMLNode.collectInto()
[10] Fixed reset crash bug. Reader providers have to explicitly call mark
and reset now. This is
now documented in HTMLParser.java.
[11] Fixed bug 649269 in HTMLLinkTag.isHttpLink(), now correctly identifies
relative links as Http
links.

A major API improvement has occurred - HTMLNode now has a new method -
collectInto(), which uses a collection parameter to collect nodes. A sample
program demonstrating this feature is at :
http://htmlparser.sourceforge.net/samples/linksEmbedded.html

Thanks to everyone who participated in the discussions and architecture
changes. There has been a rollback as well, we've taken out the mark and
reset mechanism, and this is now the responsibility of the reader supplier.

Cheers,
Somik

[Htmlparser-announce] Candidate 5 is out

From: Somik R. <so...@ya...> - 2002-12-09 01:28:26

Hi Folks,
    This week's release is Candidate 5. We've had talented developers
joining us over the weekend, hence, you can expect improvements in quality
in the coming weeks. Hopefully, we should have our production release ready
by New Year's...

From the change log :
Integration Build 1.2 - 20021208
---------------------------------
[1] Fixed bug in base href scanner - would always expect href
[2] Refactored HTMLFormScanner
[3] Refactored HTMLRenderer to use the Visitor pattern- enabling
connections with links and images
[4] HTMLStringNode returns a blank string in toPlainTextString()
[5] HTMLFormTag returns string information in toPlainTextString()

#5 is an important fix as now, we wont lose any meaningful string info
contained inside forms when we issue calls like node.toPlainTextString().

Get the latest release from http://htmlparser.sourceforge.net

The site update is continuing at an even pace. There is a new section on
writing tests for HTMLParser. We're also trying to introduce a philosophy
called "Communicate with TestCases". If you've found a bug, write a testcase
for it, and submit that in your report. Of course, you dont have to do this,
but if you do, we'd be able to make the fix much faster (and motivated to
make the fix). Writing a testcase for the parser is super simple - you can
check the philosophy and an example on the documentation page.

http://htmlparser.sourceforge.net/design/index.html

Regards,
Somik

[Htmlparser-announce] Candidate Release 4 is out

From: Somik R. <so...@ya...> - 2002-12-02 02:56:54

Hi Folks,
    Candidate Release 4 is out. This actually contains a few minor API =
changes which wont affect your application, but have been done to =
improve the OO design of the system. HTMLFormScanner has been improved. =
The major work in this release went in refactoring 201 testcases - so as =
to make it more readable, and follow the Once-And-Only-Once paradigm. =
Well, the package size dropped about 12KB (after zipping), so you can =
estimate how much refactoring was done.. All tests are passing.

    From the Change Log,=20

Integration Build 1.2 - 20021201
--------------------------------
[1] Refactored HTMLNode, API improved, now HTMLNode stores
nodeBegin and nodeEnd.
[2] Refactored Testing framework - to reduce the code size =
substantially.
[3] HTMLFormScanner improved to include Input,TextArea, Select and =
Option scanners within

You can get it from http://htmlparser.sourceforge.net
There's an all-new Contributors Page (linked from the main site). Just =
in case I missed anybody, or you have info to add, pls let me know.

Regards,
Somik

[Htmlparser-announce] HTMLParser 1.2 candidate 3 released

From: Somik R. <so...@ya...> - 2002-11-26 06:38:12

Hi Folks,
    Candidate 3 is out. You can get it from
http://htmlparser.sourceforge.net
    The website is getting an overhaul, though this is in progress. You will
find a new samples page.

    If anyone wishes to contribute a simple program to add to the catalog,
please feel free to come forward.

From the change log, in this release :

Integration Build 1.2 - 20021125
--------------------------------
[1] Incorporated Bug Fix for HTMLLinkProcessor to parse dynamic urls
[2] Refactored package names to org.htmlparser
[3] Added documentation
[4] Can handle url with spaces in it
[5] Fixed bug 643352 - going into infinite loop on bad img within link
[6] Refactored HTMLLinkTag - unnecessary boolean variables removed

Regards,
Somik

[Htmlparser-announce] Candidate Release 2 is out

From: Somik R. <so...@ya...> - 2002-11-09 18:44:13

Hi Folks,
    Candidate Release 2 is out.
    Changes are :
[1] Updated javadoc
[2] Added support for multiple calls to elements() [sequentially, not =
parallelly]

The latter implies, you can complete one round of parsing, and make =
another call to HTMLParser.elements() to begin another, without needing =
to recreate the parser object.

Regards,
Somik

[Htmlparser-announce] HTMLParser Candidate Release 1 is out

From: Somik R. <so...@ya...> - 2002-10-31 12:12:38

Hi Folks,
    HTMLParser 20021031 (C1) is out. This is candidate release 1. If =
there are no issues, then this will become a production release.
   =20
    There are bug fixes in this release, and some improvements. Most =
important improvement - allowing renderers to be plugged in so as to =
allow customization of functionality of toHTML(). Check the javadoc of =
com.kizna.html.HTMLNode.

    Feedback will help us finalize this version, and is eagerly awaited. =


Regards,
Somik

[Htmlparser-announce] Integration release 1.2-20021016 is out

From: Somik R. <so...@ya...> - 2002-10-16 10:59:16

Hi Folks,
    Integration release 1.2-20021016 is out. You can get it from
http://htmlparser.sourceforge.net

Here's the change log :

Integration Build 1.2 - 20021016
--------------------------------
[1] Fixed bug 621117 - JSP tags not recognized if within string node
[2] Fixed bug 617228 - Links with > symbol in query strings were not
being recognized.
[3] build.xml completely automatic - no manual changes needed before running
[4] build.xml included in release package, inside src.zip
[5] Refactored HTMLTag - design modified, introduced HTMLTagParser helper
class
[6] Optimized scanning process - 20% faster now

There have been some refactorings and optimizations in this release. Most
notably, the scanners are not enumerated sequentially anymore. Instead, they
are stored inside hashtables, and are identified by the first word that
occurs in a tag (in uppercase).  Now, we have a default implementation of
evaluate() which returns true, and most of the scanners dont override this
if their evaluation is simply based on matching the first word. However, if
the matching logic is complex, then evaluate() should be overridden.

An additional method has been introduced in HTMLTagScanner() which all
scanners have to override - and that is - getID() - which will be used to
register the scanner into the hashtable (called only once) inside
addScanner().

In addition feedback is being incorporated - you will find feedback if you
run the testcases.

The performance improvement is substantial - on running
com.kizna.htmlTests.PerformanceTest.java - I could see a reduction of 500 ms
(with all scanners registered) from 2500 ms to 2000 ms (run on the MySQL
installation guide page).

For developers (or folks who want to join) - the build script has been
included in the distribution (it is a whole lot more powerful now -
autodetects code version, etc..). Making your package ready for distribution
is exceedingly simple now - so do go ahead and explore.

Regards,
Somik

[Htmlparser-announce] New Integration Release out - v1.2-2002_10_02

From: Somik R. <so...@ya...> - 2002-10-02 03:18:41

Hi Folks,
    The latest integration release of HTMLParser has some bug fixes, but =
the biggest improvements is the addition of a base ref scanner. Now, =
pages with base ref urls can be easily picked up, and images and links =
resolved accordingly.

    You can download it from http://htmlparser.sourceforge.net       =20

Regards,
Somik

[Htmlparser-announce] Integration Release 2002_08_31 out

From: Somik R. <so...@ya...> - 2002-09-01 03:48:33

Hi Folks,
    2002_08_31 is out. Changes :
[1] Feedback integrated into the API. Not yet functional - but will be =
over the next few releases. The API change has been put in early. This =
is the last planned change in the API for production release - 1.2.
[2] End of Line String implemented across all scanners.=20

You can download it from http://htmlparser.sourceforge.net

Regards,
Somik

[Htmlparser-announce] HTMLParser 1.2-2002_08_26

From: Somik R. <so...@ya...> - 2002-08-26 01:46:18

Hi Folks,
    Integration Release 1.2-2002_08_26 is out. Major improvement is =
handling of newline characters is totally customizable. You can set your =
own line separator ("\n" or "\r\n"), or have the parser auto-detect it =
from your JVM. This is useful when you perform platform-specific =
reconstructions using toHTML(). So, you wont see the funny characters =
that occur due to cross-platform incompatibility of end-of-line =
characters.=20
    For the complete change log, check the download page.

Regards,
Somik

[Htmlparser-announce] HTMLParser v1.2-2002-08-11 released

From: Somik R. <so...@ya...> - 2002-08-10 08:12:31

Hi Folks,
    The next integration release (v1.2-2002-08-11) is out. Has =
significant bug fixes and API changes.
    Check http://htmlparser.sourceforge.net=20
Regards,
Somik
**********************************
Somik Raha
System Architect
Kizna Corporation
Hiroo ON Bldg. 2F, 5-19-9 Hiroo,
Shibuya-ku, Tokyo,=20
150-0012, JAPAN
Phone : +81-3-5475-2646
Fax     : +81-3-3445-9089
Web   : http://www.kizna.com
Mail    : so...@ki...
**********************************

[Htmlparser-announce] New Integration Release is out

From: Somik R. <so...@ya...> - 2002-08-04 07:07:14

Hi Folks,
    HTMLParser 1.2- 2002_08_04 is out. Major API changes have occurred - =
chained exception handling, which will allow applications to handle =
exceptions. Lots of important bug fixes done.
    Note : 1 known bug still exists in parseParameters() - so you would =
see two failing testcases, but this bug is minor, and will be fixed in =
the next release. We would appreciate feedback on the API changes in the =
user list.

    Check http://htmlparser.sourceforge.net.=20

Regards,
Somik

[Htmlparser-announce] Integration Release 1.2-2002_07_28

From: Somik R. <so...@ya...> - 2002-07-28 07:20:37

Hi Folks,
    This release contains a lot of important bug fixes.  You can get it =
from http://htmlparser.sourceforge.net

Regards,
Somik

[Htmlparser-announce] Integration Release 2002_07_21 is out

From: Somik R. <so...@ya...> - 2002-07-21 06:06:03

Hi Folks,
    A new integration release is out - 2002-07-21.  It contains 4 bug =
fixes, and the code is refactored and a bit more optimized.

Regards,
Somik

664 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 > >> (Page 3 of 4)