htmlparser-user Mailing List for HTML Parser (Page 81)

Brought to you by: derrickoswald

htmlparser-user — The user mailing list for users of the htmlparser library

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2002	Jan (7)	Feb	Mar (9)	Apr (50)	May (20)	Jun (47)	Jul (37)	Aug (32)	Sep (30)	Oct (11)	Nov (37)	Dec (47)
2003	Jan (31)	Feb (70)	Mar (67)	Apr (34)	May (66)	Jun (25)	Jul (48)	Aug (43)	Sep (58)	Oct (25)	Nov (10)	Dec (25)
2004	Jan (38)	Feb (17)	Mar (24)	Apr (25)	May (11)	Jun (6)	Jul (24)	Aug (42)	Sep (13)	Oct (17)	Nov (13)	Dec (44)
2005	Jan (10)	Feb (16)	Mar (16)	Apr (23)	May (6)	Jun (19)	Jul (39)	Aug (15)	Sep (40)	Oct (49)	Nov (29)	Dec (41)
2006	Jan (28)	Feb (24)	Mar (52)	Apr (41)	May (31)	Jun (34)	Jul (22)	Aug (12)	Sep (11)	Oct (11)	Nov (11)	Dec (4)
2007	Jan (39)	Feb (13)	Mar (16)	Apr (24)	May (13)	Jun (12)	Jul (21)	Aug (61)	Sep (31)	Oct (13)	Nov (32)	Dec (15)
2008	Jan (7)	Feb (8)	Mar (14)	Apr (12)	May (23)	Jun (20)	Jul (9)	Aug (6)	Sep (2)	Oct (7)	Nov (3)	Dec (2)
2009	Jan (5)	Feb (8)	Mar (10)	Apr (22)	May (85)	Jun (82)	Jul (45)	Aug (28)	Sep (26)	Oct (50)	Nov (8)	Dec (16)
2010	Jan (3)	Feb (11)	Mar (39)	Apr (56)	May (80)	Jun (64)	Jul (49)	Aug (48)	Sep (16)	Oct (3)	Nov (5)	Dec (5)
2011	Jan (13)	Feb	Mar (1)	Apr (7)	May (7)	Jun (7)	Jul (7)	Aug (8)	Sep	Oct (6)	Nov (2)	Dec
2012	Jan (5)	Feb	Mar (3)	Apr (3)	May (4)	Jun (8)	Jul (1)	Aug (5)	Sep (10)	Oct (3)	Nov (2)	Dec (4)
2013	Jan (4)	Feb (2)	Mar (7)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar (1)	Apr	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (4)	Nov (2)	Dec (4)
2015	Jan (4)	Feb (2)	Mar (8)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug (1)	Sep (1)	Oct (4)	Nov (3)	Dec (4)
2016	Jan (4)	Feb (6)	Mar (9)	Apr (9)	May (6)	Jun (1)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2017	Jan	Feb (1)	Mar (3)	Apr (1)	May	Jun (1)	Jul (2)	Aug (3)	Sep (6)	Oct (3)	Nov (2)	Dec (5)
2018	Jan (3)	Feb (13)	Mar (28)	Apr (5)	May (4)	Jun (2)	Jul (2)	Aug (8)	Sep (2)	Oct (1)	Nov (5)	Dec (1)
2019	Jan (8)	Feb (1)	Mar	Apr (1)	May (4)	Jun	Jul (1)	Aug	Sep	Oct	Nov (2)	Dec (2)
2020	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (2)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec (1)
2021	Jan (3)	Feb (2)	Mar (1)	Apr (1)	May (2)	Jun (1)	Jul (2)	Aug (1)	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2024	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 79 80 81 82 83 .. 99 > >> (Page 81 of 99)

[Htmlparser-user] Major Milestone: Integration Release 1.3-20030316 is out

From: Somik R. <so...@ya...> - 2003-03-16 21:36:46

Hi Folks,
    This is a major milestone release. A massive refactoring has been
completed (took two weeks) - which has brought all the robust error handling
cases into CompositeTagScanner. This means, all tags that have children will
be able to do error correction uniformly. Form tag (and table tags too)
should be robust.

    Table tags are not yet in the standard set of scanners (you still need
to add them manually). They should make the cut next week.
    We have a new method - registerDomScanners() in Parser - that allows you
to build html dom objects.

    Interesting fact, as a result of the refactorings, the LOC of the
scanners package has reduced from 1553 to 1355 (I was surprised at the
digits).

    Documentation has been updated - we've started putting up answers by our
list members to common questions. Pls feel free to update the Wiki and
improve it. No login is required.

    From the change log:

Integration build 1.3 - 20030316
--------------------------------
[1] Added method finishedParsing() to NodeVisitor
[2] LinkScanner uses CompositeTagScanner.scan()
[3] BulletScanner added
[4] FormScanner uses CompositeTagScanner.scan()
[5] AppletScanner uses CompositeTagScanner.scan()

    We highly recommend an upgrade to this version.

Regards,
Somik

Re: [Htmlparser-user] html code parsing

From: Derrick O. <Der...@ro...> - 2003-03-15 20:59:22

Guilherme,

I think what you need is in
src/org/htmlparser/util/Translate.java

Something like this should work:
String htmltext = Translate.encode (resultset.getString ("databasetext"));

If you have to do a lot of it though, you'll probably want to rewrite 
that method.
As it stands it allocates one Character for each character in the input 
string.
If you do want to rewrite it, you should probably instead adjust the 
Generate class
in the same package since the Translate.java source is created by 
running Generate.

Derrick

>To: htm...@li...
>Date: Fri, 14 Mar 2003 20:40:12 +0000 (WET)
>From: Guilherme Zambon <gz...@sa...>
>Subject: [Htmlparser-user] html code parsing
>Reply-To: htm...@li...
>
>Anyone using htmlparser to parse ", <, > from user input to
>&quot;, &lt; and &gt; ?
>I have the following scenario:
>my database has texts with these chars (",< and >) and I have to
>put them from database to a <textarea> in the html. Is there any
>taglib or other solution to I filter this database information,
>to show in a html form field?
>
>Thanks in advance,
>
>Guilherme Zambon
>
>Example of code that I need to threat:
>
><textarea><%= rs.getString("databasetext") %></textarea>
>
>it generates something like
><textarea>a text with < won't work in a html</textarea>
>
>and I want something like
><textarea><sometag:encode string="<%=
>rs.getString("databasetext")" /></textarea>
>
>--
>SAPO ADSL.PT, apanhe já o comboio da Banda Larga. Kit SAPO ADSL.PT €50
>
>hTTP://www.sapo.pt/kitadsl
>
>  
>

[Htmlparser-user] html code parsing

From: Guilherme Z. <gz...@sa...> - 2003-03-14 20:40:22

Anyone using htmlparser to parse ", <, > from user input to
&quot;, &lt; and &gt; ?
I have the following scenario:
my database has texts with these chars (",< and >) and I have to
put them from database to a <textarea> in the html. Is there any
taglib or other solution to I filter this database information,
to show in a html form field?

Thanks in advance,

Guilherme Zambon

Example of code that I need to threat:

<textarea><%= rs.getString("databasetext") %></textarea>

it generates something like
<textarea>a text with < won't work in a html</textarea>

and I want something like
<textarea><sometag:encode string="<%=
rs.getString("databasetext")" /></textarea>

--
SAPO ADSL.PT, apanhe já o comboio da Banda Larga. Kit SAPO ADSL.PT 50

hTTP://www.sapo.pt/kitadsl

Re: [Htmlparser-user] Re: Parsing td tr and table

From: Somik R. <so...@ya...> - 2003-03-14 06:36:50

> 1) does this mean that I will do the same way for TableRowScanner and
> TableColumnScanner or will I extend those from TableScanner.
>
No - actually TableScanner takes care of related scanners (column and row).
So if you register TableScanner alone, you should be fine.

> 2) and should this work in cases like <td ...><img src=..></td>

Yes of course. Just make sure that you also call registerScanners(), if you
want to pick up image tags within the td.

Regards,
Somik
----- Original Message -----
From: "ja...@jo... Jokisalo" <jan...@ho...>
To: <htm...@li...>
Sent: Wednesday, March 12, 2003 9:52 PM
Subject: [Htmlparser-user] Re: Parsing td tr and table


> Thank you Somik!
>
> 1) does this mean that I will do the same way for TableRowScanner and
> TableColumnScanner or will I extend those from TableScanner.
>
> 2) and should this work in cases like <td ...><img src=..></td>
>
> Thanks for good product! --Janne
>
> ---------
> parser.registerScanners();
> parser.addScanner(new TableScanner(parser));
> Node[] tables =
> parser.extractAllNodesThatAre(TableTag.class);
> // you can cast each table to a TableTag and do
> // what you want..
>
> Regards,
> Somik
>
>
>
> _________________________________________________________________
> The new MSN 8: advanced junk mail protection and 2 months FREE*
> http://join.msn.com/?page=features/junkmail
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by:Crypto Challenge is now open!
> Get cracking and register here for some mind boggling fun and
> the chance of winning an Apple iPod:
> http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user

Re: [Htmlparser-user] problem parsing Chinese character website

From: Derrick O. <Der...@ro...> - 2003-03-13 12:30:07

I gave this problem a cursory look and it appears that the input stream 
opened with that charset doesn't return any lines.
I didn't have time to construct a simple test case and I'm not sure a 
US-English system is the best platform to test this.

Derrick

>Message: 3
>From: "Somik Raha" <so...@ya...>
>To: <htm...@li...>
>Subject: Re: [Htmlparser-user] problem parsing Chinese character website
>Date: Tue, 11 Mar 2003 22:37:51 -0800
>Reply-To: htm...@li...
>
>Derrick, Amit - any ideas ?
>
>----- Original Message -----
>From: "Joe Lin" <gu...@ya...>
>To: <htm...@li...>
>Sent: Saturday, March 08, 2003 1:32 AM
>Subject: [Htmlparser-user] problem parsing Chinese character website
>
>
>  
>
>>Hi,
>>
>>It seems that the parser has problem handling Chinese
>>chracters. I experiment with a simple web page as
>>follows (I saved it as "test.html"):
>>
>><HTML>
>><HEAD>
>><TITLE>Hello</TITLE>
>><META http-equiv=Content-Type content="text/html;
>>charset=gb2312">
>></HEAD>
>><BODY bgColor=#ffffff>
>><h1>Hello</h1><br>
>></body>
>></html>
>>
>>I then run the parser as
>>java -jar htmlparser.jar file:test.html.
>>The parser output nothing but:
>>HTMLParser v1.3 (Integration Build Mar 02, 2003)
>>Parsing file:test.html
>>INFO: detected charset "gb2312", using "EUC-CN"
>>
>>Thanks for any help.
>>
>>Joe
>>
>>
>>    
>>

[Htmlparser-user] Re: Parsing td tr and table

From: <ja...@jo...> - 2003-03-13 05:52:49

Thank you Somik!

1) does this mean that I will do the same way for TableRowScanner and 
TableColumnScanner or will I extend those from TableScanner.

2) and should this work in cases like <td ...><img src=..></td>

Thanks for good product! --Janne

---------
parser.registerScanners();
parser.addScanner(new TableScanner(parser));
Node[] tables =
parser.extractAllNodesThatAre(TableTag.class);
// you can cast each table to a TableTag and do
// what you want..

Regards,
Somik



_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE* 
http://join.msn.com/?page=features/junkmail

Re: [Htmlparser-user] FormScanner

From: Somik R. <so...@ya...> - 2003-03-13 05:21:44

> I'm just wondering if the limitation on the formscanner (i.e can't parse a
> form without the endtag) has been succesfully removed. Here is my
suggestion
> on how to implement it. If there is no endtag of formtag, the parser
should
> know when it ends when it sees another formtag. if there is no another
> formtag in the html page, just parse it till it sees the end of the html
> code. thanks. I hope you guys can improve this as I really need this
feature
> in my Harvester project. thank you.

Working hard on this one... I can't believe how much bad code I have myself
written - one's bad code always comes back to haunt one! Refactoring
LinkScanner to use the CompositeTagScanner - and thereby let all composite
tag scanners handle broken tags uniformly.

Regards,
Somik

Re: [Htmlparser-user] Parsing td tr and table

From: Somik R. <so...@ya...> - 2003-03-12 22:58:29

parser.registerScanners();
parser.addScanner(new TableScanner(parser));
Node[] tables =
parser.extractAllNodesThatAre(TableTag.class);
// you can cast each table to a TableTag and do
// what you want..

Regards,
Somik
--- "ja...@jo... Jokisalo"
<jan...@ho...> wrote:
> Hi!
> 
> Is there any example of how to parse e.g. text
> inside td:s in a table and 
> img inside a table td. There are a lot of webpages
> with tables with this 
> kind of information.
> 
> Maybe one can do it with TableColumn,
> TableColumnScanner, TableRow, 
> TableRowScanner, TableScanner and TableTag but I
> have not figured out how.
> 
> Thanks / Janne
> 
> 
> 
> 
> 
>
_________________________________________________________________
> The new MSN 8: smart spam protection and 2 months
> FREE*  
> http://join.msn.com/?page=features/junkmail
> 
> 
> 
>
-------------------------------------------------------
> This SF.net email is sponsored by:Crypto Challenge
> is now open! 
> Get cracking and register here for some mind
> boggling fun and 
> the chance of winning an Apple iPod:
>
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-user


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

[Htmlparser-user] Parsing data inside td:s tr:s and table

From: <ja...@jo...> - 2003-03-12 20:42:29

Hello!

Is there any example of how to parse e.g. text inside td:s in a table and 
img inside a table td. There are a lot of webpages with tables with this 
kind of information.

Maybe one can do it with TableColumn, TableColumnScanner, TableRow, 
TableRowScanner, TableScanner and TableTag but I have not figured out how.

Thanks / Janne





_________________________________________________________________
Tired of spam? Get advanced junk mail protection with MSN 8. 
http://join.msn.com/?page=features/junkmail

[Htmlparser-user] FormScanner

From: Mohd-Taqiyuddin Z. <mt...@ec...> - 2003-03-12 20:41:23

Hi,

I'm just wondering if the limitation on the formscanner (i.e can't parse a 
form without the endtag) has been succesfully removed. Here is my suggestion 
on how to implement it. If there is no endtag of formtag, the parser should 
know when it ends when it sees another formtag. if there is no another 
formtag in the html page, just parse it till it sees the end of the html 
code. thanks. I hope you guys can improve this as I really need this feature 
in my Harvester project. thank you.

[Htmlparser-user] Parsing td tr and table

From: <ja...@jo...> - 2003-03-12 20:39:49

Hi!

Is there any example of how to parse e.g. text inside td:s in a table and 
img inside a table td. There are a lot of webpages with tables with this 
kind of information.

Maybe one can do it with TableColumn, TableColumnScanner, TableRow, 
TableRowScanner, TableScanner and TableTag but I have not figured out how.

Thanks / Janne





_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail

Re: [Htmlparser-user] Node object and line number

From: Somik R. <so...@ya...> - 2003-03-12 20:33:31

Hi Marc,
   What you say makes sense. A node should know which
line it began and which line it ended. 

   The reason we don't do this already, we only used
it to pick up the next node, which is on the same or
the next line. Like you said, doing this is not hard
as the reader stores the line info. It should be in
one of the integration releases (do add this as a
feature request so we don't forget).

Regards,
Somik
--- Marc Novakowski <ma...@ke...> wrote:
> Hello,
> 
> I just thought I'd start out by thanking everyone
> who has worked on the htmlparser project.  I'm been
> using it for only a few days now but the
> functionality it provides has saved me amazing
> amounts of work.  So far I have found it very easy
> to integrate into my project.
> 
> I am using the htmlparser library for what I'm
> guessing is a less traditional application.  I've
> integrated it into a custom servlet filter which
> takes a processed JSP page and parses it for
> "custom" tags which I've defined.  Using custom
> scanner and tag, I'm able to replace my "custom
> tags" with appropriate HTML/Javascript in the
> toHtml() method for each tag.  However, I'd like to
> add some validation to my code to ensure certain
> constraints are observed, such as certain tags which
> REQUIRE a "name" attribute to be defined.  I've done
> this easily enough by adding a "verify()" method to
> my custom tags and throwing a ParserException if a
> constraint is violated.
> 
> However, just throwing an exception does not help
> the webpage developer determine where the problem is
> in the HTML.  What would REALLY help me is if the
> Node object had a method on it called something like
> getLineNumber() which returned the line number at
> which that node was parsed.
> 
> I've looked at the source code and this seems
> feasible.  The NodeReader class keeps track of the
> current line number as it finds nodes in the HTML. 
> Maybe the constructor for a Node() object could take
> in one more argument, the lineNumber, so that it
> could expose that lineNumber in a public method.
> 
> Does this sound like a hairbrained idea?  Has this
> ever come up before?
> 
> Thanks again,
> Marc Novakowski
> 
> 
>
-------------------------------------------------------
> This SF.net email is sponsored by:Crypto Challenge
> is now open!
> Get cracking and register here for some mind
> boggling fun and
> the chance of winning an Apple iPod:
>
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-user


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

[Htmlparser-user] Node object and line number

From: Marc N. <ma...@ke...> - 2003-03-12 19:15:28

Hello,

I just thought I'd start out by thanking everyone who has worked on the =
htmlparser project.  I'm been using it for only a few days now but the =
functionality it provides has saved me amazing amounts of work.  So far =
I have found it very easy to integrate into my project.

I am using the htmlparser library for what I'm guessing is a less =
traditional application.  I've integrated it into a custom servlet =
filter which takes a processed JSP page and parses it for "custom" tags =
which I've defined.  Using custom scanner and tag, I'm able to replace =
my "custom tags" with appropriate HTML/Javascript in the toHtml() method =
for each tag.  However, I'd like to add some validation to my code to =
ensure certain constraints are observed, such as certain tags which =
REQUIRE a "name" attribute to be defined.  I've done this easily enough =
by adding a "verify()" method to my custom tags and throwing a =
ParserException if a constraint is violated.

However, just throwing an exception does not help the webpage developer =
determine where the problem is in the HTML.  What would REALLY help me =
is if the Node object had a method on it called something like =
getLineNumber() which returned the line number at which that node was =
parsed.

I've looked at the source code and this seems feasible.  The NodeReader =
class keeps track of the current line number as it finds nodes in the =
HTML.  Maybe the constructor for a Node() object could take in one more =
argument, the lineNumber, so that it could expose that lineNumber in a =
public method.

Does this sound like a hairbrained idea?  Has this ever come up before?

Thanks again,
Marc Novakowski

Re: [Htmlparser-user] Cookie

From: Bob L. <bob...@ya...> - 2003-03-12 16:41:01

In order to send cookies in your Http requests, all
you need to do is set the Cookie HTTP Header in the
URL Connection.  

Generally what I've done is first create a
HttpURLConnection, create some Cookie objects that are
needed, and set the HTTP Header using those objects
(See below for code to format the header value).  

Then I'll create the Parser using the URLConnection
something like this:

        DefaultHTMLParserFeedback feedback
            = new
DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG);

        HTMLReader reader = null;
        HTMLParser parser = null;
        String charset =
HttpUtil.getCharacterSet(urlConn);

        InputStreamReader isr
            = new
InputStreamReader(urlConn.getInputStream(), charset);
        reader = new HTMLReader(isr, 8192);
        parser = new HTMLParser(reader, feedback);

The HttpUtil.getCharacterSet method used above is
basically just taken from the method of the same name
in the HTMLParser class.  That method is protected, so
I had to duplicate it elsewhere.  


    /** set cookies to send in a HttpURLConnection<br>
     * This method should only be called before any
parameters are posted
     * and before the connection is made.
     * @param urlConn the HttpURLConnection to send
the cookies through
     * @param cookies the cookies to send
     */
    public static void postCookies(HttpURLConnection
urlConn, Cookie[] cookies)
    {
        if ((cookies == null) || (cookies.length ==
0))
        {
            return;
        }

        String[] cookieHeaders = new
String[cookies.length];

        urlConn.setRequestProperty("cookie",
generateCookieHeader(cookies));
    }

    /** generate a HTTP cookie header value string
from an array of cookies
     * @param cookies the cookies which should be set
in the header value
     * @return A string containing the HTTP Cookie
Header value
     */
    private static String
generateCookieHeader(Cookie[] cookies)
    {
        StringBuffer buf = new StringBuffer();

        for (int i=0; i < cookies.length;i++)
        {
            buf.append(cookies[i].getName());
            buf.append("=");
            buf.append(cookies[i].getValue());
            if (i+1 != cookies.length)
            {
                buf.append("; ");
            }
            else buf.append(" ");
        }

        return buf.toString();
    }

--- Shan Sivakolundhu <vss...@ya...> wrote:
> 
> Hi,
> 
> In order to access a particular site I neet to have
> a cookie set. Is there any way I can set the cookie
> before I create a parser object ? Just like ...
> 
> URLConnection.("Cookie", cookieValue);
> 
> URLConnection.connect();
> 
>  
> 
> Regards,
> 
> Shan
> 
> 
> 
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Web Hosting - establish your business online


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

[Htmlparser-user] Cookie

From: Shan S. <vss...@ya...> - 2003-03-12 16:13:54

Hi,

In order to access a particular site I neet to have a cookie set. Is there any way I can set the cookie before I create a parser object ? Just like ...

URLConnection.("Cookie", cookieValue);

URLConnection.connect();

 

Regards,

Shan



---------------------------------
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online

Re: [Htmlparser-user] problem parsing Chinese character website

From: Somik R. <so...@ya...> - 2003-03-12 06:36:20

Derrick, Amit - any ideas ?

----- Original Message -----
From: "Joe Lin" <gu...@ya...>
To: <htm...@li...>
Sent: Saturday, March 08, 2003 1:32 AM
Subject: [Htmlparser-user] problem parsing Chinese character website


> Hi,
>
> It seems that the parser has problem handling Chinese
> chracters. I experiment with a simple web page as
> follows (I saved it as "test.html"):
>
> <HTML>
> <HEAD>
> <TITLE>Hello</TITLE>
> <META http-equiv=Content-Type content="text/html;
> charset=gb2312">
> </HEAD>
> <BODY bgColor=#ffffff>
> <h1>Hello</h1><br>
> </body>
> </html>
>
> I then run the parser as
> java -jar htmlparser.jar file:test.html.
> The parser output nothing but:
> HTMLParser v1.3 (Integration Build Mar 02, 2003)
> Parsing file:test.html
> INFO: detected charset "gb2312", using "EUC-CN"
>
> Thanks for any help.
>
> Joe
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Etnus, makers of TotalView, The
debugger
> for complex code. Debugging C/C++ programs can leave you feeling lost and
> disoriented. TotalView can help you find your way. Available on major UNIX
> and Linux platforms. Try it free. www.etnus.com
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user

Re: [Htmlparser-user] Changing links embedded inside a script tag?

From: Somik R. <so...@ya...> - 2003-03-12 06:35:30

Hi Joe,
    Changing links in script is not yet supported.  Can you add it as a
feature request ?
    Inline javascript ought to be available from the attributes - though, we
don't have any tests yet. If we can get some help - it would speed us up.

Regards,
Somik
----- Original Message -----
From: "Joe Lin" <gu...@ya...>
To: <htm...@li...>
Sent: Tuesday, March 04, 2003 9:42 PM
Subject: [Htmlparser-user] Changing links embedded inside a script tag?


> Hi,
>
> I need to change links embedded inside the code of a
> script tag such as:
> <script language="Javascript">
> window.open("http://mysite/index.html");
> </script>
>
> There's only getScriptCode() in ScriptTag and no
> setScriptCode() available. Has anyone done changing
> links inside Javascript? Can you please suggest a good
> way to do this?
>
> Also, how about inline Java script such as
> <form ....>
> <input type="button" onClick="<script
> window.open..../>">
> </form>
>
> Thanks so much for the help!
>
> Joe
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Etnus, makers of TotalView, The
debugger
> for complex code. Debugging C/C++ programs can leave you feeling lost and
> disoriented. TotalView can help you find your way. Available on major UNIX
> and Linux platforms. Try it free. www.etnus.com
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user

Re: [Htmlparser-user] Parsing From A String 2

From: Somik R. <so...@ya...> - 2003-03-12 06:25:55

Devin Gillman wrote:
>      I am also trying to parse a string containing html tags. I am just
> trying to pull the text from the string but I have been unsuccessful at
it.

String myStringWithTags  = "<html><head>....</head><body>..</body></html>";
Parser parser = Parser.createParser(myStringWithTags);
TextExtractingVisitor visitor = new TextExtractingVisitor();
parser.visitAllNodesWith(visitor);
System.out.println(visitor.getExtractedText());

HTH.

Regards,
Somik

[Htmlparser-user] 'help'

From: Amit M. <am...@ve...> - 2003-03-11 10:50:05

Attachments: "amane.vcf

'help'

"Provocans ad volandum"
---------------------
+91-020-4367614
+91-0231-2663094
ami...@ya...
---------------------

----- Original Message -----
From: htm...@li...
Date: Tuesday, March 11, 2003 1:43 am
Subject: Htmlparser-user digest, Vol 1 #211 - 5 msgs

> Send Htmlparser-user mailing list submissions to
>    htm...@li...
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> or, via email, send a message with subject or body 'help' to
>    htm...@li...
> 
> You can reach the person managing the list at
>    htm...@li...
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Htmlparser-user digest..."
> 
> 
> Today's Topics:
> 
>   1. Parsing From A String 2 (Devin Gillman)
>   2. compilation problem (Gokcen Ogutcu)
>   3. RE: compilation problem (dha...@or...)
>   4. RE: compilation problem (Gokcen Ogutcu)
>   5. RE: compilation problem (Dave Knipp)
> 
> --__--__--
> 
> Message: 1
> From: "Devin Gillman" <obi...@ho...>
> To: htm...@li...
> Date: Mon, 10 Mar 2003 03:41:57 -0600
> Subject: [Htmlparser-user] Parsing From A String 2
> Reply-To: htm...@li...
> 
> Hi,
> 
>     I am also trying to parse a string containing html tags. I am 
> just 
> trying to pull the text from the string but I have been 
> unsuccessful at it. 
> I've tried creating a URL from the string and trying to use a 
> HTMLReader or 
> Reader to get at the information. I suppose I could write it to a 
> file, but 
> I would prefer not to have to go through all of that a short simple 
> string. 
> Nothing has worked for me yet. I am sure there is a simple way, but 
> I can't 
> seem to find it. Any help would be appreciated.
> 
> Thanks ahead of time,
> 
> Devin Gillman
> 
> _________________________________________________________________
> Add photos to your messages with MSN 8. Get 2 months FREE*.  
> http://join.msn.com/?page=features/featuredemail
> 
> 
> 
> --__--__--
> 
> Message: 2
> Date: Mon, 10 Mar 2003 12:16:31 +0200 (EET)
> From: "Gokcen Ogutcu" <sca...@bi...>
> To: <htm...@li...>
> Subject: [Htmlparser-user] compilation problem
> Reply-To: htm...@li...
> 
> hello all,
> 
> i'm experiencing some compilation problems. i've saved one of the 
> examplesthat comes with the documentation and try to compile it, 
> just to give it a
> try. but it gave errors (error message was "unable to resolve 
> symbol"),i'm new to java, but this error was raised when the 
> compiler couldn't find
> the relevant packages or classes (i think)
> source file and the "org" dir were in the same level, i didn't 
> touch the
> directory structure of the "htmlparser".
> where am i doing wrong, i'm using j2se, maybe it requires "ant"??
> 
> thanks for your help,
> gokcen
> 
> 
> 
> 
> 
> --__--__--
> 
> Message: 3
> From: dha...@or...
> Date: Mon, 10 Mar 2003 15:50:59 +0530
> Subject: RE: [Htmlparser-user] compilation problem
> TO: htm...@li...
> Reply-To: htm...@li...
> 
> 
> --openmail-part-159d69ef-00000002
> Content-Type: text/plain; charset=ISO-8859-1; name="BDY.RTF"
> Content-Disposition: inline; filename="BDY.RTF"
> Content-Transfer-Encoding: 8bit
> 
> You need to include "htmlparser.jar" in your classpath settings and 
> thencompile the example code. You do not need "ant".
> 
> Regards,
> 
> Dhaval Udani
> 
> 
> -----Original Message-----
> From: scapegoat [mailto:sca...@bi...]
> Sent: Monday, March 10, 2003 3:47 PM
> To: htmlparser-user
> Cc: scapegoat
> Subject: [Htmlparser-user] compilation problem
> 
> 
> hello all,
> 
> i'm experiencing some compilation problems. i've saved one of the
> examples
> that comes with the documentation and try to compile it, just to 
> give it
> a
> try. but it gave errors (error message was "unable to resolve 
> symbol"),i'm new to java, but this error was raised when the 
> compiler couldn't
> find
> the relevant packages or classes (i think)
> source file and the "org" dir were in the same level, i didn't 
> touch the
> directory structure of the "htmlparser".
> where am i doing wrong, i'm using j2se, maybe it requires "ant"??
> 
> thanks for your help,
> gokcen
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> 
> 
> --openmail-part-159d69ef-00000002
> Content-Type: application/rtf; name="BDY.RTF"
> Content-Disposition: attachment; filename="BDY.RTF"
> Content-Transfer-Encoding: base64
> 
> e1xydGYxXGFuc2lcYW5zaWNwZzEyNTJcZnJvbXRleHQgXGRlZmYwe1xmb250dGJsDQp7XGYw
> XGZzd2lzcyBBcmlhbDt9DQp7XGYxXGZtb2Rlcm4gQ291cmllciBOZXc7fQ0Ke1xmMlxmbmls
> XGZjaGFyc2V0MiBTeW1ib2w7fQ0Ke1xmM1xmbW9kZXJuXGZjaGFyc2V0MCBDb3VyaWVyIE5l
> dzt9fQ0Ke1xjb2xvcnRibFxyZWQwXGdyZWVuMFxibHVlMDtccmVkMFxncmVlbjBcYmx1ZTI1
> NTt9DQpcdWMxXHBhcmRccGxhaW5cZGVmdGFiMzYwIFxmMFxmczIwXGNmMCBZb3UgbmVlZCB0
> byBpbmNsdWRlICJodG1scGFyc2VyLmphciIgaW4geW91ciBjbGFzc3BhdGggc2V0dGluZ3Mg
> YW5kIHRoZW4gY29tcGlsZSB0aGUgZXhhbXBsZSBjb2RlLiBZb3UgZG8gbm90IG5lZWQgImFu
> dCIuXHBhcg0KXHBhcg0KUmVnYXJkcyxccGFyDQpccGFyDQpEaGF2YWwgVWRhbmlccGFyDQpc
> cGFyDQpccGFyDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLVxwYXINCkZyb206IHNjYXBl
> Z29hdCBbbWFpbHRvOnNjYXBlZ29hdEBiaWtlci5nZW4udHJdXHBhcg0KU2VudDogTW9uZGF5
> LCBNYXJjaCAxMCwgMjAwMyAzOjQ3IFBNXHBhcg0KVG86IGh0bWxwYXJzZXItdXNlclxwYXIN
> CkNjOiBzY2FwZWdvYXRccGFyDQpTdWJqZWN0OiBbSHRtbHBhcnNlci11c2VyXSBjb21waWxh
> dGlvbiBwcm9ibGVtXHBhcg0KXHBhcg0KXHBhcg0KaGVsbG8gYWxsLFxwYXINClxwYXINCmkn
> bSBleHBlcmllbmNpbmcgc29tZSBjb21waWxhdGlvbiBwcm9ibGVtcy4gaSd2ZSBzYXZlZCBv
> bmUgb2YgdGhlIGV4YW1wbGVzXHBhcg0KdGhhdCBjb21lcyB3aXRoIHRoZSBkb2N1bWVudGF0
> aW9uIGFuZCB0cnkgdG8gY29tcGlsZSBpdCwganVzdCB0byBnaXZlIGl0IGFccGFyDQp0cnku
> IGJ1dCBpdCBnYXZlIGVycm9ycyAoZXJyb3IgbWVzc2FnZSB3YXMgInVuYWJsZSB0byByZXNv
> bHZlIHN5bWJvbCIpLFxwYXINCmknbSBuZXcgdG8gamF2YSwgYnV0IHRoaXMgZXJyb3Igd2Fz
> IHJhaXNlZCB3aGVuIHRoZSBjb21waWxlciBjb3VsZG4ndCBmaW5kXHBhcg0KdGhlIHJlbGV2
> YW50IHBhY2thZ2VzIG9yIGNsYXNzZXMgKGkgdGhpbmspXHBhcg0Kc291cmNlIGZpbGUgYW5k
> IHRoZSAib3JnIiBkaXIgd2VyZSBpbiB0aGUgc2FtZSBsZXZlbCwgaSBkaWRuJ3QgdG91Y2gg
> dGhlXHBhcg0KZGlyZWN0b3J5IHN0cnVjdHVyZSBvZiB0aGUgImh0bWxwYXJzZXIiLlxwYXIN
> CndoZXJlIGFtIGkgZG9pbmcgd3JvbmcsIGknbSB1c2luZyBqMnNlLCBtYXliZSBpdCByZXF1
> aXJlcyAiYW50Ij8/XHBhcg0KXHBhcg0KdGhhbmtzIGZvciB5b3VyIGhlbHAsXHBhcg0KZ29r
> Y2VuXHBhcg0KXHBhcg0KXHBhcg0KXHBhcg0KXHBhcg0KXHBhcg0KLS0tLS0tLS0tLS0tLS0t
> LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLVxwYXINClRoaXMgc2Yu
> bmV0IGVtYWlsIGlzIHNwb25zb3JlZCBieTpUaGlua0dlZWtccGFyDQpXZWxjb21lIHRvIGdl
> ZWsgaGVhdmVuLlxwYXINCmh0dHA6Ly90aGlua2dlZWsuY29tL3NmXHBhcg0KX19fX19fX19f
> X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19ccGFyDQpIdG1scGFyc2Vy
> LXVzZXIgbWFpbGluZyBsaXN0XHBhcg0KSHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZv
> cmdlLm5ldFxwYXINCmh0dHBzOi8vbGlzdHMuc291cmNlZm9yZ2UubmV0L2xpc3RzL2xpc3Rp
> bmZvL2h0bWxwYXJzZXItdXNlclxwYXINCn0=
> 
> --openmail-part-159d69ef-00000002--
> 
> 
> 
> --__--__--
> 
> Message: 4
> Date: Mon, 10 Mar 2003 15:28:56 +0200 (EET)
> Subject: RE: [Htmlparser-user] compilation problem
> From: "Gokcen Ogutcu" <sca...@bi...>
> To: <htm...@li...>
> Reply-To: htm...@li...
> 
> i have tried,
> 
> javac -classpath htmlparser.jar LinkExtractor.java
> 
> and
> 
> export CLASSPATH=$CLASSPATH:/home/x/htmlparser.jar
> javac LinkExtractor.java
> 
> and they both didn't work, i'm probably mistyping the commands above.
> where am i doing wrong?
> 
> thanks again,
> gokcen
> 
> > You need to include "htmlparser.jar" in your classpath settings 
> and then
> > compile the example code. You do not need "ant".
> >
> > Regards,
> >
> > Dhaval Udani
> >
> 
> 
> 
> 
> --__--__--
> 
> Message: 5
> From: "Dave Knipp" <dav...@ho...>
> To: htm...@li...
> Subject: RE: [Htmlparser-user] compilation problem
> Date: Mon, 10 Mar 2003 08:06:06 -0600
> Reply-To: htm...@li...
> 
> <html><div style='background-color:'><P>is your jar, in the same 
> folder as the entry point for your program?  If not you need to 
> give the classpath the actual path to the htmlparser.jar.  If it 
> is, i would suggest adding more files to your classpath.  For 
> example, if you are compiling from the directory with your main in 
> it, then just doing something like this:</P>
> <P>javac -classpath .;./htmlparser_location/htmlparser.jar 
> YourClass.java</P><P>just keep trying different configurations in 
> your classpath and you are bound to get it to compile.</P>
> <P>good luck,</P>
> <P>Dave Knipp</P></div><br clear=all><hr>MSN 8 with e-mail" 
> target="l">http://g.msn.com/8HMPENUS/2740">e-mail virus protection 
> service:  2 months FREE*</html>
> 
> 
> 
> --__--__--
> 
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> 
> 
> End of Htmlparser-user Digest
> 'help'

Re: [Htmlparser-user] parser termination callback

From: Somik R. <so...@ya...> - 2003-03-11 02:50:58

Hi Joe,
  One suggestion is - pass the stream to the visitor,
so you can close it outside. However, it might be a
good idea to support a parseCompleted() event on the
visitor interface. 

Regards,
Somik
   
--- Joe Lin <gu...@ya...> wrote:
> Hi,
> 
> I wrote a visitor and register with the Parser.
> Basically I was paersing a web page and dump the
> result to a file. I close the FileOutputStream in my
> visitEndTag as such:
> 
> public void visitEndTag(EndTag endTag)
> {
>   if ( endTag.getTagName().equalsIgnoreCase("HTML")
> )
>   {
>      //flus and close the file outputstream
>   }  
> }
> 
> However, my program is getting the IOException
> saying
> that the outputstream is closed while I was still
> trying to write to it. I then realize that my "if"
> statement in the visitEndTagis not a correct signal
> for determining that the parser is done parsing. Can
> anyone please help me find out if there's any way
> that
> I can know the parser is finished parsing? Thanks.
> 
> Joe
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
> 
> 
>
-------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-user


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

[Htmlparser-user] parser termination callback

From: Joe L. <gu...@ya...> - 2003-03-11 00:55:24

Hi,

I wrote a visitor and register with the Parser.
Basically I was paersing a web page and dump the
result to a file. I close the FileOutputStream in my
visitEndTag as such:

public void visitEndTag(EndTag endTag)
{
  if ( endTag.getTagName().equalsIgnoreCase("HTML") )
  {
     //flus and close the file outputstream
  }  
}

However, my program is getting the IOException saying
that the outputstream is closed while I was still
trying to write to it. I then realize that my "if"
statement in the visitEndTagis not a correct signal
for determining that the parser is done parsing. Can
anyone please help me find out if there's any way that
I can know the parser is finished parsing? Thanks.

Joe

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

RE: [Htmlparser-user] compilation problem

From: Dave K. <dav...@ho...> - 2003-03-10 14:06:16

<html><div style='background-color:'><P>is your jar, in the same folder as the entry point for your program?&nbsp; If not you need to give the classpath the actual path to the htmlparser.jar.&nbsp; If it is, i would suggest adding more files to your classpath.&nbsp; For example, if you are compiling from the directory with your main in it, then just doing something like this:</P>
<P>javac -classpath .;./htmlparser_location/htmlparser.jar YourClass.java</P>
<P>just keep trying different configurations in your classpath and you are bound to get it to compile.</P>
<P>good luck,</P>
<P>Dave Knipp</P></div><br clear=all><hr>MSN 8 with <a href="http://g.msn.com/8HMPENUS/2740">e-mail virus protection service: </a> 2 months FREE*</html>

RE: [Htmlparser-user] compilation problem

From: Gokcen O. <sca...@bi...> - 2003-03-10 13:29:52

i have tried,

javac -classpath htmlparser.jar LinkExtractor.java

and

export CLASSPATH=$CLASSPATH:/home/x/htmlparser.jar
javac LinkExtractor.java

and they both didn't work, i'm probably mistyping the commands above.
where am i doing wrong?

thanks again,
gokcen

> You need to include "htmlparser.jar" in your classpath settings and then
> compile the example code. You do not need "ant".
>
> Regards,
>
> Dhaval Udani
>

RE: [Htmlparser-user] compilation problem

From: <dha...@or...> - 2003-03-10 10:44:09

Attachments: BDY.RTF

You need to include "htmlparser.jar" in your classpath settings and then
compile the example code. You do not need "ant".

Regards,

Dhaval Udani


-----Original Message-----
From: scapegoat [mailto:sca...@bi...]
Sent: Monday, March 10, 2003 3:47 PM
To: htmlparser-user
Cc: scapegoat
Subject: [Htmlparser-user] compilation problem


hello all,

i'm experiencing some compilation problems. i've saved one of the
examples
that comes with the documentation and try to compile it, just to give it
a
try. but it gave errors (error message was "unable to resolve symbol"),
i'm new to java, but this error was raised when the compiler couldn't
find
the relevant packages or classes (i think)
source file and the "org" dir were in the same level, i didn't touch the
directory structure of the "htmlparser".
where am i doing wrong, i'm using j2se, maybe it requires "ant"??

thanks for your help,
gokcen





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user

[Htmlparser-user] compilation problem

From: Gokcen O. <sca...@bi...> - 2003-03-10 10:17:22

hello all,

i'm experiencing some compilation problems. i've saved one of the examples
that comes with the documentation and try to compile it, just to give it a
try. but it gave errors (error message was "unable to resolve symbol"),
i'm new to java, but this error was raised when the compiler couldn't find
the relevant packages or classes (i think)
source file and the "org" dir were in the same level, i didn't touch the
directory structure of the "htmlparser".
where am i doing wrong, i'm using j2se, maybe it requires "ant"??

thanks for your help,
gokcen

790 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 79 80 81 82 83 .. 99 > >> (Page 81 of 99)