Thanks for your comment. You must have some typo there
since I cannot see how you get the different outputs
with the same code. However, I can guess what you
mean(?).
If I understand correctly, at least in both IE and
Netscape, we have
docbase plus
quotes.html==>http://cbc.ca/business/quotes.html
docbase plus /quotes.html==>http://cbc.ca/quotes.html
So it is not necessary to make such methods as you
mentioned. All should be determined in the
implementation, and users don't have to worry about.
Seem to me is that the problem I met in htmlparser is
not this problem. It is more complicated. Let me
specify the problem a bit more here:
Let
docbase="http://cbc.ca/business1/business2/.../businessn/".
If n is big, then in htmlparser it has
docbase plus
quotes.html==>http://cbc.ca/business1/business2/.../business(n-1)/quotes.html
The "/businessn" is missing. However, for small n, the
parser works properly.
This is very strang. I believe that it is a bug.
Xue-Feng
--- Derrick Oswald <DerrickOswald@...> wrote:
> Xue-Feng,
>
> The URL 'error' you mention is actually just the way
> the Java URL class
> works. Without a terminating slash, the document
> base URL is assumed to
> be a resource, which is correctly removed by the
> algorithm. Here's an
> example:
>
> import java.io.IOException;
> import java.net.URL;
>
> public class UrlProblem
> {
> public static void main (String[] args) throws
> IOException
> {
> URL docbase;
> URL relative;
>
> docbase = new URL
> ("http://cbc.ca/business");
> relative = new URL (docbase, "quotes.html");
> System.out.println (relative.toExternalForm
> ());
> docbase = new URL
> ("http://cbc.ca/business/");
> relative = new URL (docbase, "quotes.html");
> System.out.println (relative.toExternalForm
> ());
> // output:
> // http://cbc.ca/quotes.html
> // http://cbc.ca/business/quotes.html
> }
> }
>
> This is why most URL's that reference the default
> document (usually
> index.html), end in a slash.
>
> Derrick
>
> Xue-Feng Yang wrote:
>
> >Input is in your message below.
> >
> >
> >Cheers,
> >
> >Xue-Feng
> >
> > --- Somik Raha <somik@...> wrote: > Hi Xue,
> >
> >
> >>--- Xue-Feng Yang <just4look@...> wrote:
> >>
> >>
> >>>I had made a small parser to replace Htmlparser
> >>>
> >>>
> >>for
> >>
> >>
> >>>our program. I would like to replace back
> >>>
> >>>
> >>Htmlparser
> >>
> >>
> >>>in future. However, I am not so hurry on this
> now.
> >>>
> >>>
> >>Sorry about that. I am personally of the opinion
> >>that
> >>if a simple text search program works - you should
> >>use
> >>it instead of the parser. Go for the parser only
> >>when
> >>you will really get business value out of it.
> >>
> >>
> >
> >I am an expert on parser and compiler. I wrote
> pasers
> >and compilers for commercial purposes. I can build
> a
> >htmlparser by myself for sure. However, your open
> >source htmlparser is on going, why to write a
> second
> >one.
> >
> >More comments below.
> >
> >
> >
> >>>I am not clear what you mean "couldn't find
> >>>
> >>>
> >>issues".
> >>
> >>
> >>>Do you mean that you cannot run my program to
> find
> >>>the
> >>>problem? or you can run my program and find the
> >>>problem but couldn't locate the issue in the
> >>>htmlparser code?
> >>>
> >>>
> >>Neither, but close to the latter. I meant I
> couldn't
> >>see any problem in the parser. Only if I see a
> >>problem
> >>can I proceed to fix it.
> >>
> >>
> >>
> >
> >So you mean you run my code, and see the print out
> >correct? Really strange now!
> >
> >More comments below.
> >
> >
> >
> >>>On the other hand, I don't know much about the
> >>>architecture of htmlparser. It is much easier for
> >>>
> >>>
> >>me
> >>
> >>
> >>>to write my own parser. So is there someone who
> >>>
> >>>
> >>can
> >>
> >>
> >>>show me more about htmlparser first?
> >>>
> >>>
> >>I wasn't asking you to delve into the code. Its
> >>hard,
> >>and a little crazy.. I was asking you to write a
> >>failing test that involves "using" the parser to
> do
> >>what you want, and assert that it does it
> correctly.
> >>
> >>Are you familiar with JUnit?
> >>
> >>
> >>
> >
> >Sure, but why bother it here?
> >
> >More comments below.
> >
> >
> >
> >>The best way to get familiar with the parser is to
> >>look at the docs. Read
> >>
> >>
> >>
>
>http://htmlparser.sourceforge.net/docs/index.php/TestDrivenDevelopment
> >
> >
> >>- writing unit tests is really easy - you should
> be
> >>able to get started here.
> >>
> >>
> >>
> >>
> >>>By the way, in addition to the issue "missing
> >>>
> >>>
> >>text",
> >>
> >>
> >>>I also found that htmlparser changes the links.
> >>>
> >>>
> >>Here
> >>
> >>
> >>>is an example: if you run my code, you will see
> >>>
> >>>
> >>>
> >>>
>
>linkString=http://dir.yahoo.com/Regional/Countries/Canada/Provinces_and_Territories/British_Columbia/Regional_Districts/Cities
> >
> >
> >>>textString=
> >>>
> >>>However, the link in html is
> >>>
>
=== message truncated ===
______________________________________________________________________
Post your free ad now! http://personals.yahoo.ca
|