Re: [Htmlparser-developer] AI - to be or not to be
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-12-30 06:14:51
|
Sam Joseph wrote: > I would have thought the solution to this would be to define under which > circumstances a pair of apostrophes will indicate text. In the first > two examples you are inside an ANCHOR tag, and in the third you are > inside a SCRIPT tag. It seems to me that you should only be using > apostrophe's to indicate text strings inside a SCRIPT tag, no? If > that's true then set your parsing behaviour differently depending on the > tag type. You are right. The script scanner could set some static variable in the HTMLStringNode - which tells it to move into the ignore states if it encounters an apostrophe, and flag it off when its done. I'll probably do that for now.. Regards, Somik ----- Original Message ----- From: "Sam Joseph" <ga...@yh...> To: <htm...@li...> Sent: Sunday, December 29, 2002 10:09 PM Subject: Re: [Htmlparser-developer] AI - to be or not to be > Hi Somik, > > I would have thought the solution to this would be to define under which > circumstances a pair of apostrophes will indicate text. In the first > two examples you are inside an ANCHOR tag, and in the third you are > inside a SCRIPT tag. It seems to me that you should only be using > apostrophe's to indicate text strings inside a SCRIPT tag, no? If > that's true then set your parsing behaviour differently depending on the > tag type. > > CHEERS> SAM > > Somik Raha wrote: > > >Hi Folks, > > Derrick Oswald just checked in a test case that fails.. > >Here's a link tag : > > > ><a href="http://cbc.ca/artsCanada/stories/greatnorth271202" > >class="lgblacku">Vancouver schools plan 'Great Northern Way'</a> > > > >We are in a quandary now. When we have cases like : > > > ><a href="something.html">Kaarle's Page</a> > > > >we should accept the apostrophe without doing anything special. > > > >When we get links like, > ><script> > > var code = '<sometag>'; > ></script> > > > >We should not take the tag symbols after code seriously, as they are part of > >the string. > > > >Handling the last two cases causes a conflict with the first case -bcos the > >last case is handled by checking if there's a < after ' - and this causes > >the first case to go into an ignoring mode. > > > >How do we handle this problem ? Do we write smart code to handle this > >particular situation ? From human experience, even if we've not encountered > >these cases, we know how to differentiate between a string node and a tag. > >Can AI help us here ? Also, pls feel free to suggest any straightforward > >solutions as well. > > > >Regards > >Somik > > > > > > > > > >------------------------------------------------------- > >This sf.net email is sponsored by:ThinkGeek > >Welcome to geek heaven. > >http://thinkgeek.com/sf > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |