[Htmlparser-user] problem with regular expression filter
Brought to you by:
derrickoswald
|
From: v.sudhakarreddy c. <sud...@gm...> - 2006-03-25 13:27:28
|
Hi,
iam using Regular expression filter to extract dates from a html
document. When i extract dates in
the format like 23/4/2004 , 21 march 2005 etc.. using following regular
expression Regex filter is not working. iam also giving the code here.
try
{
Parser parser =3D new Parser ("sample.html");
RegexFilter filter =3D new RegexFilter ("([1-3][0-9]?)(th|rd|st|nd)?,?
[\\s|-|/]
(jan|feb|mar|april|may|jun|jul|aug|sep|oct|nov|dec|january|february|march|a=
pril|may|june|july|august|september|october|november|december|[0-9][1-9]?),=
?
[\\s|-|/] ([0-9]|[0-9]) ([0-9]{2})? ,?");
NodeList list =3D parser.extractAllNodesThatMatch (filter);
int i=3D0;
while(i<list.size()){
System.out.println("date->" + (i+1));
String str =3D ((Node)list.elementAt(i)).toPlainTextString();
i++;
System.out.println(str + "-");
}
}
catch (ParserException e)
{ e.printStackTrace (); }
code of sample.html is..
<html> <head></head> <body>
<b><font color=3Dbrown>Important Dates</font></b>
<ul>
<li>Last date to apply for registration and travel support: <b>21
March 2005</b>
<li>Notification regarding registration request: <b>23 March
2005</b>
</ul>
<hr>
</body> </html>
can anyone tell me what was wrong with above code? The above regular
expression worked correctly to extract the dates from simple text file..
by
sudhakar
|