[Htmlparser-user] problem with regular expression filter
Brought to you by:
derrickoswald
From: v.sudhakarreddy c. <sud...@gm...> - 2006-03-25 13:27:28
|
Hi, iam using Regular expression filter to extract dates from a html document. When i extract dates in the format like 23/4/2004 , 21 march 2005 etc.. using following regular expression Regex filter is not working. iam also giving the code here. try { Parser parser =3D new Parser ("sample.html"); RegexFilter filter =3D new RegexFilter ("([1-3][0-9]?)(th|rd|st|nd)?,? [\\s|-|/] (jan|feb|mar|april|may|jun|jul|aug|sep|oct|nov|dec|january|february|march|a= pril|may|june|july|august|september|october|november|december|[0-9][1-9]?),= ? [\\s|-|/] ([0-9]|[0-9]) ([0-9]{2})? ,?"); NodeList list =3D parser.extractAllNodesThatMatch (filter); int i=3D0; while(i<list.size()){ System.out.println("date->" + (i+1)); String str =3D ((Node)list.elementAt(i)).toPlainTextString(); i++; System.out.println(str + "-"); } } catch (ParserException e) { e.printStackTrace (); } code of sample.html is.. <html> <head></head> <body> <b><font color=3Dbrown>Important Dates</font></b> <ul> <li>Last date to apply for registration and travel support: <b>21 March 2005</b> <li>Notification regarding registration request: <b>23 March 2005</b> </ul> <hr> </body> </html> can anyone tell me what was wrong with above code? The above regular expression worked correctly to extract the dates from simple text file.. by sudhakar |