You will probably need to modify your regular expression to match one or
more whitespace charscters between the day, month and year.
v.sudhakarreddy ch wrote:
> Hi,
> iam using Regular expression filter to extract dates from a html
> document. When i extract dates in
> the format like 23/4/2004 , 21 march 2005 etc.. using following
> regular expression Regex filter is not working. iam also giving the
> code here.
>
> try
> {
> Parser parser = new Parser ("sample.html");
> RegexFilter filter = new RegexFilter
> ("([1-3][0-9]?)(th|rd|st|nd)?,? [\\s|-|/]
> (jan|feb|mar|april|may|jun|jul|aug|sep|oct|nov|dec|january|february|march|april|may|june|july|august|september|october|november|december|[0-9][1-9]?),?
> [\\s|-|/] ([0-9]|[0-9]) ([0-9]{2})? ,?");
> NodeList list = parser.extractAllNodesThatMatch (filter);
> int i=0;
> while(i<list.size()){
> System.out.println("date->" + (i+1));
> String str = ((Node)list.elementAt(i)).toPlainTextString();
> i++;
> System.out.println(str + "-");
> }
> }
> catch (ParserException e)
> { e.printStackTrace (); }
>
> code of sample.html is..
>
> < html> <head></head> <body>
>
><b><
>font color=brown>Important Dates</font></b>
><
>ul>
> <li>Last date to apply for registration and travel support: <b>21
> March 2005</b>
>
> <li>Notification regarding registration request: <b>23 March
> 2005</b>
> </
>ul>
>
>
><hr>
></body> </html>
>
>
> can anyone tell me what was wrong with above code? The above regular
> expression worked correctly to extract the dates from simple text file..
>
> by
> sudhakar
>
>
>
|