Re: [Htmlparser-user] problem with regular expression filter
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-03-25 18:50:32
|
You will probably need to modify your regular expression to match one or more whitespace charscters between the day, month and year. v.sudhakarreddy ch wrote: > Hi, > iam using Regular expression filter to extract dates from a html > document. When i extract dates in > the format like 23/4/2004 , 21 march 2005 etc.. using following > regular expression Regex filter is not working. iam also giving the > code here. > > try > { > Parser parser = new Parser ("sample.html"); > RegexFilter filter = new RegexFilter > ("([1-3][0-9]?)(th|rd|st|nd)?,? [\\s|-|/] > (jan|feb|mar|april|may|jun|jul|aug|sep|oct|nov|dec|january|february|march|april|may|june|july|august|september|october|november|december|[0-9][1-9]?),? > [\\s|-|/] ([0-9]|[0-9]) ([0-9]{2})? ,?"); > NodeList list = parser.extractAllNodesThatMatch (filter); > int i=0; > while(i<list.size()){ > System.out.println("date->" + (i+1)); > String str = ((Node)list.elementAt(i)).toPlainTextString(); > i++; > System.out.println(str + "-"); > } > } > catch (ParserException e) > { e.printStackTrace (); } > > code of sample.html is.. > > < html> <head></head> <body> > ><b>< >font color=brown>Important Dates</font></b> >< >ul> > <li>Last date to apply for registration and travel support: <b>21 > March 2005</b> > > <li>Notification regarding registration request: <b>23 March > 2005</b> > </ >ul> > > ><hr> ></body> </html> > > > can anyone tell me what was wrong with above code? The above regular > expression worked correctly to extract the dates from simple text file.. > > by > sudhakar > > > |