|
From: David K. <dav...@al...> - 2003-10-08 10:14:24
Attachments:
scanner_filtering.patch
|
Hi people, I noticed that the filtering in DirectoryScanner, and the underlying ScannerWorker, was not actually implemented, and since I needed it I added some code that seems to work: ScannerWorker: - Added a private List of filters - Did the TODO in addFilter to feed the above List - Added a private method checkFilter(filter, string) - Added code to acceptEntry that checks the supplied string against the supplied filter and any already configured filters in the above List DirectoryScanner: - Added a private String for the (configured) filter - Added code to get the filter string if configured - Added code to acceptFile to use the configured filter and actually do some filtering using the code now in ScannerWorker The logic is: If no filter is specified and no filters are present in the List, the string matches, regardless. If filters are present, the string matches if it matches any of the filters. If a filter is also supplied in the method call, that filter is also checked, but only if it is not already present in the List. I have tested all changes and they seem to work, and they are fully backwards compatible, i.e no changes need to be done to existing configurations - they work as before. Example configuration snippet for a DirectoryScanner: <scanner-name>.filter=.*\.xml The above filter (obviously) matches files named *.xml and nothing else. It uses the regular expression support in the String class. So that might require 1.4-level Java support? Patch attached. Please review, and apply if correct && useful. Regards, David Kinnvall |
|
From: Dejan K. <dej...@ya...> - 2003-10-08 13:19:13
|
Hi David,
I have just realized that filtering is not implemented
in new scanner version. It was implemented in 1.0
release but since I have rewrote whole scanner for 1.1
I obivioulsy forgot to implement it.
Have you checked if your code looks like code in 1.0?
I will review your code as soon as possible. And
don't worry, mine implementation has also required 1.4
since it used java.util.regex package. There are some
other pieces of Babeldoc that require it...
Dejan
P.S. Are you sure you don't want to become commiter?
--- David Kinnvall <dav...@al...> wrote:
> Hi people,
>
> I noticed that the filtering in DirectoryScanner,
> and the
> underlying ScannerWorker, was not actually
> implemented, and
> since I needed it I added some code that seems to
> work:
>
> ScannerWorker:
> - Added a private List of filters
> - Did the TODO in addFilter to feed the above List
> - Added a private method checkFilter(filter, string)
> - Added code to acceptEntry that checks the supplied
> string against the supplied filter and any already
> configured filters in the above List
>
> DirectoryScanner:
> - Added a private String for the (configured) filter
> - Added code to get the filter string if configured
> - Added code to acceptFile to use the configured
> filter
> and actually do some filtering using the code now
> in
> ScannerWorker
> .
> If filters are present, the string matches if it
> matches
> The logic is: If no filter is specified and no
> filters
> are present in the List, the string matches,
> regardless
> any of the filters. If a filter is also supplied in
> the
> method call, that filter is also checked, but only
> if it
> is not already present in the List.
>
> I have tested all changes and they seem to work, and
> they
> are fully backwards compatible, i.e no changes need
> to be
> done to existing configurations - they work as
> before.
>
> Example configuration snippet for a
> DirectoryScanner:
>
> <scanner-name>.filter=.*\.xml
>
> The above filter (obviously) matches files named
> *.xml and
> nothing else. It uses the regular expression support
> in the
> String class. So that might require 1.4-level Java
> support?
>
> Patch attached. Please review, and apply if correct
> && useful.
>
> Regards,
>
> David Kinnvall
> > Index: com/babeldoc/scanner/ScannerWorker.java
>
===================================================================
> RCS file:
>
/cvsroot/babeldoc/babeldoc/modules/scanner/src/com/babeldoc/scanner/ScannerWorker.java,v
> retrieving revision 1.27
> diff -u -r1.27 ScannerWorker.java
> --- com/babeldoc/scanner/ScannerWorker.java 30 Sep
> 2003 14:37:21 -0000 1.27
> +++ com/babeldoc/scanner/ScannerWorker.java 8 Oct
> 2003 09:54:26 -0000
> @@ -65,6 +65,9 @@
> */
> package com.babeldoc.scanner;
>
> +import java.util.ArrayList;
> +import java.util.Iterator;
> +import java.util.List;
> import java.util.Map;
> import java.util.HashMap;
>
> @@ -114,6 +117,18 @@
> /** must the documents be submitted as binaries
> */
> private boolean binary;
>
> + /** List of filters to apply to document names.
> + * If there are any filters at least one must
> + * match for each document to be processed. If
> + * no filters are present every document will
> + * be processed. Note: An additional filter can
> + * also be provided manually through the method
> + * acceptEntry, and that filter, if specified,
> + * will be check in addition to the configured
> + * ones.
> + */
> + private List filters = new ArrayList();
> +
> public static final String SCANNER_KEY =
> "scanner";
> public static final String SCAN_DATE_KEY =
> "scan_date";
> public static final String SCAN_PATH_KEY =
> "scan_path";
> @@ -214,7 +229,11 @@
> }
>
> /**
> - * Does this worker accept this entry
> + * Does this worker accept this entry? The string
> + * is matched against the specified filter as
> well
> + * as against any already configured filters. The
> + * matching will result in true if any match is
> + * found or if there are no filters.
> *
> * @param filter filter string
> * @param string name to be filtered
> @@ -222,16 +241,63 @@
> * @return true if accepted - false otherwise
> */
> public boolean acceptEntry(String filter, String
> string) {
> - return true;
> + if(filters.isEmpty() && (filter == null ||
> filter == "")) {
> + return true;
> + } else {
> + if(!filters.isEmpty()) {
> + Iterator i = filters.iterator();
> + while(i.hasNext()) {
> + if(checkFilter((String)i.next(),
> string)) {
> + return true;
> + }
> + }
> + }
> + if(filter != null && filter != "") {
> + // Don't check filter again, if it's in
> filters.
> + if(filters.isEmpty() ||
> !filters.contains(filter)) {
> + if(checkFilter(filter, string)) {
> + return true;
> + }
> + }
> + }
> + return false;
> + }
> + }
> +
> + /**
> + * Check a filter against a string. If the filter
> is
> + * empty it is considered a match. If both are
> not
> + * empty and the string matches the regular
> expression
> + * of the filter it is considered a match.
> Otherwise
> + * it is considered NOT to be a match.
> + *
> + * @param filter Filter string to match string
> against
> + * @param string String to match against filter
> string
> + * @return boolean True if string matches filter
> + */
> + private boolean checkFilter(String filter, String
> string) {
> + if(filter == null || filter == "") {
> + return true;
> + } else {
> + if(string.matches(filter)) {
> + return true;
> + } else {
> + return false;
> + }
> + }
> }
>
> /**
> - * Add a filter
> + * Add a filter, unless it is already present.
> *
> * @param filter to be added
> */
> public void addFilter(String filter) {
> - //TODO: Implement this
> + synchronized(filters) {
> + if(filter != null &&
> !filters.contains(filter)) {
> + filters.add(filter);
> + }
> + }
> }
>
> /**
> Index:
> com/babeldoc/scanner/worker/DirectoryScanner.java
>
===================================================================
> RCS file:
>
/cvsroot/babeldoc/babeldoc/modules/scanner/src/com/babeldoc/scanner/worker/DirectoryScanner.java,v
> retrieving revision 1.23
> diff -u -r1.23 DirectoryScanner.java
> ---
> com/babeldoc/scanner/worker/DirectoryScanner.java 3
> Oct 2003 13:08:40 -0000 1.23
> +++
> com/babeldoc/scanner/worker/DirectoryScanner.java 8
> Oct 2003 09:54:26 -0000
> @@ -119,6 +119,15 @@
> */
> private int minimumFileAge = 0;
>
> + /** Filename filter, as regular expression, to
> apply
> + * to all scanned files. If not defined it
> will have
> + * no effect, i.e all files will match. If
> defined,
> + * only files matching the regular expression
> will
> + * be processed.
> + */
> + private String filter = null;
> +
> +
> /**
> * This method will scan for new documents. It
> will queue documents by
> * itself, so it will return null no matter how
> many documents found!
> @@ -179,14 +188,21 @@
> + getMinimumFileAge() + " ms");
> }
>
> - //Add filename filter if exist
> - addFilter(FILTER_FILENAME);
> +
>
setFilter(this.getInfo().getStrValue(FILTER_FILENAME));
> +
> + if(getFilter() != null && getFilter() !=
> "") {
> +
> LogService.getInstance().logInfo("Filename filter: "
> + + getFilter());
> + addFilter(getFilter());
> + }
> +
> }
>
> /**
> * release the held resource. Do nothing - no
> held resources.
> */
> public void relinquishResources() {
> + // noop
> }
>
> /**
> @@ -321,7 +337,7 @@
>
=== message truncated ===
__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
|
|
From: David K. <dav...@al...> - 2003-10-08 14:03:51
|
Dejan Krsmanovic wrote: > Hi David, Hi Dejan! > I have just realized that filtering is not implemented > in new scanner version. It was implemented in 1.0 > release but since I have rewrote whole scanner for 1.1 > I obivioulsy forgot to implement it. > Have you checked if your code looks like code in 1.0? I'm afraid not. I had no clue this was present earlier, so I just went ahead and did it. Wasn't much code, anyway... > I will review your code as soon as possible. And > don't worry, mine implementation has also required 1.4 > since it used java.util.regex package. There are some > other pieces of Babeldoc that require it... Ok, that's good to know for future work. > Dejan > P.S. Are you sure you don't want to become commiter? Heh, yeah, for the time being. But don't close the door. (You fixed the filename problem in the scanner, I see. Good, then I can scrap my own fix that I just did for it. :-) ) Regards, David |
|
From: Michael A. <mic...@ze...> - 2003-10-08 13:54:42
|
OK, so for all of you that chewed glass while going through my original patch, apologies for the coding, it was pretty crap. I have cleaned it up, and will post a patch just as soon as Sourceforge cvs stops giving me "end of file" messages. Still light on javadoc, but much easier to read. Cheers... MikeA |
|
From: Dejan K. <dej...@nb...> - 2003-10-08 14:14:33
|
Michael, David, all, since I have discovered that some parts of code that existed in 1.0 version of Babeldoc I cannot apply your patches right now. The problem is that mailscanner should use some methods from scanner worker class - that is - methods for storing filters and checking if some entry can matches given filter. Now, David could you check CVS branch 1_0 to see how it was implemented in 1.0 and see if it you can implement it like that? The idea is that you should have different filters. For example in MailScanner there could be filters for subject, e-mail address etc, etc. I see that your implementation use the list of filters to check if some entry could be accepted. It is OK to have more than one filter but you should also be able to make difference between them... Michael I can apply you patch, but what I would like to see is that MailboxScanner uses methods from Scanner worker (addFilter in intialize and acceptEntry(filterName) in doScan method). I could do this by myself but currenty I am pretty bussy with other (non-babeldoc) stuff so I am not sure when I can do it... Thanks, Dejan |
|
From: Michael A. <mic...@ze...> - 2003-10-08 14:41:11
|
Hi, Dejan, I'll get the mailbox scanner done over the next day or two. Just need to look at how the ScannerWorker does things. Cheers... MikeA PS: Anybody know how to get around Sourceforge's "end of file" cvs problem? I seem to be getting it all the time at the moment. On Wed, 2003-10-08 at 15:13, Dejan Krsmanovic wrote: > Michael, David, all, > since I have discovered that some parts of code that existed in 1.0 version > of Babeldoc I cannot apply your patches right now. The problem is that > mailscanner should use some methods from scanner worker class - that is - > methods for storing filters and checking if some entry can matches given > filter. > > Now, David could you check CVS branch 1_0 to see how it was implemented in > 1.0 and see if it you can implement it like that? The idea is that you > should have different filters. For example in MailScanner there could be > filters for subject, e-mail address etc, etc. I see that your implementation > use the list of filters to check if some entry could be accepted. It is OK > to have more than one filter but you should also be able to make difference > between them... > > Michael I can apply you patch, but what I would like to see is that > MailboxScanner uses methods from Scanner worker (addFilter in intialize and > acceptEntry(filterName) in doScan method). > > I could do this by myself but currenty I am pretty bussy with other > (non-babeldoc) stuff so I am not sure when I can do it... > > > Thanks, > Dejan > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Babeldoc-devel mailing list > Bab...@li... > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel |
|
From: Dejan K. <dej...@nb...> - 2003-10-08 14:44:58
|
Not that these methods I mentioned are currently not in ScannerWorker. Please check 1.0 branch to see how is that working. I am also hoping that David will modify his implementation to be compatible with 1.0 release! Dejan ----- Original Message ----- From: "Michael Ansley" <mic...@ze...> To: "Babeldoc Developer List" <bab...@li...> Sent: Wednesday, October 08, 2003 4:41 PM Subject: Re: [Babeldoc-devel] Scanners and filters... > Hi, Dejan, > > I'll get the mailbox scanner done over the next day or two. Just need > to look at how the ScannerWorker does things. > > Cheers... > > > MikeA > > PS: Anybody know how to get around Sourceforge's "end of file" cvs > problem? I seem to be getting it all the time at the moment. > > > On Wed, 2003-10-08 at 15:13, Dejan Krsmanovic wrote: > > Michael, David, all, > > since I have discovered that some parts of code that existed in 1.0 version > > of Babeldoc I cannot apply your patches right now. The problem is that > > mailscanner should use some methods from scanner worker class - that is - > > methods for storing filters and checking if some entry can matches given > > filter. > > > > Now, David could you check CVS branch 1_0 to see how it was implemented in > > 1.0 and see if it you can implement it like that? The idea is that you > > should have different filters. For example in MailScanner there could be > > filters for subject, e-mail address etc, etc. I see that your implementation > > use the list of filters to check if some entry could be accepted. It is OK > > to have more than one filter but you should also be able to make difference > > between them... > > > > Michael I can apply you patch, but what I would like to see is that > > MailboxScanner uses methods from Scanner worker (addFilter in intialize and > > acceptEntry(filterName) in doScan method). > > > > I could do this by myself but currenty I am pretty bussy with other > > (non-babeldoc) stuff so I am not sure when I can do it... > > > > > > Thanks, > > Dejan > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Babeldoc-devel mailing list > > Bab...@li... > > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Babeldoc-devel mailing list > Bab...@li... > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel |
|
From: Michael A. <mic...@ze...> - 2003-10-08 19:42:04
|
Hi, Dejan, Why was this stuff not carried from 1.0 into 1.2? I'd like to have a go at reimplementing it, but I need to know if it's just a case of reimplementing the functionality as it stands in 1.0, or changing it to accomodate some or other requirement. Cheers... MikeA On Wed, 2003-10-08 at 15:44, Dejan Krsmanovic wrote: > Not that these methods I mentioned are currently not in ScannerWorker. > Please check 1.0 branch to see how is that working. I am also hoping that > David will modify his implementation to be compatible with 1.0 release! > > Dejan > ----- Original Message ----- > From: "Michael Ansley" <mic...@ze...> > To: "Babeldoc Developer List" <bab...@li...> > Sent: Wednesday, October 08, 2003 4:41 PM > Subject: Re: [Babeldoc-devel] Scanners and filters... > > > > Hi, Dejan, > > > > I'll get the mailbox scanner done over the next day or two. Just need > > to look at how the ScannerWorker does things. > > > > Cheers... > > > > > > MikeA > > > > PS: Anybody know how to get around Sourceforge's "end of file" cvs > > problem? I seem to be getting it all the time at the moment. > > > > > > On Wed, 2003-10-08 at 15:13, Dejan Krsmanovic wrote: > > > Michael, David, all, > > > since I have discovered that some parts of code that existed in 1.0 > version > > > of Babeldoc I cannot apply your patches right now. The problem is that > > > mailscanner should use some methods from scanner worker class - that > is - > > > methods for storing filters and checking if some entry can matches given > > > filter. > > > > > > Now, David could you check CVS branch 1_0 to see how it was implemented > in > > > 1.0 and see if it you can implement it like that? The idea is that you > > > should have different filters. For example in MailScanner there could be > > > filters for subject, e-mail address etc, etc. I see that your > implementation > > > use the list of filters to check if some entry could be accepted. It is > OK > > > to have more than one filter but you should also be able to make > difference > > > between them... > > > > > > Michael I can apply you patch, but what I would like to see is that > > > MailboxScanner uses methods from Scanner worker (addFilter in intialize > and > > > acceptEntry(filterName) in doScan method). > > > > > > I could do this by myself but currenty I am pretty bussy with other > > > (non-babeldoc) stuff so I am not sure when I can do it... > > > > > > > > > Thanks, > > > Dejan > > > > > > > > > > > > > > > ------------------------------------------------------- > > > This sf.net email is sponsored by:ThinkGeek > > > Welcome to geek heaven. > > > http://thinkgeek.com/sf > > > _______________________________________________ > > > Babeldoc-devel mailing list > > > Bab...@li... > > > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Babeldoc-devel mailing list > > Bab...@li... > > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Babeldoc-devel mailing list > Bab...@li... > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel |
|
From: Dejan K. <dej...@nb...> - 2003-10-09 06:57:05
|
Hi Mike, There was no real reason. I just forgot to put these methods in 1.1! The problem arised when I stareted to refactor scanner package and re-implement some classes. Old scanner had many problems (I am not telling that new one does not have!) so I planned to change scanner architecture a lot. Since that was happening at the same time as we prepared for 1.0 release I done it on separate CVS branch. And probably some changes that were done (mostly by me!) on 1.0 branch had not been merged to 1.1. Anyway, I have just copied methods from 1.0 branch to current (1.2). I have no time for testing now so please check if there are some problems with it! Try to add filtering functionality to MailbxScanner with this config. Thanks, Dejan ----- Original Message ----- From: "Michael Ansley" <mic...@ze...> To: "Dejan Krsmanovic" <dej...@nb...> Cc: "Babeldoc Developer List" <bab...@li...> Sent: Wednesday, October 08, 2003 9:42 PM Subject: Re: [Babeldoc-devel] Scanners and filters... > Hi, Dejan, > > Why was this stuff not carried from 1.0 into 1.2? I'd like to have a go > at reimplementing it, but I need to know if it's just a case of > reimplementing the functionality as it stands in 1.0, or changing it to > accomodate some or other requirement. > > Cheers... > > > MikeA > > > On Wed, 2003-10-08 at 15:44, Dejan Krsmanovic wrote: > > Not that these methods I mentioned are currently not in ScannerWorker. > > Please check 1.0 branch to see how is that working. I am also hoping that > > David will modify his implementation to be compatible with 1.0 release! > > > > Dejan > > ----- Original Message ----- > > From: "Michael Ansley" <mic...@ze...> > > To: "Babeldoc Developer List" <bab...@li...> > > Sent: Wednesday, October 08, 2003 4:41 PM > > Subject: Re: [Babeldoc-devel] Scanners and filters... > > > > > > > Hi, Dejan, > > > > > > I'll get the mailbox scanner done over the next day or two. Just need > > > to look at how the ScannerWorker does things. > > > > > > Cheers... > > > > > > > > > MikeA > > > > > > PS: Anybody know how to get around Sourceforge's "end of file" cvs > > > problem? I seem to be getting it all the time at the moment. > > > > > > > > > On Wed, 2003-10-08 at 15:13, Dejan Krsmanovic wrote: > > > > Michael, David, all, > > > > since I have discovered that some parts of code that existed in 1.0 > > version > > > > of Babeldoc I cannot apply your patches right now. The problem is that > > > > mailscanner should use some methods from scanner worker class - that > > is - > > > > methods for storing filters and checking if some entry can matches given > > > > filter. > > > > > > > > Now, David could you check CVS branch 1_0 to see how it was implemented > > in > > > > 1.0 and see if it you can implement it like that? The idea is that you > > > > should have different filters. For example in MailScanner there could be > > > > filters for subject, e-mail address etc, etc. I see that your > > implementation > > > > use the list of filters to check if some entry could be accepted. It is > > OK > > > > to have more than one filter but you should also be able to make > > difference > > > > between them... > > > > > > > > Michael I can apply you patch, but what I would like to see is that > > > > MailboxScanner uses methods from Scanner worker (addFilter in intialize > > and > > > > acceptEntry(filterName) in doScan method). > > > > > > > > I could do this by myself but currenty I am pretty bussy with other > > > > (non-babeldoc) stuff so I am not sure when I can do it... > > > > > > > > > > > > Thanks, > > > > Dejan > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > This sf.net email is sponsored by:ThinkGeek > > > > Welcome to geek heaven. > > > > http://thinkgeek.com/sf > > > > _______________________________________________ > > > > Babeldoc-devel mailing list > > > > Bab...@li... > > > > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel > > > > > > > > > > > > ------------------------------------------------------- > > > This sf.net email is sponsored by:ThinkGeek > > > Welcome to geek heaven. > > > http://thinkgeek.com/sf > > > _______________________________________________ > > > Babeldoc-devel mailing list > > > Bab...@li... > > > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Babeldoc-devel mailing list > > Bab...@li... > > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > SourceForge.net hosts over 70,000 Open Source Projects. > See the people who have HELPED US provide better services: > Click here: http://sourceforge.net/supporters.php > _______________________________________________ > Babeldoc-devel mailing list > Bab...@li... > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel |
|
From: David K. <dav...@al...> - 2003-10-10 07:40:01
Attachments:
scanner_filtering2.patch
|
Dejan, Mike, list, Dejan Krsmanovic wrote: > Hi Mike, [snip] > Anyway, I have just copied methods from 1.0 branch to current (1.2). I have > no time for testing now so please check if there are some problems with it! > Try to add filtering functionality to MailbxScanner with this config. I have tested the methods in my setup and they seem to work just fine. Filtering works, both with and without a specified filter. What's nicer is that I was able to drop my own filtering stuff, since these methods cover the needs I have. I did some trivial cleanup: - Added missing messages to messages.properties (please review!) - Moved the patterns Hashtable to the top of the ScannerWorker file and added a brief javadoc about it - Re-ordered the methods and accompanying javadocs that got a bit mixed up (addFilter and acceptEntry) and added a bit more javadoc text explaining how the filter logic works - Really "new" in this patch is only my previously suggested addition of providing the DirectoryScanner's doneDirectory under the attribute "done_dir" Patch, as described above, attached. > Thanks, > Dejan /David |
|
From: Dejan K. <dej...@nb...> - 2003-10-10 08:11:34
|
Applied. Thanks David!
Dejan
----- Original Message -----
From: "David Kinnvall" <dav...@al...>
To: "Babeldoc Developer List" <bab...@li...>
Cc: "Dejan Krsmanovic" <dej...@nb...>
Sent: Friday, October 10, 2003 9:39 AM
Subject: Re: [Babeldoc-devel] Scanners and filters...
> Dejan, Mike, list,
>
> Dejan Krsmanovic wrote:
>
> > Hi Mike,
> [snip]
> > Anyway, I have just copied methods from 1.0 branch to current (1.2). I
have
> > no time for testing now so please check if there are some problems with
it!
> > Try to add filtering functionality to MailbxScanner with this config.
>
> I have tested the methods in my setup and they seem to work just
> fine. Filtering works, both with and without a specified filter.
> What's nicer is that I was able to drop my own filtering stuff,
> since these methods cover the needs I have.
>
> I did some trivial cleanup:
>
> - Added missing messages to messages.properties (please review!)
> - Moved the patterns Hashtable to the top of the ScannerWorker
> file and added a brief javadoc about it
> - Re-ordered the methods and accompanying javadocs that got a
> bit mixed up (addFilter and acceptEntry) and added a bit more
> javadoc text explaining how the filter logic works
> - Really "new" in this patch is only my previously suggested
> addition of providing the DirectoryScanner's doneDirectory
> under the attribute "done_dir"
>
> Patch, as described above, attached.
>
> > Thanks,
> > Dejan
>
> /David
>
----------------------------------------------------------------------------
----
> Index: config/i18n/messages.properties
> ===================================================================
> RCS file:
/cvsroot/babeldoc/babeldoc/modules/scanner/config/i18n/messages.properties,v
> retrieving revision 1.16
> diff -u -r1.16 messages.properties
> --- config/i18n/messages.properties 1 Oct 2003 08:33:35 -0000 1.16
> +++ config/i18n/messages.properties 10 Oct 2003 07:27:59 -0000
> @@ -32,8 +32,11 @@
> scanner.ScannerThread.info.scanningPaused=Scanning paused
> scanner.ScannerThread.info.showConfig=ScannerThread initialized using
config {0}
>
> -
> +#ScannerWorker
> scanner.ScannerWorker.error.message=Error during scaning
> +scanner.ScannerWorker.debug.addingFilter=Adding filter '{1}': '{0}'
> +scanner.ScannerWorker.warn.noFilter=No filter called '{0}' found!
> +scanner.ScannerWorker.debug.match=Matching '{1}': {0}
>
> #DirectoryScanner
> scanner.DirectoryScanner.error.notDir=Configuration {0} has a value {1}
which is not an accessible directory!
> Index: src/com/babeldoc/scanner/ScannerWorker.java
> ===================================================================
> RCS file:
/cvsroot/babeldoc/babeldoc/modules/scanner/src/com/babeldoc/scanner/ScannerW
orker.java,v
> retrieving revision 1.28
> diff -u -r1.28 ScannerWorker.java
> --- src/com/babeldoc/scanner/ScannerWorker.java 9 Oct 2003 06:54:10 -0000
1.28
> +++ src/com/babeldoc/scanner/ScannerWorker.java 10 Oct 2003 07:27:59 -0000
> @@ -118,6 +118,9 @@
> /** must the documents be submitted as binaries */
> private boolean binary;
>
> + /** Used to filter what documents to accept for processing */
> + private Hashtable patterns = new Hashtable();
> +
> public static final String SCANNER_KEY = "scanner";
> public static final String SCAN_DATE_KEY = "scan_date";
> public static final String SCAN_PATH_KEY = "scan_path";
> @@ -217,15 +220,23 @@
> return this.valueObject;
> }
>
> +
> /**
> - * Does this worker accept this entry
> - *
> - * @param filter filter string
> - * @param string name to be filtered
> - *
> - * @return true if accepted - false otherwise
> + * Add named filter to use when deciding what documents
> + * to accept for processing. This method gets called by
> + * implementing subclasses to add filters specific to
> + * each scanner implementation.
> + *
> + * The filter is fetched from the scanner configuration
> + * and must be a valid Java regular expression according
> + * to the documentation for java.util.regex.Pattern
> + *
> + * An empty or non-existing pattern is interpreted and
> + * stored as ".*", i.e the match-all wildcard pattern.
> + *
> + * @param filterName Name of configured filter to add,
> + * replaces any existing pattern having the same name
> */
> - private Hashtable patterns = new Hashtable();
> protected void addFilter(String filterName) {
> String patternExp = (String)
this.getInfo().getOption(filterName).getValue();
> if ((patternExp==null) || patternExp.equals("")) {
> @@ -238,16 +249,29 @@
> "scanner.ScannerWorker.debug.addingFilter",
> patternExp,
> filterName));
> - }
> + }
> patterns.put(filterName, pattern);
> }
>
> +
> + /**
> + * Does this worker accept this entry when matched against
> + * the named pattern? If the pattern name does not exist,
> + * the entry is not accepted, else the entry is accepted
> + * if the entry matches the regular expression defined by
> + * the named pattern.
> + *
> + * @param patternName Name of pattern to use when matching
> + * @param text Text to match against the pattern
> + *
> + * @return true if accepted - false otherwise
> + */
> protected boolean acceptEntry(String patternName, String text) {
> Pattern pattern = (Pattern) patterns.get(patternName);
> if (pattern == null) {
> LogService.getInstance().logDebug(
> I18n.get("scanner.ScannerWorker.warn.noFilter", patternName));
> - return false;
> + return false;
> }
> Matcher matcher = pattern.matcher(text);
> boolean result = matcher.matches();
> @@ -302,7 +326,7 @@
> this.initialize();
>
> if (getLog().isDebugEnabled()) {
> - getLog().logDebug(this.getName() + " worker initalized
successfully");
> + getLog().logDebug(this.getName() + " worker initialized
successfully");
> }
>
> //Set status to stopped if worker should be ignored
> Index: src/com/babeldoc/scanner/worker/DirectoryScanner.java
> ===================================================================
> RCS file:
/cvsroot/babeldoc/babeldoc/modules/scanner/src/com/babeldoc/scanner/worker/D
irectoryScanner.java,v
> retrieving revision 1.24
> diff -u -r1.24 DirectoryScanner.java
> --- src/com/babeldoc/scanner/worker/DirectoryScanner.java 8 Oct 2003
13:39:07 -0000 1.24
> +++ src/com/babeldoc/scanner/worker/DirectoryScanner.java 10 Oct 2003
07:27:59 -0000
> @@ -100,6 +100,12 @@
> public static final String FILTER_FILENAME = "filter";
> public static final String MINIMUM_FILE_AGE = "minimumFileAge";
>
> + /**
> + * This is used to provide information to the pipeline stages
> + * about where documents processed by this scanner are moved.
> + */
> + public static final String DONE_DIR_KEY = "done_dir";
> +
> public DirectoryScanner() {
> super(new DirectoryScannerInfo());
> }
> @@ -305,7 +311,8 @@
> PipelineDocument.getMimeTypeForFile(file.getName())),
> new NameValuePair(SCAN_DATE_KEY, Long.toString(modified)),
> new NameValuePair(SCAN_PATH_KEY, file.getCanonicalPath()),
> - new NameValuePair(FILE_NAME_KEY, file.getName())});
> + new NameValuePair(FILE_NAME_KEY, file.getName()),
> + new NameValuePair(DONE_DIR_KEY,
getDoneDirectory())});
> } finally {
> fis.close();
> baos.close();
> @@ -315,7 +322,8 @@
> /**
> * Consult configuration if this file should be processed
> * or not. Current configurable constraints include the age
> - * of the file and a filename filter.
> + * of the file and a filename filter, both optional and by
> + * default all permissive.
> *
> * @param file The file to be checked against configuration
> * @return true If the file should be processed at this time
>
>
|
|
From: David K. <dav...@al...> - 2003-10-09 17:41:34
|
Dejan Krsmanovic wrote: > Hi Mike, [snip] > Anyway, I have just copied methods from 1.0 branch to current (1.2). I have > no time for testing now so please check if there are some problems with it! > Try to add filtering functionality to MailbxScanner with this config. I'll go ahead and try to retrofit whatever might be needed from my previous filter patch into the re-merged filter functionality now present in 1.2. Shouldn't be much work, from the looks of it, since it looks well thought out. Due to time constraints, it is likely that this won't happen until next week, though, if nobody beats me to it, that is. ;-) > Thanks, > Dejan /David |
|
From: Michael A. <mic...@ze...> - 2003-10-09 17:52:52
|
Is anybody else having problems with anonymous CVS access? It's being a real pain in the ass right now, I can't get anything. Does anybody do nightly tarballs? Cheers... On Thu, 2003-10-09 at 18:41, David Kinnvall wrote: > Dejan Krsmanovic wrote: > > > Hi Mike, > [snip] > > Anyway, I have just copied methods from 1.0 branch to current (1.2). I have > > no time for testing now so please check if there are some problems with it! > > Try to add filtering functionality to MailbxScanner with this config. > > I'll go ahead and try to retrofit whatever might be needed from > my previous filter patch into the re-merged filter functionality > now present in 1.2. Shouldn't be much work, from the looks of it, > since it looks well thought out. Due to time constraints, it is > likely that this won't happen until next week, though, if nobody > beats me to it, that is. ;-) > > > Thanks, > > Dejan > > /David > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > SourceForge.net hosts over 70,000 Open Source Projects. > See the people who have HELPED US provide better services: > Click here: http://sourceforge.net/supporters.php > _______________________________________________ > Babeldoc-devel mailing list > Bab...@li... > https://lists.sourceforge.net/lists/listinfo/babeldoc-devel > |