|
From: Ignacio G. <igc...@gm...> - 2007-10-31 14:03:27
|
Hello Brad, everyone, I have been playing around with Wayback 1.0 for a couple of weeks, since it got released and here is a list of my comments, questions and issues. I will start by saying that I really like the changes that have been made, specially in the configuration aspect of the tool. It is now much easier to configure, to understand what each section does and set up the environment. I have been able to set up several AccessPoints (3) that access different collections (3) and they all seem to work as expected. They are set up on port 8088, so changing the port is not an issue and can be done easily, using the AccessPoint configuration. All three collections use CDX indexes, so this also works perfectly. However, I was only able to make Wayback work using version 1.0.0 under the ROOT context. I downloaded and tried version 1.0.1 but it did not start due to errors in the configuration (even using the default set up). I do not think that using the ROOT context is a big issue, since the AccessPoints provide path control and differentiation, but it wold be good if we could deploy Wayback under different contexts. Also, I have found that if you try to access an AccessPoint location without the trailing slash '/' it will not work. A Not-Found (404) error is displayed instead. This means that typing: http://xyz.com/myCollection/ displays the Wayback interface successfully, but using http://xyz.com/myCollection will not. I do not know if this is something that should be corrected in the server configuration and it is not a Wayback issue, but I thought I should let you know. My next comments are regarding the exclusion and restriction mechanisms. Have in mind that I am using version 1.0.0, so I do not know if a working 1.0.1 has this issues resolved. I was able to successfully implement an IP-based restriction on one of my collections, and it did block content for all IPs outside of the specified range. However, I had some problems when trying to specify more than one <value> element to the IP <list>. I wanted to use two IP ranges, and there were some issues. I will have to test this more extensively, because it might be a problem of Wayback not updating properly after a simple restart. I also tried to implement an static exclusion using a plain text file and I have to say that I was not able to make this work at all. I added this code section to my wayback.xml file. It was by itself, outside any AccessPoint or Collection. <bean name="2004-exclusion-list" class=" org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory "> <property name="file" value="/vol/webcapture/wayback_indexes/el2004/exclude.txt" /> <property name="checkInterval" value="10" /> </bean> Then, inside the desired AccessPoint, I added the following: <property name="exclusionFactory" ref="2004-exclusion-list" /> The Catalina log does not show any information regarding Wayback accessing the file, so I believe that the configuration file parsed correctly, but it chose to ignore the exclusion and that is why it is not being applied. My last question has to do with the integration of this two exclusion/restriction mechanisms. In some of my AccessPoints, I would like to be able to block some URLs, but only to those users that are outside of the range provided. Will I have to create two AccessPoints, one with the IP restriction that will allow users to view the complete collection, and then a different one that will block the contents for everyone or can I put the together in a single AccessPoint? Since I could not implement the static exclusion I was not able to test if this properties could be nested one inside the other, but I think that this would be a very important option. Otherwise, we would have to implement server-side redirection based on IP addresses to point users to the correct AccessPoint, and that would eliminate most of the benefit of integrating IP recognition inside Wayback. This is what I have experienced up to this point. I will keep testing other aspects that we might use and report back with my findings. Thank you. |
|
From: Brad T. <br...@ar...> - 2007-11-03 01:32:50
|
Hi Ignacio, Glad the new configuration system is working well for you. I am still unable to reproduce the problem with the non-ROOT context, I will hopefully have more info Monday on this issue. I am also unable to reproduce the multiple IP address problem -- perhaps adding some additional logging within this module will simplify. A restart of Tomcat should be all that's needed to reload the new wayback.xml configuration, so let me know if you're still having problems with this. Re: the same collection exported with different Exclusion configuration, you'll need multiple AccessPoints. AccessPoint A: exports Collection Foo with no Exclusions, and limits via authentication users within one of your IP ranges. AccessPoint B: exports Collection Foo with your administrative list Exclusions, and has no authentication configuration. Re: the administrative list exclusions.. I just noticed that the documentation does not include the 'init-method="init"' which needs to be part of the StaticMapExclusionFilterFactory bean definition. I've just checked in this documentation change now, and will push it live to the wayback website on Monday. Let me know how this works for you. Brad Ignacio Garcia wrote: > Hello Brad, everyone, > > I have been playing around with Wayback 1.0 for a couple of weeks, since it > got released and here is a list of my comments, questions and issues. > > I will start by saying that I really like the changes that have been made, > specially in the configuration aspect of the tool. > It is now much easier to configure, to understand what each section does and > set up the environment. > > I have been able to set up several AccessPoints (3) that access different > collections (3) and they all seem to work as expected. > They are set up on port 8088, so changing the port is not an issue and can > be done easily, using the AccessPoint configuration. > All three collections use CDX indexes, so this also works perfectly. > However, I was only able to make Wayback work using version 1.0.0 under the > ROOT context. > I downloaded and tried version 1.0.1 but it did not start due to errors in > the configuration (even using the default set up). > I do not think that using the ROOT context is a big issue, since the > AccessPoints provide path control and differentiation, but it wold be good > if we could deploy Wayback under different contexts. > Also, I have found that if you try to access an AccessPoint location without > the trailing slash '/' it will not work. A Not-Found (404) error is > displayed instead. > This means that typing: http://xyz.com/myCollection/ displays the Wayback > interface successfully, but using http://xyz.com/myCollection will not. > I do not know if this is something that should be corrected in the server > configuration and it is not a Wayback issue, but I thought I should let you > know. > > My next comments are regarding the exclusion and restriction mechanisms. > Have in mind that I am using version 1.0.0, so I do not know if a working > 1.0.1 has this issues resolved. > > I was able to successfully implement an IP-based restriction on one of my > collections, and it did block content for all IPs outside of the specified > range. > However, I had some problems when trying to specify more than one <value> > element to the IP <list>. > I wanted to use two IP ranges, and there were some issues. > I will have to test this more extensively, because it might be a problem of > Wayback not updating properly after a simple restart. > > I also tried to implement an static exclusion using a plain text file and I > have to say that I was not able to make this work at all. > I added this code section to my wayback.xml file. It was by itself, outside > any AccessPoint or Collection. > > <bean name="2004-exclusion-list" class=" > org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory > "> > <property name="file" > value="/vol/webcapture/wayback_indexes/el2004/exclude.txt" /> > <property name="checkInterval" value="10" /> > </bean> > > Then, inside the desired AccessPoint, I added the following: > > <property name="exclusionFactory" ref="2004-exclusion-list" /> > > The Catalina log does not show any information regarding Wayback accessing > the file, so I believe that the configuration file parsed correctly, but it > chose to ignore the exclusion and that is why it is not being applied. > > My last question has to do with the integration of this two > exclusion/restriction mechanisms. > In some of my AccessPoints, I would like to be able to block some URLs, but > only to those users that are outside of the range provided. > Will I have to create two AccessPoints, one with the IP restriction that > will allow users to view the complete collection, and then a different one > that will block the contents for everyone or can I put the together in a > single AccessPoint? > Since I could not implement the static exclusion I was not able to test if > this properties could be nested one inside the other, but I think that this > would be a very important option. > Otherwise, we would have to implement server-side redirection based on IP > addresses to point users to the correct AccessPoint, and that would > eliminate most of the benefit of integrating IP recognition inside Wayback. > > > This is what I have experienced up to this point. I will keep testing other > aspects that we might use and report back with my findings. > > Thank you. > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ------------------------------------------------------------------------ > > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Ignacio G. <igc...@gm...> - 2007-11-06 14:45:45
|
Hello Brad, Thanks for the response... I just installed once again Wayback 1.0.1 in a non-root context and it seems like this time it worked. I do not know what the problem was before, but it working with no problems. The only thing I can think of that might have been a problem is the name of the context. I was using "wb-webapp" before and I installed as "wayback" this time. It might be a problem with the "-" (dash) in the middle of the context name that was making the application fail. The addition of 'init-method="init"' to the exclusion bean also worked, so we are good there too. The only question regarding this is: What is the naming convention for URLs that we must use. >From my testing it seems like the "http://" always needs to be present and then it does not matter if the www. are there or not. I tried without http:// and nothing got blocked, so I am assuming the template would be: http://(www.)?DOMAIN.TO.BLOCK One last thing that you did not address in my first email is the issue of accessing the accessPoints without the trailing slash. After making version 1.0.1 work, it seems like the problem is still there: http://xyz.com/wayback/collectionA/ -> works http://xyz.com/wayback/collectionA -> does not work Do you have any ideas on this one? Thank you. On 11/2/07, Brad Tofel <br...@ar...> wrote: > > Hi Ignacio, > > Glad the new configuration system is working well for you. > > I am still unable to reproduce the problem with the non-ROOT context, I > will hopefully have more info Monday on this issue. > > I am also unable to reproduce the multiple IP address problem -- perhaps > adding some additional logging within this module will simplify. A > restart of Tomcat should be all that's needed to reload the new > wayback.xml configuration, so let me know if you're still having > problems with this. > > Re: the same collection exported with different Exclusion configuration, > you'll need multiple AccessPoints. > > AccessPoint A: exports Collection Foo with no Exclusions, and limits via > authentication users within one of your IP ranges. > > AccessPoint B: exports Collection Foo with your administrative list > Exclusions, and has no authentication configuration. > > Re: the administrative list exclusions.. I just noticed that the > documentation does not include the 'init-method="init"' which needs to > be part of the StaticMapExclusionFilterFactory bean definition. I've > just checked in this documentation change now, and will push it live to > the wayback website on Monday. > > Let me know how this works for you. > > Brad > > Ignacio Garcia wrote: > > Hello Brad, everyone, > > > > I have been playing around with Wayback 1.0 for a couple of weeks, since > it > > got released and here is a list of my comments, questions and issues. > > > > I will start by saying that I really like the changes that have been > made, > > specially in the configuration aspect of the tool. > > It is now much easier to configure, to understand what each section does > and > > set up the environment. > > > > I have been able to set up several AccessPoints (3) that access > different > > collections (3) and they all seem to work as expected. > > They are set up on port 8088, so changing the port is not an issue and > can > > be done easily, using the AccessPoint configuration. > > All three collections use CDX indexes, so this also works perfectly. > > However, I was only able to make Wayback work using version 1.0.0 under > the > > ROOT context. > > I downloaded and tried version 1.0.1 but it did not start due to errors > in > > the configuration (even using the default set up). > > I do not think that using the ROOT context is a big issue, since the > > AccessPoints provide path control and differentiation, but it wold be > good > > if we could deploy Wayback under different contexts. > > Also, I have found that if you try to access an AccessPoint location > without > > the trailing slash '/' it will not work. A Not-Found (404) error is > > displayed instead. > > This means that typing: http://xyz.com/myCollection/ displays the > Wayback > > interface successfully, but using http://xyz.com/myCollection will not. > > I do not know if this is something that should be corrected in the > server > > configuration and it is not a Wayback issue, but I thought I should let > you > > know. > > > > My next comments are regarding the exclusion and restriction mechanisms. > > Have in mind that I am using version 1.0.0, so I do not know if a > working > > 1.0.1 has this issues resolved. > > > > I was able to successfully implement an IP-based restriction on one of > my > > collections, and it did block content for all IPs outside of the > specified > > range. > > However, I had some problems when trying to specify more than one > <value> > > element to the IP <list>. > > I wanted to use two IP ranges, and there were some issues. > > I will have to test this more extensively, because it might be a problem > of > > Wayback not updating properly after a simple restart. > > > > I also tried to implement an static exclusion using a plain text file > and I > > have to say that I was not able to make this work at all. > > I added this code section to my wayback.xml file. It was by itself, > outside > > any AccessPoint or Collection. > > > > <bean name="2004-exclusion-list" class=" > > > org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory > > "> > > <property name="file" > > value="/vol/webcapture/wayback_indexes/el2004/exclude.txt" /> > > <property name="checkInterval" value="10" /> > > </bean> > > > > Then, inside the desired AccessPoint, I added the following: > > > > <property name="exclusionFactory" ref="2004-exclusion-list" /> > > > > The Catalina log does not show any information regarding Wayback > accessing > > the file, so I believe that the configuration file parsed correctly, but > it > > chose to ignore the exclusion and that is why it is not being applied. > > > > My last question has to do with the integration of this two > > exclusion/restriction mechanisms. > > In some of my AccessPoints, I would like to be able to block some URLs, > but > > only to those users that are outside of the range provided. > > Will I have to create two AccessPoints, one with the IP restriction that > > will allow users to view the complete collection, and then a different > one > > that will block the contents for everyone or can I put the together in a > > single AccessPoint? > > Since I could not implement the static exclusion I was not able to test > if > > this properties could be nested one inside the other, but I think that > this > > would be a very important option. > > Otherwise, we would have to implement server-side redirection based on > IP > > addresses to point users to the correct AccessPoint, and that would > > eliminate most of the benefit of integrating IP recognition inside > Wayback. > > > > > > This is what I have experienced up to this point. I will keep testing > other > > aspects that we might use and report back with my findings. > > > > Thank you. > > > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Archive-access-discuss mailing list > > Arc...@li... > > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > > |
|
From: Brad T. <br...@ar...> - 2007-11-06 19:53:18
|
I think I did the deploy at "wb-webapp" ServletContext and had success, but I'll double check that this did work on our setup. Re: the trailing slash, this is a bug that has come up twice in the last week. There'll be a fix checked into SVN and available on our build box in the next day or two; I'll send a note when it's available. The static map file should not require leading "http://", let me know if you find this is not the case. The wayback currently does not have much flexibility in the URL canonicalization -- currently leading "www." and "www[0-9]*\." are stripped from hostnames. We intend to make this functionality configurable going forward, but have no schedule yet. Let me know what priority it is for your installations. We are looking at doing a maintenance 1.2.0 release in the next few weeks, which will also have this fix present. Brad > Hello Brad, > > Thanks for the response... > > I just installed once again Wayback 1.0.1 in a non-root context and it > seems > like this time it worked. I do not know what the problem was before, but > it > working with no problems. > The only thing I can think of that might have been a problem is the name > of > the context. I was using "wb-webapp" before and I installed as "wayback" > this time. > It might be a problem with the "-" (dash) in the middle of the context > name > that was making the application fail. > > The addition of 'init-method="init"' to the exclusion bean also worked, so > we are good there too. > The only question regarding this is: > What is the naming convention for URLs that we must use. >>From my testing it seems like the "http://" always needs to be present >> and > then it does not matter if the www. are there or not. > I tried without http:// and nothing got blocked, so I am assuming the > template would be: http://(www.)?DOMAIN.TO.BLOCK > > One last thing that you did not address in my first email is the issue of > accessing the accessPoints without the trailing slash. > After making version 1.0.1 work, it seems like the problem is still there: > http://xyz.com/wayback/collectionA/ -> works > http://xyz.com/wayback/collectionA -> does not work > Do you have any ideas on this one? > > Thank you. > > On 11/2/07, Brad Tofel <br...@ar...> wrote: >> >> Hi Ignacio, >> >> Glad the new configuration system is working well for you. >> >> I am still unable to reproduce the problem with the non-ROOT context, I >> will hopefully have more info Monday on this issue. >> >> I am also unable to reproduce the multiple IP address problem -- perhaps >> adding some additional logging within this module will simplify. A >> restart of Tomcat should be all that's needed to reload the new >> wayback.xml configuration, so let me know if you're still having >> problems with this. >> >> Re: the same collection exported with different Exclusion configuration, >> you'll need multiple AccessPoints. >> >> AccessPoint A: exports Collection Foo with no Exclusions, and limits via >> authentication users within one of your IP ranges. >> >> AccessPoint B: exports Collection Foo with your administrative list >> Exclusions, and has no authentication configuration. >> >> Re: the administrative list exclusions.. I just noticed that the >> documentation does not include the 'init-method="init"' which needs to >> be part of the StaticMapExclusionFilterFactory bean definition. I've >> just checked in this documentation change now, and will push it live to >> the wayback website on Monday. >> >> Let me know how this works for you. >> >> Brad >> >> Ignacio Garcia wrote: >> > Hello Brad, everyone, >> > >> > I have been playing around with Wayback 1.0 for a couple of weeks, >> since >> it >> > got released and here is a list of my comments, questions and issues. >> > >> > I will start by saying that I really like the changes that have been >> made, >> > specially in the configuration aspect of the tool. >> > It is now much easier to configure, to understand what each section >> does >> and >> > set up the environment. >> > >> > I have been able to set up several AccessPoints (3) that access >> different >> > collections (3) and they all seem to work as expected. >> > They are set up on port 8088, so changing the port is not an issue and >> can >> > be done easily, using the AccessPoint configuration. >> > All three collections use CDX indexes, so this also works perfectly. >> > However, I was only able to make Wayback work using version 1.0.0 >> under >> the >> > ROOT context. >> > I downloaded and tried version 1.0.1 but it did not start due to >> errors >> in >> > the configuration (even using the default set up). >> > I do not think that using the ROOT context is a big issue, since the >> > AccessPoints provide path control and differentiation, but it wold be >> good >> > if we could deploy Wayback under different contexts. >> > Also, I have found that if you try to access an AccessPoint location >> without >> > the trailing slash '/' it will not work. A Not-Found (404) error is >> > displayed instead. >> > This means that typing: http://xyz.com/myCollection/ displays the >> Wayback >> > interface successfully, but using http://xyz.com/myCollection will >> not. >> > I do not know if this is something that should be corrected in the >> server >> > configuration and it is not a Wayback issue, but I thought I should >> let >> you >> > know. >> > >> > My next comments are regarding the exclusion and restriction >> mechanisms. >> > Have in mind that I am using version 1.0.0, so I do not know if a >> working >> > 1.0.1 has this issues resolved. >> > >> > I was able to successfully implement an IP-based restriction on one of >> my >> > collections, and it did block content for all IPs outside of the >> specified >> > range. >> > However, I had some problems when trying to specify more than one >> <value> >> > element to the IP <list>. >> > I wanted to use two IP ranges, and there were some issues. >> > I will have to test this more extensively, because it might be a >> problem >> of >> > Wayback not updating properly after a simple restart. >> > >> > I also tried to implement an static exclusion using a plain text file >> and I >> > have to say that I was not able to make this work at all. >> > I added this code section to my wayback.xml file. It was by itself, >> outside >> > any AccessPoint or Collection. >> > >> > <bean name="2004-exclusion-list" class=" >> > >> org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory >> > "> >> > <property name="file" >> > value="/vol/webcapture/wayback_indexes/el2004/exclude.txt" /> >> > <property name="checkInterval" value="10" /> >> > </bean> >> > >> > Then, inside the desired AccessPoint, I added the following: >> > >> > <property name="exclusionFactory" ref="2004-exclusion-list" /> >> > >> > The Catalina log does not show any information regarding Wayback >> accessing >> > the file, so I believe that the configuration file parsed correctly, >> but >> it >> > chose to ignore the exclusion and that is why it is not being applied. >> > >> > My last question has to do with the integration of this two >> > exclusion/restriction mechanisms. >> > In some of my AccessPoints, I would like to be able to block some >> URLs, >> but >> > only to those users that are outside of the range provided. >> > Will I have to create two AccessPoints, one with the IP restriction >> that >> > will allow users to view the complete collection, and then a different >> one >> > that will block the contents for everyone or can I put the together in >> a >> > single AccessPoint? >> > Since I could not implement the static exclusion I was not able to >> test >> if >> > this properties could be nested one inside the other, but I think that >> this >> > would be a very important option. >> > Otherwise, we would have to implement server-side redirection based on >> IP >> > addresses to point users to the correct AccessPoint, and that would >> > eliminate most of the benefit of integrating IP recognition inside >> Wayback. >> > >> > >> > This is what I have experienced up to this point. I will keep testing >> other >> > aspects that we might use and report back with my findings. >> > >> > Thank you. >> > >> > >> > ------------------------------------------------------------------------ >> > >> > >> ------------------------------------------------------------------------- >> > This SF.net email is sponsored by: Splunk Inc. >> > Still grepping through log files to find problems? Stop. >> > Now Search log events and configuration files using AJAX and a >> browser. >> > Download your FREE copy of Splunk now >> http://get.splunk.com/ >> > ------------------------------------------------------------------------ >> > >> > _______________________________________________ >> > Archive-access-discuss mailing list >> > Arc...@li... >> > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >> > >> >> > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> > http://get.splunk.com/_______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |