archive-access-discuss Mailing List for Web Archive Access Utilities (Page 34)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello Brad, everyone,

I have been playing around with Wayback 1.0 for a couple of weeks, since it
got released and here is a list of my comments, questions and issues.

I will start by saying that I really like the changes that have been made,
specially in the configuration aspect of the tool.
It is now much easier to configure, to understand what each section does and
set up the environment.

I have been able to set up several AccessPoints (3) that access different
collections (3) and they all seem to work as expected.
They are set up on port 8088, so changing the port is not an issue and can
be done easily, using the AccessPoint configuration.
All three collections use CDX indexes, so this also works perfectly.
However, I was only able to make Wayback work using version 1.0.0 under the
ROOT context.
I downloaded and tried version 1.0.1 but it did not start due to errors in
the configuration (even using the default set up).
I do not think that using the ROOT context is a big issue, since the
AccessPoints provide path control and differentiation, but it wold be good
if we could deploy Wayback under different contexts.
Also, I have found that if you try to access an AccessPoint location without
the trailing slash '/' it will not work. A Not-Found (404) error is
displayed instead.
This means that typing: http://xyz.com/myCollection/ displays the Wayback
interface successfully, but using http://xyz.com/myCollection will not.
I do not know if this is something that should be corrected in the server
configuration and it is not a Wayback issue, but I thought I should let you
know.

My next comments are regarding the exclusion and restriction mechanisms.
Have in mind that I am using version 1.0.0, so I do not know if a working
1.0.1 has this issues resolved.

I was able to successfully implement an IP-based restriction on one of my
collections, and it did block content for all IPs outside of the specified
range.
However, I had some problems when trying to specify more than one <value>
element to the IP <list>.
I wanted to use two IP ranges, and there were some issues.
I will have to test this more extensively, because it might be a problem of
Wayback not updating properly after a simple restart.

I also tried to implement an static exclusion using a plain text file and I
have to say that I was not able to make this work at all.
I added this code section to my wayback.xml file. It was by itself, outside
any AccessPoint or Collection.

   <bean name="2004-exclusion-list" class="
org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory
">
     <property name="file"
value="/vol/webcapture/wayback_indexes/el2004/exclude.txt" />
      <property name="checkInterval" value="10" />
   </bean>

Then, inside the desired AccessPoint, I added the following:

   <property name="exclusionFactory" ref="2004-exclusion-list" />

The Catalina log does not show any information regarding Wayback accessing
the file, so I believe that the configuration file parsed correctly, but it
chose to ignore the exclusion and that is why it is not being applied.

My last question has to do with the integration of this two
exclusion/restriction mechanisms.
In some of my AccessPoints, I would like to be able to block some URLs, but
only to those users that are outside of the range provided.
Will I have to create two AccessPoints, one with the IP restriction that
will allow users to view the complete collection, and then a different one
that will block the contents for everyone or can I put the together in a
single AccessPoint?
Since I could not implement the static exclusion I was not able to test if
this properties could be nested one inside the other, but I think that this
would be a very important option.
Otherwise, we would have to implement server-side redirection based on IP
addresses to point users to the correct AccessPoint, and that would
eliminate most of the benefit of integrating IP recognition inside Wayback.

This is what I have experienced up to this point. I will keep testing other
aspects that we might use and report back with my findings.

Thank you.

2005	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (4)	Sep (5)	Oct (17)	Nov (30)	Dec (3)
2006	Jan (4)	Feb (14)	Mar (8)	Apr (11)	May (2)	Jun (13)	Jul (9)	Aug (2)	Sep (2)	Oct (9)	Nov (20)	Dec (9)
2007	Jan (6)	Feb (4)	Mar (6)	Apr (7)	May (6)	Jun (6)	Jul (4)	Aug (3)	Sep (9)	Oct (26)	Nov (23)	Dec (2)
2008	Jan (17)	Feb (19)	Mar (16)	Apr (27)	May (3)	Jun (21)	Jul (21)	Aug (8)	Sep (13)	Oct (7)	Nov (8)	Dec (8)
2009	Jan (18)	Feb (14)	Mar (27)	Apr (14)	May (10)	Jun (14)	Jul (18)	Aug (30)	Sep (18)	Oct (12)	Nov (5)	Dec (26)
2010	Jan (27)	Feb (3)	Mar (8)	Apr (4)	May (6)	Jun (13)	Jul (25)	Aug (11)	Sep (2)	Oct (4)	Nov (7)	Dec (6)
2011	Jan (25)	Feb (17)	Mar (25)	Apr (23)	May (15)	Jun (12)	Jul (8)	Aug (13)	Sep (4)	Oct (17)	Nov (7)	Dec (6)
2012	Jan (4)	Feb (7)	Mar (1)	Apr (10)	May (11)	Jun (5)	Jul (7)	Aug (1)	Sep (1)	Oct (5)	Nov (6)	Dec (13)
2013	Jan (9)	Feb (7)	Mar (3)	Apr (1)	May (3)	Jun (19)	Jul (3)	Aug (3)	Sep	Oct (1)	Nov (1)	Dec (1)
2014	Jan (11)	Feb (1)	Mar	Apr (2)	May (6)	Jun	Jul	Aug (1)	Sep	Oct (1)	Nov (1)	Dec (1)
2015	Jan	Feb	Mar	Apr	May	Jun (1)	Jul (4)	Aug	Sep	Oct	Nov	Dec (1)
2016	Jan (4)	Feb (3)	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct (1)	Nov	Dec
2018	Jan	Feb	Mar	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov (1)	Dec
2019	Jan (2)	Feb (1)	Mar	Apr	May	Jun (2)	Jul	Aug	Sep (1)	Oct (1)	Nov	Dec

archive-access-discuss Mailing List for Web Archive Access Utilities (Page 34)

archive-access-discuss — General discussion about archive-access projects