Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

7 [uuri] When 'generous' mode, don't encode curly-brackets - ID: 1329725
Last Update: Comment added ( karl-ia )

Below is conversation from the list with Bjarne about
curly-brackets not getting encoded by IE. We should
probably do the same. Giving it priority 7 since issue
came in off the list.

Bjarne Andersen wrote:

> Thanks for the pointers - the URL I gave was not
complete - this ons is:
>
http://www.bs.dk/content.aspx?itemguid={31637766-92B4-4ACA-9A0D-5CFF042B151
E}
>
> URLs like this get encoded in the arc-files -> they
are encoded in my cdx-files -> my proxy-server can't
find them.
>
> :-)
> Bjarne
>
> stack wrote:
> > Bjarne Andersen wrote:
> >
> > > Which class does the URLencoding in heritrix ?
> >
> > org.archive.net.UURI. Study its superclasses
LaxURI and
> > commons-httpclient URI. Also see the
UURIFactory#fixup code.
> >
> > >
> > > it looks like URLs like:
> > > http://www.bs.dk/showfile.aspx?IdGuid={B0A}
> > > <http://www.bs.dk/showfile.aspx?IdGuid=%7BB0A%7D>
> > >
> > > gets encoded to:
> > > http://www.bs.dk/showfile.aspx?IdGuid=%7BBOA%7D
> > >
> > > my browser (IE) does not encode '{}'
automatically so when my browser
> > > wants access such URI's they are not found in my
archive unless I
> > > encode the braces myself. Because of this I want
to use the heritrix
> > > URLencoding Class to ensure consistency between
what heritrix has
> > > encoded and what my proxyserver encodes before
looking URIs up in my
> > > archive.
> >
> > Sounds like something we should just accomodate in
LaxURI (Doesn't
> > matter whether I use the encoded or non-encoded
version on live net in
> > FireFox, I get 'Server Error in '/' Application.'
Is that your
> > experience?).
> >
> > St.Ack
> >
> > >
> > > best
> > >
> > > ---
> > > Bjarne Andersen


Michael Stack ( stack-sf ) - 2005-10-18 17:45

7

Closed

None

Michael Stack

None

1.6.0

Public


Comments ( 3 )

Date: 2007-03-14 01:44
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-977 -- please add further
comments at that location.


Date: 2005-11-04 02:43
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

So, firefox only allows curlies in query string. Not in path.

Closing because added unit test to ensure we're doing as per
firefox.

Fix for '[ 1329725 ] [uuri] When 'generous' mode, don't
encode curly-brackets'
* src/java/org/archive/net/LaxURI.java
Line lengths and changed param name so doesn't mask data
member.
* src/java/org/archive/net/LaxURLCodec.java
Added a QUERY_SAFE set to use escaping query strings in
fixup. Javadoc.
* src/java/org/archive/net/UURIFactory.java
Add override of ensureMinimalEscaping, one that takes
the BitSet to use
escaping.
* src/java/org/archive/net/UURIFactoryTest.java
Add curlies test. Make sure curlies in query are ok but
curlies anywhere
else are not.




Date: 2005-11-02 18:25
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Add to .1.6.0. Looks easy to do.


Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
status_id Open 2005-11-04 02:43 stack-sf
close_date - 2005-11-04 02:43 stack-sf
assigned_to nobody 2005-11-02 20:24 gojomo
artifact_group_id None 2005-11-02 18:25 stack-sf