htdig-dev Mailing List for ht://Dig (Page 92)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

According to Jessica Biola:
> Is there a way to match spaces in a regex inside a
> url_rewrite_rules parameter, so that you could just
> do:
> 
> url_rewrite_rules: (.*)[:space:](.*) \1%20\2
> 
> (of course, you'd have to repeat this same rule
> multiple times to handle multiple spaces)  I tried the
> above rule and it didn't seem to work.  Characters
> inside the [brackets] were taken literally, and thus,
> the first s, p, a, c, or e were replaced with %20.
> 
> This may seem like a wimpy work-around, but it could
> be done without the need to modify any code
> internally, keeping htdig RFC2396 compliant at the
> same time.
> 
> So if you could help me with the regex I would
> appreciate it.

Interesting idea, but there are a few reasons it won't work:

1) As you discovered, the [:space:] character class isn't implemented.
This may actually be a function of which regex code ends up being used.
Some C libraries may implement this, but clearly that's not the case on
your system.  Even if your regex code does implement this, see point 3.

2) You can't use just a space in the regular expression, either with
or without the brackets, because url_rewrite_rules is parsed as a
string list, not a quoted string list, so there's no way to embed a
literal space in your regular expression.

3) Even if you could get around the two problems above, it still wouldn't
work because the URL class doesn't do the rewriting until AFTER it's
parsed the URL, and so the spaces are already stripped out in accordance
with RFC2396.

By the way, any trick you'd use to make htdig handle spaces within URLs
would be a violation of RFC2396, regardless of whether it required code
changes or just config file changes.  The standard says spaces should
be stripped out.  The way most web browsers handle spaces within URLs is
also a violation of RFC2396.  The question is whether/how we get htdig
to do likewise.

The change I had suggested previously, which Joe Jah wrote into a patch
mostly does things correctly.  Only one bit is missing.  All white space
characters other than the space itself are stripped out anywhere, and
the chop() call strips off trailing spaces, but there's nothing in that
patch to strip off leading spaces, which is what caused grief in Joe's
test of his patch.

What you could do is, in addition to Joe's patch, add the following
at the very start of URL::URL(char *ref, URL &parent)...

    while (*ref == ' ')
	ref++;

and this at the very start of URL::parse(char *u)...

    while (*u == ' ')
	u++;

before ref or u is assigned to the String "temp".

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (47)	Nov (74)	Dec (66)
2002	Jan (95)	Feb (102)	Mar (83)	Apr (64)	May (55)	Jun (39)	Jul (23)	Aug (77)	Sep (88)	Oct (84)	Nov (66)	Dec (46)
2003	Jan (56)	Feb (129)	Mar (37)	Apr (63)	May (59)	Jun (104)	Jul (48)	Aug (37)	Sep (49)	Oct (157)	Nov (119)	Dec (54)
2004	Jan (51)	Feb (66)	Mar (39)	Apr (113)	May (34)	Jun (136)	Jul (67)	Aug (20)	Sep (7)	Oct (10)	Nov (14)	Dec (3)
2005	Jan (40)	Feb (21)	Mar (26)	Apr (13)	May (6)	Jun (4)	Jul (23)	Aug (3)	Sep (1)	Oct (13)	Nov (1)	Dec (6)
2006	Jan (2)	Feb (4)	Mar (4)	Apr (1)	May (11)	Jun (1)	Jul (4)	Aug (4)	Sep	Oct (4)	Nov	Dec (1)
2007	Jan (2)	Feb (8)	Mar (1)	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov	Dec
2008	Jan (1)	Feb	Mar (1)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2009	Jan	Feb	Mar (2)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2011	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2016	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec

htdig-dev Mailing List for ht://Dig (Page 92)

htdig-dev — Developer Discussion for the ht://Dig project