#220 Empty 'virtroot' not recognized

v1.0
closed-fixed
Andre-Littoz
genxref (49)
7
2012-10-31
2012-10-29
Olivier Duclos
No

When launching genxref with the command './genxref --url=http://lxr.domain.com --allversions' I get the following error:

No matching configuration at ./genxref line 312, <FILETYPE> line 1.

I tried to track down the error and found this a few lines above :

if ($option{'url'}) { # Single 'url'
@config = (1); # Fake list to prevent looping
}

This may prevent looping but the rest of the script needed the data in @config. I tried commenting this passage, but then the script exits without any error and without having generating the database files!

This was on Debian Squeeze with Perl 5.10.1.

Discussion

  • Andre-Littoz
    Andre-Littoz
    2012-10-29

    Hi,

    Your analysis is wrong: inside the loop, access to data in @config (aka. through variable $treedescr) is protected by another "if ($option("url"))" to discriminate between single and multiple URL indexing. It should work as is.

    Check that your --url argument is correct. As I read it, it only contains the target hostname, i.e. without 'virtroot'. The genxref line should read something like:

    ./genxref --url=http://lxr.domain.com/virtroot --allversions

    unless, of course, your 'virtroot' is only '/', meaning the host only contains a single LXR tree (without any other page).

    If you forgot nothing, send your lxr.conf.

    Regards,
    ajl

     
  • Olivier Duclos
    Olivier Duclos
    2012-10-29

    Thanks for your quick reply.

    I should have started from the beginning. My lxr.conf contains this:

    'host_names' => [ 'lxr.sleepycat.fr' ]
    'virtroot' => ''

    When I launch ./genxref --url=//lxr.sleepycat.fr --allversions I get this:
    Can't find config for http://lxr.sleepycat.fr/: ...

    I tried adding 'http://' in the hostname and putting '/' in virtroot but it doesn't change anything. Then I added this line to lxr.conf :

    , 'baseurl' => 'http://lxr.sleepycat.fr'

    and it got me to the problem mentioned above. I'm stuck!

     
  • Andre-Littoz
    Andre-Littoz
    2012-10-29

    I must admit this is the only case I haven't tested in depth.

    Though I'm rather busy professionally speaking, I may have found something in in Config.pm sub _initialize. Target URL is first split into components host and script path. Unhappily, this process assumes there is always a script path after the hostname. If there is none, the regexp captures nothing and the internal hostname is EMPTY. Consequently, this internal host name never matches any in 'host_names'.

    If you have more time than me, try the following:
    in Config.pm, change line 248 from

    $hits += $virtroot =~ s!^/+!/!; # and a single starting /

    to

    $hits += $virtroot =~ s!^/*!/!; # and a single starting /
    (replace + by * inside regexp)

    Explanation: your 'virtroot' contains an empty string and at least a path separator is needed to build the URL path up to the script name.

    Then try the following command:
    ./genxref --url=//lxr.domain.com/ --alversions

    i.e. add a trailing / to your URL.

    If it works, it will confirm the diagnostic and I'll design something clean to cover this case.

    Please report back.

     
  • Olivier Duclos
    Olivier Duclos
    2012-10-29

    Thanks! You're diagnostic was right but incomplete.

    After modifying the regex at line 248, the script continued ($hits == 1 before the test), but I still got the error "Can't find the config for...".

    So I continued debugging. A few lines down there is this test to match URLs :

    if ( $host eq $rt
    && $script_path eq $virtroot
    )

    Here are the values of the variables just be fore the test :

    HOST=http://lxr.sleepycat.fr
    RT=lxr.sleepycat.fr
    SCRIPT_PATH=/
    VIRTROOT=

    So the test failed. To fix the value of $rt, I just prefixed my host_names with '//' in lxr.conf. I guess this is my fault because the doc says you need to put // or http:// when entering a hostname. (But then it's not a hostname anymore!)

    I wasn't sure how to fix the value of $script_path as there are several regex involved. I ended up adding this line after line 232 :

    $script_path = '' if ($script_path eq '/');

    Now all the values match! My sources are being indexed as I write...

     
  • Andre-Littoz
    Andre-Littoz
    2012-10-30

    • summary: genxref crash while reading configuration --> Empty 'virtroot' not recognized
    • assigned_to: nobody --> ajlittoz
    • priority: 5 --> 7
     
  • Andre-Littoz
    Andre-Littoz
    2012-10-30

    Changing name of bug from "genxref crash while reading configuration" to "Empty 'virtroot' not recognized' with higher severity.

    Fix in Config.pm sub _initialize: make sure script path is never empty (should at least contain /), check 'virtroot' is also never empty (contains at least /, internally changed if not)

    Fix doc about 'virtroot' so that it always starts with /.

    @olivier_: word "hostname" in doc may be confusing. What is needed is an URL prefix containing scheme (option), // separator and hostname proper. What would be an unambiguous word for that?

     
  • Andre-Littoz
    Andre-Littoz
    2012-10-31

    • status: open --> closed-fixed
     
  • Andre-Littoz
    Andre-Littoz
    2012-10-31

    Fixed in CVS
    Doc already stated that 'virtroot' should contain at least '/'. Now, this is enforced in Config.pm.