From: Gilles D. <gr...@sc...> - 2002-09-12 18:19:21
|
According to Stefan Seiz: > I'd appreciate some assistance in solving a problem i have in conjunction > with local_urls in htdig 3.1.6: > > I need to index a site which uses AKAMAI(*) for content storage etc. That > site has tons of urls like this: > > http://a12345.g.akamai.net/7/1234/5678/20020121102215/DB/DXF/DIN/053595.pdf > > All files of such urls are available on the AKAMAI Servers AND on the local > filesystem. The tricky part here is, that the URL contains a dynamic part - > in the above example the string "20020121102215" which is a Timestamp and > can change. > > Now i'd really like htdig to index tese files on the local filesystem to > save some bandwith. Reading the documentation the only way to map http > requests to local-files is "local_urls". My problem is, that it seems i can > NOT use regex or wildcards in "local_urls". > I tried: > http://a12345.g.akamai.net/(.*)/DB/=/path/to/local/DB/ > Whiich obviously dosn't work due to the lack of pattern matching. > > My next try was to use url_rewrite_rules in conjunction with local_urls > which is NOT a solution in my case, because then the REWRITTEN Urls are > stored in the database but I NEED the correct AKAMAi urls in the database so > users get these after searching the site. > > I hope i explained my problem clear enough. > > I'd be really happy if someone has any idea how to work arround the lack of > PATTERNS in local_urls and achieve what i need maybe somwhow differently I can only think of two workarounds, but I fear the first one would be impractical in your case. 1) If you only have to deal with a limited amount of dynamic numbers in URLs, you can make symbolic links of all of them. E.g.: cd /path/to/local ln -s 7 . ln -s 1234 . ln -s 5678 . ln -s 20020121102215 . Then you could use "local_urls: http://a12345.g.akamai.net/=/path/to/local/" and the dynamic part of the URL would then essentially get swallowed up by all the symbolic links you created. Of course, if you're dealing with lots of numbers, this would quickly get out of hand. 2) You can modify Retriever::GetLocal(), in htdig/Retriever.cc, to detect and strip out the dynamic components of the URLs when looking for a match with the stored "prefixes" from local_urls. E.g. instead of using mystrncasecmp(), or after using mystrncasecmp() if it failed, use a function that you write which will do simple wildcard matching for you. Then use "local_urls: http://a12345.g.akamai.net/*/DB/=/path/to/local/DB/" where the "*" would be intercepted by your matching function to swallow as many characters as needed to get a match. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |