From: Keith C. <kei...@ya...> - 2002-10-02 23:22:12
|
Hi Gilles, Jake's application looks like it's BroadVision --- and that is how the system maintains session. Most BroadVision systems will dump you to an error page or give you "session expired" messages without the BV_SessionID/BV_EngineID strings. The only way I know to do this is to grab a session/engine ID from the home page of a BroadVision site (and these ID's could be embedded in a form and not on the URL) then have this "fresh" session info inserted into the links that ht://Dig presents on the search results page. I don't know of any other way to do it --- if it could be done, it would be great to have an alternative search facility that will work with a BroadVision site. I'm in the thick of debugging a Verity search issue on a BV site right now. )-: =====Keith --- Gilles Detillieux <gr...@sc...> wrote: > According to Jake Baillie: > > I have an evil application that's inserting a session ID (yadda, yadda, > > we've heard it all before). > > > > So, I put together a rewrite rule: > > > > url_rewrite_rules: (.*)\\?BV_SessionID=(.*)\\&(.*) \\1?\\3 > > > > Now, I'm actually using htdig not as a search engine here, but merely a > > spider. I'm using the -t option to output a text list of URLs, and I'm > > going to take that list and do something else with it. > > > > What I want to happen is: > > > > > http://www.domain.com/something.jsp?BV_SessionID=24324234234&other=yadda¶mater=stupid > > > > to rewrite to: > > > > http://www.domain.com/something.jsp?other=yadda¶meter=stupid > > > > when it enters the database (and writes that db.log text file). This is > > happening, as it stands, with my rule above. When I do htdig -vvv, I can > > see the normalization being done. Good. > > > > The problem - it seems to be taking the links off of the page it retrieves > > > (reading into the anchor tags), and normalizing them too, instead of just > > following them verbatim from the page and translating them later. This is > a > > problem, because the site cannot be traversed without the session id on > the > > line (I know, I didn't design it), but I need it to go away when the page > > is included in the database, because I might have to stop and restart > htdig > > before the site is fully traversed, and the session ids expire after 60 > > minutes. And htdig doesn't know a page is duplicated if the session id is > > different. > > > > See the problem? :) If not, I can clarify. If so, suggestions are > > appreciated. :) > > > > Please hit reply all, as I'm not subscribed to the list. > > OK, this application is a bit more evil than the other session-ID-inserting > applications we've heard all about before. With most of these, the session > ID can be safely omitted before the URL is fetched. Unfortunately, htdig > processes url_rewrite_rules before fetching the URL - it really almost has > to, as it needs to know if this is a new URL or not before fetching it. > > What you're asking for is for htdig to process url_rewrite_rules only > for the purpose of determining if the URL has been visited or not, but > that it keeps the session ID for when it fetches the URL. Even that > won't be good enough, though. > > If I understand correctly, the session ID MUST be there in the URL or you > can't access the document, plus, if the session ID has expired you also > can no longer access the document until you get a fresh session ID. So, > how can you possibly get htsearch to return URLs with a useable session > ID so that the search results actually lead to something you can fetch? > > In your position, I think I'd find the programmer of this evil application > and slap him about the head until he agrees to right something that's more > search-engine-friendly. > > -- > Gilles R. Detillieux E-mail: <gr...@sc...> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > htdig-general mailing list <htd...@li...> > To unsubscribe, send a message to > <htd...@li...> with a subject of unsubscribe > FAQ: http://htdig.sourceforge.net/FAQ.html __________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com |