Re: [VuFind-Tech] SOLR Search Refactored, pt. 2

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> Somewhat related: Maybe someone can supply WorldCat/Summon response data for
> predefined searches/operations? Could come in handy for unit testing. This
> also applies to ILS drivers.

I've been thinking about this myself.  We should probably fictionalize the data within the responses to avoid any copyright concerns.  Just let me know when you're ready for data and exactly what you need and I'll put something together.

> VuFind\Service makes sense. Quick from top of my head: Hierachically we have:
> 
> - HTTP bases services
>   - SOLR
>   - WorldCat
>   - Summon

For consistency with Zend naming, I would leave the HTTP client outside of VuFind\Service; I think the idea is that Zend\Http is a generic HTTP library, while Zend\Service\* contains code that connects to specific services.  But other than that, I think we're on the same page.

> Why not a service that creates a HTTP client for accessing a particular URL
> and uses a set of rules to set up timeouts and proxy?
> 
> VuFind\Service\Http\HttpManager::createClient(URL) =>
> VuFind\Service\Http\HttpClient
> 
> This centralized service could also take care to set the connect and response
> timeout to, say, 1 seconds less than the configured maximum execution time if
> not configured otherwise. For the rules we might start with something like an
> associative array with a regexp as key and proxy/timeout settings as value. If
> the regexp matches, then the client is created with the respective
> proxy/timeout settings.

I definitely like the idea of setting the timeout to be less than maximum execution time by default...  but we don't necessarily need a factory service to accomplish that, since we could also just default it in the \VuFind\Http\Client constructor in a similar way to the current proxy handling.  Of course, we could do it both ways -- address the timeout setting in the Client constructor but create a Manager that allows objects to be created with a more compact/convenient syntax.  It might also be worth investigating integration with the ZF2 ServiceManager -- it's on my to-do list to learn more about how the ServiceManager works and whether we can use it to make the code more dependency-injection-friendly (i.e. to eliminate some of the static stuff that's currently going on).

I have mixed feelings about the idea of determining timeouts based on regexes.  Although that makes for a simple, centralized design, I don't think it maps well to the user-friendly configuration we currently have.  If a user wants to adjust a timeout configuration for Solr, it makes more sense for them to edit something in the [Index] section of config.ini rather than to configure the Solr URL in a [Timeouts] section (for example).  If a Solr URL changes, having timeouts explicitly linked to that URL could cause problems (easy to forget to update both places).  Obviously, you could keep the current configuration and map it to the new underlying system, but then you get Solr-specific code in what should be generic HTTP-related code, and things get ugly from there.  I'd be inclined to keep things in such a way that each module has its own timeout configuration and simply passes it to the constructor or factory.  Of course, if you want to create a mechanism for configuring timeouts based on URL patterns, that could be useful as a way of establishing defaults if no explicit value is passed in.

> Wrt the timeout settings there's another idea lingering: A centralized way to
> handle HTTP timeouts. The least you should do is to log (`connection timed
> out', `took to long to response'); if you use external web services you could
> also simply continue (instead of showing the error page) if the requested
> external service is not essential. Or implement a core-switching SOLR
> connection (something requested during Leipzig Conference by someone from the
> FINC team,
> IIRC): If connection to the main SOLR service failed with a timeout, switch to
> a backup server.

Couldn't this be addressed by ensuring that the HTTP client throws a specific exception for timeouts?  This way, the calling code could catch the exception and act accordingly (ignore it, try a different Solr core, etc.)...  or it could be allowed to pass through and result in a standard error page if there is no better action to take.

> No, there is no reason /for/ the BC break. I wonder if there's a reason to
> /keep/ this work-around for a change in VuFind's configuration file format
> that happend > ~2.5 years ago on a Friday, 13th (r1825)? Instead of working
> around the possible confusion it looks easier to be to (a) detect a possible
> misconfiguration and (b) tell the user to fix the configuration in the update
> script.

The reason I'm such a stickler for backward compatibility is that links and bookmarks tend to linger forever.  The librarians here like to link to canned VuFind searches in their course and topic guides, and key elements of these guides don't change much over time, so there's a good chance that there are still saved VuFind searches in old formats.  I'd rather make sure that those links continue to work than have to worry about hunting down dozens or hundreds of bad links and/or explaining to the users why their links are broken.

Obviously, if fixing the configuration were an option, that would be preferable...  but you can't fix links that are out in the wild.  Obviously, sometimes you just have to let go of something and take the consequences...  but in this particular case, I don't think allowing case-insensitive search types is such a bad thing to maintain in the interest of convenience.

Another issue is that some users have requested that all VuFind URLs be entirely case-insensitive.  I'm not exactly sure why this is desirable (the main use case I can think of is that it makes it easier to type URLs from memory since you don't have to remember exactly which words are capitalized), but given that there is interest in it, having case-insensitive search types makes it easier to achieve.

> DismaxQuery and MungeQuery both provide a __toString() and a `stripField()'
> method.
> 
> Not sure if it is worth trying this, need to stare at the code for some time
> longer.

Hmmm... an interesting thought, but I agree: I can't come up with an opinion on the best approach without staring at the code.

thanks,
Demian