From: Chanel W. <Cha...@ya...> - 2013-04-03 17:16:55
|
This is excellent information. Thanks Demian. I'm going to have to ponder on it for a while longer to figure out the best long term solution. Current thoughts: The driver method just isn't going to work because Symphony's web services doesn't support date range searching. I'm heartened to see that Vufind's change tracking is more intelligent than I realized. We'll still run into an issue if/when we switch ILSes (which we're trying to make a seamless transition for our patrons). Currently we're using Symphony's catalog key as the match point for VuFind because it's the only thing in Symphony can be relied on. When we switch ILSes, we'll have to match on a new key which means dumping all the accumulated change tracking data. It's not necessarily a game ender but I'm going to have to compare pros/cons against separately indexing Symphony's internal record creation date. Thanks! chanel From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, April 03, 2013 7:26 AM To: Chanel Wheeler; vufind-tech (vuf...@li...) Subject: RE: pondering on new items A couple of notes: 1.) As you say, the driver method for New Items has a paging mechanism built in, which limits the number of results returned. However, it's not quite as limiting as it looks - the code was originally designed to retrieve just a page at a time, but because this broke faceting, it was later changed to retrieve a larger number. It's still not comprehensive if you have thousands and thousands of new items within a particular range, but it's often good enough. Speed issues are a legitimate concern, though. I don't think you have to worry about a threshold on the VuFind end, though - there is a maximum number of IDs that can be retrieved from Solr simultaneously (the dreaded maxBooleanClauses limit) but VuFind includes a safety to prevent this from becoming a fatal error, and you can raise the limit if you need to. All of this is not to say that doing the driver-based new item method is great... but it may not be quite as bad as it looks. 2.) The way VuFind's change tracking functionality works (when enabled) is that it uses a MySQL table independent of the Solr index to keep track of when each record was first encountered and when it was subsequently changed. It detects changes by looking at various elements within the MARC record. Reindexing every night won't interfere with this - even when you rebuild the whole Solr index, the MySQL data persists. The purpose of this is to detect records that are new to your catalog, regardless of the age of their cataloging. As you say, a small edit in a 5 year old record may cause it to appear to have changed... but that doesn't matter if you only look at the "created" date. Hopefully that clarifies things rather than muddying the waters... but please let me know if you still have questions. Assuming that the change tracking could suit your needs, I'd recommend starting to track changes as soon as possible so that when you go into production, you actually have a long history to work with (since obviously, the first time you index after enabling change tracking, everything is going to appear to be brand new). There is a ticket in JIRA to use the change tracking fields instead of the ILS driver for new items, but it has a couple of issues that need to be resolved and has not been ported over to 2.0 yet. If you want to move forward with this, perhaps you could take a look at it... or else I can make it a higher priority to resolve on this end (it's something that I want to see finished; I just haven't found time for it yet). - Demian From: Chanel Wheeler [mailto:Cha...@ya...] Sent: Tuesday, April 02, 2013 4:33 PM To: vufind-tech (vuf...@li...<mailto:vuf...@li...>) Subject: [VuFind-Tech] pondering on new items I started working on the driver functions for New Items and realized that it's not going to achieve what I want. It's really, really important to me that New Items allow the user to take advantage of the facets in the same way they would with a search. So, for example, if on the front page of our VuFind installation you just hit the Find button without a search term, you get facets for 600,000+ records but only the first 20 of those display. I want the same type of functionality for New Items. (I should add that I define "new items" as those which cause a bib record to be created. It's not perfect since AACR2 rules cause a multitude of records to get created for the same title but without FRBR it's the best I can do.) Looking at New Items, it's designed to get a page-worth of listings from the driver without facet awareness of all the new items. (I'm also severely limited on the driver side because Symphony's web services don't do everything yet.) The only alternative I see using this method is to return all items instead of a page-worth. I know that 30 days-worth of new books alone approaches 3,000 items. The search is just too slow and I expect I'll hit some sort of threshold on the VuFind end. There's been talk (in 1.4, I think) of doing the search for new items in the Solr database instead of using the driver. From what I could dig up in the email archives that approach tracks the date when a record is created/updated in Symphony. That would seem to have a number of problems if I understand it correctly. For one, we load the entire catalog every night (at least for now) so every record would be the same age. Even if we only updated records which have changed, a MARC edit in a 5 year old record would have it seeming that it's new. Then I started thinking about the 005 field. Sampling 005s in our ILS, it appears that it maintains the date that was in the original OCLC record (as opposed to creating the date upon entry into our database). I found a record with 2002 date in the 005 that had been imported into our ILS for the first time just a month ago. Stymied again! This leaves me grasping at the crazy straws. I'm starting to think about what if I inject the record creation date in the ILS into some random MARC field then index that when I do a VuFind import. Then New Items could effectively do a regular search that uses a date range instead of a title, etc. That's a lot of work. Am I making any wrong assumptions as I toss aside potential solutions? Is there an easier way to accomplish the same end? Thanks! chanel Chanel Wheeler Library Network Programmer/Analyst Yavapai Library Network 1120 Commerce Dr. Prescott, AZ 86305 Phone: (928) 442-5741 cha...@ya...<mailto:cha...@ya...> Open a help desk ticket<mailto:he...@yl...> |