From: Tod O. <to...@uc...> - 2013-03-25 20:50:00
|
PS: I really like the business of vacuuming the SQLlite db to reduce the size. On Mar 25, 2013, at 3:47 PM, Tod Olson <to...@uc...> wrote: > I see the tarball attached to the JIRA just has the jar files, but I don't see the source code changes. Are you able to share those? > > I'm particularly interested in the possibility of returning additional data to the templates at query time. For example, in the call number browse, we will want to return title information. So I'm interested in seeing how you deal with that, ideally make it configurable at runtime. And see if it can be rolled back into the nla-browse-handler base code. > > -Tod > > On Mar 25, 2013, at 3:03 PM, Alan Rykhus <ala...@mn...> wrote: > >> Hello, >> >> I'm back from my meeting and took a look at the code. >> >> What I do in the new code attached to VUFIND-726 >> >> 1. Load the headings into all_headings >> 2. Create an index for the all_headings table - this I added >> 3. Build the headings table from the all_headings table >> I think it worked better once I created the index >> 3-a. Create an index for the headings table >> 4. Drop the all_headings table >> 4-a. Vacuum the database to shrink the file size down. >> >> As I stated earlier, I have in excess of 10 million entries. The file >> size of the title.db is 3.4 GB. I'm not 100% sure, but I think creating >> the index in all_headings solved my issue where things blew up on >> creating the headings table. I did this work in December, its not real >> fresh anymore. >> >> al >> >> On Mon, 2013-03-25 at 19:14 +0000, Filipe MS Bento (UA) wrote: >>> Many thansk, Al! >>> >>> Filipe >>> >>> >>> On Mon, Mar 25, 2013 at 6:24 PM, Alan Rykhus <ala...@mn...> >>> wrote: >>> Hello, >>> >>> With the modifications I made to enable consortial browsing my >>> title.db >>> has in excess of 10 million entries. I did do some >>> modifications in the >>> building of the SQLite database. I was seeing some errors when >>> I was >>> working on it. I need to refresh my memory. >>> >>> I have a meeting right now, I'll look at my notes afterwards. >>> >>> al >>> >>> On Mon, 2013-03-25 at 18:08 +0000, Tod Olson wrote: >>>> The fixed SQLlite was supposed to have been released on >>> 12/12/2012, >>>> and I think it was. IIRC, to fix locally we had to: >>>> >>>> >>>> 1. compile the fixed SQLlite >>>> 2. compile sqlitejdbc against the fixed SQLite >>>> 3. plug the resulting sqlitejdbc JAR file into >>> nla-browse-handler/libs >>>> and rebuild browse-handler.jar and browse-indexer.jar >>>> >>>> >>>> Below is the last email I find from Mark on the matter >>>> >>>> >>>> On Jan 24, 2013, at 6:51 PM, Mark Triggs >>> <ma...@di...> wrote: >>>> >>>>> Just a note that I haven't forgotten about this, but I >>> need to find >>>>> some >>>>> time to have a think about it. I tried the official >>> SQLiteJDBC >>>>> build, >>>>> but its bundled Linux build of SQLite requires a glibc >>> version >>>>> that's >>>>> newer than what ships with my (and, I suspect, other >>> people's) >>>>> distro. >>>>> >>>>> I'm not sure whether I'll need to do a custom build, or >>> whether we >>>>> can >>>>> just tell people to use the latest SQLiteJDBC as a drop-in >>>>> replacement >>>>> if they hit problems on Solaris. More investigation >>> needed :) >>>> >>>> >>>> -Tod >>>> >>>> >>>> On Mar 25, 2013, at 12:42 PM, "Filipe MS Bento (UA)" >>> <fs...@ua...> >>>> wrote: >>>> >>>>> Yup, Demian, that is the idea I’ve retained (about that >>> bug in >>>>> SQLite). >>>>> >>>>> I have the latest, at least for CentOS 6.x: >>>>> >>>>> sqlite-3.6.20-1.el6.x86_64 >>>>> >>>>> All the best and thanks, >>>>> >>>>> Filipe >>>>> >>>>> From: Demian Katz [mailto:dem...@vi...] >>>>> Sent: segunda-feira, 25 de Março de 2013 17:36 >>>>> To: Filipe MS Bento (UA); Tod Olson; vufind-tech Tech; >>> Jochen >>>>> Lienhard >>>>> Cc: ma...@di...; ala...@mn... >>>>> Subject: RE: Large browse indexes: SQLite limits?) RE: >>> [VuFind-Tech] >>>>> AlphaBrowse >>>>> >>>>> If memory serves, some of the problems people were >>> encountering had >>>>> to do with a bug in SQLite which was fixed in a newer >>> version… but >>>>> that newer version was difficult to compile on certain >>> platforms. >>>>> What version are you currently running? Have you tried >>> upgrading to >>>>> see if that changes anything? >>>>> >>>>> - Demian >>>>> >>>>> From: Filipe MS Bento (UA) [mailto:fs...@ua...] >>>>> Sent: Monday, March 25, 2013 1:20 PM >>>>> To: Tod Olson; Demian Katz; vufind-tech Tech; Jochen >>> Lienhard >>>>> Cc: ma...@di...; ala...@mn... >>>>> Subject: (Re: Large browse indexes: SQLite limits?) RE: >>>>> [VuFind-Tech] AlphaBrowse >>>>> >>>>> Hi all! >>>>> >>>>> In a somewhat related matter, Tod, Mark, Demian, Al et >>> al., do you >>>>> have any news about this: >>>>> >>>>> >>> http://vufind.2307425.n4.nabble.com/Large-browse-indexes-SQLite-limits-td4654998.html >>>>> >>>>> I've been suffering from the same problem, as I have 7.5 >>> million >>>>> records, all of them with topics, author, and also a >>> generous >>>>> authority DB (SOLR core). >>>>> >>>>> It's been a while since I tried to find a solution, but I >>> remember >>>>> to pull the latest version of index-alphabetic-browse.sh >>> from vufind >>>>> svn and remember that the original error was: >>>>> >>>>> + sort -T /var/tmp -u $'-t\001' -k1 >>> title.tmp -o >>>>> sorted-title.tmp >>>>> sort: write failed: /var/tmp/sortSueUsl: >>> No space >>>>> left on device >>>>> >>>>> and the solution I've tried was to map /var/temp to a >>> volume with >>>>> lots of space (300GB, instead of the 8GB from /): >>>>> >>>>> mkdir /DATA/var_tmp/ >>>>> >>>>> >>>>> >>>>> chmod 1777 /DATA/var_tmp/ >>>>> >>>>> >>>>> vim ./index-alphabetic-browse.sh >>>>> >>>>> >>>>> >>>>> (...) >>>>> >>>>> >>>>> >>>>> set -e >>>>> >>>>> set -x >>>>> >>>>> >>>>> >>>>> export TMPDIR=/DATA/var_tmp/ >>>>> >>>>> >>>>> >>>>> cd "`dirname $0`/import" >>>>> >>>>> >>>>> >>>>> (...) >>>>> >>>>> >>>>> >>>>> sort -T /DATA/var_tmp/ -u -t >>>>> $'\1' -k1 "${browse}.tmp" -o "sorted-${browse}.tmp" >>>>> >>>>> >>>>> >>>>> (instead of /var/tmp) >>>>> >>>>> >>>>> The “No space left on device” error is history, but had a >>> new: >>>>> >>>>> java.sql.SQLException: disk I/O error sort error >>>>> >>>>> >>>>> >>>>> which seems to be related to the 2GB limit, even in a >>> 64bits OS: >>>>> >>>>> >>>>> I’m altready excluding hierarchy, title and lcc (which we >>> don’t >>>>> use): >>>>> >>>>> #build_browse "hierarchy" "hierarchy_browse" >>>>> #build_browse "title" "title_fullStr" 1 >>> "-Dbibleech=StoredFieldLeech >>>>> -Dsortfield=title_sort -Dvaluefield=title_fullStr" >>>>> build_browse "topic" "topic_browse" >>>>> build_browse "author" "author_browse" >>>>> # build_browse "lcc" "callnumber-a" 1 >>>>> build_browse "dewey" "dewey-raw" 1 >>> "-Dbibleech=StoredFieldLeech >>>>> -Dsortfield=dewey-sort -Dvaluefield=dewey-raw" >>>>> >>>>> but no luck, CreateBrowseSQLite.java doesn’t cope with it: >>>>> >>>>> + sort -T /DATA/var_tmp/ -u $'-t\001' -k1 topic.tmp -o >>>>> sorted-topic.tmp >>>>> + java -Dfile.encoding=UTF-8 -cp >>> 'browse-indexing.jar:../solr/lib/*' >>>>> CreateBrowseSQLite sorted-topic.tmp topic_browse.db >>>>> Exception in thread "main" java.sql.SQLException: disk I/O >>> error >>>>> at org.sqlite.DB.execute(DB.java:275) >>>>> at org.sqlite.DB.executeUpdate(DB.java:281) >>>>> at org.sqlite.Stmt.executeUpdate(Stmt.java:103) >>>>> at >>>>> >>> CreateBrowseSQLite.buildOrderedTables(CreateBrowseSQLite.java:115) >>>>> at >>> CreateBrowseSQLite.create(CreateBrowseSQLite.java:139) >>>>> at >>> CreateBrowseSQLite.main(CreateBrowseSQLite.java:154) >>>>> >>>>> so this strange >>>>> behavior: >>> http://iia.web.ua.pt/AlphaBrowse/Results?source=topic&from=nanotubes (no matter the term entered, it always shows the first page [with lots of garbage, but that’s because they are at records’ topic) >>>>> >>>>> Thank you in advance for any news you might share about >>> this, >>>>> >>>>> Filipe >>>>> >>>>> -------------------- >>>>> Filipe Manuel S. Bento | http://about.filipebento.pt/ >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Tod Olson [mailto:to...@uc...] >>>>> Sent: segunda-feira, 25 de Março de 2013 14:00 >>>>> To: Demian Katz; vufind-tech Tech; Jochen Lienhard >>>>> Subject: Re: [VuFind-Tech] AlphaBrowse >>>>> >>>>> My TODO list includes some more alpha browse work, so I'll >>> add a few >>>>> things that I have discovered: >>>>> >>>>> Relational databases are really good for ordered indexes, >>> in a way >>>>> that text indexing systems like Solr/Lucene. Completely >>> different >>>>> models for different tasks. >>>>> >>>>> If you try to switch to Postgres, you may find that you >>> need to >>>>> re-write some of the SQL. But you would also need to >>> rework how a >>>>> new index is created and swapped in. With SQLlite, there >>> is a >>>>> separate database file for each index. Do an "ls >>>>> solr/alphabetical_browse/" to see what I mean. Every time >>> you run >>>>> index-alphabetic-browse.sh, a new copy of, for example, >>>>> title_browse.db-updated is created is a different space. >>> If there >>>>> are no errors it is moved into solr/alphabetical_browse/ >>> and a file >>>>> called title_browse.db-ready is created. The next time >>> someone >>>>> queries the browse, the browse-handler notices that there >>> is a new >>>>> title_browse.db, and swaps it in. >>>>> >>>>> The downsides to this are that updating the browse indexes >>> is >>>>> currently tightly coupled to SQLlite, and the first read >>> of the new >>>>> index will be slow as the file must be read. The upside is >>> that >>>>> there is absolutely no DB administrative overhead. No new >>> DB to set >>>>> up or tables to administer or database to optimize, ever. >>>>> >>>>> I do know what you mean about slowness, at least on the >>> initial read >>>>> of a new browse index. Our full-scale test has 10GB of >>> alphabetic >>>>> browse indexes, and title_browse.db is 4.9GB. So the first >>> query >>>>> always times out because it takes awhile for the initial >>> read. But >>>>> we've found that queries after that are fast. So the >>> solution would >>>>> be to re-create the indexes at a low-use time and >>> automatically send >>>>> a warming query to the browse-handler. Which we will also >>> want to do >>>>> for Solr indexes. >>>>> >>>>> I'm sort of interested in adding MySQL support for >>> AlphaBrowse, but >>>>> it's not a high priority yet. It's not clear when or if >>> the benefits >>>>> would be worthwhile. >>>>> >>>>> Best, >>>>> >>>>> -Tod >>>>> >>>>> Tod Olson <to...@uc...> >>>>> Systems Librarian >>>>> University of Chicago Library >>>>> >>>>> >>>>> >>>>> On Mar 25, 2013, at 7:55 AM, Demian Katz >>> <dem...@vi...> >>>>> wrote: >>>>> >>>>>> The AlphaBrowse feature consists of two parts: an >>> indexer that >>>>> builds a database, and a Solr request handler that looks >>> up results >>>>> within the database. The database was chosen as an easy >>> way of >>>>> accessing an arbitrary point within a pageable list, and >>> SQLite was >>>>> specifically selected as a lightweight, stand-alone >>> option. >>>>>> >>>>>> I suspect that the code could be adjusted to use a >>> different >>>>> database by changing the JDBC calls and recompiling; you >>> can find >>>>> the source and documentation here: >>>>>> >>>>>> https://github.com/marktriggs/nla-browse-handler >>>>>> >>>>>> I'm also copying Mark Triggs (the author of the handler) >>> on this >>>>> email in case he has any additional comments... though >>> he's busy >>>>> working on the ArchivesSpace project right now and >>> probably hasn't >>>>> thought about this thing in at least a few months! >>>>>> >>>>>> - Demian >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Jochen Lienhard >>> [mailto:lie...@ub...] >>>>>>> Sent: Monday, March 25, 2013 3:38 AM >>>>>>> To: vufind-tech Tech >>>>>>> Subject: [VuFind-Tech] AlphaBrowse >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> who of you is using the AlphaBrowse feature of vufind. >>>>>>> >>>>>>> I have some questions about how it works. >>>>>>> >>>>>>> Can I replace the database sqlite by postgres? >>>>>>> Why do I need a database? >>>>>>> >>>>>>> I saw it at villanova, but it seems to work very slow. >>>>>>> >>>>>>> Greetings from Germany >>>>>>> >>>>>>> Jochen >>>>>>> >>>>>>> -- >>>>>>> Dr. rer. nat. Jochen Lienhard >>>>>>> Dezernat EDV >>>>>>> >>>>>>> Albert-Ludwigs-Universität Freiburg >>>>>>> Universitätsbibliothek >>>>>>> Rempartstr. 10-16 | Postfach 1629 >>>>>>> 79098 Freiburg | 79016 Freiburg >>>>>>> >>>>>>> Telefon: +49 761 203-3908 >>>>>>> E-Mail: lie...@ub... >>>>>>> Internet: www.ub.uni-freiburg.de >>>>>>> >>>>>>> >>>>>>> >>>>> >>> --------------------------------------------------------------------- >>>>>>> --------- Everyone hates slow websites. So do we. >>>>>>> Make your web apps faster with AppDynamics Download >>> AppDynamics >>>>> Lite >>>>>>> for free today: >>>>>>> http://p.sf.net/sfu/appdyn_d2d_mar >>>>>>> _______________________________________________ >>>>>>> Vufind-tech mailing list >>>>>>> Vuf...@li... >>>>>>> >>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>>> >>>>>> >>>>> >>> ---------------------------------------------------------------------- >>>>>> -------- Everyone hates slow websites. So do we. >>>>>> Make your web apps faster with AppDynamics Download >>> AppDynamics >>>>> Lite >>>>>> for free today: >>>>>> http://p.sf.net/sfu/appdyn_d2d_mar >>>>>> _______________________________________________ >>>>>> Vufind-tech mailing list >>>>>> Vuf...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>>> >>>>> >>>>> >>> ------------------------------------------------------------------------------ >>>>> Everyone hates slow websites. So do we. >>>>> Make your web apps faster with AppDynamics Download >>> AppDynamics Lite >>>>> for free today: >>>>> http://p.sf.net/sfu/appdyn_d2d_mar >>>>> _______________________________________________ >>>>> Vufind-tech mailing list >>>>> Vuf...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>>> >>>> >>> >>> >>> -- >>> Alan Rykhus >>> PALS, A Program of the Minnesota State Colleges and >>> Universities >>> (507)389-1975 >>> ala...@mn... >>> "Be pleasant until ten o'clock in the morning and the rest of >>> the day >>> will take care of itself." ~ Elbert Hubbard >>> >>> >>> ------------------------------------------------------------------------------ >>> Everyone hates slow websites. So do we. >>> Make your web apps faster with AppDynamics >>> Download AppDynamics Lite for free today: >>> http://p.sf.net/sfu/appdyn_d2d_mar >>> _______________________________________________ >>> Vufind-tech mailing list >>> Vuf...@li... >>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >>> >>> >>> >> >> -- >> Alan Rykhus >> PALS, A Program of the Minnesota State Colleges and Universities >> (507)389-1975 >> ala...@mn... >> "Be pleasant until ten o'clock in the morning and the rest of the day >> will take care of itself." ~ Elbert Hubbard >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_mar >> _______________________________________________ >> Vufind-tech mailing list >> Vuf...@li... >> https://lists.sourceforge.net/lists/listinfo/vufind-tech > |