Re: [VuFind-Tech] Large browse indexes: SQLite limits?) RE: AlphaBrowse

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

The fixed SQLlite was supposed to have been released on 12/12/2012, and I think it was. IIRC, to fix locally we had to:

1. compile the fixed SQLlite
2. compile sqlitejdbc against the fixed SQLite
3. plug the resulting sqlitejdbc JAR file into nla-browse-handler/libs and rebuild browse-handler.jar and browse-indexer.jar

Below is the last email I find from Mark on the matter

On Jan 24, 2013, at 6:51 PM, Mark Triggs <ma...@di...<mailto:ma...@di...>> wrote:

Just a note that I haven't forgotten about this, but I need to find some
time to have a think about it.  I tried the official SQLiteJDBC build,
but its bundled Linux build of SQLite requires a glibc version that's
newer than what ships with my (and, I suspect, other people's) distro.

I'm not sure whether I'll need to do a custom build, or whether we can
just tell people to use the latest SQLiteJDBC as a drop-in replacement
if they hit problems on Solaris.  More investigation needed :)

-Tod

On Mar 25, 2013, at 12:42 PM, "Filipe MS Bento (UA)" <fs...@ua...<mailto:fs...@ua...>>
 wrote:

Yup, Demian, that is the idea I’ve retained (about that bug in SQLite).

I have the latest, at least for CentOS 6.x:

sqlite-3.6.20-1.el6.x86_64

All the best and thanks,

Filipe

From: Demian Katz [mailto:dem...@vi...<http://villanova.edu>]
Sent: segunda-feira, 25 de Março de 2013 17:36
To: Filipe MS Bento (UA); Tod Olson; vufind-tech Tech; Jochen Lienhard
Cc: ma...@di...<mailto:ma...@di...>; ala...@mn...<mailto:ala...@mn...>
Subject: RE: Large browse indexes: SQLite limits?) RE: [VuFind-Tech] AlphaBrowse

If memory serves, some of the problems people were encountering had to do with a bug in SQLite which was fixed in a newer version… but that newer version was difficult to compile on certain platforms.  What version are you currently running?  Have you tried upgrading to see if that changes anything?

- Demian

From: Filipe MS Bento (UA) [mailto:fs...@ua...]
Sent: Monday, March 25, 2013 1:20 PM
To: Tod Olson; Demian Katz; vufind-tech Tech; Jochen Lienhard
Cc: ma...@di...<mailto:ma...@di...>; ala...@mn...<mailto:ala...@mn...>
Subject: (Re: Large browse indexes: SQLite limits?) RE: [VuFind-Tech] AlphaBrowse

Hi all!

In a somewhat related matter, Tod, Mark, Demian, Al et al., do you have any news about this:

http://vufind.2307425.n4.nabble.com/Large-browse-indexes-SQLite-limits-td4654998.html

I've been suffering from the same problem, as I have 7.5 million records, all of them with topics, author, and also a generous authority DB (SOLR core).

It's been a while since I tried to find a solution, but I remember to pull the latest version of index-alphabetic-browse.sh from vufind svn and remember that the original error was:

                + sort -T /var/tmp -u $'-t\001' -k1 title.tmp -o sorted-title.tmp
                sort: write failed: /var/tmp/sortSueUsl: No space left on device

and the solution I've tried was to map /var/temp to a volume with lots of space (300GB, instead of the 8GB from /):

mkdir /DATA/var_tmp/

chmod 1777 /DATA/var_tmp/

vim ./index-alphabetic-browse.sh

(...)

set -e

set -x

export TMPDIR=/DATA/var_tmp/

cd "`dirname $0`/import"

(...)

sort -T /DATA/var_tmp/ -u -t$'\1' -k1 "${browse}.tmp" -o "sorted-${browse}.tmp"

(instead of /var/tmp)

The “No space left on device” error is history, but had a new:

java.sql.SQLException: disk I/O error sort error

which seems to be related to the 2GB limit, even in a 64bits OS:

I’m altready excluding hierarchy, title and lcc (which we don’t use):

#build_browse "hierarchy" "hierarchy_browse"
#build_browse "title" "title_fullStr" 1 "-Dbibleech=StoredFieldLeech -Dsortfield=title_sort -Dvaluefield=title_fullStr"
build_browse "topic" "topic_browse"
build_browse "author" "author_browse"
# build_browse "lcc" "callnumber-a" 1
build_browse "dewey" "dewey-raw" 1 "-Dbibleech=StoredFieldLeech -Dsortfield=dewey-sort -Dvaluefield=dewey-raw"

but no luck, CreateBrowseSQLite.java doesn’t cope with it:

+ sort -T /DATA/var_tmp/ -u $'-t\001' -k1 topic.tmp -o sorted-topic.tmp
+ java -Dfile.encoding=UTF-8 -cp 'browse-indexing.jar:../solr/lib/*' CreateBrowseSQLite sorted-topic.tmp topic_browse.db
Exception in thread "main" java.sql.SQLException: disk I/O error
        at org.sqlite.DB.execute(DB.java:275)
        at org.sqlite.DB.executeUpdate(DB.java:281)
        at org.sqlite.Stmt.executeUpdate(Stmt.java:103)
        at CreateBrowseSQLite.buildOrderedTables(CreateBrowseSQLite.java:115)
        at CreateBrowseSQLite.create(CreateBrowseSQLite.java:139)
        at CreateBrowseSQLite.main(CreateBrowseSQLite.java:154)

so this strange behavior: http://iia.web.ua.pt/AlphaBrowse/Results?source=topic&from=nanotubes (no matter the term entered, it always shows the first page [with lots of garbage, but that’s because they are at records’ topic)

Thank you in advance for any news you might share about this,

Filipe

--------------------
Filipe Manuel S. Bento  |  http://about.filipebento.pt/

-----Original Message-----
From: Tod Olson [mailto:to...@uc...]
Sent: segunda-feira, 25 de Março de 2013 14:00
To: Demian Katz; vufind-tech Tech; Jochen Lienhard
Subject: Re: [VuFind-Tech] AlphaBrowse

My TODO list includes some more alpha browse work, so I'll add a few things that I have discovered:

Relational databases are really good for ordered indexes, in a way that text indexing systems like Solr/Lucene. Completely different models for different tasks.

If you try to switch to Postgres, you may find that you need to re-write some of the SQL. But you would also need to rework how a new index is created and swapped in. With SQLlite, there is a separate database file for each index. Do an "ls solr/alphabetical_browse/" to see what I mean. Every time you run index-alphabetic-browse.sh, a new copy of, for example, title_browse.db-updated is created is a different space. If there are no errors it is moved into solr/alphabetical_browse/ and a file called title_browse.db-ready is created. The next time someone queries the browse, the browse-handler notices that there is a new title_browse.db, and swaps it in.

The downsides to this are that updating the browse indexes is currently tightly coupled to SQLlite, and the first read of the new index will be slow as the file must be read. The upside is that there is absolutely no DB administrative overhead. No new DB to set up or tables to administer or database to optimize, ever.

I do know what you mean about slowness, at least on the initial read of a new browse index. Our full-scale test has 10GB of alphabetic browse indexes, and title_browse.db is 4.9GB. So the first query always times out because it takes awhile for the initial read. But we've found that queries after that are fast. So the solution would be to re-create the indexes at a low-use time and automatically send a warming query to the browse-handler. Which we will also want to do for Solr indexes.

I'm sort of interested in adding MySQL support for AlphaBrowse, but it's not a high priority yet. It's not clear when or if the benefits would be worthwhile.

Best,

-Tod

Tod Olson <to...@uc...<mailto:to...@uc...>>
Systems Librarian
University of Chicago Library

On Mar 25, 2013, at 7:55 AM, Demian Katz <dem...@vi...<mailto:dem...@vi...>>
wrote:

> The AlphaBrowse feature consists of two parts: an indexer that builds a database, and a Solr request handler that looks up results within the database.  The database was chosen as an easy way of accessing an arbitrary point within a pageable list, and SQLite was specifically selected as a lightweight, stand-alone option.
>
> I suspect that the code could be adjusted to use a different database by changing the JDBC calls and recompiling; you can find the source and documentation here:
>
> https://github.com/marktriggs/nla-browse-handler
>
> I'm also copying Mark Triggs (the author of the handler) on this email in case he has any additional comments...  though he's busy working on the ArchivesSpace project right now and probably hasn't thought about this thing in at least a few months!
>
> - Demian
>
>> -----Original Message-----
>> From: Jochen Lienhard [mailto:lie...@ub...]
>> Sent: Monday, March 25, 2013 3:38 AM
>> To: vufind-tech Tech
>> Subject: [VuFind-Tech] AlphaBrowse
>>
>> Hi,
>>
>> who of you is using the AlphaBrowse feature of vufind.
>>
>> I have some questions about how it works.
>>
>> Can I replace the database sqlite by postgres?
>> Why do I need a database?
>>
>> I saw it at villanova, but it seems to work very slow.
>>
>> Greetings from Germany
>>
>> Jochen
>>
>> --
>> Dr. rer. nat. Jochen Lienhard
>> Dezernat EDV
>>
>> Albert-Ludwigs-Universität Freiburg
>> Universitätsbibliothek
>> Rempartstr. 10-16  | Postfach 1629
>> 79098 Freiburg     | 79016 Freiburg
>>
>> Telefon: +49 761 203-3908
>> E-Mail: lie...@ub...<mailto:lie...@ub...>
>> Internet: www.ub.uni-freiburg.de<http://www.ub.uni-freiburg.de/>
>>
>>
>> ---------------------------------------------------------------------
>> --------- Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics Download AppDynamics Lite
>> for free today:
>> http://p.sf.net/sfu/appdyn_d2d_mar
>> _______________________________________________
>> Vufind-tech mailing list
>> Vuf...@li...<mailto:Vuf...@li...>
>> https://lists.sourceforge.net/lists/listinfo/vufind-tech
>
> ----------------------------------------------------------------------
> -------- Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics Download AppDynamics Lite
> for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Vufind-tech mailing list
> Vuf...@li...<mailto:Vuf...@li...>
> https://lists.sourceforge.net/lists/listinfo/vufind-tech

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Vufind-tech mailing list
Vuf...@li...<mailto:Vuf...@li...>
https://lists.sourceforge.net/lists/listinfo/vufind-tech