Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#32 Speeding up database search

closed-rejected
nobody
None
7
2010-02-16
2006-05-20
Quasimodo
No

Problem:

Database searches are horrible slow on larger Databases.

We are converting a Lidos-DB to OpenBiblio with an
sceduled running PHP-Skript. The DB contains approx.
35K Books.

An opac search need about 1:30 to 2 min. to complete,
which is inacceptable.

Cause:

Using of LEFT JOIN in Database queries is not
recommended as they are slow as hell. Use standard
JOIN instead to speedup searches least 10 times.

Solution:

In classes/BiblioSeachQuery.php find lines

# setting sql join clause
$join = "from biblio left join biblio_copy on
biblio.bibid=biblio_copy.bibid ";

and replace with

# setting sql join clause
$join = "from biblio join biblio_copy on
biblio.bibid=biblio_copy.bibid ";

Result:

Doing this speeded up the search to below 1 second,
which is quite fair for a large db. No noticealbe
effects in listing, so left join seems to be needless.

There are more left join, which should be

Discussion

  • Quasimodo
    Quasimodo
    2006-05-20

    Logged In: YES
    user_id=1319962

    con't:
    tested, if they are compatible with simple joins.

     
  • Quasimodo
    Quasimodo
    2006-05-20

    • priority: 5 --> 7
     
  • Micah Stetson
    Micah Stetson
    2006-05-20

    Logged In: YES
    user_id=911235

    Does anything else seem slow? You're the first that I know
    of to load that many items into OpenBiblio.

    The one problem with making it a straight join is that you
    can't find items that don't have copies. For finding items
    to check out, it doesn't matter. But if you catalog an item
    and get interrupted before you add copies, how do you find
    the item again to finish the job?

    I'm surprised the left join is slower, though. The only
    difference between a left join and a straight one, when it's
    done on an indexed field, is that rows are included for the
    index misses. Those rows should actually be faster to
    retrieve than the others, because MySQL only needs to
    consult the index and not the data. Are you sure bibid is
    indexed in biblio_copy? The install SQL should have made it
    that way.

    If the indexes do exist, and it's still slow, and you want
    to be able to find items without copies, you could do one of
    two things: Either lose the join completely and don't
    display copy info in the search results page, or do the join
    yourself (essentially) by loading the copy info in a
    separate query for each item in the results page. Either
    way should work fine. In my libraries, we just don't show
    copy info on the search results page.

    <rant ignorable="yes">
    MySQL bugs me.

    There are three reasons to use an SQL DBMS: 1. It's
    supposedly faster than just using text files. 2. It does
    query optimization for you. 3. Users familiar with SQL can
    do arbitrary queries.

    The first one is false, except for a few cases. This isn't
    the only time I've run into senseless slowness. In times
    past, I've dumped the data from the relevant tables into
    text files, used the Unix tools to do what I wanted with the
    data, and loaded the result back into MySQL. The whole
    operation was far quicker than any SQL query I could come up
    with. It stinks.

    The second one is dubious. Several times, I've had to
    rewrite queries to work around optimizer bugs. You write
    the query the straightforward way, and it hangs forever,
    because it can't figure out what order to do several joins
    in. You split it into two or three queries and create
    explicit temporary tables for intermediate results, and it
    goes fast. In all cases, if I'd just written code to access
    files in a sane way in the first place, it wouldn't have
    been a problem.

    The third one is partially true. Often it's a pain to say
    what you want to say in MySQL's ... um ... interesting SQL
    dialect. But more users can say what they want using MySQL
    than some homebrew query system, no matter how much faster
    it is. This is one of the few reasons I don't dump MySQL
    and homebrew something. (The other big reason is that I
    don't have time.)
    </rant>

    Micah

     
  • Quasimodo
    Quasimodo
    2006-05-21

    Logged In: YES
    user_id=1319962

    Hi mstetson,

    first, i still testing on version 0.5.0 and MySQL 4.1.10a.

    Now to answer your questions:

    - ad 1: Does anything else seem slow?
    No, except for left joined queries, every thing works fine
    for me.

    - ad 2: The one problem with making it a straight join is
    that you can't find items that don't have copies.
    Well, this is not a problem for me, because i don't use the
    admin-pages of openbiblio. I have to run my skript for
    updating the db on a regular basis.
    But, for all the others: Every item should have at least one
    copy. So it should be ok to use the faster solution in opac,
    which won't find item without copies - you can't check them
    out anyway. For the admin pages you could use a seperate
    query to find childless items.

    ad 3: I'm surprised the left join is slower, though.
    Yeah, me too, but i've had this malfunction in another
    selfwritten project. The db was much smaller there, but
    changing this the query speeded up from 45 sec. to less than
    1 sec.
    Seems to be a problem of left join implementation in mysql.
    (I'm running mysql with standard my-medium.cnf)

    ad 4: Are you sure bibid is indexed in biblio_copy?
    yes, musr be, because it's part of the primary key. Btw i've
    used the openbiblio installer to setup, so everything should
    be correct.

    s_bernstein

     
  • Micah Stetson
    Micah Stetson
    2006-05-22

    Logged In: YES
    user_id=911235

    In handling bookings in my custom OpenBiblio, I regularly do
    searches through the booking history with queries like this:

    30,000-row table
    join 130,000-row table
    join 160,000-row table
    join 130,000-row table
    join 160,000-row table
    join 20,000-row table
    left join 125,000-row table
    left join 8,000-row table
    left join 130,000-row table
    left join 160,000-row table

    All the joins are on indexed fields, and the results
    (sometimes in the thousands of rows) are almost instant.
    I've had a number of really annoying problems with MySQL,
    but nothing that's pointed at left joins as the root cause.
    A "biblio left join biblio_copy" for 30,000 items isn't
    anywhere near this complicated. I'm almost certain there's
    something else going on here.

    For the original select query with the left join, what does
    EXPLAIN SELECT ... tell you? What does it say with a
    straight join?

    Micah

     
  • Logged In: NO

    Why would you assume every biblio has a biblio_copy. SOme
    of us use the system not only for admin purposes but as a
    catalog of online resources of which we would not have a
    copy ... just a link in the 856 MARC records. These are
    searchable and retreivable via OPAC. Thanks ... but no
    thanks.

     
  • Logged In: NO

    still work in 0.6.0 version?

     
  • Micah Stetson
    Micah Stetson
    2007-03-21

    Logged In: YES
    user_id=911235
    Originator: NO

    I'm not sure what you're asking. Search works in 0.6.0. We are still using left joins because I'm not convinced they are a problem. You can still change them to straight joins with the same effects.

    Micah

     
  • Micah Stetson
    Micah Stetson
    2010-02-16

    • status: open --> closed-rejected