Menu

Batch Operations

2007-02-11
2013-03-08
  • Kaloyan K. Tsvetkov

    On the `code examples` page you have one example in which you delete a set of records using iteration. This seems like a hack to me -- imagine you have a set of 200 records, and if you try to delete each one of them individually, this means you will spawn 200 queries. It is much better to use `batch` operations, where with one query you delete all the records that you want. Are you planning support for such batch opperations - delete and update ?

     
    • singleobject

      singleobject - 2007-02-12

      Hello again, and thanks.

      Singleobject is still in a rudimentary form, and it does for sure generate a lot of overhead on the database: at the beginning it maps the whole database, and when it runs it uses SQL queries in an "atomical" way (meaning non-structured at all). Even the fetch() method cannot fetch more than object at a time.

      So anyways, currently the code really lacks support for "structured" queries, which - you might say - turns the RDBMS into a semi-dumb data storage (but - hey - even mysql put an (r)dbms on top of flat-files! :D ).

      While speculating about this, though, I thought that in web applications multiple-selections are usually achieved with check-boxes (where you select the ID of an item), and there are really a few situations where you need to do: DELETE FROM table WHERE field = 'word'.

      Additionally I'm not really sure whether 200, or maybe 20.000, DELETE operations carried out by ID (primary key) would perform worse than having to hit the same amount of records using the where clause on a non-indexed field.

      However, this was surely one of the limits I encountered when doing things alone, with simplicity in mind.

      I'd like to hear your critics and opinions about the whole thing.

      Thank you.

       
    • singleobject

      singleobject - 2007-02-12

      Hi, to continue a bit...

      The question brings up the problematic aspect of OOP that I tried to defeat: writing many classes and many methods.

      I've seen software projects with 10, 50, 100+ classes declared, some using inheritance, some not, and all with many methods, interfaces and whatnot.

      It's surely just my humble opinion, but there's nothing very "abstract" about this kind of code.

      My goal was to make something as general as possible. Perhaps the choice of the language was unfortunate, or I was just trying to do something too "abstract".

      It's just an idea.

       
    • Kaloyan K. Tsvetkov

      | While speculating about this, though, I thought that in web
      | applications multiple-selections are usually achieved with
      | check-boxes (where you select the ID of an item), and there
      | are really a few situations where you need to do: DELETE
      | FROM table WHERE field = 'word'.

      I was thinking more of a `DELETE FROM table WHERE field IN (...)`, where `field` is the primary key, and the set is the list of values for it: so in this way with one query you delete the whole set of records (no matter how they are gathered - from a SELECT query, or from a checkbox list).

      | Additionally I'm not really sure whether 200, or
      | maybe 20.000, DELETE operations carried out by ID
      | (primary key) would perform worse than having to
      | hit the same amount of records using the where
      | clause on a non-indexed field.

      As I said, I thought something in terms of `DELETE FROM table WHERE id IN (1,2,3,4,5)`; so we are do deleting records based on values from an indexed column(the primary key).

      Each query will open a socket to perform the "query", so in general I consider this to be one of the bottlenecks (if not using persistent connections). Anyway, the best way to check this is to benchmark it. I am not sure, but I think I did such a test awhile ago, and the batch operations were indeed faster.

       

Log in to post a comment.

MongoDB Logo MongoDB