Menu

bibkeeper fails to open "large" bib file

Help
2003-07-08
2003-07-09
  • James Haefner

    James Haefner - 2003-07-08

    Hi,
    New to bibkeeper, but not bibtex.  I can open small bib files fine, but fail on files >2meg and 4000 refs.  Is it me?

    Here's my java info:

    java version "1.4.0"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
    Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)

     
    • Morten Omholt Alver

      Hi,

      do you get an error message? If the program just appears to freeze, I wonder if it is just taking a very long time. The parser is slow when working on files with many short lines. We'll try to correct this problem.

      If the file eventually loads, try to save it to another file; this file should be very quick to load compared to the original one.

      If you get some kind of error message, it might be helpful if you could direct me to one of the bases that won't load.

      -- Morten

       
      • James Haefner

        James Haefner - 2003-07-08

        no error messages; I'll let it run as long as it wants to see if it is just time.  I did observe that the bibkpr window blanks out and gets overwritten by other windows, which suggested to me that it wasn't just taking a long time.  i'll also try creating some medium big files to get a better sense of speed.

        jim

         
    • James Haefner

      James Haefner - 2003-07-08

      I ran a couple of tests for loading time:
      No. Refs---------Load Seconds
      258                     2
      458                    75

      I'm afraid to wait for my real database of 4580 refs to load...

      System:  Redht8.0 linux, Gnome, Sawfish WM. Java1.4
      1GHz Athlon with 1G RAM

      Hope you can see how to speed it up,
      jim

       
      • Morten Omholt Alver

        The problem is certainly with the parsing algorithm, though I can't see why it should climb exponentially like that.

        Luckily, a complete replacement for the parser will be ready soon (I can't promise _how_ soon), and I'm fairly sure it'll solve the problem.

        In the meantime, a file produced by Bibkeeper will be loaded much quicker, so you could split a large database into several smaller ones, open one of them in Bibkeeper, merge with the others one by one (File -> Merge with database), and then save.

        I just did a test with the well-known xampl.bib file, merged it with itself until I had a base with over  9200 entries (2.2 MB), saved it, and it loaded in about 5 seconds.

         
    • James Haefner

      James Haefner - 2003-07-09

      here is something to consider. I currently use the sixpack bib db manager.  It has 1 field you don't support called "file", which I use to store the number of the physical copy I keep in my files.  Also sixpack does not have a field called "search".  (I didn't see that referenced in the help; what is the intended purpose?)  I see when I save a sixpack bib in bibkeeper, you strip the "file" field. Probably shouldn't be done silently...

      Neither seems a likely  candidate for a speed problem.

      I also saw the speed-up after saving w/in bibkeeper.  Since both the input and the saved versions were ascii bibtex files that were pretty similar (more white space in the sixpack bibtex version), why the speedup, (or slow-down)?

      Jim H.

       
      • Morten Omholt Alver

        The search field is a search score which is set when you make a search. The database is then sorted according to this field. If you make the field visible, you can see the number of hits within each entry. The search field is not saved.

        The 'file' field is something we'll consider adding. And, I agree that we need to consider the treatment of unknown field types. The unknown field will actually be read along with the others, but not saved. So just a small change in the program could ensure it gets written again.

        The speed-up is caused by the removal of line breaks. The parser reads line by line, and a file saved from Bibkeeper will more or less have each field in one line, making it faster to read. I still can't see why the difference is so big, though. The new parser will operate differently, not line-by-line.

        BTW, thanks for pointing out these things!

         

Log in to post a comment.

MongoDB Logo MongoDB