Menu

#249 Improved repository update

open
nobody
None
5
2012-10-09
2011-08-14
No

The repository update in it's current form isn't practicable.
Atm scanning a repository with ~30k files takes : (time output)
real 5:18.22
user 0.54
sys 4.38
And actually scanning the repos takes much longer (in my case >12h).

Discussion

  • fleax

    fleax - 2011-08-19

    Please provide more information about computer used: processor, ram, os, where is repository...

    For example, my repository with almost 40K files, located in a NAS hard drive, takes about 30 minutes to be read. Same amount of files in local hard drive about 10 minutes.

    This time is almost the same with two Linux and Mac machines, both with an i5 processor.

     
  • Misanthropist.Just.Visiting

    Specs:
    Intel Pentium P6100 Dual Core 2GHz/core
    4GB RAM
    NAS with RAID1 connected to WLAN 802.11g router with 100% connectivity, file access over SMB and NFS (my access is NFS).

    The scanning time given above was from "find /mnt/music/ -type f" command. So approximately 5 minutes for that command.
    I tried refreshing the repos after 18k files had been read into the repos and it restarted, now I'm back down to 7k and the refreshing keeps on rereading all the files it already contains in the repos. I tried letting it refresh for 12 hours straight and it never finished.

     
  • Misanthropist.Just.Visiting

    Oh I forgot OS is Ubuntu Lucid

    uname -a
    Linux lapator 2.6.32-33-generic #72-Ubuntu SMP Fri Jul 29 21:07:13 UTC 2011 x86_64 GNU/Linux

     
  • Jan-Martin Ziem

    Jan-Martin Ziem - 2011-09-04

    I'm also interested in this issue. At my side (i7, 500GB SSD + 500GB USB Drive where the most of it is located) a real fullscan of ~18k music files (mp3/ogg) doesn't took more than 5 Minutes and refreshing took 15-20 secs. But IMHO it depends very badly at your operating system and file system caches.
    I think, the find command is the complete wrong method to profile scanning time, because find doesn't look into the files and it just read out the FS tree.

    The other stupid bottleneck could be your nfs server, depending on security restrictions an nfs server (at device side maybe just fired by an low level arm processor). just try to monitor your external disk/share and try to profile the issue against local file hosting.

    To get "real" results, try to fire up your find command and pipe the output through a mp3 tag reader. That should give you "real" results.

     
  • Misanthropist.Just.Visiting

    As you suggested bionix77 I used a modified find command
    find /mnt/music -type f -exec extract {} > /dev/null \;
    and timed it.
    16270 Files, 8:41:33 real, 210.14 user, 264.76 sys
    That's only about half of all files (including pics and stuff) so a full scan would've take ~ 18-19 hours. I concede my expectations were a bit high, but since I first posted aTunes hasn't been able to fully scan the library.

    I cannot make heads or tails from the code and can't seem to find the problem. I think I'll just go ahead and write some JUnit testcases to try and locate the problem.

     
  • Jan-Martin Ziem

    Jan-Martin Ziem - 2011-09-07

    OK, it seems, that your NAS/NFS/SAN is just a bit slower than expected ;)

    But i have to agree to your problem: The system should detect the real (estimated) time to scan or at least stop/pause all timed rescan jobs while the current scan job is running. I'll try to modify that code to solve this issue.

    Just a little question: Why the heck is the load so huge?

     
  • Jan-Martin Ziem

    Jan-Martin Ziem - 2011-09-07

    This issue was solved previously. It doesn't appear in current SVN 4702. I tried to reproduced the behavior with an automatic update each minute, but this was blocked by handler.isLoaderWorking() in the repo update runnable RepositoryAutoRefresher.

     
  • Jan-Martin Ziem

    Jan-Martin Ziem - 2011-09-08

    One last note from my side: I scanned my LaCie Ethernet Disk completly and noticed that the NAS itself is also extremly slow (1 - 10 Mbit while accessing).
    But the process finished within ~1 hour:
    Read repository process DONE (26040 files, 3673.002 seconds, 0,1411 seconds / file)

     
  • fleax

    fleax - 2011-09-12

    I added a change in AudioFile class, to store and work with file paths instead of File object. Seems to improve a little repository read, although I added to reduce memory use.

     
  • Misanthropist.Just.Visiting

    I'll test as soon as I find the time to :) Hopefully the changes will solve the problem.

     
  • Misanthropist.Just.Visiting

    Alright I tested again and I'm not sure if I'm supposed to post a bug report.

    aTunes still doesn't find all music files. Looking at the dependencies I assumed it uses jaudiotagger to read the files. Well I wrote a small program that scans my library recursively, extracts the tags and puts them into a mysql table and the table has ~26k entries while aTunes only has about ~20k.
    In addition the small program I wrote takes about 17 minutes to refresh (ignore paths already in the db). aTunes on the other hand is still scanning and had been scanning a while before the program.

     

Log in to post a comment.

MongoDB Logo MongoDB