The repository update in it's current form isn't practicable.
Atm scanning a repository with ~30k files takes : (time output)
real 5:18.22
user 0.54
sys 4.38
And actually scanning the repos takes much longer (in my case >12h).
Please provide more information about computer used: processor, ram, os, where is repository...
For example, my repository with almost 40K files, located in a NAS hard drive, takes about 30 minutes to be read. Same amount of files in local hard drive about 10 minutes.
This time is almost the same with two Linux and Mac machines, both with an i5 processor.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Specs:
Intel Pentium P6100 Dual Core 2GHz/core
4GB RAM
NAS with RAID1 connected to WLAN 802.11g router with 100% connectivity, file access over SMB and NFS (my access is NFS).
The scanning time given above was from "find /mnt/music/ -type f" command. So approximately 5 minutes for that command.
I tried refreshing the repos after 18k files had been read into the repos and it restarted, now I'm back down to 7k and the refreshing keeps on rereading all the files it already contains in the repos. I tried letting it refresh for 12 hours straight and it never finished.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm also interested in this issue. At my side (i7, 500GB SSD + 500GB USB Drive where the most of it is located) a real fullscan of ~18k music files (mp3/ogg) doesn't took more than 5 Minutes and refreshing took 15-20 secs. But IMHO it depends very badly at your operating system and file system caches.
I think, the find command is the complete wrong method to profile scanning time, because find doesn't look into the files and it just read out the FS tree.
The other stupid bottleneck could be your nfs server, depending on security restrictions an nfs server (at device side maybe just fired by an low level arm processor). just try to monitor your external disk/share and try to profile the issue against local file hosting.
To get "real" results, try to fire up your find command and pipe the output through a mp3 tag reader. That should give you "real" results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As you suggested bionix77 I used a modified find command
find /mnt/music -type f -exec extract {} > /dev/null \;
and timed it.
16270 Files, 8:41:33 real, 210.14 user, 264.76 sys
That's only about half of all files (including pics and stuff) so a full scan would've take ~ 18-19 hours. I concede my expectations were a bit high, but since I first posted aTunes hasn't been able to fully scan the library.
I cannot make heads or tails from the code and can't seem to find the problem. I think I'll just go ahead and write some JUnit testcases to try and locate the problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, it seems, that your NAS/NFS/SAN is just a bit slower than expected ;)
But i have to agree to your problem: The system should detect the real (estimated) time to scan or at least stop/pause all timed rescan jobs while the current scan job is running. I'll try to modify that code to solve this issue.
Just a little question: Why the heck is the load so huge?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This issue was solved previously. It doesn't appear in current SVN 4702. I tried to reproduced the behavior with an automatic update each minute, but this was blocked by handler.isLoaderWorking() in the repo update runnable RepositoryAutoRefresher.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One last note from my side: I scanned my LaCie Ethernet Disk completly and noticed that the NAS itself is also extremly slow (1 - 10 Mbit while accessing).
But the process finished within ~1 hour:
Read repository process DONE (26040 files, 3673.002 seconds, 0,1411 seconds / file)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I added a change in AudioFile class, to store and work with file paths instead of File object. Seems to improve a little repository read, although I added to reduce memory use.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Alright I tested again and I'm not sure if I'm supposed to post a bug report.
aTunes still doesn't find all music files. Looking at the dependencies I assumed it uses jaudiotagger to read the files. Well I wrote a small program that scans my library recursively, extracts the tags and puts them into a mysql table and the table has ~26k entries while aTunes only has about ~20k.
In addition the small program I wrote takes about 17 minutes to refresh (ignore paths already in the db). aTunes on the other hand is still scanning and had been scanning a while before the program.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please provide more information about computer used: processor, ram, os, where is repository...
For example, my repository with almost 40K files, located in a NAS hard drive, takes about 30 minutes to be read. Same amount of files in local hard drive about 10 minutes.
This time is almost the same with two Linux and Mac machines, both with an i5 processor.
Specs:
Intel Pentium P6100 Dual Core 2GHz/core
4GB RAM
NAS with RAID1 connected to WLAN 802.11g router with 100% connectivity, file access over SMB and NFS (my access is NFS).
The scanning time given above was from "find /mnt/music/ -type f" command. So approximately 5 minutes for that command.
I tried refreshing the repos after 18k files had been read into the repos and it restarted, now I'm back down to 7k and the refreshing keeps on rereading all the files it already contains in the repos. I tried letting it refresh for 12 hours straight and it never finished.
Oh I forgot OS is Ubuntu Lucid
I'm also interested in this issue. At my side (i7, 500GB SSD + 500GB USB Drive where the most of it is located) a real fullscan of ~18k music files (mp3/ogg) doesn't took more than 5 Minutes and refreshing took 15-20 secs. But IMHO it depends very badly at your operating system and file system caches.
I think, the find command is the complete wrong method to profile scanning time, because find doesn't look into the files and it just read out the FS tree.
The other stupid bottleneck could be your nfs server, depending on security restrictions an nfs server (at device side maybe just fired by an low level arm processor). just try to monitor your external disk/share and try to profile the issue against local file hosting.
To get "real" results, try to fire up your find command and pipe the output through a mp3 tag reader. That should give you "real" results.
As you suggested bionix77 I used a modified find command
find /mnt/music -type f -exec extract {} > /dev/null \;
and timed it.
16270 Files, 8:41:33 real, 210.14 user, 264.76 sys
That's only about half of all files (including pics and stuff) so a full scan would've take ~ 18-19 hours. I concede my expectations were a bit high, but since I first posted aTunes hasn't been able to fully scan the library.
I cannot make heads or tails from the code and can't seem to find the problem. I think I'll just go ahead and write some JUnit testcases to try and locate the problem.
OK, it seems, that your NAS/NFS/SAN is just a bit slower than expected ;)
But i have to agree to your problem: The system should detect the real (estimated) time to scan or at least stop/pause all timed rescan jobs while the current scan job is running. I'll try to modify that code to solve this issue.
Just a little question: Why the heck is the load so huge?
This issue was solved previously. It doesn't appear in current SVN 4702. I tried to reproduced the behavior with an automatic update each minute, but this was blocked by handler.isLoaderWorking() in the repo update runnable RepositoryAutoRefresher.
One last note from my side: I scanned my LaCie Ethernet Disk completly and noticed that the NAS itself is also extremly slow (1 - 10 Mbit while accessing).
But the process finished within ~1 hour:
Read repository process DONE (26040 files, 3673.002 seconds, 0,1411 seconds / file)
I added a change in AudioFile class, to store and work with file paths instead of File object. Seems to improve a little repository read, although I added to reduce memory use.
I'll test as soon as I find the time to :) Hopefully the changes will solve the problem.
Alright I tested again and I'm not sure if I'm supposed to post a bug report.
aTunes still doesn't find all music files. Looking at the dependencies I assumed it uses jaudiotagger to read the files. Well I wrote a small program that scans my library recursively, extracts the tags and puts them into a mysql table and the table has ~26k entries while aTunes only has about ~20k.
In addition the small program I wrote takes about 17 minutes to refresh (ignore paths already in the db). aTunes on the other hand is still scanning and had been scanning a while before the program.