getdata-devel Mailing List for GetData (Page 7)
Scientific Database Format
Brought to you by:
ketiltrout
You can subscribe to this list here.
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
(1) |
Nov
(10) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
2010 |
Jan
|
Feb
(4) |
Mar
(1) |
Apr
(3) |
May
|
Jun
|
Jul
(21) |
Aug
(1) |
Sep
(16) |
Oct
(2) |
Nov
(12) |
Dec
(11) |
2011 |
Jan
(2) |
Feb
(5) |
Mar
(42) |
Apr
(1) |
May
|
Jun
|
Jul
(5) |
Aug
|
Sep
(4) |
Oct
(4) |
Nov
(7) |
Dec
(9) |
2012 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
(1) |
Jun
|
Jul
(9) |
Aug
(1) |
Sep
|
Oct
(3) |
Nov
|
Dec
(5) |
2013 |
Jan
(2) |
Feb
|
Mar
(9) |
Apr
(3) |
May
(1) |
Jun
|
Jul
|
Aug
(3) |
Sep
(3) |
Oct
(1) |
Nov
(1) |
Dec
|
2014 |
Jan
(4) |
Feb
(7) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
(1) |
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(7) |
Jun
|
Jul
|
Aug
(6) |
Sep
(6) |
Oct
(1) |
Nov
|
Dec
|
2016 |
Jan
|
Feb
(6) |
Mar
(11) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2020 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
From: Ross W. <ros...@gm...> - 2011-03-22 22:04:02
|
Amendum: The reverse loop is actually: for (j = D->n_fragment-1; j>0; j--) R On Tue, Mar 22, 2011 at 4:34 PM, Ross Williamson <ros...@gm...> wrote: > Ok so I took it upon myself to have a go at hacking close.c. I think > the issue might the counting of fragments. I have a simple situation > where there are 4 fragments inside a top level dirfile. I initially > looked at the value of D-> n_fragment which returned a value of 5. > This seemed odd as there are only 4. > > I also though we might need to deallocate the fragments in reverse > order - I changed the loop code to (making the mistake that I should > have done j>=0) > > for (j = D->n_fragment; j>0; j--) > > Whooo - This worked. Changing it to j>=0 crapped out at index > fragment[0]. Just to make sure I changed the code to: > > for(j=1; j<D->n_fragment; ++j) and that also worked - so it seems that > there are one two many fragments in n_fragment and fragment[0] does > not exist. > > Am I playing with fire here? > > Cheers > > Ross > > On Tue, Mar 22, 2011 at 3:16 PM, D. V. Wiebe <ge...@ke...> wrote: >> On Tue, Mar 22, 2011 at 03:07:46PM -0500, Ross Williamson wrote: >>> I was originally using getdata from the repository that kst2 is >>> distributed with (sorry can't remember off hand). I'm now running off >>> my own compiled version to get the debugging symbols. Both show the >>> same error. >>> >>> Yeah valgrinding gcp is not something I'm looking forward too :) I'm >>> going to see if I get the same are with single level nested fragments >>> rather than double level fragments (i.e. dirfile inside a dirfile >>> inside a dirfile) >>> >>> Cheers >>> >>> Ross >> >> If you're compiling from source, you could try enabling debugging >> messages (--enable-debug) which will result in the library printing all >> sorts of debugging messages on stderr (mostly function traces). If you >> can endeavour to capture those messages, you could send them to me and >> I could take a look. It's admittedly an outside chance, but it might >> give an indication on what's going on. >> >> -dvw >> -- >> D. V. Wiebe >> ge...@ke... >> http://getdata.sourceforge.net/ >> > > > > -- > Ross Williamson > University of Chicago > Department of Astronomy & Astrophysics > 773-834-9785 (office) > 312-504-3051 (Cell) > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Ross W. <ros...@gm...> - 2011-03-22 21:34:58
|
Ok so I took it upon myself to have a go at hacking close.c. I think the issue might the counting of fragments. I have a simple situation where there are 4 fragments inside a top level dirfile. I initially looked at the value of D-> n_fragment which returned a value of 5. This seemed odd as there are only 4. I also though we might need to deallocate the fragments in reverse order - I changed the loop code to (making the mistake that I should have done j>=0) for (j = D->n_fragment; j>0; j--) Whooo - This worked. Changing it to j>=0 crapped out at index fragment[0]. Just to make sure I changed the code to: for(j=1; j<D->n_fragment; ++j) and that also worked - so it seems that there are one two many fragments in n_fragment and fragment[0] does not exist. Am I playing with fire here? Cheers Ross On Tue, Mar 22, 2011 at 3:16 PM, D. V. Wiebe <ge...@ke...> wrote: > On Tue, Mar 22, 2011 at 03:07:46PM -0500, Ross Williamson wrote: >> I was originally using getdata from the repository that kst2 is >> distributed with (sorry can't remember off hand). I'm now running off >> my own compiled version to get the debugging symbols. Both show the >> same error. >> >> Yeah valgrinding gcp is not something I'm looking forward too :) I'm >> going to see if I get the same are with single level nested fragments >> rather than double level fragments (i.e. dirfile inside a dirfile >> inside a dirfile) >> >> Cheers >> >> Ross > > If you're compiling from source, you could try enabling debugging > messages (--enable-debug) which will result in the library printing all > sorts of debugging messages on stderr (mostly function traces). If you > can endeavour to capture those messages, you could send them to me and > I could take a look. It's admittedly an outside chance, but it might > give an indication on what's going on. > > -dvw > -- > D. V. Wiebe > ge...@ke... > http://getdata.sourceforge.net/ > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Ross W. <ros...@gm...> - 2011-03-22 20:08:28
|
I was originally using getdata from the repository that kst2 is distributed with (sorry can't remember off hand). I'm now running off my own compiled version to get the debugging symbols. Both show the same error. Yeah valgrinding gcp is not something I'm looking forward too :) I'm going to see if I get the same are with single level nested fragments rather than double level fragments (i.e. dirfile inside a dirfile inside a dirfile) Cheers Ross On Tue, Mar 22, 2011 at 2:57 PM, D. V. Wiebe <ge...@ke...> wrote: > On Tue, Mar 22, 2011 at 02:01:58PM -0500, Ross Williamson wrote: >> OK here is the trace with debugging in getdata - I haven't yet (I >> think) got the glibc debugging symbols sorted but it does look like >> the error is close.c in the dirfile code. I'm going to look at the >> getdata code a bit more but any help is appreciated. >> >> Again - note this only happens when I have fragments in the top-level >> (and second-level) dirfiles. >> >> #0 0x00007ffff31baba5 in raise (sig=<value optimized out>) at >> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >> #1 0x00007ffff31be6b0 in abort () at abort.c:92 >> #2 0x00007ffff31f443b in __libc_message (do_abort=<value optimized >> out>, fmt=<value optimized out>) >> at ../sysdeps/unix/sysv/linux/libc_fatal.c:189 >> #3 0x00007ffff31fe4b6 in malloc_printerr (action=3, >> str=0x7ffff32ceca2 "corrupted double-linked list", ptr=<value >> optimized out>) >> at malloc.c:6283 >> #4 0x00007ffff31fe961 in malloc_consolidate (av=<value optimized >> out>) at malloc.c:5169 >> #5 0x00007ffff3201350 in _int_free (av=0x7ffff3505e40, p=0x1c66110) >> at malloc.c:5034 >> #6 0x00007ffff3204c83 in __libc_free (mem=<value optimized out>) at >> malloc.c:3738 >> #7 0x00007ffff43b841b in _GD_FreeD (D=0xc069e0, flush_meta=<value >> optimized out>) at close.c:46 >> #8 _GD_ShutdownDirfile (D=0xc069e0, flush_meta=<value optimized out>) >> at close.c:93 >> #9 0x00007ffff7b688e1 in >> gcp::control::ArchiverWriterDirfile::closeArcfile (this=0x6878e0) at >> ArchiverWriterDirfile.c:296 >> #10 0x00007ffff7b67d6b in >> gcp::control::ArchiverWriterDirfile::openArcfile (this=0x6878e0, >> dir=0xc06b40 "/home/rw247/arc") >> at ArchiverWriterDirfile.c:190 >> #11 0x00007ffff7b64d4b in arc_save_integration (arc=0x684e60) at archiver.c:1212 >> #12 0x00007ffff7b63e6e in archiver_thread (arg=0x684e60) at archiver.c:899 >> #13 0x00007ffff5330971 in start_thread (arg=<value optimized out>) at >> pthread_create.c:304 >> #14 0x00007ffff326d92d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #15 0x0000000000000000 in ?? () > > Well, the symptom is certainly in gd_close(), when GetData tries to > free the list of fields while destroying the DIRFILE object (resulting > in that double-free message). But the corruption is presumably happening > earlier and going undetected. > > As Ted points out, running gcp under valgrind would likely catch the > culprit, regardless of whether it's happening in GetData or gcp. > (Although I realise valgrinding gcp can be daunting.) I periodically > run valgrind on the GetData test-suite, but I'm sure the suite does not > cover the whole library code. > > Which version of GetData are you using? Did you compile from source, or > get it from a distribution? > > -dvw > -- > D. V. Wiebe > ge...@ke... > http://getdata.sourceforge.net/ > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: D. V. W. <ge...@ke...> - 2011-03-22 19:57:43
|
On Tue, Mar 22, 2011 at 02:01:58PM -0500, Ross Williamson wrote: > OK here is the trace with debugging in getdata - I haven't yet (I > think) got the glibc debugging symbols sorted but it does look like > the error is close.c in the dirfile code. I'm going to look at the > getdata code a bit more but any help is appreciated. > > Again - note this only happens when I have fragments in the top-level > (and second-level) dirfiles. > > #0 0x00007ffff31baba5 in raise (sig=<value optimized out>) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x00007ffff31be6b0 in abort () at abort.c:92 > #2 0x00007ffff31f443b in __libc_message (do_abort=<value optimized > out>, fmt=<value optimized out>) > at ../sysdeps/unix/sysv/linux/libc_fatal.c:189 > #3 0x00007ffff31fe4b6 in malloc_printerr (action=3, > str=0x7ffff32ceca2 "corrupted double-linked list", ptr=<value > optimized out>) > at malloc.c:6283 > #4 0x00007ffff31fe961 in malloc_consolidate (av=<value optimized > out>) at malloc.c:5169 > #5 0x00007ffff3201350 in _int_free (av=0x7ffff3505e40, p=0x1c66110) > at malloc.c:5034 > #6 0x00007ffff3204c83 in __libc_free (mem=<value optimized out>) at > malloc.c:3738 > #7 0x00007ffff43b841b in _GD_FreeD (D=0xc069e0, flush_meta=<value > optimized out>) at close.c:46 > #8 _GD_ShutdownDirfile (D=0xc069e0, flush_meta=<value optimized out>) > at close.c:93 > #9 0x00007ffff7b688e1 in > gcp::control::ArchiverWriterDirfile::closeArcfile (this=0x6878e0) at > ArchiverWriterDirfile.c:296 > #10 0x00007ffff7b67d6b in > gcp::control::ArchiverWriterDirfile::openArcfile (this=0x6878e0, > dir=0xc06b40 "/home/rw247/arc") > at ArchiverWriterDirfile.c:190 > #11 0x00007ffff7b64d4b in arc_save_integration (arc=0x684e60) at archiver.c:1212 > #12 0x00007ffff7b63e6e in archiver_thread (arg=0x684e60) at archiver.c:899 > #13 0x00007ffff5330971 in start_thread (arg=<value optimized out>) at > pthread_create.c:304 > #14 0x00007ffff326d92d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 > #15 0x0000000000000000 in ?? () Well, the symptom is certainly in gd_close(), when GetData tries to free the list of fields while destroying the DIRFILE object (resulting in that double-free message). But the corruption is presumably happening earlier and going undetected. As Ted points out, running gcp under valgrind would likely catch the culprit, regardless of whether it's happening in GetData or gcp. (Although I realise valgrinding gcp can be daunting.) I periodically run valgrind on the GetData test-suite, but I'm sure the suite does not cover the whole library code. Which version of GetData are you using? Did you compile from source, or get it from a distribution? -dvw -- D. V. Wiebe ge...@ke... http://getdata.sourceforge.net/ |
From: Ross W. <ros...@gm...> - 2011-03-22 19:02:29
|
OK here is the trace with debugging in getdata - I haven't yet (I think) got the glibc debugging symbols sorted but it does look like the error is close.c in the dirfile code. I'm going to look at the getdata code a bit more but any help is appreciated. Again - note this only happens when I have fragments in the top-level (and second-level) dirfiles. #0 0x00007ffff31baba5 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007ffff31be6b0 in abort () at abort.c:92 #2 0x00007ffff31f443b in __libc_message (do_abort=<value optimized out>, fmt=<value optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189 #3 0x00007ffff31fe4b6 in malloc_printerr (action=3, str=0x7ffff32ceca2 "corrupted double-linked list", ptr=<value optimized out>) at malloc.c:6283 #4 0x00007ffff31fe961 in malloc_consolidate (av=<value optimized out>) at malloc.c:5169 #5 0x00007ffff3201350 in _int_free (av=0x7ffff3505e40, p=0x1c66110) at malloc.c:5034 #6 0x00007ffff3204c83 in __libc_free (mem=<value optimized out>) at malloc.c:3738 #7 0x00007ffff43b841b in _GD_FreeD (D=0xc069e0, flush_meta=<value optimized out>) at close.c:46 #8 _GD_ShutdownDirfile (D=0xc069e0, flush_meta=<value optimized out>) at close.c:93 #9 0x00007ffff7b688e1 in gcp::control::ArchiverWriterDirfile::closeArcfile (this=0x6878e0) at ArchiverWriterDirfile.c:296 #10 0x00007ffff7b67d6b in gcp::control::ArchiverWriterDirfile::openArcfile (this=0x6878e0, dir=0xc06b40 "/home/rw247/arc") at ArchiverWriterDirfile.c:190 #11 0x00007ffff7b64d4b in arc_save_integration (arc=0x684e60) at archiver.c:1212 #12 0x00007ffff7b63e6e in archiver_thread (arg=0x684e60) at archiver.c:899 #13 0x00007ffff5330971 in start_thread (arg=<value optimized out>) at pthread_create.c:304 #14 0x00007ffff326d92d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #15 0x0000000000000000 in ?? () On Tue, Mar 22, 2011 at 1:07 PM, Matthew D Truch <ma...@tr...> wrote: > On Tue, Mar 22, 2011 at 11:00:27AM -0700, Ted Kisner wrote: >> Can you recompile everything with "-O0 -g" and run in gdb? Also >> useful is to run this "debug copy" of the full software stack inside >> valgrind to look for out-of-bounds memory access. > > You could also look into installing the "debuginfo" packages for glibc > and getdata on your system (if your distribution supports such things). > > -- > "Duct Tape is like the Force. It has a dark side, a light side, and holds the universe together." > -------------------------- > Matthew Truch > Department of Physics and Astronomy > University of Pennsylvania > ma...@tr... > http://matt.truch.net/ > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Matthew D T. <ma...@tr...> - 2011-03-22 18:07:43
|
On Tue, Mar 22, 2011 at 11:00:27AM -0700, Ted Kisner wrote: > Can you recompile everything with "-O0 -g" and run in gdb? Also > useful is to run this "debug copy" of the full software stack inside > valgrind to look for out-of-bounds memory access. You could also look into installing the "debuginfo" packages for glibc and getdata on your system (if your distribution supports such things). -- "Duct Tape is like the Force. It has a dark side, a light side, and holds the universe together." -------------------------- Matthew Truch Department of Physics and Astronomy University of Pennsylvania ma...@tr... http://matt.truch.net/ |
From: Ted K. <tsk...@gm...> - 2011-03-22 18:00:45
|
Can you recompile everything with "-O0 -g" and run in gdb? Also useful is to run this "debug copy" of the full software stack inside valgrind to look for out-of-bounds memory access. valgrind is very useful for finding double-free's... -Ted On Mar 22, 2011, at 10:56 AM, Ross Williamson wrote: > Yeah - Here is the backtrace and dump. So it sounds like it might > actually be the raid5 and not LVM. hmm that' annoying. > > START METAFLUSH > END METAFLUSH > START CLOSE > *** glibc detected *** ./bin/sptControl: double free or corruption > (fasttop): 0x0000000002320c70 *** > ======= Backtrace: ========= > /lib/libc.so.6(+0x774b6)[0x7fdbebd3f4b6] > /lib/libc.so.6(cfree+0x73)[0x7fdbebd45c83] > /usr/lib/libgetdata.so.4(+0x8617)[0x7fdbecef9617] > /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_ZN3gcp7control21ArchiverWriterDirfile12closeArcfileEv+0x1ca)[0x7fdbf04e651a] > /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_ZN3gcp7control21ArchiverWriterDirfile11openArcfileEPc+0x45)[0x7fdbf04e7905] > /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_Z15archiver_threadPv+0x6b3)[0x7fdbf04e44b3] > /lib/libpthread.so.0(+0x7971)[0x7fdbede71971] > /lib/libc.so.6(clone+0x6d)[0x7fdbebdae92d] > ======= Memory map: ======== > 00400000-00415000 r-xp 00000000 fc:00 60424210 > /home/rw247/gcpSptpolDevel/gcp/bin/sptControl > 00615000-00616000 r--p 00015000 fc:00 60424210 > /home/rw247/gcpSptpolDevel/gcp/bin/sptControl > 00616000-00617000 rw-p 00016000 fc:00 60424210 > /home/rw247/gcpSptpolDevel/gcp/bin/sptControl > 01bd2000-031d2000 rw-p 00000000 00:00 0 [heap] > 7fdb5a8c8000-7fdbb0000000 rw-p 00000000 00:00 0 > 7fdbb0000000-7fdbb0080000 rw-p 00000000 00:00 0 > 7fdbb0080000-7fdbb4000000 ---p 00000000 00:00 0 > 7fdbb4319000-7fdbb431a000 rw-p 00000000 00:00 0 > 7fdbb431a000-7fdbb431b000 ---p 00000000 00:00 0 > 7fdbb431b000-7fdbb4b1b000 rw-p 00000000 00:00 0 > 7fdbb4b1b000-7fdbb4b1c000 ---p 00000000 00:00 0 > 7fdbb4b1c000-7fdbb531c000 rw-p 00000000 00:00 0 > 7fdbb531c000-7fdbb531d000 ---p 00000000 00:00 0 > 7fdbb531d000-7fdbb5b1d000 rw-p 00000000 00:00 0 > 7fdbb5b1d000-7fdbb5b1e000 ---p 00000000 00:00 0 > 7fdbb5b1e000-7fdbb631e000 rw-p 00000000 00:00 0 > 7fdbb631e000-7fdbb631f000 ---p 00000000 00:00 0 > 7fdbb631f000-7fdbb6b1f000 rw-p 00000000 00:00 0 > 7fdbb6b1f000-7fdbb6b20000 ---p 00000000 00:00 0 > 7fdbb6b20000-7fdbbda4a000 rw-p 00000000 00:00 0 > 7fdbbda4a000-7fdbbda4b000 ---p 00000000 00:00 0 > 7fdbbda4b000-7fdbbe700000 rw-p 00000000 00:00 0 > 7fdbbe779000-7fdbe9793000 rw-p 00000000 00:00 0 > 7fdbe9793000-7fdbe9796000 r-xp 00000000 08:01 11796565 > /lib/libgpg-error.so.0.4.0 > 7fdbe9796000-7fdbe9995000 ---p 00003000 08:01 11796565 > /lib/libgpg-error.so.0.4.0 > 7fdbe9995000-7fdbe9996000 r--p 00002000 08:01 11796565 > /lib/libgpg-error.so.0.4.0 > 7fdbe9996000-7fdbe9997000 rw-p 00003000 08:01 11796565 > /lib/libgpg-error.so.0.4.0 > 7fdbe9997000-7fdbe99a7000 r-xp 00000000 08:01 2885347 > /usr/lib/libtasn1.so.3.1.9 > 7fdbe99a7000-7fdbe9ba6000 ---p 00010000 08:01 2885347 > /usr/lib/libtasn1.so.3.1.9 > 7fdbe9ba6000-7fdbe9ba7000 r--p 0000f000 08:01 2885347 > /usr/lib/libtasn1.so.3.1.9 > 7fdbe9ba7000-7fdbe9ba8000 rw-p 00010000 08:01 2885347 > /usr/lib/libtasn1.so.3.1.9 > 7fdbe9ba8000-7fdbe9baa000 r-xp 00000000 08:01 11796895 > /lib/libkeyutils.so.1.3 > 7fdbe9baa000-7fdbe9da9000 ---p 00002000 08:01 11796895 > /lib/libkeyutils.so.1.3 > 7fdbe9da9000-7fdbe9daa000 r--p 00001000 08:01 11796895 > /lib/libkeyutils.so.1.3 > 7fdbe9daa000-7fdbe9dab000 rw-p 00002000 08:01 11796895 > /lib/libkeyutils.so.1.3 > 7fdbe9dab000-7fdbe9db2000 r-xp 00000000 08:01 2885337 > /usr/lib/libkrb5support.so.0.1 > 7fdbe9db2000-7fdbe9fb1000 ---p 00007000 08:01 2885337 > /usr/lib/libkrb5support.so.0.1 > 7fdbe9fb1000-7fdbe9fb2000 r--p 00006000 08:01 2885337 > /usr/lib/libkrb5support.so.0.1 > 7fdbe9fb2000-7fdbe9fb3000 rw-p 00007000 08:01 2885337 > /usr/lib/libkrb5support.so.0.1 > 7fdbe9fb3000-7fdbe9fb6000 r-xp 00000000 08:01 11796560 > /lib/libcom_err.so.2.1 > 7fdbe9fb6000-7fdbea1b5000 ---p 00003000 08:01 11796560 > /lib/libcom_err.so.2.1 > 7fdbea1b5000-7fdbea1b6000 r--p 00002000 08:01 11796560 > /lib/libcom_err.so.2.1 > 7fdbea1b6000-7fdbea1b7000 rw-p 00003000 08:01 11796560 > /lib/libcom_err.so.2.1 > 7fdbea1b7000-7fdbea1db000 r-xp 00000000 08:01 2885386 > /usr/lib/libk5crypto.so.3.1 > 7fdbea1db000-7fdbea3db000 ---p 00024000 08:01 2885386 > /usr/lib/libk5crypto.so.3.1 > 7fdbea3db000-7fdbea3dc000 r--p 00024000 08:01 2885386 > /usr/lib/libk5crypto.so.3.1 > 7fdbea3dc000-7fdbea3dd000 rw-p 00025000 08:01 2885386 > /usr/lib/libk5crypto.so.3.1 > 7fdbea3dd000-7fdbea496000 r-xp 00000000 08:01 2886419 > /usr/lib/libkrb5.so.3.3 > 7fdbea496000-7fdbea695000 ---p 000b9000 08:01 2886419 > /usr/lib/libkrb5.so.3.3 > 7fdbea695000-7fdbea69e000 r--p 000b8000 08:01 2886419 > /usr/lib/libkrb5.so.3.3 > 7fdbea69e000-7fdbea69f000 rw-p 000c1000 08:01 2886419 > /usr/lib/libkrb5.so.3.3 > 7fdbea69f000-7fdbea6b8000 r-xp 00000000 08:01 2888461 > /usr/lib/libsasl2.so.2.0.23 > 7fdbea6b8000-7fdbea8b7000 ---p 00019000 08:01 2888461 > /usr/lib/libsasl2.so.2.0.23 > 7fdbea8b7000-7fdbea8b8000 r--p 00018000 08:01 2888461 > /usr/lib/libsasl2.so.2.0.23 > 7fdbea8b8000-7fdbea8b9000 rw-p 00019000 08:01 2888461 > /usr/lib/libsasl2.so.2.0.23 > 7fdbea8b9000-7fdbea8cf000 r-xp 00000000 08:01 11797729 > /lib/libresolv-2.12.1.so > 7fdbea8cf000-7fdbeaace000 ---p 00016000 08:01 11797729 > /lib/libresolv-2.12.1.so > 7fdbeaace000-7fdbeaacf000 r--p 00015000 08:01 11797729 > /lib/libresolv-2.12.1.so > 7fdbeaacf000-7fdbeaad0000 rw-p 00016000 08:01 11797729 > /lib/libresolv-2.12.1.so > 7fdbeaad0000-7fdbeaad2000 rw-p 00000000 00:00 0 > 7fdbeaad2000-7fdbeab46000 r-xp 00000000 08:01 11796957 > /lib/libgcrypt.so.11.5.3 > 7fdbeab46000-7fdbead46000 ---p 00074000 08:01 11796957 > /lib/libgcrypt.so.11.5.3 > 7fdbead46000-7fdbead47000 r--p 00074000 08:01 11796957 > /lib/libgcrypt.so.11.5.3 > 7fdbead47000-7fdbead4a000 rw-p 00075000 08:01 11796957 > /lib/libgcrypt.so.11.5.3 > 7fdbead4a000-7fdbeade5000 r-xp 00000000 08:01 2885353 > /usr/lib/libgnutls.so.26.14.12 > 7fdbeade5000-7fdbeafe5000 ---p 0009b000 08:01 2885353 > /usr/lib/libgnutls.so.26.14.12 > 7fdbeafe5000-7fdbeafeb000 r--p 0009b000 08:01 2885353 > /usr/lib/libgnutls.so.26.14.12 > 7fdbeafeb000-7fdbeafec000 rw-p 000a1000 08:01 2885353 > /usr/lib/libgnutls.so.26.14.12 > 7fdbeafec000-7fdbeb01e000 r-xp 00000000 08:01 2885333 > /usr/lib/libgssapi_krb5.so.2.2Aborted > > > On Tue, Mar 22, 2011 at 12:44 PM, Matthew D Truch <ma...@tr...> wrote: >>> I think the long flush times are related to using LVM on a data drive >>> (and possibly raid5). I've switched to a standalone drive and flush >>> times are now maybe a couple of seconds, The issue now is that I still >>> get the crash when closing down the dirfile with fragments as opposed >>> to having them all in a single directory. I'll keep investigation but >>> any help is appreciated. >> >> This doesn't surprize me. I have an array of disks here coupled with >> RAID and LVM, and flushes of large files are generally slow (the RAID >> system seems to queue up as much of a write as possible to avoid >> repeatedly re-calculating parity). >> >> However, the crash is what worries me. Why not post the backtrace? >> >> -- >> "Party on Wayne; Party on Garth. -- Wayne's World" >> -------------------------- >> Matthew Truch >> Department of Physics and Astronomy >> University of Pennsylvania >> ma...@tr... >> http://matt.truch.net/ >> > > > > -- > Ross Williamson > University of Chicago > Department of Astronomy & Astrophysics > 773-834-9785 (office) > 312-504-3051 (Cell) > > ------------------------------------------------------------------------------ > Enable your software for Intel(R) Active Management Technology to meet the > growing manageability and security demands of your customers. Businesses > are taking advantage of Intel(R) vPro (TM) technology - will your software > be a part of the solution? Download the Intel(R) Manageability Checker > today! http://p.sf.net/sfu/intel-dev2devmar > _______________________________________________ > getdata-devel mailing list > get...@li... > https://lists.sourceforge.net/lists/listinfo/getdata-devel |
From: Ross W. <ros...@gm...> - 2011-03-22 17:56:39
|
Yeah - Here is the backtrace and dump. So it sounds like it might actually be the raid5 and not LVM. hmm that' annoying. START METAFLUSH END METAFLUSH START CLOSE *** glibc detected *** ./bin/sptControl: double free or corruption (fasttop): 0x0000000002320c70 *** ======= Backtrace: ========= /lib/libc.so.6(+0x774b6)[0x7fdbebd3f4b6] /lib/libc.so.6(cfree+0x73)[0x7fdbebd45c83] /usr/lib/libgetdata.so.4(+0x8617)[0x7fdbecef9617] /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_ZN3gcp7control21ArchiverWriterDirfile12closeArcfileEv+0x1ca)[0x7fdbf04e651a] /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_ZN3gcp7control21ArchiverWriterDirfile11openArcfileEPc+0x45)[0x7fdbf04e7905] /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_Z15archiver_threadPv+0x6b3)[0x7fdbf04e44b3] /lib/libpthread.so.0(+0x7971)[0x7fdbede71971] /lib/libc.so.6(clone+0x6d)[0x7fdbebdae92d] ======= Memory map: ======== 00400000-00415000 r-xp 00000000 fc:00 60424210 /home/rw247/gcpSptpolDevel/gcp/bin/sptControl 00615000-00616000 r--p 00015000 fc:00 60424210 /home/rw247/gcpSptpolDevel/gcp/bin/sptControl 00616000-00617000 rw-p 00016000 fc:00 60424210 /home/rw247/gcpSptpolDevel/gcp/bin/sptControl 01bd2000-031d2000 rw-p 00000000 00:00 0 [heap] 7fdb5a8c8000-7fdbb0000000 rw-p 00000000 00:00 0 7fdbb0000000-7fdbb0080000 rw-p 00000000 00:00 0 7fdbb0080000-7fdbb4000000 ---p 00000000 00:00 0 7fdbb4319000-7fdbb431a000 rw-p 00000000 00:00 0 7fdbb431a000-7fdbb431b000 ---p 00000000 00:00 0 7fdbb431b000-7fdbb4b1b000 rw-p 00000000 00:00 0 7fdbb4b1b000-7fdbb4b1c000 ---p 00000000 00:00 0 7fdbb4b1c000-7fdbb531c000 rw-p 00000000 00:00 0 7fdbb531c000-7fdbb531d000 ---p 00000000 00:00 0 7fdbb531d000-7fdbb5b1d000 rw-p 00000000 00:00 0 7fdbb5b1d000-7fdbb5b1e000 ---p 00000000 00:00 0 7fdbb5b1e000-7fdbb631e000 rw-p 00000000 00:00 0 7fdbb631e000-7fdbb631f000 ---p 00000000 00:00 0 7fdbb631f000-7fdbb6b1f000 rw-p 00000000 00:00 0 7fdbb6b1f000-7fdbb6b20000 ---p 00000000 00:00 0 7fdbb6b20000-7fdbbda4a000 rw-p 00000000 00:00 0 7fdbbda4a000-7fdbbda4b000 ---p 00000000 00:00 0 7fdbbda4b000-7fdbbe700000 rw-p 00000000 00:00 0 7fdbbe779000-7fdbe9793000 rw-p 00000000 00:00 0 7fdbe9793000-7fdbe9796000 r-xp 00000000 08:01 11796565 /lib/libgpg-error.so.0.4.0 7fdbe9796000-7fdbe9995000 ---p 00003000 08:01 11796565 /lib/libgpg-error.so.0.4.0 7fdbe9995000-7fdbe9996000 r--p 00002000 08:01 11796565 /lib/libgpg-error.so.0.4.0 7fdbe9996000-7fdbe9997000 rw-p 00003000 08:01 11796565 /lib/libgpg-error.so.0.4.0 7fdbe9997000-7fdbe99a7000 r-xp 00000000 08:01 2885347 /usr/lib/libtasn1.so.3.1.9 7fdbe99a7000-7fdbe9ba6000 ---p 00010000 08:01 2885347 /usr/lib/libtasn1.so.3.1.9 7fdbe9ba6000-7fdbe9ba7000 r--p 0000f000 08:01 2885347 /usr/lib/libtasn1.so.3.1.9 7fdbe9ba7000-7fdbe9ba8000 rw-p 00010000 08:01 2885347 /usr/lib/libtasn1.so.3.1.9 7fdbe9ba8000-7fdbe9baa000 r-xp 00000000 08:01 11796895 /lib/libkeyutils.so.1.3 7fdbe9baa000-7fdbe9da9000 ---p 00002000 08:01 11796895 /lib/libkeyutils.so.1.3 7fdbe9da9000-7fdbe9daa000 r--p 00001000 08:01 11796895 /lib/libkeyutils.so.1.3 7fdbe9daa000-7fdbe9dab000 rw-p 00002000 08:01 11796895 /lib/libkeyutils.so.1.3 7fdbe9dab000-7fdbe9db2000 r-xp 00000000 08:01 2885337 /usr/lib/libkrb5support.so.0.1 7fdbe9db2000-7fdbe9fb1000 ---p 00007000 08:01 2885337 /usr/lib/libkrb5support.so.0.1 7fdbe9fb1000-7fdbe9fb2000 r--p 00006000 08:01 2885337 /usr/lib/libkrb5support.so.0.1 7fdbe9fb2000-7fdbe9fb3000 rw-p 00007000 08:01 2885337 /usr/lib/libkrb5support.so.0.1 7fdbe9fb3000-7fdbe9fb6000 r-xp 00000000 08:01 11796560 /lib/libcom_err.so.2.1 7fdbe9fb6000-7fdbea1b5000 ---p 00003000 08:01 11796560 /lib/libcom_err.so.2.1 7fdbea1b5000-7fdbea1b6000 r--p 00002000 08:01 11796560 /lib/libcom_err.so.2.1 7fdbea1b6000-7fdbea1b7000 rw-p 00003000 08:01 11796560 /lib/libcom_err.so.2.1 7fdbea1b7000-7fdbea1db000 r-xp 00000000 08:01 2885386 /usr/lib/libk5crypto.so.3.1 7fdbea1db000-7fdbea3db000 ---p 00024000 08:01 2885386 /usr/lib/libk5crypto.so.3.1 7fdbea3db000-7fdbea3dc000 r--p 00024000 08:01 2885386 /usr/lib/libk5crypto.so.3.1 7fdbea3dc000-7fdbea3dd000 rw-p 00025000 08:01 2885386 /usr/lib/libk5crypto.so.3.1 7fdbea3dd000-7fdbea496000 r-xp 00000000 08:01 2886419 /usr/lib/libkrb5.so.3.3 7fdbea496000-7fdbea695000 ---p 000b9000 08:01 2886419 /usr/lib/libkrb5.so.3.3 7fdbea695000-7fdbea69e000 r--p 000b8000 08:01 2886419 /usr/lib/libkrb5.so.3.3 7fdbea69e000-7fdbea69f000 rw-p 000c1000 08:01 2886419 /usr/lib/libkrb5.so.3.3 7fdbea69f000-7fdbea6b8000 r-xp 00000000 08:01 2888461 /usr/lib/libsasl2.so.2.0.23 7fdbea6b8000-7fdbea8b7000 ---p 00019000 08:01 2888461 /usr/lib/libsasl2.so.2.0.23 7fdbea8b7000-7fdbea8b8000 r--p 00018000 08:01 2888461 /usr/lib/libsasl2.so.2.0.23 7fdbea8b8000-7fdbea8b9000 rw-p 00019000 08:01 2888461 /usr/lib/libsasl2.so.2.0.23 7fdbea8b9000-7fdbea8cf000 r-xp 00000000 08:01 11797729 /lib/libresolv-2.12.1.so 7fdbea8cf000-7fdbeaace000 ---p 00016000 08:01 11797729 /lib/libresolv-2.12.1.so 7fdbeaace000-7fdbeaacf000 r--p 00015000 08:01 11797729 /lib/libresolv-2.12.1.so 7fdbeaacf000-7fdbeaad0000 rw-p 00016000 08:01 11797729 /lib/libresolv-2.12.1.so 7fdbeaad0000-7fdbeaad2000 rw-p 00000000 00:00 0 7fdbeaad2000-7fdbeab46000 r-xp 00000000 08:01 11796957 /lib/libgcrypt.so.11.5.3 7fdbeab46000-7fdbead46000 ---p 00074000 08:01 11796957 /lib/libgcrypt.so.11.5.3 7fdbead46000-7fdbead47000 r--p 00074000 08:01 11796957 /lib/libgcrypt.so.11.5.3 7fdbead47000-7fdbead4a000 rw-p 00075000 08:01 11796957 /lib/libgcrypt.so.11.5.3 7fdbead4a000-7fdbeade5000 r-xp 00000000 08:01 2885353 /usr/lib/libgnutls.so.26.14.12 7fdbeade5000-7fdbeafe5000 ---p 0009b000 08:01 2885353 /usr/lib/libgnutls.so.26.14.12 7fdbeafe5000-7fdbeafeb000 r--p 0009b000 08:01 2885353 /usr/lib/libgnutls.so.26.14.12 7fdbeafeb000-7fdbeafec000 rw-p 000a1000 08:01 2885353 /usr/lib/libgnutls.so.26.14.12 7fdbeafec000-7fdbeb01e000 r-xp 00000000 08:01 2885333 /usr/lib/libgssapi_krb5.so.2.2Aborted On Tue, Mar 22, 2011 at 12:44 PM, Matthew D Truch <ma...@tr...> wrote: >> I think the long flush times are related to using LVM on a data drive >> (and possibly raid5). I've switched to a standalone drive and flush >> times are now maybe a couple of seconds, The issue now is that I still >> get the crash when closing down the dirfile with fragments as opposed >> to having them all in a single directory. I'll keep investigation but >> any help is appreciated. > > This doesn't surprize me. I have an array of disks here coupled with > RAID and LVM, and flushes of large files are generally slow (the RAID > system seems to queue up as much of a write as possible to avoid > repeatedly re-calculating parity). > > However, the crash is what worries me. Why not post the backtrace? > > -- > "Party on Wayne; Party on Garth. -- Wayne's World" > -------------------------- > Matthew Truch > Department of Physics and Astronomy > University of Pennsylvania > ma...@tr... > http://matt.truch.net/ > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Matthew D T. <ma...@tr...> - 2011-03-22 17:44:29
|
> I think the long flush times are related to using LVM on a data drive > (and possibly raid5). I've switched to a standalone drive and flush > times are now maybe a couple of seconds, The issue now is that I still > get the crash when closing down the dirfile with fragments as opposed > to having them all in a single directory. I'll keep investigation but > any help is appreciated. This doesn't surprize me. I have an array of disks here coupled with RAID and LVM, and flushes of large files are generally slow (the RAID system seems to queue up as much of a write as possible to avoid repeatedly re-calculating parity). However, the crash is what worries me. Why not post the backtrace? -- "Party on Wayne; Party on Garth. -- Wayne's World" -------------------------- Matthew Truch Department of Physics and Astronomy University of Pennsylvania ma...@tr... http://matt.truch.net/ |
From: Ross W. <ros...@gm...> - 2011-03-22 17:10:59
|
Ok so quick update, I think the long flush times are related to using LVM on a data drive (and possibly raid5). I've switched to a standalone drive and flush times are now maybe a couple of seconds, The issue now is that I still get the crash when closing down the dirfile with fragments as opposed to having them all in a single directory. I'll keep investigation but any help is appreciated. Cheers Ross On Tue, Mar 22, 2011 at 11:59 AM, Ross Williamson <ros...@gm...> wrote: > Hi Don, > > So the gd_metaflush call is relatively quick. If I change that to a > gd_flush(D_,NULL) then it hangs there. It actually takes 45 seconds to > flush the data but it only crashes when it starts to call the > gd_close(). The screen dumps the following error: > > *** glibc detected *** ./bin/sptControl: double free or corruption > (fasttop): 0x....... *** > > Plus a much longer backtrace if it helps but I'm assuming that's from my code. > > Interestingly, if I don't have fragments it takes a longer (55 > seconds) but does not cause the crash - The close is very quick once > flushed. > > Any thoughts? > > Cheers > > Ross > > On Mon, Mar 21, 2011 at 11:49 PM, D. V. Wiebe <ge...@ke...> wrote: >> On Mon, Mar 21, 2011 at 10:16:32PM -0500, Ross Williamson wrote: >>> Hi Everyone, >>> >>> Sorry a few more questions: >>> >>> 1) I think the general philosophy of how I'm implementing dirfiles is >>> wrong. I currently open all my fields (about 2000 of them), write to >>> them every few seconds with gd_putdata and about every 30 minutes >>> attempt to close down the current dirfile (see 2) and then open a new >>> one. The reason for closing down the dirfile and opening another is >>> purely a hangover from a previous system but I think it's something >>> that would be useful. Should I be thinking more along the lines of >>> opening a single dirfile and just keep on writing to it indefinitely. >>> i.e. do not try and split my data into separate dirfiles - just keep >>> appending the one that is there. >> >> There's nothing inherently bad about chunking up your dirfile into >> different time slices, but so long as you don't run into filesystem >> limits on the raw data files, you shouldn't have problems with just >> making a bigger dirfile. (If I got the largefile support right in >> the library, on an ext3 partition the limit should be 2**32 blocks >> (ie. 2 Terabytes for a 4kB blocksize) per raw data file.) It's really >> up to you, and how you want to structure your analysis software. >> >> All the data acquired during the BLAST06 flight ended up in a single >> dirfile, which was, ~12 days long @ 100 Hz, resulting in about 100 >> megasamples per bolometer channel. We found it convenient. >> >>> 2) When I do try and close my dirfile using gd_close() the flush takes >>> a long time (especially when using fragments) - about 10ish seconds. >>> This is actually too long for my code and it crashes (something I can >>> possibly fix in my code). Is this normal or am I screwing up >>> somewhere? >> >> That sounds surprisingly long, although it may just have to do with >> flushing 2000 large files to disk. I'd be curious to know if the delay >> occurs when writing the metadata to disk (almost certainly inefficient >> code in the library) or the data itself (which might just be a >> filesystem imposed data transfer limit). >> >> Can you time a call to: >> >> gd_metaflush(dirfile); >> >> immediately before: >> >> gd_close(dirfile); >> >> (time that too) and let me know the results? The sum of these two calls >> should equal the time for gd_close() alone (since gd_close() calls >> gd_metaflush() internally, and this second call to gd_metaflush() will do >> nothing if you call it explicitly first). >> >>> 3) How did you BLAST guys deal with your bololometer data? I currently >>> just have about 1500 separate fields each of which has 200 samples per >>> frame. Sound right? >> >> Basically. BLAST had ~288 bolometers, sampled at 100Hz and assembled >> into 5Hz frames (the rate of the housekeeping data) with 20 samples per >> frame. Each bolometer was written to a separate field as 24-bit >> integers, extended to 32-bits. >> >> Cheers, >> -dvw >> -- >> D. V. Wiebe >> ge...@ke... >> http://getdata.sourceforge.net/ >> > > > > -- > Ross Williamson > University of Chicago > Department of Astronomy & Astrophysics > 773-834-9785 (office) > 312-504-3051 (Cell) > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Ross W. <ros...@gm...> - 2011-03-22 17:00:31
|
Hi Don, So the gd_metaflush call is relatively quick. If I change that to a gd_flush(D_,NULL) then it hangs there. It actually takes 45 seconds to flush the data but it only crashes when it starts to call the gd_close(). The screen dumps the following error: *** glibc detected *** ./bin/sptControl: double free or corruption (fasttop): 0x....... *** Plus a much longer backtrace if it helps but I'm assuming that's from my code. Interestingly, if I don't have fragments it takes a longer (55 seconds) but does not cause the crash - The close is very quick once flushed. Any thoughts? Cheers Ross On Mon, Mar 21, 2011 at 11:49 PM, D. V. Wiebe <ge...@ke...> wrote: > On Mon, Mar 21, 2011 at 10:16:32PM -0500, Ross Williamson wrote: >> Hi Everyone, >> >> Sorry a few more questions: >> >> 1) I think the general philosophy of how I'm implementing dirfiles is >> wrong. I currently open all my fields (about 2000 of them), write to >> them every few seconds with gd_putdata and about every 30 minutes >> attempt to close down the current dirfile (see 2) and then open a new >> one. The reason for closing down the dirfile and opening another is >> purely a hangover from a previous system but I think it's something >> that would be useful. Should I be thinking more along the lines of >> opening a single dirfile and just keep on writing to it indefinitely. >> i.e. do not try and split my data into separate dirfiles - just keep >> appending the one that is there. > > There's nothing inherently bad about chunking up your dirfile into > different time slices, but so long as you don't run into filesystem > limits on the raw data files, you shouldn't have problems with just > making a bigger dirfile. (If I got the largefile support right in > the library, on an ext3 partition the limit should be 2**32 blocks > (ie. 2 Terabytes for a 4kB blocksize) per raw data file.) It's really > up to you, and how you want to structure your analysis software. > > All the data acquired during the BLAST06 flight ended up in a single > dirfile, which was, ~12 days long @ 100 Hz, resulting in about 100 > megasamples per bolometer channel. We found it convenient. > >> 2) When I do try and close my dirfile using gd_close() the flush takes >> a long time (especially when using fragments) - about 10ish seconds. >> This is actually too long for my code and it crashes (something I can >> possibly fix in my code). Is this normal or am I screwing up >> somewhere? > > That sounds surprisingly long, although it may just have to do with > flushing 2000 large files to disk. I'd be curious to know if the delay > occurs when writing the metadata to disk (almost certainly inefficient > code in the library) or the data itself (which might just be a > filesystem imposed data transfer limit). > > Can you time a call to: > > gd_metaflush(dirfile); > > immediately before: > > gd_close(dirfile); > > (time that too) and let me know the results? The sum of these two calls > should equal the time for gd_close() alone (since gd_close() calls > gd_metaflush() internally, and this second call to gd_metaflush() will do > nothing if you call it explicitly first). > >> 3) How did you BLAST guys deal with your bololometer data? I currently >> just have about 1500 separate fields each of which has 200 samples per >> frame. Sound right? > > Basically. BLAST had ~288 bolometers, sampled at 100Hz and assembled > into 5Hz frames (the rate of the housekeeping data) with 20 samples per > frame. Each bolometer was written to a separate field as 24-bit > integers, extended to 32-bits. > > Cheers, > -dvw > -- > D. V. Wiebe > ge...@ke... > http://getdata.sourceforge.net/ > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: D. V. W. <ge...@ke...> - 2011-03-22 04:49:33
|
On Mon, Mar 21, 2011 at 10:16:32PM -0500, Ross Williamson wrote: > Hi Everyone, > > Sorry a few more questions: > > 1) I think the general philosophy of how I'm implementing dirfiles is > wrong. I currently open all my fields (about 2000 of them), write to > them every few seconds with gd_putdata and about every 30 minutes > attempt to close down the current dirfile (see 2) and then open a new > one. The reason for closing down the dirfile and opening another is > purely a hangover from a previous system but I think it's something > that would be useful. Should I be thinking more along the lines of > opening a single dirfile and just keep on writing to it indefinitely. > i.e. do not try and split my data into separate dirfiles - just keep > appending the one that is there. There's nothing inherently bad about chunking up your dirfile into different time slices, but so long as you don't run into filesystem limits on the raw data files, you shouldn't have problems with just making a bigger dirfile. (If I got the largefile support right in the library, on an ext3 partition the limit should be 2**32 blocks (ie. 2 Terabytes for a 4kB blocksize) per raw data file.) It's really up to you, and how you want to structure your analysis software. All the data acquired during the BLAST06 flight ended up in a single dirfile, which was, ~12 days long @ 100 Hz, resulting in about 100 megasamples per bolometer channel. We found it convenient. > 2) When I do try and close my dirfile using gd_close() the flush takes > a long time (especially when using fragments) - about 10ish seconds. > This is actually too long for my code and it crashes (something I can > possibly fix in my code). Is this normal or am I screwing up > somewhere? That sounds surprisingly long, although it may just have to do with flushing 2000 large files to disk. I'd be curious to know if the delay occurs when writing the metadata to disk (almost certainly inefficient code in the library) or the data itself (which might just be a filesystem imposed data transfer limit). Can you time a call to: gd_metaflush(dirfile); immediately before: gd_close(dirfile); (time that too) and let me know the results? The sum of these two calls should equal the time for gd_close() alone (since gd_close() calls gd_metaflush() internally, and this second call to gd_metaflush() will do nothing if you call it explicitly first). > 3) How did you BLAST guys deal with your bololometer data? I currently > just have about 1500 separate fields each of which has 200 samples per > frame. Sound right? Basically. BLAST had ~288 bolometers, sampled at 100Hz and assembled into 5Hz frames (the rate of the housekeeping data) with 20 samples per frame. Each bolometer was written to a separate field as 24-bit integers, extended to 32-bits. Cheers, -dvw -- D. V. Wiebe ge...@ke... http://getdata.sourceforge.net/ |
From: Ross W. <ros...@gm...> - 2011-03-22 03:18:06
|
Hi Everyone, Sorry a few more questions: 1) I think the general philosophy of how I'm implementing dirfiles is wrong. I currently open all my fields (about 2000 of them), write to them every few seconds with gd_putdata and about every 30 minutes attempt to close down the current dirfile (see 2) and then open a new one. The reason for closing down the dirfile and opening another is purely a hangover from a previous system but I think it's something that would be useful. Should I be thinking more along the lines of opening a single dirfile and just keep on writing to it indefinitely. i.e. do not try and split my data into separate dirfiles - just keep appending the one that is there. 2) When I do try and close my dirfile using gd_close() the flush takes a long time (especially when using fragments) - about 10ish seconds. This is actually too long for my code and it crashes (something I can possibly fix in my code). Is this normal or am I screwing up somewhere? 3) How did you BLAST guys deal with your bololometer data? I currently just have about 1500 separate fields each of which has 200 samples per frame. Sound right? Cheers for all your help. Ross -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Michael M. <mmi...@as...> - 2011-03-21 18:17:49
|
You might actually just have too many open files. What does "ulimit -n" report? On many systems the default maximum is 1024. Not sure why it would delay complaining, though. In the (common) case that you are on a *nix platform that uses PAM, you can configure this limit via /etc/security/limits.conf. ...Milligan On Mon, Mar 21, 2011 at 12:17:13PM -0500, Ross Williamson wrote: > Hi Everyone > > So I'm now stuck with an issue where I'm receiving an error regarding > "too many open files" and I'm convinced I'm not using the API > correctly. > > I create a Dirfile and then add about 2000 different field_codes for > various things (located in different fragments). About every 10 > seconds I dump the data into the fields using gd_putdata. It works for > the first x (not sure exactly how many) and then get_data returns the > "too many open files error". What is the correct way to put data into > so many fields? > > Thanks > > Ross > > -- > Ross Williamson > University of Chicago > Department of Astronomy & Astrophysics > 773-834-9785 (office) > 312-504-3051 (Cell) > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > getdata-devel mailing list > get...@li... > https://lists.sourceforge.net/lists/listinfo/getdata-devel -- Key fingerprint = 9F6B E8F5 206F 35E9 FABB 9EAD 398D CD42 D1CE 8C87 |
From: Ross W. <ros...@gm...> - 2011-03-21 18:11:57
|
Sweet - Thanks Changed entry in /etc/security/limits.conf required a log out/log in to take effect Ross On Mon, Mar 21, 2011 at 12:31 PM, Michael Milligan <mmi...@as...> wrote: > You might actually just have too many open files. What does > "ulimit -n" report? On many systems the default maximum is 1024. > Not sure why it would delay complaining, though. > > In the (common) case that you are on a *nix platform that uses PAM, > you can configure this limit via /etc/security/limits.conf. > > ...Milligan > > On Mon, Mar 21, 2011 at 12:17:13PM -0500, Ross Williamson wrote: >> Hi Everyone >> >> So I'm now stuck with an issue where I'm receiving an error regarding >> "too many open files" and I'm convinced I'm not using the API >> correctly. >> >> I create a Dirfile and then add about 2000 different field_codes for >> various things (located in different fragments). About every 10 >> seconds I dump the data into the fields using gd_putdata. It works for >> the first x (not sure exactly how many) and then get_data returns the >> "too many open files error". What is the correct way to put data into >> so many fields? >> >> Thanks >> >> Ross >> >> -- >> Ross Williamson >> University of Chicago >> Department of Astronomy & Astrophysics >> 773-834-9785 (office) >> 312-504-3051 (Cell) >> >> ------------------------------------------------------------------------------ >> Colocation vs. Managed Hosting >> A question and answer guide to determining the best fit >> for your organization - today and in the future. >> http://p.sf.net/sfu/internap-sfd2d >> _______________________________________________ >> getdata-devel mailing list >> get...@li... >> https://lists.sourceforge.net/lists/listinfo/getdata-devel > > -- > Key fingerprint = 9F6B E8F5 206F 35E9 FABB 9EAD 398D CD42 D1CE 8C87 > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAk2Hi4gACgkQOY3NQtHOjIdYowCfVzJsX3Oe2Di+Hk4XOr4OmC14 > /EwAnjiwl5iwhO62GcFks5xm71s6qRww > =ApZe > -----END PGP SIGNATURE----- > > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Matthew D T. <ma...@tr...> - 2011-03-21 17:29:24
|
> So I'm now stuck with an issue where I'm receiving an error regarding > "too many open files" and I'm convinced I'm not using the API > correctly. > > I create a Dirfile and then add about 2000 different field_codes for > various things (located in different fragments). About every 10 > seconds I dump the data into the fields using gd_putdata. It works for > the first x (not sure exactly how many) and then get_data returns the > "too many open files error". What is the correct way to put data into > so many fields? We run into this on BLAST as well, although we don't have quite that many open files. You've hit the Linux open file descriptor limit. My guess is that the output of `ulimit -n` lists about the number that get written. The way to increase this limit varies slightly between distribution, but you should be able to google how to do it relatively easily. If you use Fedora (and derivatives) or a quick google makes me think that if you use Ubuntu (and derivatives) you can change the maximum via the file /etc/security/limits.conf (which may require a reboot). -- "One in every seven days is a Thursday." -------------------------- Matthew Truch Department of Physics and Astronomy University of Pennsylvania ma...@tr... http://matt.truch.net/ |
From: Ross W. <ros...@gm...> - 2011-03-21 17:17:40
|
Hi Everyone So I'm now stuck with an issue where I'm receiving an error regarding "too many open files" and I'm convinced I'm not using the API correctly. I create a Dirfile and then add about 2000 different field_codes for various things (located in different fragments). About every 10 seconds I dump the data into the fields using gd_putdata. It works for the first x (not sure exactly how many) and then get_data returns the "too many open files error". What is the correct way to put data into so many fields? Thanks Ross -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Matthew T. <ma...@tr...> - 2011-03-18 21:46:25
|
"D. V. Wiebe" <ge...@ke...> wrote: >On Fri, Mar 18, 2011 at 03:42:15PM -0500, Ross Williamson wrote: >> Yes I can - I was just using that as an example name but I do have >> other fields (status for example) that share the same names. >> >> I can fudge it though, > >Okay, I got it. It's not a crazy idea. > >Way back when we first introduced includes we considered this problem. >Someone (Matt?) suggested allowing a optional prefix (or suffix?) to be >attached to fields defined in a fragment. So you could include, say, >"foo/format" and tell GetData to prepend "foo_" to all the field names. >And then you could add "bar/format" and tell GetData to prepend "bar_", >&c. allowing you to have the same field names in foo/ and bar/. > >It never got implemented for a variety of reasons. Not the smallest >being that this was before GetData had blebbed off from kst and >changing >the metadata parser always ran the risk of losing the ability to read >old dirfiles. (The modern parser is significantly more robust.) But >also, we never ran into a situation where it was the easiest solution >around the problem. > >However, if you think it would be useful, I could look into >implementing >it. Since we were thinking about it way back then, I think some of the >framework is there, if it hasn't atrophied. > >Let me know, >-dvw >-- >D. V. Wiebe >ge...@ke... >http://getdata.sourceforge.net/ >------------------------------------------------------------------------------ >Colocation vs. Managed Hosting >A question and answer guide to determining the best fit >for your organization - today and in the future. >http://p.sf.net/sfu/internap-sfd2d_______________________________________________ >getdata-devel mailing list >get...@li... >https://lists.sourceforge.net/lists/listinfo/getdata-devel Actually, with current BLAST analysis, this could be useful. Although it might be too late for us this time. But I'll vote for it. -- Mathew Truch ma...@tr... |
From: D. V. W. <ge...@ke...> - 2011-03-18 21:34:58
|
On Fri, Mar 18, 2011 at 03:42:15PM -0500, Ross Williamson wrote: > Yes I can - I was just using that as an example name but I do have > other fields (status for example) that share the same names. > > I can fudge it though, Okay, I got it. It's not a crazy idea. Way back when we first introduced includes we considered this problem. Someone (Matt?) suggested allowing a optional prefix (or suffix?) to be attached to fields defined in a fragment. So you could include, say, "foo/format" and tell GetData to prepend "foo_" to all the field names. And then you could add "bar/format" and tell GetData to prepend "bar_", &c. allowing you to have the same field names in foo/ and bar/. It never got implemented for a variety of reasons. Not the smallest being that this was before GetData had blebbed off from kst and changing the metadata parser always ran the risk of losing the ability to read old dirfiles. (The modern parser is significantly more robust.) But also, we never ran into a situation where it was the easiest solution around the problem. However, if you think it would be useful, I could look into implementing it. Since we were thinking about it way back then, I think some of the framework is there, if it hasn't atrophied. Let me know, -dvw -- D. V. Wiebe ge...@ke... http://getdata.sourceforge.net/ |
From: Ross W. <ros...@gm...> - 2011-03-18 20:42:42
|
Yes I can - I was just using that as an example name but I do have other fields (status for example) that share the same names. I can fudge it though, Ross On Fri, Mar 18, 2011 at 3:26 PM, D. V. Wiebe <ge...@ke...> wrote: > On Fri, Mar 18, 2011 at 03:16:49PM -0500, Ross Williamson wrote: >> Ah ok great - that makes sense. >> >> So I've run into a little problem though with fragments. It looks >> like you can't have the same name of a field (i.e. utc) even if they >> are part of a different fragment. I have lot's of directories that >> have utc as their timestamp and it won't let me create those using >> gd_add_raw where the dirfile is the top level and I'm referencing the >> index to the fragment. >> >> Am I doing something wrong? I'd rather not have individual top level >> dirfile instances for each subdirectory >> >> Ross > > As it is, all dirfile fields share the same namespace, regardless of > where they're defined. > > I don't understand what you're trying to do. Don't all those utc fields > have the same data in them? So can't you get by with just one? > > -dvw > -- > D. V. Wiebe > ge...@ke... > http://getdata.sourceforge.net/ > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: D. V. W. <ge...@ke...> - 2011-03-18 20:27:06
|
On Fri, Mar 18, 2011 at 03:16:49PM -0500, Ross Williamson wrote: > Ah ok great - that makes sense. > > So I've run into a little problem though with fragments. It looks > like you can't have the same name of a field (i.e. utc) even if they > are part of a different fragment. I have lot's of directories that > have utc as their timestamp and it won't let me create those using > gd_add_raw where the dirfile is the top level and I'm referencing the > index to the fragment. > > Am I doing something wrong? I'd rather not have individual top level > dirfile instances for each subdirectory > > Ross As it is, all dirfile fields share the same namespace, regardless of where they're defined. I don't understand what you're trying to do. Don't all those utc fields have the same data in them? So can't you get by with just one? -dvw -- D. V. Wiebe ge...@ke... http://getdata.sourceforge.net/ |
From: Ross W. <ros...@gm...> - 2011-03-18 20:17:16
|
Ah ok great - that makes sense. So I've run into a little problem though with fragments. It looks like you can't have the same name of a field (i.e. utc) even if they are part of a different fragment. I have lot's of directories that have utc as their timestamp and it won't let me create those using gd_add_raw where the dirfile is the top level and I'm referencing the index to the fragment. Am I doing something wrong? I'd rather not have individual top level dirfile instances for each subdirectory Ross On Mon, Mar 14, 2011 at 7:36 PM, D. V. Wiebe <ge...@ke...> wrote: > On Mon, Mar 14, 2011 at 06:15:50PM -0500, Ross Williamson wrote: >> Awesome thanks for that. >> >> I have one more question. I'm missing the reason why fragments are >> useful - Is it just to split up a potentially large format file? >> >> Cheers >> >> Ross > > In practice, large format files aren't an issue. I've never encountered > one too big. (It'd have to be >2Gb, which would be a lot of "format"). > > Basically, fragments make the database modular. We invented fragments > when we started to analyse BLAST data. In this situation, different > people are working on different parts of the data reduction. Someone > (say, me) who wanted to participate in the analysis would need to collect > deconvolved detectors from Person D, calibration timestreams from Person > M, pointing solution timestreams from Person G, &c. Fragments meant I > could organise an analysis dirfile like this: > > - blast_data > +- format > +- deconvoled_bolos > | +- decon_ch1_rev7 > | +- decon_ch2_rev7 > | +- decon_ch3_rev7 > | `- format > +- pointing > | +- ra_rev8 > | +- dec_rev8 > | +- roll_rev8 > | `- format > `- calibration > +- calib_ch1_rev3 > +- calib_ch2_rev3 > +- calib_ch3_rev3 > `- format > > and have the top level format file be just three INCLUDE directives to > the subdirfiles. The benefit of doing this is that later, when Person > M makes a new calibration ("rev4"), he tars up a new "calibration" > directory, including a format file with all the necessary metadata, > and all I have to to is delete my curent "calibration" directory > and untar the new one from Person M in its place, and I'm ready > to go. > > Before inventing this, a new calibration or whatever entailed adding > all the new data files to the dirfile directory and then editing the > format file (by hand! -- this was before the GetData library could > deal with modifying metadata) to replace all the definitions of > "calib_*_rev3" with "calib_*_rev4". It quickly got tiring. > > That is, really inventing fragments were a way of pulling in > subdirectories into a parent dirfile directory. Being able to > include another format file fragment in the *same* directory was > just syntactic sugar. > > Make sense? > -dvw > -- > D. V. Wiebe > ge...@ke... > http://getdata.sourceforge.net/ > -- Ross Williamson University of Chicago Department of Astronomy & Astrophysics 773-834-9785 (office) 312-504-3051 (Cell) |
From: Matthew D T. <ma...@tr...> - 2011-03-18 03:13:59
|
> ----------- > fix date of changlog entry > > Modified Paths: > -------------- > trunk/getdata/ChangeLog > > Modified: trunk/getdata/ChangeLog > =================================================================== > --- trunk/getdata/ChangeLog 2011-03-17 22:49:29 UTC (rev 520) > +++ trunk/getdata/ChangeLog 2011-03-17 22:49:43 UTC (rev 521) > @@ -1,4 +1,4 @@ > -2010-12-13 Peter Kümmel <syn...@gm...> > +2010-03-17 Peter Kümmel <syn...@gm...> > * use _stat64 and struct _stat64 with msvc > * fix tests by removing the content of dirfile > * guard definitions of macros in C++ binding Since you are correcting the date, you might make note that it's 2011. ;-) -- "If you have only seen it once, then you haven't seen it twice." -------------------------- Matthew Truch Department of Physics and Astronomy University of Pennsylvania ma...@tr... http://matt.truch.net/ |
From: Peter K. <syn...@gm...> - 2011-03-15 23:28:29
|
On 26.02.2011 07:00, D. V. Wiebe wrote: > On Sun, Feb 13, 2011 at 02:24:40PM +0100, Peter K?mmel wrote: >> When I use --prefix /opt/local under OSX >> libgetdata++.dylib links against a libgetata >> in /usr/local which does not exists: >> >> otool -L /opt/local/libgetata++.dylib >> >> I don't know where the user local comes from but in >> binding/cxx/.lib/libgetdata++.lai I found two >> /usr/local entries which lock responsible for the >> wrong path. >> >> Peter > > I suspect you changed --prefix, but didn't do a "make clean" before > running "make" again. It's a very lame feature of libtool: it > hardcodes --prefix into .la files, but doesn't change them when you > modifiy the prefix unless you blow them away to force make to re-create > them. > > I don't understand why that is, possibly my inability to use libtool... > > If it persists after a "make clean", that's very weird. Let me know. > Yes, a clean rebuild solved it. Pter |
From: D. V. W. <ge...@ke...> - 2011-03-15 00:37:10
|
On Mon, Mar 14, 2011 at 06:15:50PM -0500, Ross Williamson wrote: > Awesome thanks for that. > > I have one more question. I'm missing the reason why fragments are > useful - Is it just to split up a potentially large format file? > > Cheers > > Ross In practice, large format files aren't an issue. I've never encountered one too big. (It'd have to be >2Gb, which would be a lot of "format"). Basically, fragments make the database modular. We invented fragments when we started to analyse BLAST data. In this situation, different people are working on different parts of the data reduction. Someone (say, me) who wanted to participate in the analysis would need to collect deconvolved detectors from Person D, calibration timestreams from Person M, pointing solution timestreams from Person G, &c. Fragments meant I could organise an analysis dirfile like this: - blast_data +- format +- deconvoled_bolos | +- decon_ch1_rev7 | +- decon_ch2_rev7 | +- decon_ch3_rev7 | `- format +- pointing | +- ra_rev8 | +- dec_rev8 | +- roll_rev8 | `- format `- calibration +- calib_ch1_rev3 +- calib_ch2_rev3 +- calib_ch3_rev3 `- format and have the top level format file be just three INCLUDE directives to the subdirfiles. The benefit of doing this is that later, when Person M makes a new calibration ("rev4"), he tars up a new "calibration" directory, including a format file with all the necessary metadata, and all I have to to is delete my curent "calibration" directory and untar the new one from Person M in its place, and I'm ready to go. Before inventing this, a new calibration or whatever entailed adding all the new data files to the dirfile directory and then editing the format file (by hand! -- this was before the GetData library could deal with modifying metadata) to replace all the definitions of "calib_*_rev3" with "calib_*_rev4". It quickly got tiring. That is, really inventing fragments were a way of pulling in subdirectories into a parent dirfile directory. Being able to include another format file fragment in the *same* directory was just syntactic sugar. Make sense? -dvw -- D. V. Wiebe ge...@ke... http://getdata.sourceforge.net/ |