getdata-devel Mailing List for GetData (Page 7)

Scientific Database Format

Brought to you by: ketiltrout

getdata-devel — General discussion of GetData development.

You can subscribe to this list here.

2008	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep (6)	Oct (1)	Nov (10)	Dec (4)
2009	Jan (1)	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (4)	Oct (2)	Nov (1)	Dec
2010	Jan	Feb (4)	Mar (1)	Apr (3)	May	Jun	Jul (21)	Aug (1)	Sep (16)	Oct (2)	Nov (12)	Dec (11)
2011	Jan (2)	Feb (5)	Mar (42)	Apr (1)	May	Jun	Jul (5)	Aug	Sep (4)	Oct (4)	Nov (7)	Dec (9)
2012	Jan	Feb	Mar	Apr (3)	May (1)	Jun	Jul (9)	Aug (1)	Sep	Oct (3)	Nov	Dec (5)
2013	Jan (2)	Feb	Mar (9)	Apr (3)	May (1)	Jun	Jul	Aug (3)	Sep (3)	Oct (1)	Nov (1)	Dec
2014	Jan (4)	Feb (7)	Mar	Apr	May	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec (1)
2015	Jan	Feb	Mar	Apr	May (7)	Jun	Jul	Aug (6)	Sep (6)	Oct (1)	Nov	Dec
2016	Jan	Feb (6)	Mar (11)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2017	Jan (1)	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2018	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2020	Jan (1)	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2021	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep (1)	Oct	Nov (1)	Dec

Flat | Threaded

<< < 1 .. 5 6 7 8 9 .. 12 > >> (Page 7 of 12)

Re: [getdata-devel] Flush time + usage philosophy

From: Ross W. <ros...@gm...> - 2011-03-22 22:04:02

Amendum:

The reverse loop is actually:

for (j = D->n_fragment-1; j>0; j--)

R

On Tue, Mar 22, 2011 at 4:34 PM, Ross Williamson
<ros...@gm...> wrote:
> Ok so I took it upon myself to have a go at hacking close.c. I think
> the issue might the counting of fragments.  I have a simple situation
> where there are 4 fragments inside a top level dirfile.  I initially
> looked at the value of D-> n_fragment which returned a value of 5.
> This seemed odd as there are only 4.
>
> I also though we might need to deallocate the fragments in reverse
> order - I changed the loop code to (making the mistake that I should
> have done j>=0)
>
> for (j = D->n_fragment; j>0; j--)
>
> Whooo - This worked. Changing it to j>=0 crapped out at index
> fragment[0].  Just to make sure I changed the code to:
>
> for(j=1; j<D->n_fragment; ++j) and that also worked - so it seems that
> there are one two many fragments in n_fragment and fragment[0] does
> not exist.
>
> Am I playing with fire here?
>
> Cheers
>
> Ross
>
> On Tue, Mar 22, 2011 at 3:16 PM, D. V. Wiebe <ge...@ke...> wrote:
>> On Tue, Mar 22, 2011 at 03:07:46PM -0500, Ross Williamson wrote:
>>> I was originally using getdata from the repository that kst2 is
>>> distributed with (sorry can't remember off hand). I'm now running off
>>> my own compiled version to get the debugging symbols. Both show the
>>> same error.
>>>
>>> Yeah valgrinding gcp is not something I'm looking forward too :) I'm
>>> going to see if I get the same are with single level nested fragments
>>> rather than double level fragments (i.e. dirfile inside a dirfile
>>> inside a dirfile)
>>>
>>> Cheers
>>>
>>> Ross
>>
>> If you're compiling from source, you could try enabling debugging
>> messages (--enable-debug) which will result in the library printing all
>> sorts of debugging messages on stderr (mostly function traces).  If you
>> can endeavour to capture those messages, you could send them to me and
>> I could take a look.  It's admittedly an outside chance, but it might
>> give an indication on what's going on.
>>
>> -dvw
>> --
>> D. V. Wiebe
>> ge...@ke...
>> http://getdata.sourceforge.net/
>>
>
>
>
> --
> Ross Williamson
> University of Chicago
> Department of Astronomy & Astrophysics
> 773-834-9785 (office)
> 312-504-3051 (Cell)
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] Flush time + usage philosophy

From: Ross W. <ros...@gm...> - 2011-03-22 21:34:58

Ok so I took it upon myself to have a go at hacking close.c. I think
the issue might the counting of fragments.  I have a simple situation
where there are 4 fragments inside a top level dirfile.  I initially
looked at the value of D-> n_fragment which returned a value of 5.
This seemed odd as there are only 4.

I also though we might need to deallocate the fragments in reverse
order - I changed the loop code to (making the mistake that I should
have done j>=0)

for (j = D->n_fragment; j>0; j--)

Whooo - This worked. Changing it to j>=0 crapped out at index
fragment[0].  Just to make sure I changed the code to:

for(j=1; j<D->n_fragment; ++j) and that also worked - so it seems that
there are one two many fragments in n_fragment and fragment[0] does
not exist.

Am I playing with fire here?

Cheers

Ross

On Tue, Mar 22, 2011 at 3:16 PM, D. V. Wiebe <ge...@ke...> wrote:
> On Tue, Mar 22, 2011 at 03:07:46PM -0500, Ross Williamson wrote:
>> I was originally using getdata from the repository that kst2 is
>> distributed with (sorry can't remember off hand). I'm now running off
>> my own compiled version to get the debugging symbols. Both show the
>> same error.
>>
>> Yeah valgrinding gcp is not something I'm looking forward too :) I'm
>> going to see if I get the same are with single level nested fragments
>> rather than double level fragments (i.e. dirfile inside a dirfile
>> inside a dirfile)
>>
>> Cheers
>>
>> Ross
>
> If you're compiling from source, you could try enabling debugging
> messages (--enable-debug) which will result in the library printing all
> sorts of debugging messages on stderr (mostly function traces).  If you
> can endeavour to capture those messages, you could send them to me and
> I could take a look.  It's admittedly an outside chance, but it might
> give an indication on what's going on.
>
> -dvw
> --
> D. V. Wiebe
> ge...@ke...
> http://getdata.sourceforge.net/
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] Flush time + usage philosophy

From: Ross W. <ros...@gm...> - 2011-03-22 20:08:28

I was originally using getdata from the repository that kst2 is
distributed with (sorry can't remember off hand). I'm now running off
my own compiled version to get the debugging symbols. Both show the
same error.

Yeah valgrinding gcp is not something I'm looking forward too :) I'm
going to see if I get the same are with single level nested fragments
rather than double level fragments (i.e. dirfile inside a dirfile
inside a dirfile)

Cheers

Ross

On Tue, Mar 22, 2011 at 2:57 PM, D. V. Wiebe <ge...@ke...> wrote:
> On Tue, Mar 22, 2011 at 02:01:58PM -0500, Ross Williamson wrote:
>> OK here is the trace with debugging in getdata - I haven't yet (I
>> think) got the glibc debugging symbols sorted but it does look like
>> the error is close.c in the dirfile code. I'm going to look at the
>> getdata code a bit more but any help is appreciated.
>>
>> Again - note this only happens when I have fragments in the top-level
>> (and second-level) dirfiles.
>>
>> #0  0x00007ffff31baba5 in raise (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> #1  0x00007ffff31be6b0 in abort () at abort.c:92
>> #2  0x00007ffff31f443b in __libc_message (do_abort=<value optimized
>> out>, fmt=<value optimized out>)
>>     at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
>> #3  0x00007ffff31fe4b6 in malloc_printerr (action=3,
>> str=0x7ffff32ceca2 "corrupted double-linked list", ptr=<value
>> optimized out>)
>>     at malloc.c:6283
>> #4  0x00007ffff31fe961 in malloc_consolidate (av=<value optimized
>> out>) at malloc.c:5169
>> #5  0x00007ffff3201350 in _int_free (av=0x7ffff3505e40, p=0x1c66110)
>> at malloc.c:5034
>> #6  0x00007ffff3204c83 in __libc_free (mem=<value optimized out>) at
>> malloc.c:3738
>> #7  0x00007ffff43b841b in _GD_FreeD (D=0xc069e0, flush_meta=<value
>> optimized out>) at close.c:46
>> #8  _GD_ShutdownDirfile (D=0xc069e0, flush_meta=<value optimized out>)
>> at close.c:93
>> #9  0x00007ffff7b688e1 in
>> gcp::control::ArchiverWriterDirfile::closeArcfile (this=0x6878e0) at
>> ArchiverWriterDirfile.c:296
>> #10 0x00007ffff7b67d6b in
>> gcp::control::ArchiverWriterDirfile::openArcfile (this=0x6878e0,
>> dir=0xc06b40 "/home/rw247/arc")
>>     at ArchiverWriterDirfile.c:190
>> #11 0x00007ffff7b64d4b in arc_save_integration (arc=0x684e60) at archiver.c:1212
>> #12 0x00007ffff7b63e6e in archiver_thread (arg=0x684e60) at archiver.c:899
>> #13 0x00007ffff5330971 in start_thread (arg=<value optimized out>) at
>> pthread_create.c:304
>> #14 0x00007ffff326d92d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #15 0x0000000000000000 in ?? ()
>
> Well, the symptom is certainly in gd_close(), when GetData tries to
> free the list of fields while destroying the DIRFILE object (resulting
> in that double-free message).  But the corruption is presumably happening
> earlier and going undetected.
>
> As Ted points out, running gcp under valgrind would likely catch the
> culprit, regardless of whether it's happening in GetData or gcp.
> (Although I realise valgrinding gcp can be daunting.)  I periodically
> run valgrind on the GetData test-suite, but I'm sure the suite does not
> cover the whole library code.
>
> Which version of GetData are you using?  Did you compile from source, or
> get it from a distribution?
>
> -dvw
> --
> D. V. Wiebe
> ge...@ke...
> http://getdata.sourceforge.net/
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] Flush time + usage philosophy

From: D. V. W. <ge...@ke...> - 2011-03-22 19:57:43

On Tue, Mar 22, 2011 at 02:01:58PM -0500, Ross Williamson wrote:
> OK here is the trace with debugging in getdata - I haven't yet (I
> think) got the glibc debugging symbols sorted but it does look like
> the error is close.c in the dirfile code. I'm going to look at the
> getdata code a bit more but any help is appreciated.
> 
> Again - note this only happens when I have fragments in the top-level
> (and second-level) dirfiles.
> 
> #0  0x00007ffff31baba5 in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00007ffff31be6b0 in abort () at abort.c:92
> #2  0x00007ffff31f443b in __libc_message (do_abort=<value optimized
> out>, fmt=<value optimized out>)
>     at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
> #3  0x00007ffff31fe4b6 in malloc_printerr (action=3,
> str=0x7ffff32ceca2 "corrupted double-linked list", ptr=<value
> optimized out>)
>     at malloc.c:6283
> #4  0x00007ffff31fe961 in malloc_consolidate (av=<value optimized
> out>) at malloc.c:5169
> #5  0x00007ffff3201350 in _int_free (av=0x7ffff3505e40, p=0x1c66110)
> at malloc.c:5034
> #6  0x00007ffff3204c83 in __libc_free (mem=<value optimized out>) at
> malloc.c:3738
> #7  0x00007ffff43b841b in _GD_FreeD (D=0xc069e0, flush_meta=<value
> optimized out>) at close.c:46
> #8  _GD_ShutdownDirfile (D=0xc069e0, flush_meta=<value optimized out>)
> at close.c:93
> #9  0x00007ffff7b688e1 in
> gcp::control::ArchiverWriterDirfile::closeArcfile (this=0x6878e0) at
> ArchiverWriterDirfile.c:296
> #10 0x00007ffff7b67d6b in
> gcp::control::ArchiverWriterDirfile::openArcfile (this=0x6878e0,
> dir=0xc06b40 "/home/rw247/arc")
>     at ArchiverWriterDirfile.c:190
> #11 0x00007ffff7b64d4b in arc_save_integration (arc=0x684e60) at archiver.c:1212
> #12 0x00007ffff7b63e6e in archiver_thread (arg=0x684e60) at archiver.c:899
> #13 0x00007ffff5330971 in start_thread (arg=<value optimized out>) at
> pthread_create.c:304
> #14 0x00007ffff326d92d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #15 0x0000000000000000 in ?? ()

Well, the symptom is certainly in gd_close(), when GetData tries to
free the list of fields while destroying the DIRFILE object (resulting
in that double-free message).  But the corruption is presumably happening
earlier and going undetected.

As Ted points out, running gcp under valgrind would likely catch the
culprit, regardless of whether it's happening in GetData or gcp.
(Although I realise valgrinding gcp can be daunting.)  I periodically
run valgrind on the GetData test-suite, but I'm sure the suite does not
cover the whole library code.

Which version of GetData are you using?  Did you compile from source, or
get it from a distribution?

-dvw
-- 
D. V. Wiebe
ge...@ke...
http://getdata.sourceforge.net/

Re: [getdata-devel] Flush time + usage philosophy

From: Ross W. <ros...@gm...> - 2011-03-22 19:02:29

OK here is the trace with debugging in getdata - I haven't yet (I
think) got the glibc debugging symbols sorted but it does look like
the error is close.c in the dirfile code. I'm going to look at the
getdata code a bit more but any help is appreciated.

Again - note this only happens when I have fragments in the top-level
(and second-level) dirfiles.

#0  0x00007ffff31baba5 in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff31be6b0 in abort () at abort.c:92
#2  0x00007ffff31f443b in __libc_message (do_abort=<value optimized
out>, fmt=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00007ffff31fe4b6 in malloc_printerr (action=3,
str=0x7ffff32ceca2 "corrupted double-linked list", ptr=<value
optimized out>)
    at malloc.c:6283
#4  0x00007ffff31fe961 in malloc_consolidate (av=<value optimized
out>) at malloc.c:5169
#5  0x00007ffff3201350 in _int_free (av=0x7ffff3505e40, p=0x1c66110)
at malloc.c:5034
#6  0x00007ffff3204c83 in __libc_free (mem=<value optimized out>) at
malloc.c:3738
#7  0x00007ffff43b841b in _GD_FreeD (D=0xc069e0, flush_meta=<value
optimized out>) at close.c:46
#8  _GD_ShutdownDirfile (D=0xc069e0, flush_meta=<value optimized out>)
at close.c:93
#9  0x00007ffff7b688e1 in
gcp::control::ArchiverWriterDirfile::closeArcfile (this=0x6878e0) at
ArchiverWriterDirfile.c:296
#10 0x00007ffff7b67d6b in
gcp::control::ArchiverWriterDirfile::openArcfile (this=0x6878e0,
dir=0xc06b40 "/home/rw247/arc")
    at ArchiverWriterDirfile.c:190
#11 0x00007ffff7b64d4b in arc_save_integration (arc=0x684e60) at archiver.c:1212
#12 0x00007ffff7b63e6e in archiver_thread (arg=0x684e60) at archiver.c:899
#13 0x00007ffff5330971 in start_thread (arg=<value optimized out>) at
pthread_create.c:304
#14 0x00007ffff326d92d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()


On Tue, Mar 22, 2011 at 1:07 PM, Matthew D Truch <ma...@tr...> wrote:
> On Tue, Mar 22, 2011 at 11:00:27AM -0700, Ted Kisner wrote:
>> Can you recompile everything with "-O0 -g" and run in gdb?  Also
>> useful is to run this "debug copy" of the full software stack inside
>> valgrind to look for out-of-bounds memory access.
>
> You could also look into installing the "debuginfo" packages for glibc
> and getdata on your system (if your distribution supports such things).
>
> --
> "Duct Tape is like the Force.  It has a dark side, a light side, and holds the universe together."
> --------------------------
> Matthew Truch
> Department of Physics and Astronomy
> University of Pennsylvania
> ma...@tr...
> http://matt.truch.net/
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] Flush time + usage philosophy

From: Matthew D T. <ma...@tr...> - 2011-03-22 18:07:43

On Tue, Mar 22, 2011 at 11:00:27AM -0700, Ted Kisner wrote:
> Can you recompile everything with "-O0 -g" and run in gdb?  Also
> useful is to run this "debug copy" of the full software stack inside
> valgrind to look for out-of-bounds memory access.

You could also look into installing the "debuginfo" packages for glibc
and getdata on your system (if your distribution supports such things).  

-- 
"Duct Tape is like the Force.  It has a dark side, a light side, and holds the universe together."
--------------------------
Matthew Truch
Department of Physics and Astronomy
University of Pennsylvania
ma...@tr...
http://matt.truch.net/

Re: [getdata-devel] Flush time + usage philosophy

From: Ted K. <tsk...@gm...> - 2011-03-22 18:00:45

Can you recompile everything with "-O0 -g" and run in gdb?  Also useful is to run this "debug copy" of the full software stack inside valgrind to look for out-of-bounds memory access.

valgrind is very useful for finding double-free's...

-Ted


On Mar 22, 2011, at 10:56 AM, Ross Williamson wrote:

> Yeah - Here is the backtrace and dump. So it sounds like it might
> actually be the raid5 and not LVM.  hmm that' annoying.
> 
> START METAFLUSH
> END METAFLUSH
> START CLOSE
> *** glibc detected *** ./bin/sptControl: double free or corruption
> (fasttop): 0x0000000002320c70 ***
> ======= Backtrace: =========
> /lib/libc.so.6(+0x774b6)[0x7fdbebd3f4b6]
> /lib/libc.so.6(cfree+0x73)[0x7fdbebd45c83]
> /usr/lib/libgetdata.so.4(+0x8617)[0x7fdbecef9617]
> /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_ZN3gcp7control21ArchiverWriterDirfile12closeArcfileEv+0x1ca)[0x7fdbf04e651a]
> /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_ZN3gcp7control21ArchiverWriterDirfile11openArcfileEPc+0x45)[0x7fdbf04e7905]
> /home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_Z15archiver_threadPv+0x6b3)[0x7fdbf04e44b3]
> /lib/libpthread.so.0(+0x7971)[0x7fdbede71971]
> /lib/libc.so.6(clone+0x6d)[0x7fdbebdae92d]
> ======= Memory map: ========
> 00400000-00415000 r-xp 00000000 fc:00 60424210
>  /home/rw247/gcpSptpolDevel/gcp/bin/sptControl
> 00615000-00616000 r--p 00015000 fc:00 60424210
>  /home/rw247/gcpSptpolDevel/gcp/bin/sptControl
> 00616000-00617000 rw-p 00016000 fc:00 60424210
>  /home/rw247/gcpSptpolDevel/gcp/bin/sptControl
> 01bd2000-031d2000 rw-p 00000000 00:00 0                                  [heap]
> 7fdb5a8c8000-7fdbb0000000 rw-p 00000000 00:00 0
> 7fdbb0000000-7fdbb0080000 rw-p 00000000 00:00 0
> 7fdbb0080000-7fdbb4000000 ---p 00000000 00:00 0
> 7fdbb4319000-7fdbb431a000 rw-p 00000000 00:00 0
> 7fdbb431a000-7fdbb431b000 ---p 00000000 00:00 0
> 7fdbb431b000-7fdbb4b1b000 rw-p 00000000 00:00 0
> 7fdbb4b1b000-7fdbb4b1c000 ---p 00000000 00:00 0
> 7fdbb4b1c000-7fdbb531c000 rw-p 00000000 00:00 0
> 7fdbb531c000-7fdbb531d000 ---p 00000000 00:00 0
> 7fdbb531d000-7fdbb5b1d000 rw-p 00000000 00:00 0
> 7fdbb5b1d000-7fdbb5b1e000 ---p 00000000 00:00 0
> 7fdbb5b1e000-7fdbb631e000 rw-p 00000000 00:00 0
> 7fdbb631e000-7fdbb631f000 ---p 00000000 00:00 0
> 7fdbb631f000-7fdbb6b1f000 rw-p 00000000 00:00 0
> 7fdbb6b1f000-7fdbb6b20000 ---p 00000000 00:00 0
> 7fdbb6b20000-7fdbbda4a000 rw-p 00000000 00:00 0
> 7fdbbda4a000-7fdbbda4b000 ---p 00000000 00:00 0
> 7fdbbda4b000-7fdbbe700000 rw-p 00000000 00:00 0
> 7fdbbe779000-7fdbe9793000 rw-p 00000000 00:00 0
> 7fdbe9793000-7fdbe9796000 r-xp 00000000 08:01 11796565
>  /lib/libgpg-error.so.0.4.0
> 7fdbe9796000-7fdbe9995000 ---p 00003000 08:01 11796565
>  /lib/libgpg-error.so.0.4.0
> 7fdbe9995000-7fdbe9996000 r--p 00002000 08:01 11796565
>  /lib/libgpg-error.so.0.4.0
> 7fdbe9996000-7fdbe9997000 rw-p 00003000 08:01 11796565
>  /lib/libgpg-error.so.0.4.0
> 7fdbe9997000-7fdbe99a7000 r-xp 00000000 08:01 2885347
>  /usr/lib/libtasn1.so.3.1.9
> 7fdbe99a7000-7fdbe9ba6000 ---p 00010000 08:01 2885347
>  /usr/lib/libtasn1.so.3.1.9
> 7fdbe9ba6000-7fdbe9ba7000 r--p 0000f000 08:01 2885347
>  /usr/lib/libtasn1.so.3.1.9
> 7fdbe9ba7000-7fdbe9ba8000 rw-p 00010000 08:01 2885347
>  /usr/lib/libtasn1.so.3.1.9
> 7fdbe9ba8000-7fdbe9baa000 r-xp 00000000 08:01 11796895
>  /lib/libkeyutils.so.1.3
> 7fdbe9baa000-7fdbe9da9000 ---p 00002000 08:01 11796895
>  /lib/libkeyutils.so.1.3
> 7fdbe9da9000-7fdbe9daa000 r--p 00001000 08:01 11796895
>  /lib/libkeyutils.so.1.3
> 7fdbe9daa000-7fdbe9dab000 rw-p 00002000 08:01 11796895
>  /lib/libkeyutils.so.1.3
> 7fdbe9dab000-7fdbe9db2000 r-xp 00000000 08:01 2885337
>  /usr/lib/libkrb5support.so.0.1
> 7fdbe9db2000-7fdbe9fb1000 ---p 00007000 08:01 2885337
>  /usr/lib/libkrb5support.so.0.1
> 7fdbe9fb1000-7fdbe9fb2000 r--p 00006000 08:01 2885337
>  /usr/lib/libkrb5support.so.0.1
> 7fdbe9fb2000-7fdbe9fb3000 rw-p 00007000 08:01 2885337
>  /usr/lib/libkrb5support.so.0.1
> 7fdbe9fb3000-7fdbe9fb6000 r-xp 00000000 08:01 11796560
>  /lib/libcom_err.so.2.1
> 7fdbe9fb6000-7fdbea1b5000 ---p 00003000 08:01 11796560
>  /lib/libcom_err.so.2.1
> 7fdbea1b5000-7fdbea1b6000 r--p 00002000 08:01 11796560
>  /lib/libcom_err.so.2.1
> 7fdbea1b6000-7fdbea1b7000 rw-p 00003000 08:01 11796560
>  /lib/libcom_err.so.2.1
> 7fdbea1b7000-7fdbea1db000 r-xp 00000000 08:01 2885386
>  /usr/lib/libk5crypto.so.3.1
> 7fdbea1db000-7fdbea3db000 ---p 00024000 08:01 2885386
>  /usr/lib/libk5crypto.so.3.1
> 7fdbea3db000-7fdbea3dc000 r--p 00024000 08:01 2885386
>  /usr/lib/libk5crypto.so.3.1
> 7fdbea3dc000-7fdbea3dd000 rw-p 00025000 08:01 2885386
>  /usr/lib/libk5crypto.so.3.1
> 7fdbea3dd000-7fdbea496000 r-xp 00000000 08:01 2886419
>  /usr/lib/libkrb5.so.3.3
> 7fdbea496000-7fdbea695000 ---p 000b9000 08:01 2886419
>  /usr/lib/libkrb5.so.3.3
> 7fdbea695000-7fdbea69e000 r--p 000b8000 08:01 2886419
>  /usr/lib/libkrb5.so.3.3
> 7fdbea69e000-7fdbea69f000 rw-p 000c1000 08:01 2886419
>  /usr/lib/libkrb5.so.3.3
> 7fdbea69f000-7fdbea6b8000 r-xp 00000000 08:01 2888461
>  /usr/lib/libsasl2.so.2.0.23
> 7fdbea6b8000-7fdbea8b7000 ---p 00019000 08:01 2888461
>  /usr/lib/libsasl2.so.2.0.23
> 7fdbea8b7000-7fdbea8b8000 r--p 00018000 08:01 2888461
>  /usr/lib/libsasl2.so.2.0.23
> 7fdbea8b8000-7fdbea8b9000 rw-p 00019000 08:01 2888461
>  /usr/lib/libsasl2.so.2.0.23
> 7fdbea8b9000-7fdbea8cf000 r-xp 00000000 08:01 11797729
>  /lib/libresolv-2.12.1.so
> 7fdbea8cf000-7fdbeaace000 ---p 00016000 08:01 11797729
>  /lib/libresolv-2.12.1.so
> 7fdbeaace000-7fdbeaacf000 r--p 00015000 08:01 11797729
>  /lib/libresolv-2.12.1.so
> 7fdbeaacf000-7fdbeaad0000 rw-p 00016000 08:01 11797729
>  /lib/libresolv-2.12.1.so
> 7fdbeaad0000-7fdbeaad2000 rw-p 00000000 00:00 0
> 7fdbeaad2000-7fdbeab46000 r-xp 00000000 08:01 11796957
>  /lib/libgcrypt.so.11.5.3
> 7fdbeab46000-7fdbead46000 ---p 00074000 08:01 11796957
>  /lib/libgcrypt.so.11.5.3
> 7fdbead46000-7fdbead47000 r--p 00074000 08:01 11796957
>  /lib/libgcrypt.so.11.5.3
> 7fdbead47000-7fdbead4a000 rw-p 00075000 08:01 11796957
>  /lib/libgcrypt.so.11.5.3
> 7fdbead4a000-7fdbeade5000 r-xp 00000000 08:01 2885353
>  /usr/lib/libgnutls.so.26.14.12
> 7fdbeade5000-7fdbeafe5000 ---p 0009b000 08:01 2885353
>  /usr/lib/libgnutls.so.26.14.12
> 7fdbeafe5000-7fdbeafeb000 r--p 0009b000 08:01 2885353
>  /usr/lib/libgnutls.so.26.14.12
> 7fdbeafeb000-7fdbeafec000 rw-p 000a1000 08:01 2885353
>  /usr/lib/libgnutls.so.26.14.12
> 7fdbeafec000-7fdbeb01e000 r-xp 00000000 08:01 2885333
>  /usr/lib/libgssapi_krb5.so.2.2Aborted
> 
> 
> On Tue, Mar 22, 2011 at 12:44 PM, Matthew D Truch <ma...@tr...> wrote:
>>> I think the long flush times are related to using LVM on a data drive
>>> (and possibly raid5). I've switched to a standalone drive and flush
>>> times are now maybe a couple of seconds, The issue now is that I still
>>> get the crash when closing down the dirfile with fragments as opposed
>>> to having them all in a single directory. I'll keep investigation but
>>> any help is appreciated.
>> 
>> This doesn't surprize me.  I have an array of disks here coupled with
>> RAID and LVM, and flushes of large files are generally slow (the RAID
>> system seems to queue up as much of a write as possible to avoid
>> repeatedly re-calculating parity).
>> 
>> However, the crash is what worries me.  Why not post the backtrace?
>> 
>> --
>> "Party on Wayne; Party on Garth. -- Wayne's World"
>> --------------------------
>> Matthew Truch
>> Department of Physics and Astronomy
>> University of Pennsylvania
>> ma...@tr...
>> http://matt.truch.net/
>> 
> 
> 
> 
> -- 
> Ross Williamson
> University of Chicago
> Department of Astronomy & Astrophysics
> 773-834-9785 (office)
> 312-504-3051 (Cell)
> 
> ------------------------------------------------------------------------------
> Enable your software for Intel(R) Active Management Technology to meet the
> growing manageability and security demands of your customers. Businesses
> are taking advantage of Intel(R) vPro (TM) technology - will your software 
> be a part of the solution? Download the Intel(R) Manageability Checker 
> today! http://p.sf.net/sfu/intel-dev2devmar
> _______________________________________________
> getdata-devel mailing list
> get...@li...
> https://lists.sourceforge.net/lists/listinfo/getdata-devel

Re: [getdata-devel] Flush time + usage philosophy

From: Ross W. <ros...@gm...> - 2011-03-22 17:56:39

Yeah - Here is the backtrace and dump. So it sounds like it might
actually be the raid5 and not LVM.  hmm that' annoying.

START METAFLUSH
END METAFLUSH
START CLOSE
*** glibc detected *** ./bin/sptControl: double free or corruption
(fasttop): 0x0000000002320c70 ***
======= Backtrace: =========
/lib/libc.so.6(+0x774b6)[0x7fdbebd3f4b6]
/lib/libc.so.6(cfree+0x73)[0x7fdbebd45c83]
/usr/lib/libgetdata.so.4(+0x8617)[0x7fdbecef9617]
/home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_ZN3gcp7control21ArchiverWriterDirfile12closeArcfileEv+0x1ca)[0x7fdbf04e651a]
/home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_ZN3gcp7control21ArchiverWriterDirfile11openArcfileEPc+0x45)[0x7fdbf04e7905]
/home/rw247/gcpSptpolDevel/gcp/lib/libGcpControlCommon.so(_Z15archiver_threadPv+0x6b3)[0x7fdbf04e44b3]
/lib/libpthread.so.0(+0x7971)[0x7fdbede71971]
/lib/libc.so.6(clone+0x6d)[0x7fdbebdae92d]
======= Memory map: ========
00400000-00415000 r-xp 00000000 fc:00 60424210
  /home/rw247/gcpSptpolDevel/gcp/bin/sptControl
00615000-00616000 r--p 00015000 fc:00 60424210
  /home/rw247/gcpSptpolDevel/gcp/bin/sptControl
00616000-00617000 rw-p 00016000 fc:00 60424210
  /home/rw247/gcpSptpolDevel/gcp/bin/sptControl
01bd2000-031d2000 rw-p 00000000 00:00 0                                  [heap]
7fdb5a8c8000-7fdbb0000000 rw-p 00000000 00:00 0
7fdbb0000000-7fdbb0080000 rw-p 00000000 00:00 0
7fdbb0080000-7fdbb4000000 ---p 00000000 00:00 0
7fdbb4319000-7fdbb431a000 rw-p 00000000 00:00 0
7fdbb431a000-7fdbb431b000 ---p 00000000 00:00 0
7fdbb431b000-7fdbb4b1b000 rw-p 00000000 00:00 0
7fdbb4b1b000-7fdbb4b1c000 ---p 00000000 00:00 0
7fdbb4b1c000-7fdbb531c000 rw-p 00000000 00:00 0
7fdbb531c000-7fdbb531d000 ---p 00000000 00:00 0
7fdbb531d000-7fdbb5b1d000 rw-p 00000000 00:00 0
7fdbb5b1d000-7fdbb5b1e000 ---p 00000000 00:00 0
7fdbb5b1e000-7fdbb631e000 rw-p 00000000 00:00 0
7fdbb631e000-7fdbb631f000 ---p 00000000 00:00 0
7fdbb631f000-7fdbb6b1f000 rw-p 00000000 00:00 0
7fdbb6b1f000-7fdbb6b20000 ---p 00000000 00:00 0
7fdbb6b20000-7fdbbda4a000 rw-p 00000000 00:00 0
7fdbbda4a000-7fdbbda4b000 ---p 00000000 00:00 0
7fdbbda4b000-7fdbbe700000 rw-p 00000000 00:00 0
7fdbbe779000-7fdbe9793000 rw-p 00000000 00:00 0
7fdbe9793000-7fdbe9796000 r-xp 00000000 08:01 11796565
  /lib/libgpg-error.so.0.4.0
7fdbe9796000-7fdbe9995000 ---p 00003000 08:01 11796565
  /lib/libgpg-error.so.0.4.0
7fdbe9995000-7fdbe9996000 r--p 00002000 08:01 11796565
  /lib/libgpg-error.so.0.4.0
7fdbe9996000-7fdbe9997000 rw-p 00003000 08:01 11796565
  /lib/libgpg-error.so.0.4.0
7fdbe9997000-7fdbe99a7000 r-xp 00000000 08:01 2885347
  /usr/lib/libtasn1.so.3.1.9
7fdbe99a7000-7fdbe9ba6000 ---p 00010000 08:01 2885347
  /usr/lib/libtasn1.so.3.1.9
7fdbe9ba6000-7fdbe9ba7000 r--p 0000f000 08:01 2885347
  /usr/lib/libtasn1.so.3.1.9
7fdbe9ba7000-7fdbe9ba8000 rw-p 00010000 08:01 2885347
  /usr/lib/libtasn1.so.3.1.9
7fdbe9ba8000-7fdbe9baa000 r-xp 00000000 08:01 11796895
  /lib/libkeyutils.so.1.3
7fdbe9baa000-7fdbe9da9000 ---p 00002000 08:01 11796895
  /lib/libkeyutils.so.1.3
7fdbe9da9000-7fdbe9daa000 r--p 00001000 08:01 11796895
  /lib/libkeyutils.so.1.3
7fdbe9daa000-7fdbe9dab000 rw-p 00002000 08:01 11796895
  /lib/libkeyutils.so.1.3
7fdbe9dab000-7fdbe9db2000 r-xp 00000000 08:01 2885337
  /usr/lib/libkrb5support.so.0.1
7fdbe9db2000-7fdbe9fb1000 ---p 00007000 08:01 2885337
  /usr/lib/libkrb5support.so.0.1
7fdbe9fb1000-7fdbe9fb2000 r--p 00006000 08:01 2885337
  /usr/lib/libkrb5support.so.0.1
7fdbe9fb2000-7fdbe9fb3000 rw-p 00007000 08:01 2885337
  /usr/lib/libkrb5support.so.0.1
7fdbe9fb3000-7fdbe9fb6000 r-xp 00000000 08:01 11796560
  /lib/libcom_err.so.2.1
7fdbe9fb6000-7fdbea1b5000 ---p 00003000 08:01 11796560
  /lib/libcom_err.so.2.1
7fdbea1b5000-7fdbea1b6000 r--p 00002000 08:01 11796560
  /lib/libcom_err.so.2.1
7fdbea1b6000-7fdbea1b7000 rw-p 00003000 08:01 11796560
  /lib/libcom_err.so.2.1
7fdbea1b7000-7fdbea1db000 r-xp 00000000 08:01 2885386
  /usr/lib/libk5crypto.so.3.1
7fdbea1db000-7fdbea3db000 ---p 00024000 08:01 2885386
  /usr/lib/libk5crypto.so.3.1
7fdbea3db000-7fdbea3dc000 r--p 00024000 08:01 2885386
  /usr/lib/libk5crypto.so.3.1
7fdbea3dc000-7fdbea3dd000 rw-p 00025000 08:01 2885386
  /usr/lib/libk5crypto.so.3.1
7fdbea3dd000-7fdbea496000 r-xp 00000000 08:01 2886419
  /usr/lib/libkrb5.so.3.3
7fdbea496000-7fdbea695000 ---p 000b9000 08:01 2886419
  /usr/lib/libkrb5.so.3.3
7fdbea695000-7fdbea69e000 r--p 000b8000 08:01 2886419
  /usr/lib/libkrb5.so.3.3
7fdbea69e000-7fdbea69f000 rw-p 000c1000 08:01 2886419
  /usr/lib/libkrb5.so.3.3
7fdbea69f000-7fdbea6b8000 r-xp 00000000 08:01 2888461
  /usr/lib/libsasl2.so.2.0.23
7fdbea6b8000-7fdbea8b7000 ---p 00019000 08:01 2888461
  /usr/lib/libsasl2.so.2.0.23
7fdbea8b7000-7fdbea8b8000 r--p 00018000 08:01 2888461
  /usr/lib/libsasl2.so.2.0.23
7fdbea8b8000-7fdbea8b9000 rw-p 00019000 08:01 2888461
  /usr/lib/libsasl2.so.2.0.23
7fdbea8b9000-7fdbea8cf000 r-xp 00000000 08:01 11797729
  /lib/libresolv-2.12.1.so
7fdbea8cf000-7fdbeaace000 ---p 00016000 08:01 11797729
  /lib/libresolv-2.12.1.so
7fdbeaace000-7fdbeaacf000 r--p 00015000 08:01 11797729
  /lib/libresolv-2.12.1.so
7fdbeaacf000-7fdbeaad0000 rw-p 00016000 08:01 11797729
  /lib/libresolv-2.12.1.so
7fdbeaad0000-7fdbeaad2000 rw-p 00000000 00:00 0
7fdbeaad2000-7fdbeab46000 r-xp 00000000 08:01 11796957
  /lib/libgcrypt.so.11.5.3
7fdbeab46000-7fdbead46000 ---p 00074000 08:01 11796957
  /lib/libgcrypt.so.11.5.3
7fdbead46000-7fdbead47000 r--p 00074000 08:01 11796957
  /lib/libgcrypt.so.11.5.3
7fdbead47000-7fdbead4a000 rw-p 00075000 08:01 11796957
  /lib/libgcrypt.so.11.5.3
7fdbead4a000-7fdbeade5000 r-xp 00000000 08:01 2885353
  /usr/lib/libgnutls.so.26.14.12
7fdbeade5000-7fdbeafe5000 ---p 0009b000 08:01 2885353
  /usr/lib/libgnutls.so.26.14.12
7fdbeafe5000-7fdbeafeb000 r--p 0009b000 08:01 2885353
  /usr/lib/libgnutls.so.26.14.12
7fdbeafeb000-7fdbeafec000 rw-p 000a1000 08:01 2885353
  /usr/lib/libgnutls.so.26.14.12
7fdbeafec000-7fdbeb01e000 r-xp 00000000 08:01 2885333
  /usr/lib/libgssapi_krb5.so.2.2Aborted


On Tue, Mar 22, 2011 at 12:44 PM, Matthew D Truch <ma...@tr...> wrote:
>> I think the long flush times are related to using LVM on a data drive
>> (and possibly raid5). I've switched to a standalone drive and flush
>> times are now maybe a couple of seconds, The issue now is that I still
>> get the crash when closing down the dirfile with fragments as opposed
>> to having them all in a single directory. I'll keep investigation but
>> any help is appreciated.
>
> This doesn't surprize me.  I have an array of disks here coupled with
> RAID and LVM, and flushes of large files are generally slow (the RAID
> system seems to queue up as much of a write as possible to avoid
> repeatedly re-calculating parity).
>
> However, the crash is what worries me.  Why not post the backtrace?
>
> --
> "Party on Wayne; Party on Garth. -- Wayne's World"
> --------------------------
> Matthew Truch
> Department of Physics and Astronomy
> University of Pennsylvania
> ma...@tr...
> http://matt.truch.net/
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] Flush time + usage philosophy

From: Matthew D T. <ma...@tr...> - 2011-03-22 17:44:29

> I think the long flush times are related to using LVM on a data drive
> (and possibly raid5). I've switched to a standalone drive and flush
> times are now maybe a couple of seconds, The issue now is that I still
> get the crash when closing down the dirfile with fragments as opposed
> to having them all in a single directory. I'll keep investigation but
> any help is appreciated.

This doesn't surprize me.  I have an array of disks here coupled with
RAID and LVM, and flushes of large files are generally slow (the RAID
system seems to queue up as much of a write as possible to avoid
repeatedly re-calculating parity).  

However, the crash is what worries me.  Why not post the backtrace?

-- 
"Party on Wayne; Party on Garth. -- Wayne's World"
--------------------------
Matthew Truch
Department of Physics and Astronomy
University of Pennsylvania
ma...@tr...
http://matt.truch.net/

Re: [getdata-devel] Flush time + usage philosophy

From: Ross W. <ros...@gm...> - 2011-03-22 17:10:59

Ok so quick update,

I think the long flush times are related to using LVM on a data drive
(and possibly raid5). I've switched to a standalone drive and flush
times are now maybe a couple of seconds, The issue now is that I still
get the crash when closing down the dirfile with fragments as opposed
to having them all in a single directory. I'll keep investigation but
any help is appreciated.

Cheers

Ross

On Tue, Mar 22, 2011 at 11:59 AM, Ross Williamson
<ros...@gm...> wrote:
> Hi Don,
>
> So the gd_metaflush call is relatively quick. If I change that to a
> gd_flush(D_,NULL) then it hangs there. It actually takes 45 seconds to
> flush the data but it only crashes when it starts to call the
> gd_close(). The screen dumps the following error:
>
> *** glibc detected *** ./bin/sptControl: double free or corruption
> (fasttop): 0x....... ***
>
> Plus a much longer backtrace if it helps but I'm assuming that's from my code.
>
> Interestingly, if I don't have fragments it takes a longer (55
> seconds) but does not cause the crash - The close is very quick once
> flushed.
>
> Any thoughts?
>
> Cheers
>
> Ross
>
> On Mon, Mar 21, 2011 at 11:49 PM, D. V. Wiebe <ge...@ke...> wrote:
>> On Mon, Mar 21, 2011 at 10:16:32PM -0500, Ross Williamson wrote:
>>> Hi Everyone,
>>>
>>> Sorry a few more questions:
>>>
>>> 1) I think the general philosophy of how I'm implementing dirfiles is
>>> wrong. I currently open all my fields (about 2000 of them), write to
>>> them every few seconds with gd_putdata and about every 30 minutes
>>> attempt to close down the current dirfile (see 2) and then open a new
>>> one. The reason for closing down the dirfile and opening another is
>>> purely a hangover from a previous system but I think it's something
>>> that would be useful.  Should I be thinking more along the lines of
>>> opening a single dirfile and just keep on writing to it indefinitely.
>>> i.e. do not try and split my data into separate dirfiles - just keep
>>> appending the one that is there.
>>
>> There's nothing inherently bad about chunking up your dirfile into
>> different time slices, but so long as you don't run into filesystem
>> limits on the raw data files, you shouldn't have problems with just
>> making a bigger dirfile.  (If I got the largefile support right in
>> the library, on an ext3 partition the limit should be 2**32 blocks
>> (ie. 2 Terabytes for a 4kB blocksize) per raw data file.) It's really
>> up to you, and how you want to structure your analysis software.
>>
>> All the data acquired during the BLAST06 flight ended up in a single
>> dirfile, which was, ~12 days long @ 100 Hz, resulting in about 100
>> megasamples per bolometer channel.  We found it convenient.
>>
>>> 2) When I do try and close my dirfile using gd_close() the flush takes
>>> a long time (especially when using fragments) - about 10ish seconds.
>>> This is actually too long for my code and it crashes (something I can
>>> possibly fix in my code). Is this normal or am I screwing up
>>> somewhere?
>>
>> That sounds surprisingly long, although it may just have to do with
>> flushing 2000 large files to disk.  I'd be curious to know if the delay
>> occurs when writing the metadata to disk (almost certainly inefficient
>> code in the library) or the data itself (which might just be a
>> filesystem imposed data transfer limit).
>>
>> Can you time a call to:
>>
>>  gd_metaflush(dirfile);
>>
>> immediately before:
>>
>>  gd_close(dirfile);
>>
>> (time that too) and let me know the results?  The sum of these two calls
>> should equal the time for gd_close() alone (since gd_close() calls
>> gd_metaflush() internally, and this second call to gd_metaflush() will do
>> nothing if you call it explicitly first).
>>
>>> 3) How did you BLAST guys deal with your bololometer data? I currently
>>> just have about 1500 separate fields each of which has 200 samples per
>>> frame. Sound right?
>>
>> Basically.  BLAST had ~288 bolometers, sampled at 100Hz and assembled
>> into 5Hz frames (the rate of the housekeeping data) with 20 samples per
>> frame.  Each bolometer was written to a separate field as 24-bit
>> integers, extended to 32-bits.
>>
>> Cheers,
>> -dvw
>> --
>> D. V. Wiebe
>> ge...@ke...
>> http://getdata.sourceforge.net/
>>
>
>
>
> --
> Ross Williamson
> University of Chicago
> Department of Astronomy & Astrophysics
> 773-834-9785 (office)
> 312-504-3051 (Cell)
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] Flush time + usage philosophy

From: Ross W. <ros...@gm...> - 2011-03-22 17:00:31

Hi Don,

So the gd_metaflush call is relatively quick. If I change that to a
gd_flush(D_,NULL) then it hangs there. It actually takes 45 seconds to
flush the data but it only crashes when it starts to call the
gd_close(). The screen dumps the following error:

*** glibc detected *** ./bin/sptControl: double free or corruption
(fasttop): 0x....... ***

Plus a much longer backtrace if it helps but I'm assuming that's from my code.

Interestingly, if I don't have fragments it takes a longer (55
seconds) but does not cause the crash - The close is very quick once
flushed.

Any thoughts?

Cheers

Ross

On Mon, Mar 21, 2011 at 11:49 PM, D. V. Wiebe <ge...@ke...> wrote:
> On Mon, Mar 21, 2011 at 10:16:32PM -0500, Ross Williamson wrote:
>> Hi Everyone,
>>
>> Sorry a few more questions:
>>
>> 1) I think the general philosophy of how I'm implementing dirfiles is
>> wrong. I currently open all my fields (about 2000 of them), write to
>> them every few seconds with gd_putdata and about every 30 minutes
>> attempt to close down the current dirfile (see 2) and then open a new
>> one. The reason for closing down the dirfile and opening another is
>> purely a hangover from a previous system but I think it's something
>> that would be useful.  Should I be thinking more along the lines of
>> opening a single dirfile and just keep on writing to it indefinitely.
>> i.e. do not try and split my data into separate dirfiles - just keep
>> appending the one that is there.
>
> There's nothing inherently bad about chunking up your dirfile into
> different time slices, but so long as you don't run into filesystem
> limits on the raw data files, you shouldn't have problems with just
> making a bigger dirfile.  (If I got the largefile support right in
> the library, on an ext3 partition the limit should be 2**32 blocks
> (ie. 2 Terabytes for a 4kB blocksize) per raw data file.) It's really
> up to you, and how you want to structure your analysis software.
>
> All the data acquired during the BLAST06 flight ended up in a single
> dirfile, which was, ~12 days long @ 100 Hz, resulting in about 100
> megasamples per bolometer channel.  We found it convenient.
>
>> 2) When I do try and close my dirfile using gd_close() the flush takes
>> a long time (especially when using fragments) - about 10ish seconds.
>> This is actually too long for my code and it crashes (something I can
>> possibly fix in my code). Is this normal or am I screwing up
>> somewhere?
>
> That sounds surprisingly long, although it may just have to do with
> flushing 2000 large files to disk.  I'd be curious to know if the delay
> occurs when writing the metadata to disk (almost certainly inefficient
> code in the library) or the data itself (which might just be a
> filesystem imposed data transfer limit).
>
> Can you time a call to:
>
>  gd_metaflush(dirfile);
>
> immediately before:
>
>  gd_close(dirfile);
>
> (time that too) and let me know the results?  The sum of these two calls
> should equal the time for gd_close() alone (since gd_close() calls
> gd_metaflush() internally, and this second call to gd_metaflush() will do
> nothing if you call it explicitly first).
>
>> 3) How did you BLAST guys deal with your bololometer data? I currently
>> just have about 1500 separate fields each of which has 200 samples per
>> frame. Sound right?
>
> Basically.  BLAST had ~288 bolometers, sampled at 100Hz and assembled
> into 5Hz frames (the rate of the housekeeping data) with 20 samples per
> frame.  Each bolometer was written to a separate field as 24-bit
> integers, extended to 32-bits.
>
> Cheers,
> -dvw
> --
> D. V. Wiebe
> ge...@ke...
> http://getdata.sourceforge.net/
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] Flush time + usage philosophy

From: D. V. W. <ge...@ke...> - 2011-03-22 04:49:33

On Mon, Mar 21, 2011 at 10:16:32PM -0500, Ross Williamson wrote:
> Hi Everyone,
> 
> Sorry a few more questions:
> 
> 1) I think the general philosophy of how I'm implementing dirfiles is
> wrong. I currently open all my fields (about 2000 of them), write to
> them every few seconds with gd_putdata and about every 30 minutes
> attempt to close down the current dirfile (see 2) and then open a new
> one. The reason for closing down the dirfile and opening another is
> purely a hangover from a previous system but I think it's something
> that would be useful.  Should I be thinking more along the lines of
> opening a single dirfile and just keep on writing to it indefinitely.
> i.e. do not try and split my data into separate dirfiles - just keep
> appending the one that is there.

There's nothing inherently bad about chunking up your dirfile into
different time slices, but so long as you don't run into filesystem
limits on the raw data files, you shouldn't have problems with just
making a bigger dirfile.  (If I got the largefile support right in
the library, on an ext3 partition the limit should be 2**32 blocks
(ie. 2 Terabytes for a 4kB blocksize) per raw data file.) It's really
up to you, and how you want to structure your analysis software.

All the data acquired during the BLAST06 flight ended up in a single
dirfile, which was, ~12 days long @ 100 Hz, resulting in about 100
megasamples per bolometer channel.  We found it convenient.

> 2) When I do try and close my dirfile using gd_close() the flush takes
> a long time (especially when using fragments) - about 10ish seconds.
> This is actually too long for my code and it crashes (something I can
> possibly fix in my code). Is this normal or am I screwing up
> somewhere?

That sounds surprisingly long, although it may just have to do with
flushing 2000 large files to disk.  I'd be curious to know if the delay
occurs when writing the metadata to disk (almost certainly inefficient
code in the library) or the data itself (which might just be a
filesystem imposed data transfer limit).

Can you time a call to:

  gd_metaflush(dirfile);

immediately before:

  gd_close(dirfile);

(time that too) and let me know the results?  The sum of these two calls
should equal the time for gd_close() alone (since gd_close() calls
gd_metaflush() internally, and this second call to gd_metaflush() will do
nothing if you call it explicitly first).

> 3) How did you BLAST guys deal with your bololometer data? I currently
> just have about 1500 separate fields each of which has 200 samples per
> frame. Sound right?

Basically.  BLAST had ~288 bolometers, sampled at 100Hz and assembled
into 5Hz frames (the rate of the housekeeping data) with 20 samples per
frame.  Each bolometer was written to a separate field as 24-bit
integers, extended to 32-bits.

Cheers,
-dvw
-- 
D. V. Wiebe
ge...@ke...
http://getdata.sourceforge.net/

[getdata-devel] Flush time + usage philosophy

From: Ross W. <ros...@gm...> - 2011-03-22 03:18:06

Hi Everyone,

Sorry a few more questions:

1) I think the general philosophy of how I'm implementing dirfiles is
wrong. I currently open all my fields (about 2000 of them), write to
them every few seconds with gd_putdata and about every 30 minutes
attempt to close down the current dirfile (see 2) and then open a new
one. The reason for closing down the dirfile and opening another is
purely a hangover from a previous system but I think it's something
that would be useful.  Should I be thinking more along the lines of
opening a single dirfile and just keep on writing to it indefinitely.
i.e. do not try and split my data into separate dirfiles - just keep
appending the one that is there.

2) When I do try and close my dirfile using gd_close() the flush takes
a long time (especially when using fragments) - about 10ish seconds.
This is actually too long for my code and it crashes (something I can
possibly fix in my code). Is this normal or am I screwing up
somewhere?

3) How did you BLAST guys deal with your bololometer data? I currently
just have about 1500 separate fields each of which has 200 samples per
frame. Sound right?

Cheers for all your help.

Ross


-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] To many open files

From: Michael M. <mmi...@as...> - 2011-03-21 18:17:49

You might actually just have too many open files.  What does 
"ulimit -n" report?  On many systems the default maximum is 1024.
Not sure why it would delay complaining, though.

In the (common) case that you are on a *nix platform that uses PAM,
you can configure this limit via /etc/security/limits.conf.

...Milligan

On Mon, Mar 21, 2011 at 12:17:13PM -0500, Ross Williamson wrote:
> Hi Everyone
> 
> So I'm now stuck with an issue where I'm receiving an error regarding
> "too many open files" and I'm convinced I'm not using the API
> correctly.
> 
> I create a Dirfile and then add about 2000 different field_codes for
> various things (located in different fragments). About every 10
> seconds I dump the data into the fields using gd_putdata. It works for
> the first x (not sure exactly how many) and then get_data returns the
> "too many open files error".  What is the correct way to put data into
> so many fields?
> 
> Thanks
> 
> Ross
> 
> -- 
> Ross Williamson
> University of Chicago
> Department of Astronomy & Astrophysics
> 773-834-9785 (office)
> 312-504-3051 (Cell)
> 
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> getdata-devel mailing list
> get...@li...
> https://lists.sourceforge.net/lists/listinfo/getdata-devel

-- 
Key fingerprint = 9F6B E8F5 206F 35E9 FABB  9EAD 398D CD42 D1CE 8C87

Re: [getdata-devel] To many open files

From: Ross W. <ros...@gm...> - 2011-03-21 18:11:57

Sweet - Thanks

Changed entry in /etc/security/limits.conf

required a log out/log in to take effect

Ross

On Mon, Mar 21, 2011 at 12:31 PM, Michael Milligan
<mmi...@as...> wrote:
> You might actually just have too many open files.  What does
> "ulimit -n" report?  On many systems the default maximum is 1024.
> Not sure why it would delay complaining, though.
>
> In the (common) case that you are on a *nix platform that uses PAM,
> you can configure this limit via /etc/security/limits.conf.
>
> ...Milligan
>
> On Mon, Mar 21, 2011 at 12:17:13PM -0500, Ross Williamson wrote:
>> Hi Everyone
>>
>> So I'm now stuck with an issue where I'm receiving an error regarding
>> "too many open files" and I'm convinced I'm not using the API
>> correctly.
>>
>> I create a Dirfile and then add about 2000 different field_codes for
>> various things (located in different fragments). About every 10
>> seconds I dump the data into the fields using gd_putdata. It works for
>> the first x (not sure exactly how many) and then get_data returns the
>> "too many open files error".  What is the correct way to put data into
>> so many fields?
>>
>> Thanks
>>
>> Ross
>>
>> --
>> Ross Williamson
>> University of Chicago
>> Department of Astronomy & Astrophysics
>> 773-834-9785 (office)
>> 312-504-3051 (Cell)
>>
>> ------------------------------------------------------------------------------
>> Colocation vs. Managed Hosting
>> A question and answer guide to determining the best fit
>> for your organization - today and in the future.
>> http://p.sf.net/sfu/internap-sfd2d
>> _______________________________________________
>> getdata-devel mailing list
>> get...@li...
>> https://lists.sourceforge.net/lists/listinfo/getdata-devel
>
> --
> Key fingerprint = 9F6B E8F5 206F 35E9 FABB  9EAD 398D CD42 D1CE 8C87
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAk2Hi4gACgkQOY3NQtHOjIdYowCfVzJsX3Oe2Di+Hk4XOr4OmC14
> /EwAnjiwl5iwhO62GcFks5xm71s6qRww
> =ApZe
> -----END PGP SIGNATURE-----
>
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] To many open files

From: Matthew D T. <ma...@tr...> - 2011-03-21 17:29:24

> So I'm now stuck with an issue where I'm receiving an error regarding
> "too many open files" and I'm convinced I'm not using the API
> correctly.
> 
> I create a Dirfile and then add about 2000 different field_codes for
> various things (located in different fragments). About every 10
> seconds I dump the data into the fields using gd_putdata. It works for
> the first x (not sure exactly how many) and then get_data returns the
> "too many open files error".  What is the correct way to put data into
> so many fields?

We run into this on BLAST as well, although we don't have quite that
many open files.  You've hit the Linux open file descriptor limit.  My
guess is that the output of `ulimit -n` lists about the number that get
written.  The way to increase this limit varies slightly between
distribution, but you should be able to google how to do it relatively
easily.  If you use Fedora (and derivatives) or a quick google makes me
think that if you use Ubuntu (and derivatives) you can change the
maximum via the file /etc/security/limits.conf (which may require a
reboot).  

-- 
"One in every seven days is a Thursday."
--------------------------
Matthew Truch
Department of Physics and Astronomy
University of Pennsylvania
ma...@tr...
http://matt.truch.net/

[getdata-devel] To many open files

From: Ross W. <ros...@gm...> - 2011-03-21 17:17:40

Hi Everyone

So I'm now stuck with an issue where I'm receiving an error regarding
"too many open files" and I'm convinced I'm not using the API
correctly.

I create a Dirfile and then add about 2000 different field_codes for
various things (located in different fragments). About every 10
seconds I dump the data into the fields using gd_putdata. It works for
the first x (not sure exactly how many) and then get_data returns the
"too many open files error".  What is the correct way to put data into
so many fields?

Thanks

Ross

-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] getdata implementation questions

From: Matthew T. <ma...@tr...> - 2011-03-18 21:46:25

"D. V. Wiebe" <ge...@ke...> wrote:

>On Fri, Mar 18, 2011 at 03:42:15PM -0500, Ross Williamson wrote:
>> Yes I can - I was just using that as an example name but I do have
>> other fields (status for example) that share the same names.
>> 
>> I can fudge it though,
>
>Okay, I got it.  It's not a crazy idea.
>
>Way back when we first introduced includes we considered this problem.
>Someone (Matt?) suggested allowing a optional prefix (or suffix?) to be
>attached to fields defined in a fragment.  So you could include, say,
>"foo/format" and tell GetData to prepend "foo_" to all the field names.
>And then you could add "bar/format" and tell GetData to prepend "bar_",
>&c. allowing you to have the same field names in foo/ and bar/.
>
>It never got implemented for a variety of reasons.  Not the smallest
>being that this was before GetData had blebbed off from kst and
>changing
>the metadata parser always ran the risk of losing the ability to read
>old dirfiles.  (The modern parser is significantly more robust.)  But
>also, we never ran into a situation where it was the easiest solution
>around the problem.
>
>However, if you think it would be useful, I could look into
>implementing
>it.  Since we were thinking about it way back then, I think some of the
>framework is there, if it hasn't atrophied.
>
>Let me know,
>-dvw
>-- 
>D. V. Wiebe
>ge...@ke...
>http://getdata.sourceforge.net/
>------------------------------------------------------------------------------
>Colocation vs. Managed Hosting
>A question and answer guide to determining the best fit
>for your organization - today and in the future.
>http://p.sf.net/sfu/internap-sfd2d_______________________________________________
>getdata-devel mailing list
>get...@li...
>https://lists.sourceforge.net/lists/listinfo/getdata-devel

Actually, with current BLAST analysis, this could be useful. Although it might be too late for us this time. But I'll vote for it.
-- 
Mathew Truch
ma...@tr...

Re: [getdata-devel] getdata implementation questions

From: D. V. W. <ge...@ke...> - 2011-03-18 21:34:58

On Fri, Mar 18, 2011 at 03:42:15PM -0500, Ross Williamson wrote:
> Yes I can - I was just using that as an example name but I do have
> other fields (status for example) that share the same names.
> 
> I can fudge it though,

Okay, I got it.  It's not a crazy idea.

Way back when we first introduced includes we considered this problem.
Someone (Matt?) suggested allowing a optional prefix (or suffix?) to be
attached to fields defined in a fragment.  So you could include, say,
"foo/format" and tell GetData to prepend "foo_" to all the field names.
And then you could add "bar/format" and tell GetData to prepend "bar_",
&c. allowing you to have the same field names in foo/ and bar/.

It never got implemented for a variety of reasons.  Not the smallest
being that this was before GetData had blebbed off from kst and changing
the metadata parser always ran the risk of losing the ability to read
old dirfiles.  (The modern parser is significantly more robust.)  But
also, we never ran into a situation where it was the easiest solution
around the problem.

However, if you think it would be useful, I could look into implementing
it.  Since we were thinking about it way back then, I think some of the
framework is there, if it hasn't atrophied.

Let me know,
-dvw
-- 
D. V. Wiebe
ge...@ke...
http://getdata.sourceforge.net/

Re: [getdata-devel] getdata implementation questions

From: Ross W. <ros...@gm...> - 2011-03-18 20:42:42

Yes I can - I was just using that as an example name but I do have
other fields (status for example) that share the same names.

I can fudge it though,

Ross

On Fri, Mar 18, 2011 at 3:26 PM, D. V. Wiebe <ge...@ke...> wrote:
> On Fri, Mar 18, 2011 at 03:16:49PM -0500, Ross Williamson wrote:
>> Ah ok great - that makes sense.
>>
>> So I've run into a little problem though with fragments.  It looks
>> like you can't have the same name of a field (i.e. utc) even if they
>> are part of a different fragment.  I have lot's of directories that
>> have utc as their timestamp and it won't let me create those using
>> gd_add_raw where the dirfile is the top level and I'm referencing the
>> index to the fragment.
>>
>> Am I doing something wrong? I'd rather not have individual top level
>> dirfile instances for each subdirectory
>>
>> Ross
>
> As it is, all dirfile fields share the same namespace, regardless of
> where they're defined.
>
> I don't understand what you're trying to do.  Don't all those utc fields
> have the same data in them?  So can't you get by with just one?
>
> -dvw
> --
> D. V. Wiebe
> ge...@ke...
> http://getdata.sourceforge.net/
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] getdata implementation questions

From: D. V. W. <ge...@ke...> - 2011-03-18 20:27:06

On Fri, Mar 18, 2011 at 03:16:49PM -0500, Ross Williamson wrote:
> Ah ok great - that makes sense.
> 
> So I've run into a little problem though with fragments.  It looks
> like you can't have the same name of a field (i.e. utc) even if they
> are part of a different fragment.  I have lot's of directories that
> have utc as their timestamp and it won't let me create those using
> gd_add_raw where the dirfile is the top level and I'm referencing the
> index to the fragment.
> 
> Am I doing something wrong? I'd rather not have individual top level
> dirfile instances for each subdirectory
> 
> Ross

As it is, all dirfile fields share the same namespace, regardless of
where they're defined.

I don't understand what you're trying to do.  Don't all those utc fields
have the same data in them?  So can't you get by with just one?

-dvw
-- 
D. V. Wiebe
ge...@ke...
http://getdata.sourceforge.net/

Re: [getdata-devel] getdata implementation questions

From: Ross W. <ros...@gm...> - 2011-03-18 20:17:16

Ah ok great - that makes sense.

So I've run into a little problem though with fragments.  It looks
like you can't have the same name of a field (i.e. utc) even if they
are part of a different fragment.  I have lot's of directories that
have utc as their timestamp and it won't let me create those using
gd_add_raw where the dirfile is the top level and I'm referencing the
index to the fragment.

Am I doing something wrong? I'd rather not have individual top level
dirfile instances for each subdirectory

Ross

On Mon, Mar 14, 2011 at 7:36 PM, D. V. Wiebe <ge...@ke...> wrote:
> On Mon, Mar 14, 2011 at 06:15:50PM -0500, Ross Williamson wrote:
>> Awesome thanks for that.
>>
>> I have one more question.  I'm missing the reason why fragments are
>> useful - Is it just to split up a potentially large format file?
>>
>> Cheers
>>
>> Ross
>
> In practice, large format files aren't an issue.  I've never encountered
> one too big.  (It'd have to be >2Gb, which would be a lot of "format").
>
> Basically, fragments make the database modular.  We invented fragments
> when we started to analyse BLAST data.  In this situation, different
> people are working on different parts of the data reduction.  Someone
> (say, me) who wanted to participate in the analysis would need to collect
> deconvolved detectors from Person D, calibration timestreams from Person
> M, pointing solution timestreams from Person G, &c.  Fragments meant I
> could organise an analysis dirfile like this:
>
> - blast_data
>  +- format
>  +- deconvoled_bolos
>  |  +- decon_ch1_rev7
>  |  +- decon_ch2_rev7
>  |  +- decon_ch3_rev7
>  |  `- format
>  +- pointing
>  |  +- ra_rev8
>  |  +- dec_rev8
>  |  +- roll_rev8
>  |  `- format
>  `- calibration
>     +- calib_ch1_rev3
>     +- calib_ch2_rev3
>     +- calib_ch3_rev3
>     `- format
>
> and have the top level format file be just three INCLUDE directives to
> the subdirfiles.  The benefit of doing this is that later, when Person
> M makes a new calibration ("rev4"), he tars up a new "calibration"
> directory, including a format file with all the necessary metadata,
> and all I have to to is delete my curent "calibration" directory
> and untar the new one from Person M in its place, and I'm ready
> to go.
>
> Before inventing this, a new calibration or whatever entailed adding
> all the new data files to the dirfile directory and then editing the
> format file (by hand! -- this was before the GetData library could
> deal with modifying metadata) to replace all the definitions of
> "calib_*_rev3" with "calib_*_rev4".  It quickly got tiring.
>
> That is, really inventing fragments were a way of pulling in
> subdirectories into a parent dirfile directory.  Being able to
> include another format file fragment in the *same* directory was
> just syntactic sugar.
>
> Make sense?
> -dvw
> --
> D. V. Wiebe
> ge...@ke...
> http://getdata.sourceforge.net/
>



-- 
Ross Williamson
University of Chicago
Department of Astronomy & Astrophysics
773-834-9785 (office)
312-504-3051 (Cell)

Re: [getdata-devel] [Getdata-commits] SF.net SVN: getdata:[521] trunk/getdata/ChangeLog

From: Matthew D T. <ma...@tr...> - 2011-03-18 03:13:59

> -----------
> fix date of changlog entry
> 
> Modified Paths:
> --------------
>     trunk/getdata/ChangeLog
> 
> Modified: trunk/getdata/ChangeLog
> ===================================================================
> --- trunk/getdata/ChangeLog	2011-03-17 22:49:29 UTC (rev 520)
> +++ trunk/getdata/ChangeLog	2011-03-17 22:49:43 UTC (rev 521)
> @@ -1,4 +1,4 @@
> -2010-12-13   Peter Kümmel <syn...@gm...>
> +2010-03-17  Peter Kümmel <syn...@gm...>
>  	* use _stat64 and struct _stat64 with msvc
>  	* fix tests by removing the content of dirfile
>  	* guard definitions of macros in C++ binding

Since you are correcting the date, you might make note that it's 2011.  

;-)

-- 
"If you have only seen it once, then you haven't seen it twice."
--------------------------
Matthew Truch
Department of Physics and Astronomy
University of Pennsylvania
ma...@tr...
http://matt.truch.net/

Re: [getdata-devel] --prefix under OSX

From: Peter K. <syn...@gm...> - 2011-03-15 23:28:29

On 26.02.2011 07:00, D. V. Wiebe wrote:
> On Sun, Feb 13, 2011 at 02:24:40PM +0100, Peter K?mmel wrote:
>> When I use --prefix /opt/local under OSX
>> libgetdata++.dylib links against a libgetata
>> in /usr/local which does not exists:
>>
>> otool -L /opt/local/libgetata++.dylib
>>
>> I don't know where the user local comes from but in
>> binding/cxx/.lib/libgetdata++.lai I found two
>> /usr/local entries which lock responsible for the
>> wrong path.
>>
>> Peter
>
> I suspect you changed --prefix, but didn't do a "make clean" before
> running "make" again.  It's a very lame feature of libtool: it
> hardcodes --prefix into .la files, but doesn't change them when you
> modifiy the prefix unless you blow them away to force make to re-create
> them.
>
> I don't understand why that is, possibly my inability to use libtool...
>
> If it persists after a "make clean", that's very weird.  Let me know.
>

Yes, a clean rebuild solved it.

Pter

Re: [getdata-devel] getdata implementation questions

From: D. V. W. <ge...@ke...> - 2011-03-15 00:37:10

On Mon, Mar 14, 2011 at 06:15:50PM -0500, Ross Williamson wrote:
> Awesome thanks for that.
> 
> I have one more question.  I'm missing the reason why fragments are
> useful - Is it just to split up a potentially large format file?
> 
> Cheers
> 
> Ross

In practice, large format files aren't an issue.  I've never encountered
one too big.  (It'd have to be >2Gb, which would be a lot of "format").

Basically, fragments make the database modular.  We invented fragments
when we started to analyse BLAST data.  In this situation, different
people are working on different parts of the data reduction.  Someone
(say, me) who wanted to participate in the analysis would need to collect
deconvolved detectors from Person D, calibration timestreams from Person
M, pointing solution timestreams from Person G, &c.  Fragments meant I
could organise an analysis dirfile like this:

- blast_data
  +- format
  +- deconvoled_bolos
  |  +- decon_ch1_rev7
  |  +- decon_ch2_rev7
  |  +- decon_ch3_rev7
  |  `- format
  +- pointing
  |  +- ra_rev8
  |  +- dec_rev8
  |  +- roll_rev8
  |  `- format
  `- calibration
     +- calib_ch1_rev3
     +- calib_ch2_rev3
     +- calib_ch3_rev3
     `- format

and have the top level format file be just three INCLUDE directives to
the subdirfiles.  The benefit of doing this is that later, when Person
M makes a new calibration ("rev4"), he tars up a new "calibration"
directory, including a format file with all the necessary metadata,
and all I have to to is delete my curent "calibration" directory
and untar the new one from Person M in its place, and I'm ready
to go.

Before inventing this, a new calibration or whatever entailed adding
all the new data files to the dirfile directory and then editing the
format file (by hand! -- this was before the GetData library could
deal with modifying metadata) to replace all the definitions of
"calib_*_rev3" with "calib_*_rev4".  It quickly got tiring.

That is, really inventing fragments were a way of pulling in
subdirectories into a parent dirfile directory.  Being able to
include another format file fragment in the *same* directory was
just syntactic sugar.

Make sense?
-dvw
-- 
D. V. Wiebe
ge...@ke...
http://getdata.sourceforge.net/

1 message has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 5 6 7 8 9 .. 12 > >> (Page 7 of 12)