There seems to be a deep bug in PDL::IO::FlexRaw that prevents mapflex from working.
The problem is not with mmapping per se. The mapfraw function in FastRaw does not encounter trouble. As far as I can tell, the difference is that FastRaw does not use the set_data_by_offset, but FlexRaw does, because FlexRaw allows you to work with multiple piddles in the same file. The set_data_by_offset function sets the data member of the piddle struct, but it never provides any magic for undoing that action. As far as I can tell, piddles that are mmapped using FlexRaw do not perform any refcounting on the underlying mmapped data, so it never knows when it goes out of scope (and therefore it never unmaps the data?)
AFAIK, only FlexRaw uses set_data_by_offset, and FlexRaw's use was never tested, even with the Fortran based code. set_data_by_offset has not been covered in our test suite until my latest additions to the test suite for FlexRaw. We could have had this broken function hanging around for a long time, just not known about it.
As of the time of writing (commit 25ffbb5edd), tests 5 (A piddle and it's mapflex representation should be about equal) and 6 (Modifications to mapfraw should be saved to disk no later than when the piddle ceases to exist) fail, and the test script completely croaks before getting to the last two tests.
Here's my output from perldl -V:
perlDL shell v1.352
PDL comes with ABSOLUTELY NO WARRANTY. For details, see the file
'COPYING' in the PDL distribution. This is free software and you
are welcome to redistribute it under certain conditions, see
the same file for details.
Summary of my PDL configuration
VERSION: PDL v2.4.6_015 (supports bad values)
$%PDL::Config = {
'BADVAL_PER_PDL' => '0',
'WITH_PROJ' => undef,
'FFTW_TYPE' => 'double',
'FFTW_LIBS' => [
'/lib',
'/usr/lib',
'/usr/local/lib'
],
'WITH_FFTW' => undef,
'GSL_LIBS' => undef,
'GL_BUILD' => '1',
'WITH_IO_BROWSER' => '0',
'PROJ_INC' => undef,
'WHERE_PLPLOT_INCLUDE' => undef,
'WITH_KARMA' => '0',
'WHERE_KARMA' => undef,
'HTML_DOCS' => '1',
'WHERE_PLPLOT_LIBS' => undef,
'WITH_3D' => '1',
'FFTW_INC' => [
'/usr/include/',
'/usr/local/include'
],
'WITH_POSIX_THREADS' => '1',
'POGL_VERSION' => '0.63',
'HIDE_TRYLINK' => '1',
'WITH_HDF' => undef,
'HDF_INC' => undef,
'POGL_WINDOW_TYPE' => 'glut',
'OPENGL_LIBS' => '-L/usr/lib/mesa -L/usr/lib/ -L/usr/lib/mesa -lGLU -lGL -lXext -lX11 -lm',
'WITH_BADVAL' => '1',
'WITH_GD' => undef,
'FITS_LEGACY' => '1',
'WITH_SLATEC' => undef,
'BADVAL_USENAN' => '0',
'WITH_DEVEL_REPL' => '1',
'TEMPDIR' => '/tmp',
'PROJ_LIBS' => undef,
'USE_POGL' => '0',
'GD_LIBS' => undef,
'GSL_INC' => undef,
'GD_INC' => undef,
'WITH_GSL' => undef,
'OPTIMIZE' => undef,
'HDF_LIBS' => undef,
'MALLOCDBG' => {},
'WITH_MINUIT' => undef,
'WITH_PLPLOT' => '1',
'MINUIT_LIB' => undef
};
Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
Platform:
osname=linux, osvers=2.6.24-27-server, archname=i486-linux-gnu-thread-multi
uname='linux vernadsky 2.6.24-27-server #1 smp fri mar 12 01:45:06 utc 2010 i686 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.4.3', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib /usr/lib64
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=/lib/libc-2.11.1.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1
gnulibc_version='2.11.1'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'
Does mapfraw work from IO::FastRaw?
If so, does mapflex work with a single piddle?
From looking at Core.xs and the args to set_data_by_mmap() it appears that
the mmap is set up with shared equal 0 by default which means that any
changes are private and would not flow back to the global memory or file
object. I think the call needs to use MAP_SHARED in order for changes to
be visible.
> Does mapfraw work from IO::FastRaw?
Yes
> If so, does mapflex work with a single piddle?
No
> From looking at Core.xs and the args to set_data_by_mmap() it appears that
> the mmap is set up with shared equal 0 by default which means that any
> changes are private and would not flow back to the global memory or file
> object. I think the call needs to use MAP_SHARED in order for changes to
> be visible.
Maybe, but FastRaw works. The tests that fail for FlexRaw were just copied
over from FastRaw and modified so they used the correct calling conventions.
Due to the lack of time before the PDL-2.4.7 release, I'm marking this
bug as Postponed for revisiting after the coming release.
One direction that might be worth investigating is
to use the File::Map module for mmaping to perl
scalars instead of hand-rolled code. I don't know
if that would work for the needs of PDL but if so,
it could make mmap work cross-platform since the
File::Map module supports win32 and other OSes.
Looking at some recent CPAN Testers failures for *BSD
systems with t/flexraw.t and PDL-2.4.7_004 I tracked
down some issues here:
(1) There does appear to be a problem with the
scope of the pdl that is the whole mmap'd file. It
should probably have a refcount for each piddle
that is mapped via the offset. When I keep a ref
to the per-file pdl from mapflex then access to the
mmapped data works.
(2) By default mapflex() maps the files in ReadOnly
mode. If you add the { ReadOnly=>0 } option to the
mapflex calls then the updates to the disk data work
as expected.
(3) I don't know what the message
Warning: special data without datasv is not freed currently!!
means but it seems to be related to freeing of pdls.
--Chris
It turns out that ReadOnly was set by default to 1 for mapflex,
the docs say it should default to false. Fixing that makes the
t/flexraw.t tests pass. It still doesn't address the problem of
more than one piddle being mmapped from a file via offset
from a parent piddle for the whole file.
Fixed in git and available on CPAN for PDL versions
2.4.7_005 and higher. On the way to the fix, I noted
that it would be might be useful to revisit the API for
writeflex and mapflex as far as creating/writing data
files. As is, it is fairly clunky and not clear how things
work with multiple pdls in the same file...
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).