|
From: Ian H. <iw...@go...> - 2008-07-10 10:03:57
|
Hi all, Myself and my colleagues use, and have used, matplotlib and it's Tex capabilities quite extensively to create plots to assist in the gravitational wave searches we perform. (and it has been a great tool for us :-) ). However recently we have been running into problems when we have started automating our plot generation by running multiple plotting jobs concurrently using the condor scheduler (and dagmans). Many of our plotting jobs fail with messages such as the one below: ---snip--- Traceback (most recent call last): File "/home/romain/Projects/ ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414/inj001_summary_plots/../executables/plotinjnum", line 298, in ? 'eff_dist_h') File "/home/romain/Projects/ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414/inj001_summary_plots/../executables/plotinjnum", line 119, in plot_found_missed fname_thumb = InspiralUtils.savefig_pylal(filename=fname, doThumb=True, dpi_thumb=opts.figure_resolution) File "/home/romain/codes/s5_2yr_lv_lowcbc_20080625/pylal/lib64/python2.4/site-packages/pylal/InspiralUtils.py", line 58, in savefig_pylal fig.savefig(filename_thumb, dpi=dpi_thumb) .... File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", line 259, in make_png os.remove(outfile) OSError: [Errno 2] No such file or directory: '/home/romain/.matplotlib/tex.cache/ae479c90ff242327b54af004a0846188.output' ---snip--- My feeling is that when the code invokes the Tex 'bit' it creates a temp file in ~/matplotlib/tex.cache and then deletes it and all other temp tex files when it finishes the Tex 'bit'. This would cause problems if another job is in the middle of running Tex when the other job deletes it's temp files! We are running a slightly old version of matplotlib (0.87.7), as we run on multiple clusters our sys admins tend to only update software when there is a need to and we have had no other problems with matplotlib, I apologize if this has been fixed in the meantime (I did do a quick search of the mailing list archive but found nothing). All our clusters currently run Fedora Core 4 (we're going to move to CentOS 5). Currently we are getting around this by forcing condor to retry the failed jobs 2/3 times, this catches most of these errors. Another solution would be to limit the number of jobs running to 1 BUT as we run dagmen from within one 'super' dagman it would prove difficult to limit jobs from multiple dagmen. Anyway if anyone has any ideas of how to solve this I would appreciate this. Also if there are any options where we can set the location of these temp tex files and use a different directory for each job (or stop matplotlib deleting other temp files) that would help us. Thanks in advance for any help Ian Harry -- --------------------------------------------------------------------------- Ian Harry School of Physics & Astronomy Queens Buildings, The Parade Cardiff, CF24 3AA Email: Ian...@as... Phone: (+44) 29 208 75120 Mobile: (+44) 7890 479090 --------------------------------------------------------------------------- |
|
From: Darren D. <dsd...@gm...> - 2008-07-10 12:21:51
|
Hi Ian, On Thursday 10 July 2008 06:03:54 am Ian Harry wrote: > Hi all, > > Myself and my colleagues use, and have used, matplotlib and it's Tex > capabilities quite extensively to create plots to assist in the > gravitational wave searches we perform. (and it has been a great tool for > us > > :-) ). However recently we have been running into problems when we have > > started automating our plot generation by running multiple plotting jobs > concurrently using the condor scheduler (and dagmans). Many of our plotting > jobs fail with messages such as the one below: > > ---snip--- > > Traceback (most recent call last): > File > "/home/romain/Projects/ > ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414 >/inj001_summary_plots/../executables/plotinjnum", line 298, in ? > 'eff_dist_h') > File > "/home/romain/Projects/ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901 >414/868815014-868901414/inj001_summary_plots/../executables/plotinjnum", > line 119, in plot_found_missed > fname_thumb = InspiralUtils.savefig_pylal(filename=fname, > doThumb=True, dpi_thumb=opts.figure_resolution) > File > "/home/romain/codes/s5_2yr_lv_lowcbc_20080625/pylal/lib64/python2.4/site-pa >ckages/pylal/InspiralUtils.py", line 58, in savefig_pylal > fig.savefig(filename_thumb, dpi=dpi_thumb) > .... > File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", line > 259, in make_png > os.remove(outfile) > OSError: [Errno 2] No such file or directory: > '/home/romain/.matplotlib/tex.cache/ae479c90ff242327b54af004a0846188.output >' > > ---snip--- > > My feeling is that when the code invokes the Tex 'bit' it creates a temp > file in ~/matplotlib/tex.cache and then deletes it and all other temp tex > files when it finishes the Tex 'bit'. This would cause problems if another > job is in the middle of running Tex when the other job deletes it's temp > files! > > We are running a slightly old version of matplotlib (0.87.7), as we run on > multiple clusters our sys admins tend to only update software when there is > a need to and we have had no other problems with matplotlib, I apologize if > this has been fixed in the meantime (I did do a quick search of the mailing > list archive but found nothing). All our clusters currently run Fedora Core > 4 (we're going to move to CentOS 5). > > Currently we are getting around this by forcing condor to retry the failed > jobs 2/3 times, this catches most of these errors. Another solution would > be to limit the number of jobs running to 1 BUT as we run dagmen from > within one 'super' dagman it would prove difficult to limit jobs from > multiple dagmen. > > Anyway if anyone has any ideas of how to solve this I would appreciate > this. Also if there are any options where we can set the location of these > temp tex files and use a different directory for each job (or stop > matplotlib deleting other temp files) that would help us. I'm really hesitant to mess around with the location of the temp files. It was a bit painfull trying to get usetex to work across platforms. Instead, would you try replacing: os.remove(outfile) with: try: os.remove(outfile) except OSError: pass Let me know if that fixes it, and if you need to wrap any other file deletions. Thanks, Darren |
|
From: Ian H. <iw...@go...> - 2008-07-10 14:48:05
|
Hi Darren,
I have tried rerunning our code with the change you suggested in the
make_dvi and make_png functions. I am still noticing failures however. I put
these at the bottom of this message. Strangely enough, these errors don't
seem to occur when there are a lot of files in my tex.cache directory. For
example, I ran the code (consisting of ~40 codes all making ~10-20 plots
each), successfully 3 times (the OSError wasn't raised at all, I used a
print statement to check). I realised after this that a lot of temp files
were in my tex.cache directory, so I emptied it and then I noticed that a
lot of failures occured when I ran the code the next time (the OSError I
showed previously was raised as well as the error messages shown below). It
seems weird that it should run fine when a lot of files are left in my temp
directory and not when it is empty?
Here are the error messages that are occuring now:
Traceback (most recent call last):
File
"/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/../executables/plotinspmissed",
line 625, in ?
savePlot( opts, filename, titleText)
File
"/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/../executables/plotinspmissed",
line 108, in savePlot
dpi_thumb=opts.figure_resolution)
File
"/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.4/site-packages/pylal/InspiralUtils.py",
line 54, in savefig_pylal
fig.savefig(filename, dpi=dpi)
File "/home/spxiwh/test/matplotlib/figure.py", line 682, in savefig
self.canvas.print_figure(*args, **kwargs)
File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 456, in
print_figure
self.draw()
File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 392, in
draw
self.figure.draw(renderer)
File "/home/spxiwh/test/matplotlib/figure.py", line 544, in draw
for a in self.axes: a.draw(renderer)
File "/home/spxiwh/test/matplotlib/axes.py", line 1063, in draw
a.draw(renderer)
File "/home/spxiwh/test/matplotlib/axis.py", line 595, in draw
self.label.draw(renderer)
File "/home/spxiwh/test/matplotlib/text.py", line 340, in draw
bbox, info = self._get_layout(renderer)
File "/home/spxiwh/test/matplotlib/text.py", line 187, in _get_layout
w,h = renderer.get_text_width_height(
File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 240, in
get_text_width_height
Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)
File "/home/spxiwh/test/matplotlib/texmanager.py", line 334, in get_rgba
pngfile = self.make_png(tex, fontsize, dpi, force=False)
File "/home/spxiwh/test/matplotlib/texmanager.py", line 255, in make_png
fh = file(outfile)
IOError: [Errno 2] No such file or directory:
'/home/spxiwh/.matplotlib/tex.cache/fb2014e54961855bd04020b61190867c.output'
Traceback (most recent call last):
File
"/home/spxiwh/ihope/852450000-852700000/bnsinj_summary_plots/../executables/plotinspinj",
line 569, in ?
'end_time', 'days', opts.time_axis, plot_type = 'linear' )
File
"/home/spxiwh/ihope/852450000-852700000/bnsinj_summary_plots/../executables/plotinspinj",
line 94, in plot_parameter_accuracy
dpi_thumb=opts.figure_resolution)
File
"/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.4/site-packages/pylal/InspiralUtils.py",
line 54, in savefig_pylal
fig.savefig(filename, dpi=dpi)
File "/home/spxiwh/test/matplotlib/figure.py", line 682, in savefig
self.canvas.print_figure(*args, **kwargs)
File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 456, in
print_figure
self.draw()
File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 392, in
draw
self.figure.draw(renderer)
File "/home/spxiwh/test/matplotlib/figure.py", line 544, in draw
for a in self.axes: a.draw(renderer)
File "/home/spxiwh/test/matplotlib/axes.py", line 1063, in draw
a.draw(renderer)
File "/home/spxiwh/test/matplotlib/axis.py", line 561, in draw
tick.draw(renderer)
File "/home/spxiwh/test/matplotlib/axis.py", line 161, in draw
if self.label1On: self.label1.draw(renderer)
File "/home/spxiwh/test/matplotlib/text.py", line 838, in draw
Text.draw(self, renderer)
File "/home/spxiwh/test/matplotlib/text.py", line 340, in draw
bbox, info = self._get_layout(renderer)
File "/home/spxiwh/test/matplotlib/text.py", line 187, in _get_layout
w,h = renderer.get_text_width_height(
File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 240, in
get_text_width_height
Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)
File "/home/spxiwh/test/matplotlib/texmanager.py", line 334, in get_rgba
pngfile = self.make_png(tex, fontsize, dpi, force=False)
File "/home/spxiwh/test/matplotlib/texmanager.py", line 247, in make_png
dvifile = self.make_dvi(tex, fontsize)
File "/home/spxiwh/test/matplotlib/texmanager.py", line 223, in make_dvi
fh = file(outfile)
IOError: [Errno 2] No such file or directory:
'/home/spxiwh/.matplotlib/tex.cache/7e534aafdc12681d1ef0d36df4963de8.output'
And once I noticed:
Traceback (most recent call last):
File
"/home/spxiwh/ihope/852450000-852700000/allinj_summary_plots/../executables/plotinspmissed",
line 661, in ?
dpi_thumb=opts.figure_resolution)
File
"/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2.4/site-packages/pylal/InspiralUtils.py",
line 54, in savefig_pylal
fig.savefig(filename, dpi=dpi)
File "/usr/lib64/python2.4/site-packages/matplotlib/figure.py", line 682,
in savefig
self.canvas.print_figure(*args, **kwargs)
File
"/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py",
line 456, in print_figure
self.draw()
File
"/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py",
line 392, in draw
self.figure.draw(renderer)
File "/usr/lib64/python2.4/site-packages/matplotlib/figure.py", line 544,
in draw
for a in self.axes: a.draw(renderer)
File "/usr/lib64/python2.4/site-packages/matplotlib/axes.py", line 1063,
in draw
a.draw(renderer)
File "/usr/lib64/python2.4/site-packages/matplotlib/text.py", line 340, in
draw
bbox, info = self._get_layout(renderer)
File "/usr/lib64/python2.4/site-packages/matplotlib/text.py", line 187, in
_get_layout
w,h = renderer.get_text_width_height(
File
"/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py",
line 240, in get_text_width_height
Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb)
File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", line
330, in get_rgba
X = readpng(os.path.join(self.texcache, pngfile))
RuntimeError: _image_module::readpng: file not recognized as a PNG file
Cheers
Ian
2008/7/10 Darren Dale <dsd...@gm...>:
> Hi Ian,
>
> On Thursday 10 July 2008 06:03:54 am Ian Harry wrote:
> > Hi all,
> >
> > Myself and my colleagues use, and have used, matplotlib and it's Tex
> > capabilities quite extensively to create plots to assist in the
> > gravitational wave searches we perform. (and it has been a great tool for
> > us
> >
> > :-) ). However recently we have been running into problems when we have
> >
> > started automating our plot generation by running multiple plotting jobs
> > concurrently using the condor scheduler (and dagmans). Many of our
> plotting
> > jobs fail with messages such as the one below:
> >
> > ---snip---
> >
> > Traceback (most recent call last):
> > File
> > "/home/romain/Projects/
> >
> ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901414/868815014-868901414
> >/inj001_summary_plots/../executables/plotinjnum", line 298, in ?
> > 'eff_dist_h')
> > File
> >
> "/home/romain/Projects/ligovirgo/s5_2yr_lv_lowcbc_20080625/868815014-868901
> >414/868815014-868901414/inj001_summary_plots/../executables/plotinjnum",
> > line 119, in plot_found_missed
> > fname_thumb = InspiralUtils.savefig_pylal(filename=fname,
> > doThumb=True, dpi_thumb=opts.figure_resolution)
> > File
> >
> "/home/romain/codes/s5_2yr_lv_lowcbc_20080625/pylal/lib64/python2.4/site-pa
> >ckages/pylal/InspiralUtils.py", line 58, in savefig_pylal
> > fig.savefig(filename_thumb, dpi=dpi_thumb)
> > ....
> > File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", line
> > 259, in make_png
> > os.remove(outfile)
> > OSError: [Errno 2] No such file or directory:
> >
> '/home/romain/.matplotlib/tex.cache/ae479c90ff242327b54af004a0846188.output
> >'
> >
> > ---snip---
> >
> > My feeling is that when the code invokes the Tex 'bit' it creates a temp
> > file in ~/matplotlib/tex.cache and then deletes it and all other temp tex
> > files when it finishes the Tex 'bit'. This would cause problems if
> another
> > job is in the middle of running Tex when the other job deletes it's temp
> > files!
> >
> > We are running a slightly old version of matplotlib (0.87.7), as we run
> on
> > multiple clusters our sys admins tend to only update software when there
> is
> > a need to and we have had no other problems with matplotlib, I apologize
> if
> > this has been fixed in the meantime (I did do a quick search of the
> mailing
> > list archive but found nothing). All our clusters currently run Fedora
> Core
> > 4 (we're going to move to CentOS 5).
> >
> > Currently we are getting around this by forcing condor to retry the
> failed
> > jobs 2/3 times, this catches most of these errors. Another solution would
> > be to limit the number of jobs running to 1 BUT as we run dagmen from
> > within one 'super' dagman it would prove difficult to limit jobs from
> > multiple dagmen.
> >
> > Anyway if anyone has any ideas of how to solve this I would appreciate
> > this. Also if there are any options where we can set the location of
> these
> > temp tex files and use a different directory for each job (or stop
> > matplotlib deleting other temp files) that would help us.
>
> I'm really hesitant to mess around with the location of the temp files. It
> was
> a bit painfull trying to get usetex to work across platforms.
>
> Instead, would you try replacing:
>
> os.remove(outfile)
>
> with:
>
> try: os.remove(outfile)
> except OSError: pass
>
> Let me know if that fixes it, and if you need to wrap any other file
> deletions.
>
> Thanks,
> Darren
>
--
---------------------------------------------------------------------------
Ian Harry
School of Physics & Astronomy
Queens Buildings, The Parade
Cardiff, CF24 3AA
Email: Ian...@as...
Phone: (+44) 29 208 75120
Mobile: (+44) 7890 479090
---------------------------------------------------------------------------
|
|
From: Darren D. <dsd...@gm...> - 2008-07-10 15:43:07
|
On Thursday 10 July 2008 10:48:01 am you wrote: > Hi Darren, > > I have tried rerunning our code with the change you suggested in the > make_dvi and make_png functions. I am still noticing failures however. I > put these at the bottom of this message. Strangely enough, these errors > don't seem to occur when there are a lot of files in my tex.cache > directory. For example, I ran the code (consisting of ~40 codes all making > ~10-20 plots each), successfully 3 times (the OSError wasn't raised at all, > I used a print statement to check). I realised after this that a lot of > temp files were in my tex.cache directory, so I emptied it and then I > noticed that a lot of failures occured when I ran the code the next time > (the OSError I showed previously was raised as well as the error messages > shown below). It seems weird that it should run fine when a lot of files > are left in my temp directory and not when it is empty? Most of those files are not temporary files, but cached files. The error you reported only occurs when a required file does not already exist in the cache, and like you said, it appears to be the case that two jobs are trying to add the same file to the cache at the same time, and one job is failing because the other deletes a temporary file that is being used by both. I guess. > Here are the error messages that are occuring now: > > Traceback (most recent call last): > File > "/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/../executable >s/plotinspmissed", line 625, in ? > savePlot( opts, filename, titleText) > File > "/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/../executable >s/plotinspmissed", line 108, in savePlot > dpi_thumb=opts.figure_resolution) > File > "/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2. >4/site-packages/pylal/InspiralUtils.py", line 54, in savefig_pylal > fig.savefig(filename, dpi=dpi) > File "/home/spxiwh/test/matplotlib/figure.py", line 682, in savefig > self.canvas.print_figure(*args, **kwargs) > File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 456, in > print_figure > self.draw() > File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 392, in > draw > self.figure.draw(renderer) > File "/home/spxiwh/test/matplotlib/figure.py", line 544, in draw > for a in self.axes: a.draw(renderer) > File "/home/spxiwh/test/matplotlib/axes.py", line 1063, in draw > a.draw(renderer) > File "/home/spxiwh/test/matplotlib/axis.py", line 595, in draw > self.label.draw(renderer) > File "/home/spxiwh/test/matplotlib/text.py", line 340, in draw > bbox, info = self._get_layout(renderer) > File "/home/spxiwh/test/matplotlib/text.py", line 187, in _get_layout > w,h = renderer.get_text_width_height( > File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 240, in > get_text_width_height > Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb) > File "/home/spxiwh/test/matplotlib/texmanager.py", line 334, in get_rgba > pngfile = self.make_png(tex, fontsize, dpi, force=False) > File "/home/spxiwh/test/matplotlib/texmanager.py", line 255, in make_png > fh = file(outfile) > IOError: [Errno 2] No such file or directory: > '/home/spxiwh/.matplotlib/tex.cache/fb2014e54961855bd04020b61190867c.output >' That doesnt make any sense to me. file defaults to open a file in append mode, it doesnt matter if a file exists or not. Maybe you could try to figure out why that fails and report back. > And once I noticed: > > Traceback (most recent call last): > File > "/home/spxiwh/ihope/852450000-852700000/allinj_summary_plots/../executables >/plotinspmissed", line 661, in ? > dpi_thumb=opts.figure_resolution) > File > "/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2. >4/site-packages/pylal/InspiralUtils.py", line 54, in savefig_pylal > fig.savefig(filename, dpi=dpi) > File "/usr/lib64/python2.4/site-packages/matplotlib/figure.py", line 682, > in savefig > self.canvas.print_figure(*args, **kwargs) > File > "/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py", > line 456, in print_figure > self.draw() > File > "/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py", > line 392, in draw > self.figure.draw(renderer) > File "/usr/lib64/python2.4/site-packages/matplotlib/figure.py", line 544, > in draw > for a in self.axes: a.draw(renderer) > File "/usr/lib64/python2.4/site-packages/matplotlib/axes.py", line 1063, > in draw > a.draw(renderer) > File "/usr/lib64/python2.4/site-packages/matplotlib/text.py", line 340, > in draw > bbox, info = self._get_layout(renderer) > File "/usr/lib64/python2.4/site-packages/matplotlib/text.py", line 187, > in _get_layout > w,h = renderer.get_text_width_height( > File > "/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py", > line 240, in get_text_width_height > Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb) > File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", line > 330, in get_rgba > X = readpng(os.path.join(self.texcache, pngfile)) > RuntimeError: _image_module::readpng: file not recognized as a PNG file No idea, sorry. Darren |
|
From: Ian H. <ian...@as...> - 2008-07-15 14:13:07
|
Hi Darren, Thanks for helping with this problem. I have investigated further this issue and here is what I have found out: I have traced the errors themselves back to two functions in texmanager.py (matplotlib.texmanager), make_dvi and make_png. Most of the errors seem to mention 'Stale NFS file handles' and crop up at a variety of different places throughout these functions. I guess this is because on our clusters /home/[username] is not a local directory, we have seen issues before with other code if a lot of nodes try to access the same directory on the NFS file system simultaneously. I tried altering the __init__.py to force the code to put the .matplotlib directory on filesystems local to each node. Moving the .matplotlib directory to a local drive solves almost all of these errors. One error that remained was the one about file opening fh = file(outfile) I added a 'w' to this and this seemed to solve this problem, I also commented out some of the verbose generating commands (specifically fh.read() was causing a problem (probably expected with 'w')) within these functions and the errors go away. I guess 'a' would be better but the commands only seem to be called if the file doesn't exist? As we have a lot of users running this code a solution like this is unworkable (as a lot of our users are unfamiliar with python/Linux and want to run a simple command). Do you have any ideas of how we could solve this issue? Thanks again for your help Ian Harry 2008/7/10 Darren Dale <dsd...@gm...>: > On Thursday 10 July 2008 10:48:01 am you wrote: > > Hi Darren, > > > > I have tried rerunning our code with the change you suggested in the > > make_dvi and make_png functions. I am still noticing failures however. I > > put these at the bottom of this message. Strangely enough, these errors > > don't seem to occur when there are a lot of files in my tex.cache > > directory. For example, I ran the code (consisting of ~40 codes all > making > > ~10-20 plots each), successfully 3 times (the OSError wasn't raised at > all, > > I used a print statement to check). I realised after this that a lot of > > temp files were in my tex.cache directory, so I emptied it and then I > > noticed that a lot of failures occured when I ran the code the next time > > (the OSError I showed previously was raised as well as the error messages > > shown below). It seems weird that it should run fine when a lot of files > > are left in my temp directory and not when it is empty? > > Most of those files are not temporary files, but cached files. The error > you > reported only occurs when a required file does not already exist in the > cache, and like you said, it appears to be the case that two jobs are > trying > to add the same file to the cache at the same time, and one job is failing > because the other deletes a temporary file that is being used by both. I > guess. > > > Here are the error messages that are occuring now: > > > > Traceback (most recent call last): > > File > > > "/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/../executable > >s/plotinspmissed", line 625, in ? > > savePlot( opts, filename, titleText) > > File > > > "/home/spxiwh/ihope/852450000-852700000/nsbhinj_summary_plots/../executable > >s/plotinspmissed", line 108, in savePlot > > dpi_thumb=opts.figure_resolution) > > File > > > "/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2. > >4/site-packages/pylal/InspiralUtils.py", line 54, in savefig_pylal > > fig.savefig(filename, dpi=dpi) > > File "/home/spxiwh/test/matplotlib/figure.py", line 682, in savefig > > self.canvas.print_figure(*args, **kwargs) > > File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 456, > in > > print_figure > > self.draw() > > File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 392, > in > > draw > > self.figure.draw(renderer) > > File "/home/spxiwh/test/matplotlib/figure.py", line 544, in draw > > for a in self.axes: a.draw(renderer) > > File "/home/spxiwh/test/matplotlib/axes.py", line 1063, in draw > > a.draw(renderer) > > File "/home/spxiwh/test/matplotlib/axis.py", line 595, in draw > > self.label.draw(renderer) > > File "/home/spxiwh/test/matplotlib/text.py", line 340, in draw > > bbox, info = self._get_layout(renderer) > > File "/home/spxiwh/test/matplotlib/text.py", line 187, in _get_layout > > w,h = renderer.get_text_width_height( > > File "/home/spxiwh/test/matplotlib/backends/backend_agg.py", line 240, > in > > get_text_width_height > > Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb) > > File "/home/spxiwh/test/matplotlib/texmanager.py", line 334, in > get_rgba > > pngfile = self.make_png(tex, fontsize, dpi, force=False) > > File "/home/spxiwh/test/matplotlib/texmanager.py", line 255, in > make_png > > fh = file(outfile) > > IOError: [Errno 2] No such file or directory: > > > '/home/spxiwh/.matplotlib/tex.cache/fb2014e54961855bd04020b61190867c.output > >' > > That doesnt make any sense to me. file defaults to open a file in append > mode, > it doesnt matter if a file exists or not. Maybe you could try to figure out > why that fails and report back. > > > And once I noticed: > > > > Traceback (most recent call last): > > File > > > "/home/spxiwh/ihope/852450000-852700000/allinj_summary_plots/../executables > >/plotinspmissed", line 661, in ? > > dpi_thumb=opts.figure_resolution) > > File > > > "/home/spxiwh/lscsoft/executables/cbc_s5_1yr_20070129/pylal//lib64/python2. > >4/site-packages/pylal/InspiralUtils.py", line 54, in savefig_pylal > > fig.savefig(filename, dpi=dpi) > > File "/usr/lib64/python2.4/site-packages/matplotlib/figure.py", line > 682, > > in savefig > > self.canvas.print_figure(*args, **kwargs) > > File > > "/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py", > > line 456, in print_figure > > self.draw() > > File > > "/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py", > > line 392, in draw > > self.figure.draw(renderer) > > File "/usr/lib64/python2.4/site-packages/matplotlib/figure.py", line > 544, > > in draw > > for a in self.axes: a.draw(renderer) > > File "/usr/lib64/python2.4/site-packages/matplotlib/axes.py", line > 1063, > > in draw > > a.draw(renderer) > > File "/usr/lib64/python2.4/site-packages/matplotlib/text.py", line 340, > > in draw > > bbox, info = self._get_layout(renderer) > > File "/usr/lib64/python2.4/site-packages/matplotlib/text.py", line 187, > > in _get_layout > > w,h = renderer.get_text_width_height( > > File > > "/usr/lib64/python2.4/site-packages/matplotlib/backends/backend_agg.py", > > line 240, in get_text_width_height > > Z = texmanager.get_rgba(s, size, self.dpi.get(), rgb) > > File "/usr/lib64/python2.4/site-packages/matplotlib/texmanager.py", > line > > 330, in get_rgba > > X = readpng(os.path.join(self.texcache, pngfile)) > > RuntimeError: _image_module::readpng: file not recognized as a PNG file > > No idea, sorry. > > Darren > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > -- --------------------------------------------------------------------------- Ian Harry School of Physics & Astronomy Queens Buildings, The Parade Cardiff, CF24 3AA Email: Ian...@as... Phone: (+44) 29 208 75120 Mobile: (+44) 7890 479090 --------------------------------------------------------------------------- |
|
From: Darren D. <dsd...@gm...> - 2008-07-15 15:18:11
|
Hi Ian, On Tuesday 15 July 2008 10:13:02 am Ian Harry wrote: > Thanks for helping with this problem. > > I have investigated further this issue and here is what I have found out: > > I have traced the errors themselves back to two functions in texmanager.py > (matplotlib.texmanager), make_dvi and make_png. Most of the errors seem to > mention 'Stale NFS file handles' and crop up at a variety of different > places throughout these functions. I guess this is because on our clusters > /home/[username] is not a local directory, we have seen issues before with > other code if a lot of nodes try to access the same directory on the NFS > file system simultaneously. I tried altering the __init__.py to force the > code to put the .matplotlib directory on filesystems local to each node. > Moving the .matplotlib directory to a local drive solves almost all of > these errors. I suggest you try backing out those changes you just described, and instead try setting a MPLCONFIGDIR environment variable to point somewhere on the local filesystem. If MPLCONFIGDIR is not defined, we use ~/.matplotlib. > One error that remained was the one about file opening > fh = file(outfile) > I added a 'w' to this and this seemed to solve this problem, I also > commented out some of the verbose generating commands (specifically > fh.read() was causing a problem (probably expected with 'w')) within these > functions and the errors go away. I guess 'a' would be better but the > commands only seem to be called if the file doesn't exist? Out of curiosity, if you added 'a' instead of 'w', does the error go away? Either way, please let me know exactly what changes need to be made and I will commit the changes to svn. > As we have a lot of users running this code a solution like this is > unworkable (as a lot of our users are unfamiliar with python/Linux and want > to run a simple command). Do you have any ideas of how we could solve this > issue? Please try the environment variable I mentioned and let me know what happens. Darren |
|
From: Ian H. <ian...@as...> - 2008-07-16 11:21:05
|
2008/7/15 Darren Dale <dsd...@gm...>: > Hi Ian, > > On Tuesday 15 July 2008 10:13:02 am Ian Harry wrote: > > Thanks for helping with this problem. > > > > I have investigated further this issue and here is what I have found out: > > > > I have traced the errors themselves back to two functions in > texmanager.py > > (matplotlib.texmanager), make_dvi and make_png. Most of the errors seem > to > > mention 'Stale NFS file handles' and crop up at a variety of different > > places throughout these functions. I guess this is because on our > clusters > > /home/[username] is not a local directory, we have seen issues before > with > > other code if a lot of nodes try to access the same directory on the NFS > > file system simultaneously. I tried altering the __init__.py to force the > > code to put the .matplotlib directory on filesystems local to each node. > > Moving the .matplotlib directory to a local drive solves almost all of > > these errors. > > I suggest you try backing out those changes you just described, and instead > try setting a MPLCONFIGDIR environment variable to point somewhere on the > local filesystem. If MPLCONFIGDIR is not defined, we use ~/.matplotlib. Brilliant! This works perfectly and should be easy to implement on different systems! > > > One error that remained was the one about file opening > > fh = file(outfile) > > I added a 'w' to this and this seemed to solve this problem, I also > > commented out some of the verbose generating commands (specifically > > fh.read() was causing a problem (probably expected with 'w')) within > these > > functions and the errors go away. I guess 'a' would be better but the > > commands only seem to be called if the file doesn't exist? > > Out of curiosity, if you added 'a' instead of 'w', does the error go away? > Either way, please let me know exactly what changes need to be made and I > will > commit the changes to svn. No, using 'a' gives the same errors as using 'w' (again in fh.read()). Here are the changes I made to stop the errors that didn't seem to be due to 'stale NFS file handle': --snip-- [spxiwh@sugar 07:14 AM matplotlib]$ diff texmanager.py /usr/lib64/python2.4/site-packages/matplotlib/texmanager.py 248c248 < fh = file(outfile,'a') --- > fh = file(outfile) 252,254c252 < else: < try: verbose.report(fh.read(), 'debug') < except: pass --- > else: verbose.report(fh.read(), 'debug') 259,261c257,258 < else: < try: os.remove(fname) < except: pass --- > else: os.remove(fname) > 280c277 < fh = file(outfile,'a') --- > fh = file(outfile) 285,287c282 < else: < try: verbose.report(fh.read(), 'debug') < except: pass --- > else: verbose.report(fh.read(), 'debug') 289,290c284 < try: os.remove(outfile) < except: pass --- > os.remove(outfile) 314c308 < # else: verbose.report(fh.read(), 'debug') --- > else: verbose.report(fh.read(), 'debug') --snip-- Once again, thanks for the help. Ian > > > > As we have a lot of users running this code a solution like this is > > unworkable (as a lot of our users are unfamiliar with python/Linux and > want > > to run a simple command). Do you have any ideas of how we could solve > this > > issue? > > Please try the environment variable I mentioned and let me know what > happens. > > Darren > -- --------------------------------------------------------------------------- Ian Harry School of Physics & Astronomy Queens Buildings, The Parade Cardiff, CF24 3AA Email: Ian...@as... Phone: (+44) 29 208 75120 Mobile: (+44) 7890 479090 --------------------------------------------------------------------------- |
|
From: Darren D. <dsd...@gm...> - 2008-07-17 12:52:53
|
On Wednesday 16 July 2008 07:20:59 am Ian Harry wrote:
> [spxiwh@sugar 07:14 AM matplotlib]$ diff texmanager.py
> /usr/lib64/python2.4/site-packages/matplotlib/texmanager.py
> 248c248
> < fh = file(outfile,'a')
> ---
>
> > fh = file(outfile)
>
> 252,254c252
> < else:
> < try: verbose.report(fh.read(), 'debug')
> < except: pass
> ---
>
> > else: verbose.report(fh.read(), 'debug')
>
> 259,261c257,258
> < else:
> < try: os.remove(fname)
> < except: pass
> ---
>
> > else: os.remove(fname)
>
> 280c277
> < fh = file(outfile,'a')
> ---
>
> > fh = file(outfile)
>
> 285,287c282
> < else:
> < try: verbose.report(fh.read(), 'debug')
> < except: pass
> ---
>
> > else: verbose.report(fh.read(), 'debug')
>
> 289,290c284
> < try: os.remove(outfile)
> < except: pass
> ---
>
> > os.remove(outfile)
>
> 314c308
> < # else: verbose.report(fh.read(), 'debug')
> ---
>
> > else: verbose.report(fh.read(), 'debug')
>
> --snip--
I took a different approach:
Index: lib/matplotlib/texmanager.py
===================================================================
--- lib/matplotlib/texmanager.py (revision 5771)
+++ lib/matplotlib/texmanager.py (working copy)
@@ -273,16 +273,22 @@
%(os.path.split(texfile)[-1], outfile))
mpl.verbose.report(command, 'debug')
exit_status = os.system(command)
- fh = file(outfile)
+ try:
+ fh = file(outfile)
+ report = fh.read()
+ fh.close()
+ except IOError:
+ report = 'No latex error report available.'
if exit_status:
raise RuntimeError(('LaTeX was not able to process the
following \
-string:\n%s\nHere is the full report generated by LaTeX: \n\n'% repr(tex)) +
fh.read())
- else: mpl.verbose.report(fh.read(), 'debug')
- fh.close()
+string:\n%s\nHere is the full report generated by LaTeX: \n\n'% repr(tex)) +
report)
+ else: mpl.verbose.report(report, 'debug')
for fname in glob.glob(basefile+'*'):
if fname.endswith('dvi'): pass
elif fname.endswith('tex'): pass
- else: os.remove(fname)
+ else:
+ try: os.remove(fname)
+ except OSError: pass
return dvifile
@@ -305,14 +311,19 @@
os.path.split(dvifile)[-1], outfile))
mpl.verbose.report(command, 'debug')
exit_status = os.system(command)
- fh = file(outfile)
+ try:
+ fh = file(outfile)
+ report = fh.read()
+ fh.close()
+ except IOError:
+ report = 'No dvipng error report available.'
if exit_status:
raise RuntimeError('dvipng was not able to \
process the flowing file:\n%s\nHere is the full report generated by dvipng: \
-\n\n'% dvifile + fh.read())
- else: mpl.verbose.report(fh.read(), 'debug')
- fh.close()
- os.remove(outfile)
+\n\n'% dvifile + report)
+ else: mpl.verbose.report(report, 'debug')
+ try: os.remove(outfile)
+ except OSError: pass
return pngfile
Would you update from svn and see if it works for you?
Thanks,
Darren
|
|
From: Ian H. <ian...@as...> - 2008-07-18 11:12:10
|
Hi Darren,
I have updated from svn and tried to run the code. It is not working, but,
the failures have nothing to do with texmanager.py. I'm getting some of our
codes failing from within one of our __init__.py files (my guess is a naming
conflict). And some more codes failing with:
File
"/home/spxiwh/matplotlibinstall/lib64/python2.4/site-packages/matplotlib/axes.py",
line 263, in _xy_from_xy
assert nrx == nry, 'Dimensions of x and y are incompatible'
AssertionError: Dimensions of x and y are incompatible
I also get:
/home/spxiwh/matplotlibinstall/lib64/python2.4/site-packages/matplotlib/__init__.py:801:
UserWarning: This call to matplotlib.use() has no effect
because the the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
at the top of all of our plotting routine outputs now.
This sounds like we have bugs in our code, which we need to deal with before
we can upgrade our numpy and matplotlib versions. Because of time
restraints, it is likely that upgrading of these modules on our systems will
not happen for a few months. Using MPLCONFIGDIR should stop most of our
failures anyway, I guess we can solve the rest by automatically retrying
failed jobs.
Thanks for the help
Ian
2008/7/17 Darren Dale <dsd...@gm...>:
> On Wednesday 16 July 2008 07:20:59 am Ian Harry wrote:
> > [spxiwh@sugar 07:14 AM matplotlib]$ diff texmanager.py
> > /usr/lib64/python2.4/site-packages/matplotlib/texmanager.py
> > 248c248
> > < fh = file(outfile,'a')
> > ---
> >
> > > fh = file(outfile)
> >
> > 252,254c252
> > < else:
> > < try: verbose.report(fh.read(), 'debug')
> > < except: pass
> > ---
> >
> > > else: verbose.report(fh.read(), 'debug')
> >
> > 259,261c257,258
> > < else:
> > < try: os.remove(fname)
> > < except: pass
> > ---
> >
> > > else: os.remove(fname)
> >
> > 280c277
> > < fh = file(outfile,'a')
> > ---
> >
> > > fh = file(outfile)
> >
> > 285,287c282
> > < else:
> > < try: verbose.report(fh.read(), 'debug')
> > < except: pass
> > ---
> >
> > > else: verbose.report(fh.read(), 'debug')
> >
> > 289,290c284
> > < try: os.remove(outfile)
> > < except: pass
> > ---
> >
> > > os.remove(outfile)
> >
> > 314c308
> > < # else: verbose.report(fh.read(), 'debug')
> > ---
> >
> > > else: verbose.report(fh.read(), 'debug')
> >
> > --snip--
>
> I took a different approach:
>
> Index: lib/matplotlib/texmanager.py
> ===================================================================
> --- lib/matplotlib/texmanager.py (revision 5771)
> +++ lib/matplotlib/texmanager.py (working copy)
> @@ -273,16 +273,22 @@
> %(os.path.split(texfile)[-1], outfile))
> mpl.verbose.report(command, 'debug')
> exit_status = os.system(command)
> - fh = file(outfile)
> + try:
> + fh = file(outfile)
> + report = fh.read()
> + fh.close()
> + except IOError:
> + report = 'No latex error report available.'
> if exit_status:
> raise RuntimeError(('LaTeX was not able to process the
> following \
> -string:\n%s\nHere is the full report generated by LaTeX: \n\n'% repr(tex))
> +
> fh.read())
> - else: mpl.verbose.report(fh.read(), 'debug')
> - fh.close()
> +string:\n%s\nHere is the full report generated by LaTeX: \n\n'% repr(tex))
> +
> report)
> + else: mpl.verbose.report(report, 'debug')
> for fname in glob.glob(basefile+'*'):
> if fname.endswith('dvi'): pass
> elif fname.endswith('tex'): pass
> - else: os.remove(fname)
> + else:
> + try: os.remove(fname)
> + except OSError: pass
>
> return dvifile
>
> @@ -305,14 +311,19 @@
> os.path.split(dvifile)[-1], outfile))
> mpl.verbose.report(command, 'debug')
> exit_status = os.system(command)
> - fh = file(outfile)
> + try:
> + fh = file(outfile)
> + report = fh.read()
> + fh.close()
> + except IOError:
> + report = 'No dvipng error report available.'
> if exit_status:
> raise RuntimeError('dvipng was not able to \
> process the flowing file:\n%s\nHere is the full report generated by
> dvipng: \
> -\n\n'% dvifile + fh.read())
> - else: mpl.verbose.report(fh.read(), 'debug')
> - fh.close()
> - os.remove(outfile)
> +\n\n'% dvifile + report)
> + else: mpl.verbose.report(report, 'debug')
> + try: os.remove(outfile)
> + except OSError: pass
>
> return pngfile
>
>
> Would you update from svn and see if it works for you?
>
> Thanks,
> Darren
>
--
---------------------------------------------------------------------------
Ian Harry
School of Physics & Astronomy
Queens Buildings, The Parade
Cardiff, CF24 3AA
Email: Ian...@as...
Phone: (+44) 29 208 75120
Mobile: (+44) 7890 479090
---------------------------------------------------------------------------
|
|
From: John H. <jd...@gm...> - 2008-07-18 14:32:01
|
On Fri, Jul 18, 2008 at 6:12 AM, Ian Harry <ian...@as...> wrote:
> Hi Darren,
> /home/spxiwh/matplotlibinstall/lib64/python2.4/site-packages/matplotlib/__init__.py:801:
> UserWarning: This call to matplotlib.use() has no effect
> because the the backend has already been chosen;
> matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
> or matplotlib.backends is imported for the first time.
>
> at the top of all of our plotting routine outputs now.
>
> This sounds like we have bugs in our code, which we need to deal with before
> we can upgrade our numpy and matplotlib versions. Because of time
> restraints, it is likely that upgrading of these modules on our systems will
> not happen for a few months. Using MPLCONFIGDIR should stop most of our
> failures anyway, I guess we can solve the rest by automatically retrying
> failed jobs.
>
For your own sake, this use bug should be fixed because it means mpl
is not doing what you think. The backend needs to be set before pylab
is imported. The two main ways to set the backend are in the rc file
and with the use directive. If you do the latter, make sure you put
import matplotlib
matplotlib.use('YourBackend')
near the top of your main driver code, before you import pylab or any
other modules which import it. You should also do this only in one
place in your code. If you try and do it after you import pylab, and
your backend is already set to something else from your rc files, your
code will break .
JDH
|