|
From: Friedrich R. <fri...@gm...> - 2011-11-10 03:44:46
|
Hi,
as announced on the devel list here my report on "my" Bus error.
I first noticed the Bus error with a freshly compiled version from
today's git. A ``import matplotlib.figure`` was sufficient to produce
the bus error. ``python2.6 -v`` showed that it appears when
matplotlib.ft2font is imported, dynamically loaded from
matplotlib/ft2font.so.
I don't know much about this C++ stuff (ft2font.cpp), but I did a
bisect. Unfortunately, it seems (to me) that bisect acts on the
timeline, not respecting the branch structure, hence it gets it a bit
wrong, at least not right enough to enable me to find the offending
commit.
``git bisect`` finds "some" first bad commit, but due to the commits
in other branches after the first real bad commit it gets it a bit
wrong. The binary search then skips too far.
Nevertheless, I found that e05c2fa32f0fc31 fails, its parent cb609d5
fails too. The very first ancestor of this tree in 2011 I can find is
05631088 (2011-02-20): That one succeeds. But it has some nonstandard
setupext.py. So my test script for ``git bisect run`` cannot be
applied. Its only child is df25e31309b, with a standard setupext.py:
It succeeds.
git bisect seems to work on the full timeline, so it's useless here.
Manual bisecting (using gitk on cb609d5415e): 9a93a5c4 (2011-02-24)
fails; 2ab8582f (2011-02-21) fails; df25e3130 (2011-02-20, the merge
of 0.98 into 0.99) succeeds (see above). 13894992 (2011-02-20, the
merge of 0.99 into 1.0) fails.
So I conclude the failure is introduced somewhere in the 0.99 branch.
Compiling randomly while searching the history: e38440f2 (2010-08-18) fails.
A git blame _src/ft2font.cpp shows that most lines are due to Michael
Droettboom in 6b643862. Unfortunately this is just "Standardizing
formatting of C/C++ code."
The 1.0.0 release 668a769fb fails.
Finally testing the 6b643862 ("Standardizing formatting [...]",
2010-06-24): fails
The next one with modifications to ft2font.cpp is b5ce84214f2
(2010-06-10): fails, its predecessor 97b98e33c: fails.
Next with modifications to ft2font.cpp is 7c228264e (2010-06-04):
fails, its predecessor e8f143c78: fails.
Next one is 857adaee2 (2010-04-16): fails, its predecessor 5a9d580b81: fails.
There is no other modification to ft2font.cpp apparently in 2010 on this branch.
Btw, python2.6-32 signals "Bus error", while python-64 exits with
"Abort trap". The Python is self-compiled Python 2.6:
Python 2.6.5 (r265:79063, Jul 18 2010, 12:14:53)
[GCC 4.2.1 (Apple Inc. build 5659)] on darwin
I'm building by patching setupext.py to include /usr/local/.
Can anyone maybe provide with a pointer what I should try to sort this
out, aside of updating my freetype2 (what shouldn't count as a
solution, it should just work also with not fully-recent freetype2).
I'm not sure if I'm doing something stupid wrong, but since it
succeeds before the 0.99 branch is merged in I suspect something
non-trivial.
I wonder why I did not notice this before on my machine. Admittedly,
I did not compile in 2011 at all, I think. But in Autumn 2010, I did,
and with success. So I wonder how this error was making its way
around my machine? I remember that the git mirror of the svn was no
longer maintained the time I last worked on matplotlib.
Friedrich
|
|
From: Benjamin R. <ben...@ou...> - 2011-11-10 04:32:42
|
On Wednesday, November 9, 2011, Friedrich Romstedt <
fri...@gm...> wrote:
> Hi,
>
> as announced on the devel list here my report on "my" Bus error.
>
> I first noticed the Bus error with a freshly compiled version from
> today's git. A ``import matplotlib.figure`` was sufficient to produce
> the bus error. ``python2.6 -v`` showed that it appears when
> matplotlib.ft2font is imported, dynamically loaded from
> matplotlib/ft2font.so.
>
> I don't know much about this C++ stuff (ft2font.cpp), but I did a
> bisect. Unfortunately, it seems (to me) that bisect acts on the
> timeline, not respecting the branch structure, hence it gets it a bit
> wrong, at least not right enough to enable me to find the offending
> commit.
>
> ``git bisect`` finds "some" first bad commit, but due to the commits
> in other branches after the first real bad commit it gets it a bit
> wrong. The binary search then skips too far.
>
> Nevertheless, I found that e05c2fa32f0fc31 fails, its parent cb609d5
> fails too. The very first ancestor of this tree in 2011 I can find is
> 05631088 (2011-02-20): That one succeeds. But it has some nonstandard
> setupext.py. So my test script for ``git bisect run`` cannot be
> applied. Its only child is df25e31309b, with a standard setupext.py:
> It succeeds.
>
> git bisect seems to work on the full timeline, so it's useless here.
> Manual bisecting (using gitk on cb609d5415e): 9a93a5c4 (2011-02-24)
> fails; 2ab8582f (2011-02-21) fails; df25e3130 (2011-02-20, the merge
> of 0.98 into 0.99) succeeds (see above). 13894992 (2011-02-20, the
> merge of 0.99 into 1.0) fails.
>
> So I conclude the failure is introduced somewhere in the 0.99 branch.
>
> Compiling randomly while searching the history: e38440f2 (2010-08-18)
fails.
>
> A git blame _src/ft2font.cpp shows that most lines are due to Michael
> Droettboom in 6b643862. Unfortunately this is just "Standardizing
> formatting of C/C++ code."
>
> The 1.0.0 release 668a769fb fails.
>
> Finally testing the 6b643862 ("Standardizing formatting [...]",
> 2010-06-24): fails
>
> The next one with modifications to ft2font.cpp is b5ce84214f2
> (2010-06-10): fails, its predecessor 97b98e33c: fails.
>
> Next with modifications to ft2font.cpp is 7c228264e (2010-06-04):
> fails, its predecessor e8f143c78: fails.
>
> Next one is 857adaee2 (2010-04-16): fails, its predecessor 5a9d580b81:
fails.
>
> There is no other modification to ft2font.cpp apparently in 2010 on this
branch.
>
> Btw, python2.6-32 signals "Bus error", while python-64 exits with
> "Abort trap". The Python is self-compiled Python 2.6:
>
> Python 2.6.5 (r265:79063, Jul 18 2010, 12:14:53)
> [GCC 4.2.1 (Apple Inc. build 5659)] on darwin
>
> I'm building by patching setupext.py to include /usr/local/.
>
> Can anyone maybe provide with a pointer what I should try to sort this
> out, aside of updating my freetype2 (what shouldn't count as a
> solution, it should just work also with not fully-recent freetype2).
> I'm not sure if I'm doing something stupid wrong, but since it
> succeeds before the 0.99 branch is merged in I suspect something
> non-trivial.
>
> I wonder why I did not notice this before on my machine. Admittedly,
> I did not compile in 2011 at all, I think. But in Autumn 2010, I did,
> and with success. So I wonder how this error was making its way
> around my machine? I remember that the git mirror of the svn was no
> longer maintained the time I last worked on matplotlib.
>
> Friedrich
>
Friedrich, just curious. Is your Git mpl repo a clean clone from
github.com/matplotlib and *not* from astraw's experimental repo, right? I
haven't had issues with bisect before and so I wonder if somehow you might
have rebased astraw's repo with mpl's repo, which could have introduced
issues?
Just speculating out loud.
Ben Root
|
|
From: Michael D. <md...@st...> - 2011-11-10 16:01:13
|
Can you get a traceback from gdb? The following should do it:
gdb python2.6
at the gdb prompt, type "run", then at the Python prompt, reproduce the
error using "import matplotlib.figure". It should crash, then type "bt"
to get a traceback. That may illustrate the source of the error.
Also of note, when using bisect -- the distutils build doesn't always
rebuild enough if only header files change. I recommend clearing out
the build directory before each compile when using bisect to track down
a C++-related change.
Mike
On 11/09/2011 10:44 PM, Friedrich Romstedt wrote:
> Hi,
>
> as announced on the devel list here my report on "my" Bus error.
>
> I first noticed the Bus error with a freshly compiled version from
> today's git. A ``import matplotlib.figure`` was sufficient to produce
> the bus error. ``python2.6 -v`` showed that it appears when
> matplotlib.ft2font is imported, dynamically loaded from
> matplotlib/ft2font.so.
>
> I don't know much about this C++ stuff (ft2font.cpp), but I did a
> bisect. Unfortunately, it seems (to me) that bisect acts on the
> timeline, not respecting the branch structure, hence it gets it a bit
> wrong, at least not right enough to enable me to find the offending
> commit.
>
> ``git bisect`` finds "some" first bad commit, but due to the commits
> in other branches after the first real bad commit it gets it a bit
> wrong. The binary search then skips too far.
>
> Nevertheless, I found that e05c2fa32f0fc31 fails, its parent cb609d5
> fails too. The very first ancestor of this tree in 2011 I can find is
> 05631088 (2011-02-20): That one succeeds. But it has some nonstandard
> setupext.py. So my test script for ``git bisect run`` cannot be
> applied. Its only child is df25e31309b, with a standard setupext.py:
> It succeeds.
>
> git bisect seems to work on the full timeline, so it's useless here.
> Manual bisecting (using gitk on cb609d5415e): 9a93a5c4 (2011-02-24)
> fails; 2ab8582f (2011-02-21) fails; df25e3130 (2011-02-20, the merge
> of 0.98 into 0.99) succeeds (see above). 13894992 (2011-02-20, the
> merge of 0.99 into 1.0) fails.
>
> So I conclude the failure is introduced somewhere in the 0.99 branch.
>
> Compiling randomly while searching the history: e38440f2 (2010-08-18) fails.
>
> A git blame _src/ft2font.cpp shows that most lines are due to Michael
> Droettboom in 6b643862. Unfortunately this is just "Standardizing
> formatting of C/C++ code."
>
> The 1.0.0 release 668a769fb fails.
>
> Finally testing the 6b643862 ("Standardizing formatting [...]",
> 2010-06-24): fails
>
> The next one with modifications to ft2font.cpp is b5ce84214f2
> (2010-06-10): fails, its predecessor 97b98e33c: fails.
>
> Next with modifications to ft2font.cpp is 7c228264e (2010-06-04):
> fails, its predecessor e8f143c78: fails.
>
> Next one is 857adaee2 (2010-04-16): fails, its predecessor 5a9d580b81: fails.
>
> There is no other modification to ft2font.cpp apparently in 2010 on this branch.
>
> Btw, python2.6-32 signals "Bus error", while python-64 exits with
> "Abort trap". The Python is self-compiled Python 2.6:
>
> Python 2.6.5 (r265:79063, Jul 18 2010, 12:14:53)
> [GCC 4.2.1 (Apple Inc. build 5659)] on darwin
>
> I'm building by patching setupext.py to include /usr/local/.
>
> Can anyone maybe provide with a pointer what I should try to sort this
> out, aside of updating my freetype2 (what shouldn't count as a
> solution, it should just work also with not fully-recent freetype2).
> I'm not sure if I'm doing something stupid wrong, but since it
> succeeds before the 0.99 branch is merged in I suspect something
> non-trivial.
>
> I wonder why I did not notice this before on my machine. Admittedly,
> I did not compile in 2011 at all, I think. But in Autumn 2010, I did,
> and with success. So I wonder how this error was making its way
> around my machine? I remember that the git mirror of the svn was no
> longer maintained the time I last worked on matplotlib.
>
> Friedrich
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
|
|
From: Friedrich R. <fri...@gm...> - 2011-11-10 23:12:32
|
2011/11/10 Michael Droettboom <md...@st...>: > Can you get a traceback from gdb? The following should do it: > > gdb python2.6 For some reason I cannot load python2.6 from gdb: This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .. done (gdb) run Starting program: /Library/Frameworks/Python.framework/Versions/2.6/bin/python2.6 Reading symbols for shared libraries +. done Program received signal SIGTRAP, Trace/breakpoint trap. 0x8fe01030 in __dyld__dyld_start () (gdb) The program is running. Exit anyway? (y or n) y This is the same as superuser. I instead fell "back" to dtrace (``dapptrace -b 100m -U -p <pid>``, which requires superuser privilegue, hence my note above). dtrace is a pretty decent tool to trace function calls etc. on the kernel level, and ships with OS X 10.6. I send the gzip'ed output attached to an off-list mail, the full log is 4.8 MB even gzip'ed. I found it by first inspection useful to grep for ft2font.so and libfreetype. Make your own conclusions. From what my naked eye can see, freetype itself seems to be not the problem. The last thing freetype appears to do is to return from FT_GetSfnt_Name. Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-10 22:16:17
|
2011/11/10 Benjamin Root <ben...@ou...>: > On Wednesday, November 9, 2011, Friedrich Romstedt > <fri...@gm...> wrote: >> ``git bisect`` finds "some" first bad commit, but due to the commits >> in other branches after the first real bad commit it gets it a bit >> wrong. The binary search then skips too far. > > Friedrich, just curious. Is your Git mpl repo a clean clone from > github.com/matplotlib and *not* from astraw's experimental repo, right? I > haven't had issues with bisect before and so I wonder if somehow you might > have rebased astraw's repo with mpl's repo, which could have introduced > issues? No issues like that, clean clone (although I forked it and then cloned that). For the bisect, without further reading it'll be speculation. I guess, bisecting on the basis of branches is difficult, just imagine you have merged in some branch. Since you can specify only one "good" commit as starting point, if the merge occured later, the whole other branch would have to be considered for bisecting. I guess that's not what bisect does. The machanism, as I imagine it, to make bisect not work, is like this: The good commit is on branch A, bad commits are on branch B, and they are intermangled in the time line. So bisect might just hit always, up to some point, the good commits, concluding that everything between them is good too, what is wrong (because only those from A are good, the B ones not). Furthermore, Michael is right, while bisecting I didn't ``rm build/`` properly; I just did ``python2.6 setup.py clean``. Later on I did that properly, after I noticed that the offending commit reported by bisect actually runs cleanly. I then wrote a test script for ``git bisect run`` that applies all those steps, so I couldn't keep forgetting it any longer :-) Friedrich |
|
From: Michael D. <md...@st...> - 2011-11-11 01:38:47
|
On 11/10/2011 05:16 PM, Friedrich Romstedt wrote: > Furthermore, Michael is right, while bisecting I didn't ``rm build/`` > properly; I just did ``python2.6 setup.py clean``. Later on I did that > properly, after I noticed that the offending commit reported by bisect > actually runs cleanly. I then wrote a test script for ``git bisect > run`` that applies all those steps, so I couldn't keep forgetting it > any longer :-) Friedrich Running bisect in this way, did you arrive at a more conclusive determination about which commit may have introduced the problem? Mike |
|
From: Friedrich R. <fri...@gm...> - 2011-11-11 03:42:31
|
2011/11/11 Michael Droettboom <md...@st...>: > On 11/10/2011 05:16 PM, Friedrich Romstedt wrote: >> Furthermore, Michael is right, while bisecting I didn't ``rm build/`` >> properly; I just did ``python2.6 setup.py clean``. Later on I did that >> properly, after I noticed that the offending commit reported by bisect >> actually runs cleanly. I then wrote a test script for ``git bisect >> run`` that applies all those steps, so I couldn't keep forgetting it >> any longer :-) Friedrich > Running bisect in this way, did you arrive at a more conclusive > determination about which commit may have introduced the problem? No, I didn't, but I found it manually (kind-of), while trying to find anchor points for git bisect: If you use gitk on cb609d5415e, and scroll down to "Merge branch 'v0.99.x' into v1.0.x" (13894992d8), you'll see a couple of merges. Here, up to the merge into v1.0.x, things work. In the v1.0.x branch, everything down to the beginning of 2010 [sic] what I tested failed, including the 1.0.0 release 668a769fb. I was wrong in my conclusion in my first mail that it's the v0.99.x branch, which introduces the bug, it's apparently the v1.0.x branch. I was planning to check some early commit after some merges in 2009 on the v1.0.x branch, after 1982fba643, and the first commit in 2010 on the v1.0.x branch, bbcb85a663bbb. If one is good and one is bad I'd have let it run bisect overnight. 1982fba643 (the first unmerged, see above) is not properly updated for new libpng. The first out of 10/2009 does not work either, for same reason. The first out of 11/2009 does not work too. The first of 12/2009 also not. The first in 01/2010 fails to compile too. First of 02/2010: fails compiling. First of 03/2010: compiles, and fails on the import level with Bus error. So I'm screwed for today. I have to dig out my patch for that libpng issue and incorporate it into the test script. So far the bug arised < 03/2010. sic. sigh. Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-11 22:34:42
|
2011/11/11 Michael Droettboom <md...@st...>: > Running bisect in this way, did you arrive at a more conclusive > determination about which commit may have introduced the problem? Yes, do you know Final Fantasy? "You gonna loose it ... Tracking ... Tracking ... Found it." af9954d46e. I don't know which part of that commit breaks it, maybe you can have a look? It's a commit by you. Maybe it's just the evil font cache. :-) It's not the ft2font, notably, this was apparently imported properly; it's just some initialisation code of matplotlib that seems to fail while importing matplotlib.figure. I verified clearly; the commit mentioned fails, and its predecessor succeeds. I did patch the _png.cpp to make it work; it didn't comply with libpng-1.4 that time. I can upload the branches for testing the two commits to my repo. So far, Friedrich |
|
From: Michael D. <md...@st...> - 2011-11-11 22:41:45
|
Very odd. Given there's no C++ changes here, I'm very surprised. Shooting in the dark here: does deleting ~/.matplotlib/fontList.cache help at all? Mike On 11/11/2011 05:34 PM, Friedrich Romstedt wrote: > 2011/11/11 Michael Droettboom<md...@st...>: >> Running bisect in this way, did you arrive at a more conclusive >> determination about which commit may have introduced the problem? > Yes, do you know Final Fantasy? "You gonna loose it ... Tracking ... > Tracking ... Found it." af9954d46e. > > I don't know which part of that commit breaks it, maybe you can have a > look? It's a commit by you. Maybe it's just the evil font cache. :-) > > It's not the ft2font, notably, this was apparently imported properly; > it's just some initialisation code of matplotlib that seems to fail > while importing matplotlib.figure. > > I verified clearly; the commit mentioned fails, and its predecessor succeeds. > > I did patch the _png.cpp to make it work; it didn't comply with > libpng-1.4 that time. > > I can upload the branches for testing the two commits to my repo. > > So far, > Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-12 00:06:14
|
2011/11/11 Michael Droettboom <md...@st...>: > Very odd. Given there's no C++ changes here, I'm very surprised. Shooting > in the dark here: does deleting ~/.matplotlib/fontList.cache help at all? Nope :-( I'm pretty much surprised too. I wonder why noone else has this issue? I replaced the font_manager.py with that of the good commit and it still fails. I reverted than back to the bad font_manager.py, and replaced the mathtext.py with that of the good one. And it fails .... I then replaced both font_manager.py as well as mathtext.py with the good ones, and it .... still fails! I could not believe this and checked out the good commit once more, and this one .... fails now too ... I have no idea what's going on here. To me this looks like black magic. I didn't confuse the commits, I have logs where the good commit succeeds. Are there any other caches involved in matplotlib? I don't know of any. I cleaned the build/ directory properly. A also nuked the site-packages/matplotlib in between. The build+run logs of the "good" commit before and after the "magic trigger" are diff'ed exactly the same with the exception that in one python2.6 -v continues with lines.py and in the other not. :-( Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-12 14:23:48
|
To give the valuable information in the beginning: It appears it
cannot handle /Library/Fonts/NISC18030.ttf. It tries to load it via
ft2font.FT2Font() but that gives the Bus error. The ttf file dates to
28 Jan 2010. It is 7108232 bytes large. I don't know why it cannot
be loaded.
Until it had to recreate the fontcache, it never tried to load that.
It appears to me I used matplotlib since before that file appeared, or
at least matplotlib never tried to load it, or succeeded before in
loading it. The "first bad" commit mentioned in the last email(s) was
that one introducing a mechanism to throw the fontcache away if the
matplotlib version number does not match the version number stored in
the fontcache.
2011/11/12 Friedrich Romstedt <fri...@gm...>:
> 2011/11/11 Michael Droettboom <md...@st...>:
>> Very odd. Given there's no C++ changes here, I'm very surprised. Shooting
>> in the dark here: does deleting ~/.matplotlib/fontList.cache help at all?
I guess it might have to do with it: Removing the font cache might
have made the "good" commit 8c200dab4680efd5201 fail. Or rather
keeping the old font cache might have made the "good" commit not fail
in the beginning. Whatever the causal relation is, I will try to
investigate playing with the existence of the font cache.
I want to verify that the existence of the fontcache file influences
the test result.
-== Trying to verify the influcence of the fontcache file ==-
> I could not believe this and checked out the good commit once more,
> and this one .... fails now too ...
Verifying that there's no further magic, after a clean reboot (you
never know, and I went asleep), I'm trying both commits again, without
the font cache in action:
"good" 8c200dab4680efd5201: Bus error.
"bad" af9954d46e5d: Bus error.
So everything like yesterday evening, without the font cache.
Putting the font cache back into action now (from the moved file).
Keeping the moved file for reference (i.e., copying it).
"good" 8c200dab4680efd5201: Succeeding.
"bad" af9954d46e5d: Bus error.
So the existence of the font cache file makes the apparently "good"
commit succeeding, althought it probably shouldn't succeed. It is a
pity that it's not vice versa: That the existence of the font cache
file would make the "bad" commit fail, s.t. it (and the current
matplotlib) would succeed without it.
-== Bisecting again, this time without font cache file ==-
Removing the font cache file again (keeping the copy).
-= Trying to find some good commit in the past =-
Trying 1982fba643 (one from 2009): Bus error. This commit's test run
differs from the previous Bus errors by the following additional lines
from python2.6 -v:
# /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/commands.pyc
matches /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/commands.py
import commands # precompiled from
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/commands.pyc
This happens directly after importing ft2font.so.
Nevertheless it fails, so going further back to past ...
Trying the v0.99.0rc1 ac387d18b: Bus error.
This cannot be! It is a commit back from 2009, and I used matplotlib that time!
Still keeping trying to find a commit not exhibiting the Bus error ...
-== The first commit from 2009 ==-
Trying the first from 2009 1dcaee87fc: Bus error.
I grep'ed for "import .*commands" in the lib/ folder, and found that
it is only used by font_manager.py, and in that file on non-win32
platforms only the body of the get_fontconfig_fonts imports it. I
will augment that function body with prints to track it down. It
appears it is the fontmanager that crashes.
Probably it issues some commands that makes Python crash, and when the
fontcache existed and was not versionchecked (before the former "bad"
commit), it was simply loaded. Now it tried to rebuild it and fails
in that. It was running the time I used it because I also had a
working font cache that time maybe.
For some reason the log output did not appear in the test. Running
the test manually shows the log output. What? Apparently it is
missing because of some buffering issue. If I pass the output through
a pipe, like when logging, apparently Python switches buffering.
It might well be that the buffering truncates the whole -v output.
Apparently the -v output goes to sys.stderr, and the sys.stdout is
buffered when piping.
Patching sys.stderr: ``python2.6 -v -c "import sys; sys.stderr =
open('x.txt', 'w'); import matplotlib.figure"``. The -v output up to
the Python 2.6.5 statement goes to stdout, after that it goes to
stderr apparently. The output to x.txt is truncated in the middle of
a sentence:
# /Library/Frameworks/Python.framework/Versions/2.6/lib/python
and does not contain the ``commands`` import log. Turning buffering off:
python2.6 -v -c "import sys; sys.stderr = open('x.txt', 'w', 0);
import matplotlib.figure"
it ends again at the ``commands`` import.
It is noteworthy that the function augmented by the logging statements
exits cleanly. The function is called in whole matplotlib only once,
in font_manager.py, in findSystemFonts(). There find SystemFonts()
loops over the return value, which is {}, so no looping at all.
The crash appears in FontManager.__init__() somewhere between loading
the ttffiles and loading the afmfiles.
The crash appears in createFontList().
Setting matplotlib's ``matplotlib.verbose`` to level 'debug_annoying'
via the cmdline script yields that the bus occures after the following
last log message:
createFontDict: /Library/Fonts/NISC18030.ttf
Augmenting the createFontList() function by print statements yields this:
createFontDict: /Library/Fonts/Arial Narrow.ttf
Friedrich: ft2font.FT2Font(/Library/Fonts/Arial Narrow.ttf) ...
Friedrich: ft2font.FT2Font(/Library/Fonts/Arial Narrow.ttf) succeeded.
createFontDict: /Library/Fonts/NISC18030.ttf
Friedrich: ft2font.FT2Font(/Library/Fonts/NISC18030.ttf) ...
./runtest.sh: line 1: 7150 Bus error python2.6 -v -u -c
"import matplotlib; matplotlib.verbose.set_level('debug-annoying');
import matplotlib.figure" 2>&1
So it appears it cannot handle /Library/Fonts/NISC18030.ttf.
Any ideas?
Friedrich
|
|
From: Friedrich R. <fri...@gm...> - 2011-11-12 18:47:27
|
2011/11/12 Friedrich Romstedt <fri...@gm...>: > To give the valuable information in the beginning: It appears it > cannot handle /Library/Fonts/NISC18030.ttf. It tries to load it via > ft2font.FT2Font() but that gives the Bus error. The ttf file dates to > 28 Jan 2010. It is 7108232 bytes large. I don't know why it cannot > be loaded. A quick googling of "NISC18030.ttf matplotlib" yields this interesting result: http://groups.google.com/group/sage-devel/browse_thread/thread/2c538915abc99946 >From there: "[...] and hope for John Hunter to be able to replicate the problem and come up with something better in the next few weeks (or I'll come back to it later)." So I think we have at least replicated it. What troubles me is that I was a 10.6 user from the beginning since, say, mid 2010. So my initial working fontcache was built that time. On 10.6. How do I analyse if the respective TTF is in the font cache (file)? OK, I loaded the font manager from the pickle just via matplotlib from 2009; and /Library/Fonts/NISC18030.ttf is not amongst ``matplotlib.font_manager.fontManager.ttffiles``. I didn't know until today that the CXX appearing in the build process actually refers to a package and is not just an alteration of C++ to make it more shell-friendly, as I believed until now. Maybe someone with some insights in CXX can help? I see that I can do that too, but it'll take probably much longer than when you, dear recipient, do it. >From the post referenced above it *seems* that it might have to do something with creating a Python Int from NULL? But since my knowledgability is low on CXX, as mentioned, I would not give my word for this. The explanation why it didn't try to index that file follows: $ stat -f "...." /Library/Fonts/NISC18030.ttf Last accessed or modified: 1321107464 = 12 Nov 2011 Last changed: 1264652963 = 28 Jan 2010 Time of Birth: 1292365840 = 14 Dec 2010 There you go. I guess some Mac OS X 10.6 update (probably a combo update) installed it. I will not go into details here of checking the pax files or something, I just think we see that it was born on my Mac later than I started using matplotlib. I never deleted the fontcache while using matplotlib, but I vaguely remember that I had a problem with another user than "me". I remember also some post on a failing matplotlib on OS X 10.6; which we were not able to solve, but I'll look into that now. Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-12 20:26:13
|
2011/11/12 Friedrich Romstedt <fri...@gm...>: > 2011/11/12 Friedrich Romstedt <fri...@gm...>: >> To give the valuable information in the beginning: It appears it >> cannot handle /Library/Fonts/NISC18030.ttf. It tries to load it via >> ft2font.FT2Font() but that gives the Bus error. The ttf file dates to >> 28 Jan 2010. It is 7108232 bytes large. I don't know why it cannot >> be loaded. > > A quick googling of "NISC18030.ttf matplotlib" yields this interesting > result: http://groups.google.com/group/sage-devel/browse_thread/thread/2c538915abc99946 And this: http://trac.sagemath.org/sage_trac/ticket/7022. Actually I got the above from that. >From there (username is "was", probably William Stein): "All it does is take the plane vanilla matplotlib-0.99.1.spkg spkg and add a little script that simply rebuilds f2font.so again using *exactly* the same command lines used by distutils to build that extension. That's it. For some reason -- probably involving environment variables (?) -- this fixes the problem. I consider this a temporary 1-sage release solution until the matplotlib developers (or me) come up with a real fix." I downloaded that script, and reproduced the functionality with my framework Python 2.6. The resulting ft2font.so differs binarily from the original ft2font, and ... indeed it runs smoothly with that ft2font.so. What the hell is happening here? Is it really CXX related? What environmental variables are set by the mighty distutils? Feel free to start a new thread on -devel as soon as you have some solution or idea :-) Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-12 21:51:19
|
2011/11/12 Friedrich Romstedt <fri...@gm...>: > 2011/11/12 Friedrich Romstedt <fri...@gm...>: >> A quick googling of "NISC18030.ttf matplotlib" yields this interesting >> result: http://groups.google.com/group/sage-devel/browse_thread/thread/2c538915abc99946 > > And this: http://trac.sagemath.org/sage_trac/ticket/7022. Actually I > got the above from that. > > From there (username is "was", probably William Stein): "All it does > is take the plane vanilla matplotlib-0.99.1.spkg spkg and add a little > script that simply rebuilds f2font.so again using *exactly* the same > command lines used by distutils to build that extension. That's it. > For some reason -- probably involving environment variables (?) -- > this fixes the problem. I consider this a temporary 1-sage release > solution until the matplotlib developers (or me) come up with a real > fix." > > I downloaded that script, and reproduced the functionality with my > framework Python 2.6. The resulting ft2font.so differs binarily from > the original ft2font, and ... indeed it runs smoothly with that > ft2font.so. > > What the hell is happening here? Is it really CXX related? What > environmental variables are set by the mighty distutils? I made up a patched gcc-4.2 bash script that puts the ``env`` output together with the command to run in logfiles based on timecode. The result is that the only differences in environmental variables are: 1) PLAT=macosx-10.5-intel 2) MACOSX_DEPLOYMENT_TARGET=10.5 I don't know anything about (1). (2) was set at compile time and is correct, but I will check if it affects the thing. I will try to reproduce the original ft2font.so generated by distutils by manual commands, to see what ingredience makes it fail in the end. PLAT has apparently no effect on the byte file size at least. MACOSX_DEPLOYMENT_TARGET makes the file size increase to about the size of the original file. The files do not match binary, I guess there's a time stamp somewhere and a compression involved. Even when the file sizes matched by byte, the contents still differ binary. I will focus on whether MACOSX_DEPLOYMENT_TARGET breaks it or not. Unsetting MACOSX_DEPLOYMENT_TARGET and using hence the default ``10.6`` makes it work. Recompiling manually with MACOSX_DEPLOYMENT_TARGET=10.5 and removing the fontcache generated by the last run makes it fail. So to me this looks pretty much like a gcc-4.2 bug. MACOSX_DEPLOYMENT_TARGET has nothing todo with the source code. It *should* just add a legacy layer. What it apparently does is to compile for 10.5 instead, and maybe add a legacy layer for 10.6? Just speculating. So I think we found it, but we cannot solve it apparently. Only thing is to build libraries for 10.6 with the python.org OS X 10.6-only version, so that we can set the deployment target to 10.6 when building the library (matplotlib). I'm cc'ing the sage people manually since I'm not on sage-devel and don't need it at all. William, Ondrej, FYI. So far, Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-12 22:49:18
|
This is my summary of what I found out. 2011/11/12 Friedrich Romstedt <fri...@gm...>: > So to me this looks pretty much like a gcc-4.2 bug. > > MACOSX_DEPLOYMENT_TARGET has nothing todo with the source code. It > *should* just add a legacy layer. What it apparently does is to > compile for 10.5 instead, and maybe add a legacy layer for 10.6? Just > speculating. > > So I think we found it, but we cannot solve it apparently. > > Only thing is to build libraries for 10.6 with the python.org OS X > 10.6-only version, so that we can set the deployment target to 10.6 > when building the library (matplotlib). Hi Mike, I think it might be that there now, in 2011, with OS X 10.6, there is no "good" commit anymore. The mechanism was as follows, for those commits which were apparently "good": The fontcache was loaded without any change. For the "bad" commits, it was attempted to be recreated, but this lead to Bus error, and hence it was not written. When the fontcache is missing, all commits that incorporate the source code leading to reading that ttf file fail. They didn't fail until the deployment target bug was introduced into gcc-4.2. They also didn't fail until there was a ttf file present that triggers probably a special code route. It probably might even work with gcc-4.0? I consider that the source code offending to the bug is in matplotlib from the beginning, as it isn't apparently a programming mistake. But still it might be that you find something that triggers it and that can be solved. If it would not appear with gcc-4.0 that would explain why we have so little amount of reports on that issue. It seems when using the python.org Python, which is, probably with the exception of 10.6-only Python, compiled with gcc-4.0, suffices to circumvent the bug. I'm not interested in using gcc-4.0, since I compiled libpng, libjpg, libtiff, libfreetype etc.pp. using gcc-4.2. I, for my own purpose, will probably recompile only Python without the 10.5 target. This will sort it out. But I don't know if that is a solution for packagers always. I think the offending binary instruction is either in ft2font.so or in libfreetype.dylib. In the former case, it might result from ft2font.cpp or from the CXX stuff I didn't understand. In the latter case, upgrading libfreetype might help, but not likely, since to let the error propagate to there it must depend on the deployment target variable used to compile ft2font.so (since the whole Bus error depends on that). So it is not proabable that the offending instruction is in libfreetype.dylib. Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-13 12:51:14
|
2011/11/12 Friedrich Romstedt <fri...@gm...>: > This is my summary of what I found out. Some small follow-up regarding what might trigger the bug: http://comments.gmane.org/gmane.comp.python.matplotlib.general/1115 is a report by Chris Barker indicating as a side-effect that NISC18030.ttf was present even in 2005. It "could not be loaded" that time. I.e. it didn't cause a Bus error. That it was attempted to be loaded indicates that the fontcache was to be rebuilt that time, so the file must be present. http://code.google.com/p/anki/issues/detail?id=560 indicates that, in 2008, on a 10.5 OS X the file could "not be loaded" too. Again, just the attempt implies that the fontcache was rebuilt. So the file must be present, except if the font_manager.py logic of early 2009 is the result of a dramatic change since then. It appears very probable that the Bus error is not triggered on 10.5, but only on 10.6, when building with MACOSX_DEPLOYMENT_TARGET=10.5. It remains unclear starting from which patch version of 10.6 it appears, and also if it is a gcc-4.2 only issue. In the case it is gcc-4.2 related, it would explain the rarity, because gcc-4.2 was introduced in 10.6, so who would build with 10.5 deployment target? If 10.5 is targeted, you mostly need to use gcc-4.0 anyway. (This is something I overlooked myself for my own decision until now.) Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2011-11-13 23:05:36
|
2011/11/12 Friedrich Romstedt <fri...@gm...>: > $ stat -f "...." /Library/Fonts/NISC18030.ttf > Last accessed or modified: 1321107464 = 12 Nov 2011 > Last changed: 1264652963 = 28 Jan 2010 > Time of Birth: 1292365840 = 14 Dec 2010 The file might have been created earlier; the date 14 Dec 2010 is the day where I reinstalled my Mac after a HDD crash from backup. I have checked if I have backups older than that on one of the Time Machine disks but that is negative. But since Time Machine uses hardlinks to link the files between different backups the file backed up in the oldest backup from 27 Dec 2010 might have still the date of birth we're looking for. Assumed it didn't issue a completely new backup after restoring from the old one. I'm interested in this because I wonder how I ever got a working fontcache. It might be that I compiled matplotlib first differently, with python.org Python, hence gcc-4.0, and if we assume that it works under gcc-4.0, I would have ended up with a proper fontcache, and was free to compile with gcc-4.2 + 10.5 deployment target. Then the fontcache lived on all that years since Mid 2009 untouched. Until now, where it attempted to recreate it, with the gcc-4.2 + 10.5 targeted matplotlib, failing on that. I guess that the NISC18030.ttf in the backup has the date of birth of the first backup ever, meaning that it was probably present from the very beginning. This is suggested by the posts back to 2005, where the file existed on that ``bsd`` machine of William Stein, iirc. I strongly believe I just got a working intermediate matplotlib, which created the everlasting (or not) fontcache. Friedrich |
|
From: Michael D. <md...@st...> - 2011-11-14 13:13:59
|
Thanks for all the time you've devoted to this. It does look like possibly some kind of compiler bug. The font loads and renders fine on Linux, for what it's worth (just as a data point). To confirm this theory: if you move NISC1803.ttf somewhere temporary, delete ~/.matplotlibrc/fontList.cache and then import matplotlib, do you get the crash? That at least confirms that loading this font file triggers the bug (wherever the bug may be). Test with matplotlib 1.1.0 or git master so we have a sense of the current behavior. Mike On 11/13/2011 06:05 PM, Friedrich Romstedt wrote: > 2011/11/12 Friedrich Romstedt<fri...@gm...>: >> $ stat -f "...." /Library/Fonts/NISC18030.ttf >> Last accessed or modified: 1321107464 = 12 Nov 2011 >> Last changed: 1264652963 = 28 Jan 2010 >> Time of Birth: 1292365840 = 14 Dec 2010 > The file might have been created earlier; the date 14 Dec 2010 is the > day where I reinstalled my Mac after a HDD crash from backup. > > I have checked if I have backups older than that on one of the Time > Machine disks but that is negative. But since Time Machine uses > hardlinks to link the files between different backups the file backed > up in the oldest backup from 27 Dec 2010 might have still the date of > birth we're looking for. Assumed it didn't issue a completely new > backup after restoring from the old one. > > I'm interested in this because I wonder how I ever got a working fontcache. > > It might be that I compiled matplotlib first differently, with > python.org Python, hence gcc-4.0, and if we assume that it works under > gcc-4.0, I would have ended up with a proper fontcache, and was free > to compile with gcc-4.2 + 10.5 deployment target. Then the fontcache > lived on all that years since Mid 2009 untouched. Until now, where it > attempted to recreate it, with the gcc-4.2 + 10.5 targeted matplotlib, > failing on that. > > I guess that the NISC18030.ttf in the backup has the date of birth of > the first backup ever, meaning that it was probably present from the > very beginning. This is suggested by the posts back to 2005, where > the file existed on that ``bsd`` machine of William Stein, iirc. I > strongly believe I just got a working intermediate matplotlib, which > created the everlasting (or not) fontcache. |
|
From: Friedrich R. <fri...@gm...> - 2011-11-14 14:04:34
|
2011/11/14 Michael Droettboom <md...@st...>: > Thanks for all the time you've devoted to this. It does look like possibly > some kind of compiler bug. The font loads and renders fine on Linux, for > what it's worth (just as a data point). > > To confirm this theory: if you move NISC1803.ttf somewhere temporary, delete > ~/.matplotlibrc/fontList.cache and then import matplotlib, do you get the > crash? That at least confirms that loading this font file triggers the bug > (wherever the bug may be). Test with matplotlib 1.1.0 or git master so we > have a sense of the current behavior. Hi Mike, the following fonts on my system are offending: /Library/Fonts/NISC18030.ttf /Library/Fonts/AppleMyungjo.ttf /Library/Fonts/Gungseouche.ttf With these fonts made unfindable by matplotlib (:file:`*.ttf_`) it exits cleanly. I will provide with a patch to matplotlib for an rc setting "fonts.bus-error : ...", e.g. ``fonts.bus-error : NISC18030.ttf, AppleMyungjo.ttf, Gungseouche.ttf`` in the next days. It was clear from the beginning (well, from the point I got a handle on it), that loading the font makes the 2009 matplotlib crash. The only question unanswered is where the codepath is that triggers this compiler bug (I think the compiler but hypothesis is not disproven and works well atm). If the code path is in ft2font.cpp, we could (you could) reformulate ft2font.cpp in an equivalent way with the exception that it is not equivalent in crashing. You might want to augment ft2font.cpp by printf() or something to see if the crash appears inside a call to libfreetype or if all those calls return cleanly. To my understanding, since recompiling ft2font.so without MACOSX_DEPLOYMENT_TARGET different from 10.6 helps, ft2font.cpp should be the culprit resp. victim. The only alternative I'm seeing would be that it has to to do with the load mechanism of the dylib, but I deem this rather unlikely. Well, unlikely is not the best word in this context, since all this things here were pretty unlikely. If the codepath is in libfreetype this would be an issue for their list. ... Friedrich |
|
From: Friedrich R. <fri...@gm...> - 2012-03-15 11:31:12
|
Hi, Am 14. November 2011 15:04 schrieb Friedrich Romstedt <fri...@gm...>: > 2011/11/14 Michael Droettboom <md...@st...>: >> Thanks for all the time you've devoted to this. It does look like possibly >> some kind of compiler bug. The font loads and renders fine on Linux, for >> what it's worth (just as a data point). >> >> To confirm this theory: if you move NISC1803.ttf somewhere temporary, delete >> ~/.matplotlibrc/fontList.cache and then import matplotlib, do you get the >> crash? That at least confirms that loading this font file triggers the bug >> (wherever the bug may be). Test with matplotlib 1.1.0 or git master so we >> have a sense of the current behavior. > > Hi Mike, > > the following fonts on my system are offending: > > /Library/Fonts/NISC18030.ttf > /Library/Fonts/AppleMyungjo.ttf > /Library/Fonts/Gungseouche.ttf > > With these fonts made unfindable by matplotlib (:file:`*.ttf_`) it > exits cleanly. > > I will provide with a patch to matplotlib for an rc setting > "fonts.bus-error : ...", e.g. ``fonts.bus-error : NISC18030.ttf, > AppleMyungjo.ttf, Gungseouche.ttf`` in the next days. I just took the time to recompile the whole thingy, including supporting libraries. I used: – libfreetype-2.4.9 – matplotlib-1.1.0 – MACOSX_DEPLOYMENT_TARGET=10.5 – The files noted in the citation above are in place (i.e., accessible as .ttf files) My theory was that a compiler error triggers the error with the font files in question. Because recompiling ft2font.so with a different MACOSX_DEPLOYMENT_TARGET made the crash disappear I supposed that ft2font would trigger that compiler error. It needed to be a compiler error because that environment variable was the only change that made the crash disappear. Now it is the question if with more recent software that error still persists. I have found that this is not the case. I recompiled with the libraries noted above (all compiled from source), and I can successfully import matplotlib.figure. This import previously provoked the crash. So I believe that either I was wrong in some respect, or the more recent software toolchain no longer provokes the crash, because its code changed. Since it works just flawlessly on my system now, I see little need to implement the mechanism for excluding font files from being loaded – if it is not needed I will not code it. Friedrich P.S.: Of course I moved the font cache before, so that it is recreated when importing matplotlib.figure for the first time. P.P.S.: One more difference is that the current Python is not a framework Python anymore, but a regular Python. > It was clear from the beginning (well, from the point I got a handle > on it), that loading the font makes the 2009 matplotlib crash. The > only question unanswered is where the codepath is that triggers this > compiler bug (I think the compiler but hypothesis is not disproven and > works well atm). If the code path is in ft2font.cpp, we could (you > could) reformulate ft2font.cpp in an equivalent way with the exception > that it is not equivalent in crashing. You might want to augment > ft2font.cpp by printf() or something to see if the crash appears > inside a call to libfreetype or if all those calls return cleanly. > > To my understanding, since recompiling ft2font.so without > MACOSX_DEPLOYMENT_TARGET different from 10.6 helps, ft2font.cpp should > be the culprit resp. victim. The only alternative I'm seeing would be > that it has to to do with the load mechanism of the dylib, but I deem > this rather unlikely. Well, unlikely is not the best word in this > context, since all this things here were pretty unlikely. > > If the codepath is in libfreetype this would be an issue for their list. ... > > Friedrich |