From: Geoffrey F. <fu...@ga...> - 2002-09-02 14:45:59
|
Okay, from my inbox, looks like nobody replied to Alan on this topic, but himself. Alan W. Irwin writes: > On Sun, 14 Jul 2002, Alan W. Irwin wrote: > > > ==17017== discard syms in /home/software/plplot_cvs/HEAD/plplot_working3/tmp/drivers/psd_drv.so due to munmap() > > ==17017== discard 0 (0 -> 0) translations in range 0x43ADB000 .. 0x43AE1FFF > > Unable to load driver: psd_drv.so. > > Reason: /usr/local/plplot/lib/plplot5.1.0/data/../drivers/psd_drv.so: cannot open shared object file: No such file or directory > > > Update: The discard sysm and discard 0 messages are legit, but the reason is > wrong. What's going on here is ./drivers/psd_drv.so fails in plplot/tmp so > the plcore code tries a second time in the install location which I had > cleaned out so that fails as well with that resulting error message. If I > simply copy ./drivers/psd_drv.so to the install location, I get the correct > error message: > > Unable to load driver: psd_drv.so. > Reason: /usr/local/plplot/lib/plplot5.1.0/data/../drivers/psd_drv.so: undefined symbol: plRotPhy > > What's going on is that psd_drv.so has some undefined symbols (presumably > plRotPhy is the first of these to be encountered) And I did a search once, which I reported on to the list some months back, which showed there were some others, I forget exactly which ones. It's a small set, but one thing I concluded was that really there could be more, and a family of "driver support utility functions" in libplplot could be useful, and expanded to cover some more cases. Like, for example, all the coord mapping stuff needed for viewport selection and transformation stuff inside interactive drivers, etc. Well, something like that is probably what is happening here with plRotPhy, but anyway, I'm saying, one could easily imagine a formalized and expanded collection of such things, for the purpose of bringing more useful, and similarly-behaving functionality, to all the interactive drivers. There might even be useful stuff that could be factored out from the non-interactive drivers, and stored in a driver-support collection inside libplplot. > from libplplot that cannot > be resolved by the dynamic loader despite the fact that libplplot has > already been dynamically loaded. My best guess for the cause of this is > that python (and java) use dlopen to load libplplot in the first > place, Well, and it's not just things that explicitly use dlopen. There is also ld.so which sets up the initial process image. Similar effects can come into play even in more ordinary settings, which is why all the RTLD_* flags are there, to help you control the way the loader reacts to linked libraries. > and > they do not use the RTLD_GLOBAL flag which can be ORed with the RTLD_NOW or > RTLD_LAZY options for dlopen. The point of RTLD_GLOBAL is "the external > symbols defined in the library will be made available to subsequently loaded > libraries" according to the excellent documentation I found at > http://www.tldp.org/HOWTO/Program-Library-HOWTO/dl-libraries.html. > Presumably, the dynamic loader invoked when libplplot is simply used as a > shared library (i.e., when x??c is invoked) effectively sets this flag so > that libplplot symbols such as plRotPhy can be resolved in psd_drv.so. > > So my tentative conclusion is that the dlopen calls in the python extension > interface (and the java equivalent) forgot to OR in RTLD_GLOBAL with their > dlopen flag (since dlopening a library which in turn dlopens another library > depending on the symbols of the first library is a scenario they probably > didn't think about). I pondered deeply about this before. I'm guessing "forgot" isn't what happened. My conclusion was they probably don't allow this for some annoying excuse relating to security. I remember following some of the linux kernel traffic back when ELF was coming on line, and ld.so was getting its revamp, and David Engel and the EYC were going back and forth on all the issues with LD_LIBRARY_PATH and ld.so.cache and all that. Turns out there are some realy amazingly subtle ways to trojan horse a system that uses shared objects and dynamic linking. So, when looking closely at how Sun expects you to get along with Java and external code, I think the answer to this comumdrum is exemplified in the way that JNI clients interact with the JVM. Note that we do not have to link Java C interface code (JNI code) to "libjava" in order to do our magic. This is really an impressive achievement, and I personally believe (and this is admittedly speculation, since I don't have any contacts inside Sun to check this with), that it was done specifically in order to overcome this issue of RTLD_* interplay with dynloading. What they do is pass a struct to your jni code. This struct is called the "env" struct, which is how your JNI code learns everything about the environment of the JVM process context in which it is executing. This struct is populated (initialized) with a collection of function pointers, which your JNI code uses in order to invoke functions in the JVM. This effectively eliminates the need to link JNI client code against any hypotehtical "libjava", and yet it allows you to call jni support code that is infact burried in the jvm. Way cool. I personally believe this solution was the result of a serious software engineering effort, and not just a dullard's hack, or the blundering bumblefussing concoction of people lacking in creativity. So, coming back to PLplot. What I think we shoudl do here, to finally and fully resolve this and all such related issues, is to employ this same approach inside PLplot. In fact, we are already sort of doing something like this, since the dispatch table is sort of like the env struct, albeit the dispatch table is only used inside plcore. But the point is, we load up some structs with function pointers, and deref them during the course of action, to get work done. I'm saying we can embellish this appraoch to do a little more. What I envision is a PLenv struct, which has a bunch of function pointer members that are initialized to point to a collection of PLplot provided "driver support functions". plRotPhy would be one example of these, but there should be a few more entries in the PLenv table to cover other reasonable things drivers might want to do, which could be implemented in a driver-neutral manner and provided in libplplot proper. So, we init this struct, and then pass it along to the drivers when we call them. One way this could be accomplished without too much disruption in the APIs of all these things, would be to add the PLenv member to the PLStream, which is already being passed to all the drivers when we call the driver entry point functions. The alternative of adding a PLenv argument to the calling interface of all the public driver entry point functions, would mean a lot of gratuitous changes to the code, and something would probably be destabilized. You could also imagine just adding the function pointer members directly to PLStream. If anyone feels strongly about this, I'd be willing to go either way, with some arm twisting. But I propose having a seperate struct PLenv with one struct PLenv * member in PLStream, just to reduce the memory footprint of the total solution, and also to evoke some mental continuity with the JNI env thing upon which it is modelled. So, this would allow us to drop linking directly any drivers to libplplot. I also do not believe that libplplot should be linked to any drivers. I haven't been able to understand with complete certainty, if this goal has or has not been achieved yet in the current situation. > Bottom line: gotta work around this problem myself. The solution which I > have just committed is to have libplplot link to the "special" drivers tk, > xwin, and tkwin and all other "ordinary" drivers link the other way to > plplot. My spot checks indicated everything works with this solution. Let me know what you think of the above. We've discussed my jni env thing before, but I think now you'll be able to understand my proposal better than when we discussed it before. What do you think now? > Status: there are still some things to clean up. For example, I > just committed a bugfix for plplot.h which solves one bug and also > now allows (at least from my preliminary tests) tk to be moved from > the special to the ordinary list. Also, the install has to be > fixed up to deal with the special dynamic drivers. But > fundamentally, I think I have the dynamic driver linking problems > whipped. Great progress, thanks. -- Geoffrey Furnish fu...@ga... |