From: Chuck E. <ec...@mi...> - 2001-02-22 18:14:13
|
You can find a tarball with this description and accompanying source code at: ftp://webware.sourceforge.net/pub/webware/ModulesProb01.tar.gz I have a problem where Python creates duplicate modules in memory rather than reuse the same one. For example, a module in Pkg/Mod.py ends up in sys.modules under keys "Pkg.Mod" and "Mod" **pointing to two distinct modules**. This creates further problems: Suppose a module Foo.py contains a class named Foo. If the module is loaded twice as two separate instances (of ModuleType), then there are 2 separate Foo classes. This causes confusion including the failure of an assertion such as: assert issubclass(foo, Foo) The Foo in that code may be pointing to the first class, while the instance foo may have been created from the second class. The assertion then fails! The problem stems from the fact that Python tracks modules by a relative, rather than absolute, path. A simple os.chdir() or a subtley in packages can cause this problem. This problem is easiest to see in an os.chdir() situation: C:\>mkdir foo C:\>cd foo C:\foo>mkdir bar C:\foo>cd bar C:\foo\bar>echo class baz: pass > baz.py C:\foo\bar>echo ### > __init__.py C:\foo\bar>cd .. C:\foo>python ActivePython 2.0, build 202 (ActiveState Tool Corp.) based on Python 2.0 (#8, Oct 19 2000, 11:30:05) [MSC 32 bit (Intel)] on win32 >>> from bar.baz import baz as baz1 >>> import os >>> os.chdir('bar') >>> from baz import baz as baz2 >>> baz1 <class bar.baz.baz at 007908BC> >>> baz2 <class baz.baz at 0079075C> >>> baz1 == baz2 0 >>> import sys >>> [mod for mod in sys.modules.items() if mod[0].count('baz')] [('baz', <module 'baz' from 'baz.pyc'>), ('bar.baz', <module 'bar.baz' from 'bar\baz.pyc'>)] >>> mod1 = sys.modules['bar.baz'] >>> mod2 = sys.modules['baz'] >>> id(mod1) 136147728 >>> id(mod2) 136147744 This problem can also be seen without any use of os.chdir(). See ManufactureWare/ which contains the "assert issubclass(foo, Foo)" problem described above. This all applies to Python 2.0 on Windows & UNIX. I haven't tried 2.1 yet. I believe the solution is for Python to track modules by their absolute path. I don't know of any other resolution to the situation other than modifying Python in this manner. I also don't know of any disadvantage for Python to track modules by absolute path. - Does anyone know of any workarounds? - Does anyone know why it would be bad for Python to track modules by absolute path? - Is there any chance Python will fix this in the future? - If so, by 2.1? -Chuck ftp://webware.sourceforge.net/pub/webware/ModulesProb01.tar.gz |
From: Terrel S. <tsh...@tr...> - 2001-02-23 03:00:39
|
> I believe the solution is for Python to track modules by their absolute > path. I don't know of any other resolution to the situation other than > modifying Python in this manner. I also don't know of any disadvantage for > Python to track modules by absolute path. > > - Does anyone know of any workarounds? The workaround is surprisingly easy: 1) replace all non-absolute paths in sys.path (especially ""). 2) make sure that no module is accessible from more than one of these roots (sufficent condition: no name in sys.path is a prefix of any other.) This will make some things slightly less convenient (you will have to think more carefully about your directory structure), but it will prevent you from shooting yourself in the foot by importing a module from two different places. Python 2.0 (#5, Dec 5 2000, 17:59:47) [GCC 2.96 20000731 (Red Hat Linux 7.0)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import sys >>> from pprint import pprint as pp >>> import os >>> pp(sys.path) ['', '/usr/local/lib/python2.0', '/usr/local/lib/python2.0/plat-linux2', '/usr/local/lib/python2.0/lib-tk', '/usr/local/lib/python2.0/lib-dynload', '/usr/local/lib/python2.0/site-packages'] >>> os.getcwd() '/home/tshumway/projects/current/ModulesProb/foo' >>> sys.path[0]=os.getcwd() >>> from bar.baz import baz as baz1 >>> os.chdir("bar") >>> from baz import baz as baz2 Traceback (most recent call last): File "<stdin>", line 1, in ? ImportError: No module named baz >>> from bar.baz import baz as baz2 >>> baz1==baz2 1 >>> pp(sys.path) ['/home/tshumway/projects/current/ModulesProb/foo', '/usr/local/lib/python2.0', '/usr/local/lib/python2.0/plat-linux2', '/usr/local/lib/python2.0/lib-tk', '/usr/local/lib/python2.0/lib-dynload', '/usr/local/lib/python2.0/site-packages'] >>> pp(sys.modules) {'UserDict': <module 'UserDict' from '/usr/local/lib/python2.0/UserDict.pyc'>, '__builtin__': <module '__builtin__' (built-in)>, '__main__': <module '__main__' (built-in)>, 'bar': <module 'bar' from '/home/tshumway/projects/current/ModulesProb/foo/bar/__init__.pyc'>, 'bar.baz': <module 'bar.baz' from '/home/tshumway/projects/current/ModulesProb/foo/bar/baz.pyc'>, 'cStringIO': <module 'cStringIO' from '/usr/local/lib/python2.0/lib-dynload/cStringIO.so'>, .... Notice that although "/usr/local/lib/python2.0" is a prefix of four other names, those other names are carefully choosen to be illegal as package names. (e.g. import plat-linux.foo is a syntax error.) I think the default sys.path under Windows is not quite as clean. > > - Does anyone know why it would be bad for Python to track modules by > absolute path? It would make it more difficult to move modules around. (Java demonstrates this.) Tool support would help: it is easy to search for import statements. > > - Is there any chance Python will fix this in the future? Hmmm. Seems like it does not really need a fix from On High: only a widely-published HOWTO documenting the problem and the solution. HTH -- Terrel FactoryKit/main.py: ------------------- # main.py # Make sure FactoryKit is reachable, since widgets # will want to import FactoryKit.Widget. try: import FactoryKit except ImportError: import os, sys # XXX dangerous: should make sure it is really replacing "" sys.path[0] = os.path.abspath(os.pardir) import FactoryKit from FactoryKit import Factory # A simple "import Factory" raises an exception, as it should. # Since __main__ does not belong to the FactoryKit package, # it cannot use the package-relative import shortcuts Factory.LoadAndMakeWidget('Examples/ExampleWidget.py') |
From: Chuck E. <ec...@mi...> - 2001-02-23 03:14:21
|
At 10:05 PM 2/22/2001 -0500, Terrel Shumway wrote: > > - Does anyone know why it would be bad for Python to track modules by > > absolute path? > >It would make it more difficult to move modules around. (Java demonstrates >this.) Tool support would help: it is easy to search for import statements. I don't follow that at all. This is strictly a run time phenomena. Who wants to move modules once the program starts? > > - Is there any chance Python will fix this in the future? > >Hmmm. Seems like it does not really need a fix from On High: only a >widely-published HOWTO documenting the problem and the solution. Well unless your previous comment pans out, there doesn't seem to be a reason to track them by relative path. And since doing so leads to problems, it seems that an On High fix would be more appropriate than considering the situation to be OK simply because it's documented. Also, I'm not really sure if I want to replace '' in sys.path. I thought that was there as a relative path to the current module (not the current directory) and therefore helped a given file Foo.py say "import Bar from Bar" where Bar.py resided in the same package. Particularly if you didn't even import Foo directly. Am I wrong about that? As Geoff pointed out, we most likely encountered this problem because we're running a program out of a package, which is uncommon. I tried Geoff's fix on the example code I wrote and it worked like a charm. I'll try it on Webware next. If the fix pans out for Webware, that will be great, but unless there is a concrete advantage to tracking packages by relative path, I still recommend a change in Python to track modules by absolute path. That would eliminate accidently getting 2 distinct copies of the same module for any Python program. Thanks for the input, -Chuck |
From: Terrel S. <tsh...@tr...> - 2001-02-23 16:24:09
|
Chuck Esterbrook wrote: > At 10:05 PM 2/22/2001 -0500, Terrel Shumway wrote: > > > - Does anyone know why it would be bad for Python to track modules by > > > absolute path? > > > >It would make it more difficult to move modules around. (Java demonstrates > >this.) Tool support would help: it is easy to search for import statements. > > I don't follow that at all. This is strictly a run time phenomena. Who > wants to move modules once the program starts? No, this is not a runtime problem, it is a refactoring-time problem. In Java, just try renaming a package and see how long it takes you to get a clean compile without using a tool like WoodenChair. The people who designed Python's hierarchical package system wanted people to be able to gather modules and packages into packages without breaking them. The package relative import is a fairly clean solution to this fairly nasty problem. Tool support would be helpful for external code that uses the moved modules. Consider this very very simple scenario: ham.py --- class Ham:... eggs.py --- class Eggs... class Dozen... spam.py --- from ham import Ham from eggs import EggMixin class Spam(Ham,EggMixin):... class Can... spam_eater.py --------- from spam import Spam import eggs factory = Spam() breakfast = factory.getCan() breakfast.eat() lunch = eggs.Dozen() lunch.eat() supper = Eggs.leftovers((breakfast,lunch)) supper.eat("Yum,yum!") spam_eater is an application that uses ham, eggs, and spam as library modules. Suppose now that we wanted to serve Spam and Eggs over HTTP. We could just dump all of these modules into the same directory with Request,Response,Servlet, et.al., (or, equivalently, add the directory to sys.path) but that would not be clean. Lets put them together into a package: FoodKit. FoodKit +- __init__.py +- spam.py +- ham.py +- eggs.py bin +- spam_eater.py Now in our web server we can say: import FoodKit serveit = FoodKit.spam.Can() spam,eggs,and ham will work without modification. spam_eater would too, if we had put it in the package, but we decided that we shouldn't mix applications and libraries. Since FoodKit is not in sys.path (If it were, we would get the nasty duplicate module problems that started this thread.) spam_eater gets an ImportError because it cannot find spam or eggs. The solution is to add the package name, just as the web server does. from FoodKit.spam import Spam from FoodKit import eggs Finding and fixing all of these broken import statements is what tools can do. Note that the problem would be much larger without the package relative imports. (cf. Java) Also note that this example is extremely simple. A more realistic scenario would probably involve multiple packages and dozens of modules. Without tool support, this type of refactoring tends to get put off until it is really nasty. > Also, I'm not really sure if I want to replace '' in sys.path. I thought > that was there as a relative path to the current module (not the current > directory) and therefore helped a given file Foo.py say "import Bar from > Bar" where Bar.py resided in the same package. Particularly if you didn't > even import Foo directly. Am I wrong about that? No, the package relative import works without "" in sys.path. > As Geoff pointed out, we most likely encountered this problem because we're > running a program out of a package, which is uncommon. I tried Geoff's fix > on the example code I wrote and it worked like a charm. I'll try it on > Webware next. A program (script) running within a package, if it tries to use the package relative imports, is broken. __main__ is in the default package. Trying to use relative imports from __main__ is breaking the package encapsulation. > If the fix pans out for Webware, that will be great, but unless there is a > concrete advantage to tracking packages by relative path, There is a concrete advantage: a library module does not need to know where it is in the package hierarchy. Renaming the package does not require any code changes, as it does for example in Java, which uses absolute names. Only external clients need code changes. > I still recommend > a change in Python to track modules by absolute path. That would eliminate > accidently getting 2 distinct copies of the same module for any Python program. Python does track modules by *filename*. If all of the names in sys.path are absolute, then you can never get a filename that is not absolute. -- Terrel |