[Sqlalchemy-tickets] Issue #3710: Delayed configuration of polymorphic subclasses leads to duplicat
Brought to you by:
zzzeek
From: Chris W. <iss...@bi...> - 2016-05-10 11:29:08
|
New issue 3710: Delayed configuration of polymorphic subclasses leads to duplicate event handler registration https://bitbucket.org/zzzeek/sqlalchemy/issues/3710/delayed-configuration-of-polymorphic Chris Wilson: We have a complex system with mapped classes defined in many different files, used by different applications. Therefore we use derived-map polymorphic loading to load only those classes which are needed, sometimes when processing query results. We have been running into problems with SQLAlchemy generating SQL that tries to violate database constraints, for example trying to insert the same row twice in a many-to-many association secondary table: ``` IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "cons_m2m_primary_key" DETAIL: Key (id_foo, id_bar)=(182145, 29586) already exists. [SQL: 'INSERT INTO m2m (id_foo, id_bar) VALUES (%(id_foo)s, %(id_bar)s)'] [parameters: ({'id_foo': 182145, 'id_bar': 29586}, {'id_foo': 182145, 'id_bar': 29586})] ``` We tracked this down to delayed initialisation of polymorphic classes and their subclasses resulting in duplicate event listeners being added for the backref relationship. This results in A.bs == [b], but B.as == [a, a]. Depending on the order in which the unitofwork flushes these out to the database, 50% of the time it will try to flush B.as first and generate SQL which violates the database constraints, as above. Please consider the following abstract example, where the base class Animal is registered first, along with its relationship to Home, and then the polymorphic subclasses Mammal and Dog are loaded and configured in a separate step: ``` #!python from sqlalchemy import Column, ForeignKey, Integer, Table, Text, create_engine, event from sqlalchemy.ext.declarative import declarative_base, declared_attr from sqlalchemy.orm import Mapper, configure_mappers, relationship, sessionmaker from sqlalchemy.orm.mapper import _mapper_registry # This event listener may be useful for debugging: if False: @event.listens_for(Mapper, 'mapper_configured') def receive_mapper_configured(mapper, class_): # assets_allowed = mapper._props.get('assets_allowed') homes = getattr(class_, 'homes', None) if homes is not None: dispatch = list(homes.dispatch.append) if len(dispatch) > 2: print "{} failed: {} ({})".format(class_, mapper, dispatch) assert len(dispatch) <= 2, "{}: {}".format(class_, dispatch) print "{} configured: {} ({})".format(class_, mapper, repr(homes)) _mapper_registry.clear() Base = declarative_base() # Register these classes first, including the concrete polymorphic base class with the M2M association: class Home(Base): __tablename__ = 'home' id = Column(Integer, primary_key=True) name = Column(Text) class Animal(Base): __tablename__ = 'mammal' id = Column(Integer, primary_key=True) name = Column(Text) homes_table = Table("mammal_home", Base.metadata, Column('id_mammal', Integer, ForeignKey(id), primary_key=True), Column('id_home', Integer, ForeignKey(Home.id), primary_key=True) ) homes = relationship(Home, secondary=homes_table, backref='animals') species = Column(Text) @declared_attr def __mapper_args__(cls): return { 'polymorphic_on': cls.species, 'polymorphic_identity': cls.__name__, } # Register the first set of classes and create their Mappers configure_mappers() # Simulate dynamic loading of additional mapped classes class Mammal(Animal): pass class Dog(Mammal): pass # These new classes should not be configured at this point: unconfigured = [m for m in _mapper_registry if not m.configured] assert len(unconfigured) == 2, str(unconfigured) # Now register them in a separate pass. # # If we take the first branch, this test will fail randomly 50% of the time, depending on the order of # Mammal and Dog in _mapper_registry. The second branch manipulates the registry and calls configure_mappers() # twice to force the Mappers to register in the wrong order, causing the test to fail every time. if True: configure_mappers() else: old_mapper_registry = dict(_mapper_registry) del _mapper_registry[Mammal.__mapper__] Mapper._new_mappers = True configure_mappers() _mapper_registry[Mammal.__mapper__] = old_mapper_registry[Mammal.__mapper__] Mapper._new_mappers = True configure_mappers() # In this case, we know that we have hit the error: the event listeners are registered twice: assert len(Dog.homes.dispatch.append) == 4, list(Dog.homes.dispatch.append) assert len(Home.animals.dispatch.append) == 2, list(Home.animals.dispatch.append) assert len(Mammal.homes.dispatch.append) == 2, list(Mammal.homes.dispatch.append) # This assertion will fail 50% of the time if the first branch is taken, so I've commented it out, # and copied it under the second branch above where it should always "succeed" (indicating that the # event listeners are wrongly double-configured): # assert len(Dog.homes.dispatch.append) == 4, list(Dog.homes.dispatch.append) engine = create_engine('sqlite:///sqlalchemy_example.db') Base.metadata.create_all(engine) DBSession = sessionmaker(bind=engine) session = DBSession(autocommit=True) session.begin() home = Home(name="localhost") dog = Dog(name="fido", homes=[home]) assert len(dog.homes) == 1 assert len(home.animals) == 1, "This fails: Dog is listed twice in home.animals: {}".format(home.animals) print "Test passed: this will happen 50% of the time if the first branch is taken" ``` When `configure_mappers()` is called to configure the subclasses, it loops over the `_mapper_registry` in random order. Thus, it may configure `Dog` first and then `Mammal`, or vice versa. When it configures `Dog` and `Mammal`, it will add two event listeners to their `homes.dispatch.append` list: `append` and `emit_backref_from_collection_append_event`. The latter adds the Dog/Mammal to the backref relationship, unconditionally (even if already present), so it must only be called once. However, if it configures `Dog` first and `Mammal` second, then when it configures `Mammal` it loops over `mapper.self_and_descendants` (in `_register_attribute`) and thus registers the same event listeners on `Dog.homes.dispatch.append` again. So whenever a `Dog` is given some `homes`, it will be added **twice** to the `animals` collection on each of those `Homes`. This does not happen if `Mammal` is registered first and `Dog` second, because when post-configuring `Dog.homes`, `mapper.class_manager._attr_has_impl(self.key)` returns True and it is not initialised again. I think the problem is related to the comment on `_post_configure_properties` which says "This is a deferred configuration step which is intended to execute once all mappers have been constructed." However this function is called whenever `configure_mappers()` finds unconfigured mappers, not only once. Possibly this changed at some point and violated some assumptions? One fix would be to check `mapper.class_manager._attr_has_impl(key)` on each mapper when looping over `mapper.self_and_descendants`, to avoid registering it twice. However that might be a layering violation. Another fix might be, when looping over mappers in `configure_mappers()` looking for unconfigured ones, to walk up the ancestor chain of each unconfigured mapper until we find the most ancient unconfigured one, and configure that first, followed by its children, and their children, and so on. |