eiffel-edom-users Mailing List for EDOM
Brought to you by:
colin-adams
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|
From: Colin P. A. <co...@co...> - 2003-06-17 08:36:51
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: Colin> In release 1.2 (and indeed all previous releases), Colin> XERCESC_DOM_NODELIST has not been tested, and is broken, so Colin> cannot be used. This is now fixed in CVS. Like the majority of the DOM APIs, I have not yet written test cases for this. That is why I continue to find bugs in EDOM. If you would like to help with writing test cases (Eiffel programming skills only are needed, not C++), then please leave a message on the developers list. -- Colin Paul Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2003-06-16 18:26:14
|
In release 1.2 (and indeed all previous releases), XERCESC_DOM_NODELIST has not been tested, and is broken, so cannot be used. I shall have a fix in CVS real soon. -- Colin Paul Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2003-06-04 11:12:39
|
I am starting to form a solution in my head to the memory-model problem (I'll sketch it on the developers list when I feel a bit more sure about it - unfortunately, it involves additional overhead, but this cannot be helped). I have tagged the current CVS with a tag of rel-1-1-a, so you can check out this level if you need stable post 1.1 code, as it is quite likely that the HEAD of CVS will be un-compilable for a little while to come (the changes will not be small). -- Colin Paul Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2003-06-03 14:45:02
|
After a day-and-a-half of trying to debug a memory problem, I have finally worked out what is going on. There is a fundamental mis-match between the memory models used by Xerces-c and Eiffel. A "workaround" (???) is to get your root class to inherit from MEMORY, and then call collection_off. Yes, I know - highly unsatisfactory - I'm going to have to think about how to design a better solution. The problem is this: In C++, memory management is a nightmare - all memory has to be specifically allocated (usually with the new operator), and disposed of when it's no longer needed (with the delete operator, if it was allocated with the new operator). When you allocate memory yourself, it is relatively straight-forward to know when to de-allocate it. The real problems occur when a library allocates the memory for you. Who is responsible for de-allocating it? With the DOM, you have the additional complication that a single DOMDocument is built up from a tree of DOMNodes (of differing sub-classes). Xerces-c takes the simplifying approach that all the memory is owned by the DOMImplementation. So if you create a document with the DOMImplementation::createDocument() call, then create further elements, attributes, etc withthe various DOMDocument::createXXX calls, then you can free all the memory at once with a call to the (xerces-c specific - not DOM) method: release(). Or if the DOM is created for you by parsing with a DOMBuilder, then calling release() on the DOMBuilder disposes of the entire DOM. Fine - C++ users are happy. But now consider Eiffel - it's memory approach is automatic - a garbage collector inspects (when memory is short), all references, to find out which objects are still live, and which are dead (and can therefore have their memory re-claimed). A procedure dispose (from class MEMORY) is available for interacting with the garbage collector, and a standard use for it is to call C++ destructors (the special routines used to clean-up memory for a C++ object when it is destroyed). OK - so when I (belatedly) implemented dispose for EDOM, I call the C++ destructor for the class with which I wrap a DOMDocument, and it in turn calls release(). That's where the initial problems started, as not all the Eiffel XERCESC_DOM_DOCUMENT objects pointed to independent DOMDocuments - some were copies, such as the return value from DOMElement::ownerDocument(). Well, eventually I sorted out how to record "original" documents from copied ones, and with EDOM 1.1 all the memory problems appeared to be solved (with some exceptions - notably Pathan - XPath implementation, as it's memory requirements have not yet been flly explained to me by the authors - I'm working on that). Unfortunately, there is a flaw in this neat little scheme. Suppose I were to create a DOMBuilder, parse a document with it, and then free the DOMBuilder (say by calling the free procedure from class MEMORY)? Well, XERCESC_LS_DOMBUILDER has a dispose procedure that calls the destructor for xercesc_eiffel_ls_DOMBuilder (my wrapper class). Thet destructor calls DOMBuilder->release(), and all the memory for the built DOMDocument is freed. However, Eiffel knows nothing about the relationship withtin Xerces-c between DOMBuilder and it's returned DOMDocument. So you can happily continue to access instances of XERCESC_DOM_NODE, and it's derivatives, but the moment you try to do anything for real with these objects, they will attempt to reference the Xercesc DOMNodes upon which they are built - BUT THESE HAVE ALREADY BEEN RELEASED. Segementation violation! OK - if you call free, then you probably deserve what you get (though not really - it ought to be safe to do this. But consider - omit the call to free. Now if you don't attempt to use the XERCESC_LS_DOMBUILDER instance again, Eiffel, when it next collects garbage, will reason that is is safe to collect this object, and bang! That I think is what is happening in the current program I am developing. It uses about 300MB if I turn off garbage collection, and runs happily. Now a simple bodge would be to access the XERCESC_LS_DOMBUILDER at the end of the program with some harmless call (such as getFeature), to stop the Garbage Collector from reclaiming it, but that may not always be convenient, and in any case should not be necessary. I tried this approach, and it did not work just by adding a reference to the parser. There are also DOM_DOCUMENTs created with createDocument to consider - I failed to solve my particular problem with this approach, and in general, the problem may be intractable. What is needed is a reference from any XERCESC_DOM_NODE to it's owning XERCESC_DOM_DOCUMENT, and likewise a reference from a XERCESC_DOM_DOCUMENT to it's owning XERCESC_LS_DOMBUILDER (if any). But this won't be easy to implement - indeed, it may not be possible to do at all. Consider a XERCESC_DOM_DOCUMENT obtained by calling ownerDocument on a DOM_ELEMENT - this will not be the sane Eiffel object as the one which truely "owns" the DOMDocument, and will not therefore cause the DOMDocument to be released - BUT NEITHER therefore will it prevent it from being released . So If you then in turn obtain a XERCESC_DOM_ELEMENT from it - by calling documentElement for instance, then although you may be careful to keep a live reference to that XERCESC_DOM_DOCUMENT - it's memory may be pulled from under you! I need to consider it more carefully - but I think there is no solution. -- Colin Paul Adams Preston Lancashire |