From: Jonathan W. <jw...@ph...> - 2008-01-25 11:29:33
|
Hi guys I think I have found the source of the segfaults during closedown. In PortManager::~PortManager a vector iterator is used to loop through all registered ports so they can be deleted. It seems that following the delete operation, *it points not to the deleted port but to the one which followed it in the list. When "it" is incremented within the for() statement it effectively skips a port. The result is that only every second port ends up being deleted. However, the real problem occured if you had an odd number of ports; in this case "it" points to invalid memory during the final iteration, the result being a predictable segfault. Svn rev 875 contains a fix for this - just don't increment "it" in the for() statement. I've included comments in the source explaining this. However, at the back of my mind I'm wondering whether all this is just relying on undocumented iterator behaviour. Is it actually valid to delete vector members while an iterator is active over the vector and is the observed behaviour something we can rely on across c++ compiler versions? This is a question we ought to answer because a quick search for "delete *it" showed that this iterator/delete construct is used in a fair number of places. I would say that like in the PortManager case the delete loop is probably only deleting every second vector element. While this may not cause any major visible issues given that the code is generally only associated with closedown, it is something we should fix. I would be interested to read your thoughts on all this. For reference, the above comments apply to gcc 4.1.2. With the above fix in place ffado is closing down without a segfault for me once more. A consequence of this is that the "silent" packets finally have a chance to do the right thing. Although more extensive testing is needed it seems that at least on my systems we do have clean shutdown again now. Startup is also clean provided the previous shutdown was ok. It would be interesting to see what the situation is with other MOTUs such as the 828. There is still obviously a remaining issue on startup with the stability of timestamps - audio is sent to the device before the DLL has stabilised sufficiently, causing discontinuities in the audio. It may turn out that the only way to fix this is to delay the sending of audio a bit more, but I'd like to confirm that before going ahead with that rather hacky solution. Finally, note that I will be at linux.conf.au from 26 Jan to 2 Feb (ie: next week). During that time I hope to have a chance to look into the SSE/optimisation details for the MOTU. However, I won't have the MOTU with me so I won't have anything profound to check in until later. I may also look into some trivial issues. Regards jonathan |
From: Pieter P. <pi...@jo...> - 2008-01-25 11:59:44
|
Jonathan Woithe wrote: > Hi guys > > I think I have found the source of the segfaults during closedown. In > PortManager::~PortManager a vector iterator is used to loop through all > registered ports so they can be deleted. It seems that following the delete > operation, *it points not to the deleted port but to the one which followed > it in the list. When "it" is incremented within the for() statement it > effectively skips a port. The result is that only every second port ends up > being deleted. However, the real problem occured if you had an odd number > of ports; in this case "it" points to invalid memory during the final > iteration, the result being a predictable segfault. > > Svn rev 875 contains a fix for this - just don't increment "it" in the for() > statement. I've included comments in the source explaining this. However, > at the back of my mind I'm wondering whether all this is just relying on > undocumented iterator behaviour. Is it actually valid to delete vector > members while an iterator is active over the vector and is the observed > behaviour something we can rely on across c++ compiler versions? > > This is a question we ought to answer because a quick search for "delete > *it" showed that this iterator/delete construct is used in a fair number of > places. I would say that like in the PortManager case the delete loop is > probably only deleting every second vector element. While this may not > cause any major visible issues given that the code is generally only > associated with closedown, it is something we should fix. > > I would be interested to read your thoughts on all this. Maybe we have to replace it with a while(vector.size()) { ptr=vector.pop(0); delete prt; } or something similar? > > For reference, the above comments apply to gcc 4.1.2. > > With the above fix in place ffado is closing down without a segfault for me > once more. A consequence of this is that the "silent" packets finally have > a chance to do the right thing. Although more extensive testing is needed > it seems that at least on my systems we do have clean shutdown again now. > Startup is also clean provided the previous shutdown was ok. It would be > interesting to see what the situation is with other MOTUs such as the 828. > > There is still obviously a remaining issue on startup with the stability of > timestamps - audio is sent to the device before the DLL has stabilised > sufficiently, causing discontinuities in the audio. It may turn out that > the only way to fix this is to delay the sending of audio a bit more, but > I'd like to confirm that before going ahead with that rather hacky solution. > > Finally, note that I will be at linux.conf.au from 26 Jan to 2 Feb (ie: next > week). During that time I hope to have a chance to look into the > SSE/optimisation details for the MOTU. However, I won't have the MOTU with > me so I won't have anything profound to check in until later. I may also > look into some trivial issues. > > Regards > jonathan > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > FFADO-devel mailing list > FFA...@li... > https://lists.sourceforge.net/lists/listinfo/ffado-devel > |
From: Arnold K. <ar...@ar...> - 2008-01-25 12:08:06
|
Am Freitag, 25. Januar 2008 schrieb Jonathan Woithe: > I think I have found the source of the segfaults during closedown. In > PortManager::~PortManager a vector iterator is used to loop through all > registered ports so they can be deleted. It seems that following the > delete operation, *it points not to the deleted port but to the one which > followed it in the list. When "it" is incremented within the for() > statement it effectively skips a port. The result is that only every > second port ends up being deleted. However, the real problem occured if > you had an odd number of ports; in this case "it" points to invalid memory > during the final iteration, the result being a predictable segfault. Oha, deleting members of an iterable(?) while having an iterator on them is= =20 very bad as far as I remember from the c++-book. And that is quite obvious,= =20 what if someone not knowing the code and the language that much wants to do= =20 something with the iterator afterwards? Bad... Better in my experience (I use it daily here) is to take out the first (or= =20 last) member, thus removing it from the iterable, and delete it until the=20 iterable is empty. So: std::list<bla*> blalist; while( blalist.size() > 0 ) { bla* tmp =3D blalist.takeFront(); delete tmp; } (might have mixed std-c++ and Qt's specialties but hopefully you get the=20 point.) Have fun, Arnold =2D-=20 visit http://www.arnoldarts.de/ =2D-- Hi, I am a .signature virus. Please copy me into your ~/.signature and send= me=20 to all your contacts. After a month or so log in as root and do a "rm -rf /". Or ask your=20 administrator to do so... |
From: Francois.ernoult <fra...@pa...> - 2008-01-25 12:40:30
|
Hi Jonathan, > With the above fix in place ffado is closing down without a segfault for > me > once more. A consequence of this is that the "silent" packets finally > have > a chance to do the right thing. Although more extensive testing is needed > it seems that at least on my systems we do have clean shutdown again now. > Startup is also clean provided the previous shutdown was ok. It would be > interesting to see what the situation is with other MOTUs such as the 828. Shutdown is clean now with my 828mkII. Nice work! Regards, Francois |
From: Daniel W. <wa...@mo...> - 2008-01-27 16:42:43
|
> I think I have found the source of the segfaults during closedown. In > PortManager::~PortManager a vector iterator is used to loop through all > registered ports so they can be deleted. It seems that following the delete > operation, *it points not to the deleted port but to the one which followed > it in the list. That is not normal. The iterator should not be changed at all. Maybe we have to check what 'delete *it' does on the memory. Maybe some sort of memory corruption? > However, > at the back of my mind I'm wondering whether all this is just relying on > undocumented iterator behaviour. Is it actually valid to delete vector > members while an iterator is active over the vector and is the observed > behaviour something we can rely on across c++ compiler versions? The cleanup strategy we are using is ok. The only thing you have to be careful is not to destroy or change the iterators in the loop. When dereferencing the iterator we get our pointer back we stored in the vector. We can do what we want with this pointer. The iterator doesn't care. > This is a question we ought to answer because a quick search for "delete > *it" showed that this iterator/delete construct is used in a fair number of > places. I would say that like in the PortManager case the delete loop is > probably only deleting every second vector element. While this may not > cause any major visible issues given that the code is generally only > associated with closedown, it is something we should fix. Could you test following program? #include <cstdio> #include <vector> using namespace std; class Foo { public: Foo() {} virtual ~Foo() {} }; typedef vector<Foo *> FooVector; typedef vector<Foo *>::iterator FooVectorIterator; int main (int argc, char *argv[]) { const int nr = 3; FooVector foos; for (int i = 0; i < nr; i++) { Foo *foo = new Foo; if (foo) { printf("foo %d: %p\n", i, foo); foos.push_back(foo); } } for (FooVectorIterator it = foos.begin(); it != foos.end(); ++it) { printf("before delete: *it %p\n", *it); delete *it; printf("after delete: *it %p\n", *it); } return 0; } I get following output on my machine: foo 0: 0x804b008 foo 1: 0x804b028 foo 2: 0x804b018 before delete: *it 0x804b008 after delete: *it 0x804b008 before delete: *it 0x804b028 after delete: *it 0x804b028 before delete: *it 0x804b018 after delete: *it 0x804b018 I try to figure out what ~Port() is doing. It seems that this part of the code is not used for my bebob devices. cheers, daniel |
From: Daniel W. <wa...@mo...> - 2008-01-27 16:52:23
|
Daniel Wagner wrote: >> I think I have found the source of the segfaults during closedown. In >> PortManager::~PortManager a vector iterator is used to loop through all >> registered ports so they can be deleted. It seems that following the delete >> operation, *it points not to the deleted port but to the one which followed >> it in the list. > > That is not normal. The iterator should not be changed at all. Maybe we have to > check what 'delete *it' does on the memory. Maybe some sort of memory corruption? Following is happening: ~PortManager() is called - an iterator is created over m_Ports - delete *it ~Port() is called - call m_manager.unregisterPort(this) PortManager::unregister is called - create a interator over m_Ports - call erase(it) So the problem is that m_Ports is changed in the 'inner' function while a iterator is hold in ~PortManager. That is clearly invalid code sequence. daniel |
From: Jonathan W. <jw...@ph...> - 2008-01-28 04:20:02
|
Hi Daniel > Daniel Wagner wrote: > >> I think I have found the source of the segfaults during closedown. In > >> PortManager::~PortManager a vector iterator is used to loop through all > >> registered ports so they can be deleted. It seems that following the delete > >> operation, *it points not to the deleted port but to the one which followed > >> it in the list. > > > > That is not normal. The iterator should not be changed at all. Maybe we have to > > check what 'delete *it' does on the memory. Maybe some sort of memory corruption? > > Following is happening: > > ~PortManager() is called > - an iterator is created over m_Ports > - delete *it > > ~Port() is called > - call m_manager.unregisterPort(this) > > PortManager::unregister is called > - create a interator over m_Ports > - call erase(it) > > So the problem is that m_Ports is changed in the 'inner' function while a > iterator is hold in ~PortManager. That is clearly invalid code sequence. Got it - well deduced. I'll see if I can improve things this evening. I guess the main problem is that both the port manager and the port itself are taking the view that it's their responsibility unregister the ports. Regards jonathan |
From: Pieter P. <pi...@jo...> - 2008-01-27 17:11:05
|
Daniel Wagner wrote: > Daniel Wagner wrote: >>> I think I have found the source of the segfaults during closedown. In >>> PortManager::~PortManager a vector iterator is used to loop through all >>> registered ports so they can be deleted. It seems that following the delete >>> operation, *it points not to the deleted port but to the one which followed >>> it in the list. >> That is not normal. The iterator should not be changed at all. Maybe we have to >> check what 'delete *it' does on the memory. Maybe some sort of memory corruption? > > Following is happening: > > ~PortManager() is called > - an iterator is created over m_Ports > - delete *it > > ~Port() is called > - call m_manager.unregisterPort(this) > > PortManager::unregister is called > - create a interator over m_Ports > - call erase(it) > > So the problem is that m_Ports is changed in the 'inner' function while a > iterator is hold in ~PortManager. That is clearly invalid code sequence. I plead guilty... I somewhere introduced that ports register/unregister themselves to the manager on creation/destruction. It seemed to be a cleaner method, since I could use a reference instead of a pointer, meaning that I could eliminate all the null pointer checks. But since it used to be pointer based (and a mess) I introduced this. Should be fixed when we use a while(items) {delete pop()} loop. Greets, Pieter |
From: Daniel W. <wa...@mo...> - 2008-01-28 08:43:44
|
> Should be fixed when we use a while(items) {delete pop()} loop. I have not really looked into to the code but generally you should try to keep the resource allocation and resource deallocation symmetric. So if you use register from a Port object then you should also use unregister from the Port object. Though if you start deallocate the resource from the PortManager then you break the symmetry. Of course this is just showing myself off :) cheers, daniel |
From: Pieter P. <pi...@jo...> - 2008-01-28 08:53:36
|
Daniel Wagner wrote: >> Should be fixed when we use a while(items) {delete pop()} loop. > > I have not really looked into to the code but generally you should try > to keep the resource allocation and resource deallocation symmetric. So > if you use register from a Port object then you should also use > unregister from the Port object. Though if you start deallocate the > resource from the PortManager then you break the symmetry. Of course > this is just showing myself off :) I think at least the registration is symmetric: * a port registers itself to the manager in it's constructor, and unregisters itself in its destructor. There is an asymmetry in the creation of ports though: they are created by the AvDevice, then added to the StreamProcessor (= PortManager). The SP destroys them. So maybe it's better to move the port creation function into the SP (SP.addPort(params)). Greets, Pieter |
From: Jonathan W. <jw...@ph...> - 2008-01-28 21:37:06
|
Hi guys > Daniel Wagner wrote: > >> Should be fixed when we use a while(items) {delete pop()} loop. > > > > I have not really looked into to the code but generally you should try > > to keep the resource allocation and resource deallocation symmetric. So > > if you use register from a Port object then you should also use > > unregister from the Port object. Though if you start deallocate the > > resource from the PortManager then you break the symmetry. Of course > > this is just showing myself off :) > > I think at least the registration is symmetric: > * a port registers itself to the manager in it's constructor, and > unregisters itself in its destructor. > > There is an asymmetry in the creation of ports though: they are created > by the AvDevice, then added to the StreamProcessor (= PortManager). The > SP destroys them. So maybe it's better to move the port creation > function into the SP (SP.addPort(params)). That's an interesting idea. Rev 876 contains a fix for this based on what Pieter wrote yesterday. Basically the PortManager destructor now just deletes the port - removal from the port list is now done by the port via the PortManager::unregister() from ~Port(). Note that I don't pop the port from the list in ~PortManager() because otherwise the PortManager::unregister() call from ~Port() will trigger an unnecessary "not registered" error in verbose mode. The fix is compile-tested only - I don't have my interface with me at linux.conf.au. Thinking about the two alternatives (ie: what we do now, and moving port creation into the SP) I think I could argue both ways. Ports are really a property of the actual device so it makes sense that they are set up by the device object. On the other hand, the asymmetry of the present situation may be seen as slightly messy. Personally I don't have a problem with the way things are at present, especially since there are other arguably more important issues I still need to sort out. :-) Regards jonathan |
From: Jonathan W. <jw...@ph...> - 2008-02-06 06:21:31
|
Hi guys This is to let you know that since returning from LCA2008 I have tested svn revision 876 with the MOTU and it still seems to be behaving itself in the same way that rev 875 did. In other words, the revised fix for the segfault on close seems to be working. Regards jonathan |