Rather than simply commenting out the dict = 0; statement in the Fuzzy constructor in htfuzzy/Fuzzy.cc, it seems the proper fix would be to replace it with dict = new Dictionary; instead. That way, when it gets to the writeDB() method, which seems to assume that dict is already set, it actually will be even if there were no words in the database. I haven't actually tried this, but it seems that should fix the problem properly. This fix should be implemented in both the 3.1.x and 3.2.x code bases of ht://Dig.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think, dict is allocated correctly before Fuzzy's constructor. So if you
replace "dict = 0" by "dict = new Dictionary" instead "//dict = 0" it could
causes memory leak. But I can be wrong. Before apply your patch, please check
potential memory leaks in htfuzzy.
Regards, Adam
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've dug around this problem and I've found this: problem is only with corrupted database. For example when you start rundig on Fedora 6 and send SIGINT (CTRL+C) you can get corrupted database. It's not standard situation but htdig (especially htfuzzy) must be resistant for this problem. Good way can be print error message about corrupted database and rebuild it but don't dumps core... Corrupted database is attached (tarball)
File Added: htdig.tar
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My testing shows it's not just corrupted databases, but also an empty (but otherwise valid) word database that can trigger the segfaults. This happens when htdig can contact the web server in the start_url, but never indexes any text (e.g. a 404 error on the initial URL).
As for when dict is allocated, it is just a pointer that is allocated before Fuzzy's constructor. The actual object that it eventually will point to isn't allocated until words are added to the list. There are similar sections of code in htfuzzy that don't properly handle null list pointers. My patch that I just uploaded fixes all the instances of these that I could find in htfuzzy. I haven't found any memory leaks in htfuzzy. (We've had reports of leaks in htdig, but I don't believe that problem exists in htfuzzy.) I'm quite certain my patch does not introduce any leaks, as it only deals with properly avoiding dereferencing null pointers. When the Dictionary object is actually allocated, the destructor correctly deletes it. When the constructor is called, the Dictionary object hasn't been allocated yet, so zeroing the pointer is the correct course of action.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
proposed patch
Logged In: YES
user_id=149687
Originator: NO
Rather than simply commenting out the dict = 0; statement in the Fuzzy constructor in htfuzzy/Fuzzy.cc, it seems the proper fix would be to replace it with dict = new Dictionary; instead. That way, when it gets to the writeDB() method, which seems to assume that dict is already set, it actually will be even if there were no words in the database. I haven't actually tried this, but it seems that should fix the problem properly. This fix should be implemented in both the 3.1.x and 3.2.x code bases of ht://Dig.
Logged In: YES
user_id=1655665
Originator: YES
I think, dict is allocated correctly before Fuzzy's constructor. So if you
replace "dict = 0" by "dict = new Dictionary" instead "//dict = 0" it could
causes memory leak. But I can be wrong. Before apply your patch, please check
potential memory leaks in htfuzzy.
Regards, Adam
Logged In: YES
user_id=1655665
Originator: YES
I've dug around this problem and I've found this: problem is only with corrupted database. For example when you start rundig on Fedora 6 and send SIGINT (CTRL+C) you can get corrupted database. It's not standard situation but htdig (especially htfuzzy) must be resistant for this problem. Good way can be print error message about corrupted database and rebuild it but don't dumps core... Corrupted database is attached (tarball)
File Added: htdig.tar
corrupted database
patch to fix segfaults in htfuzzy
Logged In: YES
user_id=149687
Originator: NO
File Added: htdig-3.2.0b6-segfault.patch
Logged In: YES
user_id=149687
Originator: NO
My testing shows it's not just corrupted databases, but also an empty (but otherwise valid) word database that can trigger the segfaults. This happens when htdig can contact the web server in the start_url, but never indexes any text (e.g. a 404 error on the initial URL).
As for when dict is allocated, it is just a pointer that is allocated before Fuzzy's constructor. The actual object that it eventually will point to isn't allocated until words are added to the list. There are similar sections of code in htfuzzy that don't properly handle null list pointers. My patch that I just uploaded fixes all the instances of these that I could find in htfuzzy. I haven't found any memory leaks in htfuzzy. (We've had reports of leaks in htdig, but I don't believe that problem exists in htfuzzy.) I'm quite certain my patch does not introduce any leaks, as it only deals with properly avoiding dereferencing null pointers. When the Dictionary object is actually allocated, the destructor correctly deletes it. When the constructor is called, the Dictionary object hasn't been allocated yet, so zeroing the pointer is the correct course of action.