See the following message for a full description:
https://sourceforge.net/forum/message.php?msg_id=5363412
By: raywilhe
In tracking down program crashes, I think I might have found a bug in the libserial
read code.
From what I can tell, serial port reads happen through a signal handler function
which fills a stl queue with data. The SerialPort::Read() functions then pull
data from that stl queue.
The stl queue that passes information between the SignalHandler and the
SerialPort::Read() functions has no protections (e.g. mutex) to prevent simultaneous
access to the stl queue.
Since the SIGIO signal can interrupt the user's thread at anytime, this can
cause problems if you are lucky enough to have the folllowing happen:
1. User Calls SerialPort::ReadByte()
2. In the middle of executing the mInputBuffer.pop() function call
the SIGIO signal is raised and the SerialPort::HandlePosixSignal()
gets immediately called
3. In the SerialPort::HandlePosixSignal() function, the mInputBuffer.push()
function is called
4. Execution returns back to SerialPort::ReadByte(), and the mInputBuffer.pop()
finishes
5. Now the stl queue is corrupted and the random crashes start.....
Please let me know if anybody can confirm/deny that this is a problem. This
is my first time looking at the internals of libSerial, but thus far I am fairly
sure the above described problem is the smoking gun in my case.
I just wanted to post my solution. All it does is replace the std::queue<> with a ThreadSafeQueue<>. The ThreadSafeQueue just wraps a mutex around all queue operations. Obviously, since I am posting this in a forum, any and all are free to use this code (tidbit):
template<typename TYPE>
class ThreadSafeQueue : protected std::queue<TYPE>
{
public:
ThreadSafeQueue()
{
pthread_mutex_init(&_m, NULL);
}
~ThreadSafeQueue()
{
pthread_mutex_destroy(&_m);
}
void push(const TYPE& val)
{
pthread_mutex_lock(&_m);
std::queue<TYPE>::push(val);
pthread_mutex_unlock(&_m);
}
void pop()
{
pthread_mutex_lock(&_m);
std::queue<TYPE>::pop();
pthread_mutex_unlock(&_m);
}
bool empty() const
{
pthread_mutex_lock(&_m);
bool retVal = std::queue<TYPE>::empty();
pthread_mutex_unlock(&_m);
return (retVal);
}
unsigned int size() const
{
pthread_mutex_lock(const_cast<pthread_mutex_t*>(&_m));
unsigned int retVal = std::queue<TYPE>::size();
pthread_mutex_unlock(const_cast<pthread_mutex_t*>(&_m));
return (retVal);
}
const TYPE& front() const
{
pthread_mutex_lock(const_cast<pthread_mutex_t*>(&_m));
const TYPE& retVal = std::queue<TYPE>::front();
pthread_mutex_unlock(const_cast<pthread_mutex_t*>(&_m));
return (retVal);
}
const TYPE& back() const
{
pthread_mutex_lock(const_cast<pthread_mutex_t*>(&_m));
const TYPE& retVal = std::queue<TYPE>::back();
pthread_mutex_unlock(const_cast<pthread_mutex_t*>(&_m));
return (retVal);
}
TYPE& front()
{
pthread_mutex_lock(&_m);
TYPE& retVal = std::queue<TYPE>::front();
pthread_mutex_unlock(&_m);
return (retVal);
}
TYPE& back()
{
pthread_mutex_lock(&_m);
TYPE& retVal = std::queue<TYPE>::back();
pthread_mutex_unlock(&_m);
return (retVal);
}
protected:
pthread_mutex_t _m;
};
raywilhe:
Thank you very much for posting your solution.
--
Crayzee Wulf
@raywilhe: thank you too, for your posting, it brought me on the right way to fix some "random crashes".
But as I discorverd, using mutex_lock inside an Signal handler is not such a good thing. It causes some awfull deadlocks. (not often, but it does)
This is also mentioned in the man page.
"ASYNC-SIGNAL SAFETY
The mutex functions are not async-signal safe. What this means is that
they should not be called from a signal handler. In particular, calling
pthread_mutex_lock or pthread_mutex_unlock from a signal handler
may deadlock the calling thread."
I suggest the use of a extra thread to handle the work of the SIGINT, and notify it in the SignalHandler only. In the thread it is ovious to use the locks.
Any better idea?
Tommy:
Thanks for the report. I think your analysis makes sense. I am going to try and reproduce this at my end. Placing a mutex around the access to the STL queue.
Thanks,
--
CrayzeeWulf
@CrayzeeWulf: Did you test that so far? The project is stuck on a five year old release candidate with known bugs, which is a pity for such a useful lib. As I am currently working with libSerial, I will test it anyways and propose a patch if it's stable.
I added a mutex to SerialPort.cpp which controls the access to mInputBuffer.
To prevent deadlocks, "SerialPort::SerialPortImpl::HandlePosixSignal" only tries to lock the mutex. If the mutex is already locked, the signal handler will use a second queue named "shadowInputBuffer". The entire content of shadowInputBuffer is transfered into mInputBuffer if the mutex can be locked successfully.
@CrayzeeWulf: I attached to whole project with all my changes to this post so you can test it yourself. I would be glad if you would migrate these changes for the next release.
Fixed in revision 113 via patch from Oliver Schwaneberg.
The solution in 0.6.0rc2 is buggy. When there is no continuous/periodic data flow, the data in shadow buffer can not be pushed into actual input buffer and if the reader waits that data before sending new data, isDataAvailable returns false and the reader is blocked. I have been just patched the code. I want to test it for a while.
Last edit: Serdar Eke 2016-04-18
Serdar:
Thanks for checking into this. Please let me know what you find in your
tests so LibSerial may be improved.
Meanwhile, please note that LibSerial has now moved to GitHub:
https://github.com/crayzeewulf/libserial
Cheers,
CrayzeeWulf
On Apr 17, 2016 6:49 PM, "Serdar Eke" serdareke@users.sf.net wrote:
Related
Bugs:
#7Dear CrayzeeWulf, the problem in 0.6.0rc2 is that shadow buffer can't be pushed into actual buffer if no another data received from the serial port (i.e. if no SIGIO is triggered) and if the reader waits the data in the shadow buffer, the reader can't read it. Also, I found another bug today. There is no mutex protection in PosixSignalDispatcherImpl singleton class.For example, if 2 instances try to attach their handler in parallel, only one may insert its handler to the signal handler list if the conflict occurs due to lack of mutex protection. Thus, sigaction handler only calls that handler. The problem occurs when libSerial is used for more than 1 serial interface. I added a mutex protection like the solution for the bug mentioned in this bug report.
Last edit: Serdar Eke 2016-04-19