Re: [RTnet-developers] Problem in SOCK_RAW
Brought to you by:
bet-frogger,
kiszka
|
From: Jorge A. <j-a...@cr...> - 2006-09-19 17:30:21
|
Em Ter=C3=A7a, 19 de Setembro de 2006 18:14, o Jan Kiszka escreveu: > Jorge Almeida wrote: > > Em Ter=C3=A7a, 19 de Setembro de 2006 16:50, escreveu: > >> Jorge Almeida wrote: > >>> Hello to all, > >>> > >>> I'm testing the SOCK_RAW functionality in the rtnet framework for lon= g periods of time and a problem is happening. > >>> I will try to describe it: > >>> > >>> I'm making tests sending 1.000.000 (one milion messages), with an int= erval of 5 ms each. > >>> After some time, more than 100.000 messages, the host were the test i= s running has a strange behaviour, the program does not return but the bash= dies. I must make the login phase again to enter the host. > >>> The messages stop of being sent (I'm monitoring the network with ethe= real). > >>> > >>> In the /proc i find some data about the file descriptor used by the s= ocket (/proc/rtai/rtdm/open_fildes) > >>> Index Locked Device > >>> 0 0 PACKET_RAW > >>> > >>> I think this is OK because the socket was never closed. > >> Because the sender somehow died I think. But why does the console also > >> die? That's not a typical program error. Anything on the kernel consol= e? > >=20 > > In attach follows the messages file for one session where the problem h= appens >=20 > Ok, your job (+ its shell) got OOM-killed (terminated due to lacking > memory). Is the test program allocating some memory in a loop? Unless it > is a kernel leak (RTnet or even lower), the OOM-killer typically (not > always) picks The Right process... I'm not allocating memory in the loop.=20 Only local variables, inside function in the loop. >=20 > >=20 > >>> But the behaviour is strange. > >>> My guess is that this problem is due to some kind of semaphore or any= synchronization mechanism. > >> The guess is based on which information? > > Because it only happens in a very high number of messages and not in a = small number. Maybe a variable that overflows or anything like that. >=20 > dmesg tells some other story so far. I can't get the right dmesg because it cleans at every reboot. But the present results follows in attach. >=20 > >>> > >>> Any clues for wath is happening? > >> Nope. > >> > >> If there are no signs anywhere, I would first try to run your scenario > >> over a similar time using some vanilla RTnet version with normal packet > >> sockets. Have you tried this before? Just to exclude that there are > >> major stability issues. > >=20 > > I've tested with SOCK_DGRAM two times, one OK the other the same proble= m. I'm doing some more tests with SOCK_DGRAM. >=20 > I think this is not related to latest changes. FWIW: that SOCK_DGRAM > test took place over RTnet, say, 0.9.5 vanilla? Nope. i'm using RTnet subversion. >=20 > >=20 > >> BTW, you are on RTAI? What version, patch, gcc? > > I'm using RTAI 3.4 test1, with gcc-4.1.0, with patch HAL IPIPE-NOTHREAD= S 1.3-08 >=20 > I'm definitely no RTAI expert anymore, but the last time I tried it with > gcc-4.1 (a few months ago) it also jumped out of the windows by just > running the latency test for half an hour or so. I think I read on the > RTAI list that gcc4.1 is not producing reliable RTAI code. >=20 > >=20 > > I'm gonna try with rtai 3.4 now. It also happens in RTAI 3.4. I'm gonna send the messages file to the RTAI List also. >=20 > I got flamed for such suggestions before, but to reduce the number of > unknowns I would really recommend to run a similar setup over Xenomai > (2.2.2 recommended for now due to pending FPU issues in 2.2.3). This can > help to find out where we have to dig deeper for the problem. >=20 > Jan >=20 >=20 =2D-=20 Jorge Almeida j-a...@cr... DISCLAIMER: This message may contain confidential information or privileged= material and is intended only for the individual(s) named. If you are not a= named addressee and mistakenly received this message you should not copy or= otherwise disseminate it: please delete this e-mail from your system and no= tify the sender immediately. E-mail transmissions are not guaranteed to be s= ecure or error-free as information could be intercepted, corrupted, lost, de= stroyed, arrive late or incomplete or contain viruses. Therefore, the sender= does not accept liability for any errors or omissions in the contents of th= is message that arise as a result of e-mail transmissions. Please request a = hard copy version if verification is required. Critical Software, SA.=0A |