Thread: RE: [Linux-hls-devel] Processes freezing up
Status: Pre-Alpha
Brought to you by:
lucabe
|
From: Paul K. <pko...@au...> - 2003-09-04 01:29:48
|
Hi Luca/John (and anyone else who's listening), To answer one question it looks like 5 seconds pass before the timer bug message appears. Sep 3 18:50:17 wally kernel: hls_ctl: Moving to res1 Sep 3 18:50:17 wally kernel: ...and setting the parameters! Sep 3 18:50:22 wally kernel: bug: kernel timer added twice at e24c1689 I don't know if it's relevant but we use the RTC to generate 1 ms (1 kHz) interrupts to our process. I assume that the 1 kHz kernel timer is generated by a different timer (the PIC/APIC?) I'm a little confused by the different timing mechanisms and timer-related patches (e.g. hi-res). We tried to specify a CPU reservation of 5ms out of 6.625ms. Before handballing to Tony I previously tested a "simulation" using hourglass (v0.6) to check on interrupt latency and deadline misses. Typing:=20 hourglass -n 1 -t 0 -rh 5ms 6.625ms -w LAT while simultaneously running top with update period 0.01 gave latencies of around 1.3ms hourglass -n 1 -t 0 -rh 5ms 6.625ms -w LAT -i RTC while simultaneously running top with update period 0.01 gave latencies of around 0.3ms =20 hourglass -n 5 -t 0 -rh 5ms 6.625ms -w PERIODIC 4ms 6.625ms -t 1 -p RTHIGH -w PERIODIC 1ms 10ms=20 showed that task 0 met all its deadlines but 1 (missed 1 at the beginning, a synchronisation/startup problem still?) [Aside: from this I take it that CPU reservations are stronger than SCHED_RR/SCHED_FIFO? With the latest talk about linux tasks vs HLS tasks, what is the relationship between HLS round-robin and linux's SCHED_OTHER, SCHED_FIFO and SCHED_RR? Do SCHED_OTHER tasks -> HLS round-robin? What about SCHED_RR/SCHED_FIFO tasks? Is that what you're grappling with at the moment, thinking about a HLS rt scheduler? Where does it fit in the default HLS hierarchy? Do any of these questions make sense?!]=20 Based on the third simulation, I would've thought our application would be okay. However it continues to show missed deadline problems. It could be we need to revisit reservation requirements, but I'm also keen to understand interrupt latency issues and scheduling issues. Regards, Paul Koufalas Senior Communications Engineer AUSPACE Limited Level 1 Innovation House Technology Park MAWSON LAKES SA 5095 AUSTRALIA T +61 8 8260 8236 F +61 8 8260 8226 M +61 404 837 122 www.auspace.com.au This email is for the intended addressee only. If you have received this e-mail in error, you are requested to contact the sender and delete the e-mail. Nothing in this email shall bind Auspace Limited in any contract or obligation. =20 -----Original Message----- From: Tony Lupoi=20 Sent: Wednesday, September 03, 2003 7:00 PM To: lin...@li... Subject: [Linux-hls-devel] Processes freezing up Hi Luca/John, =20 I'm a work colleague of Paul Koufalas and have been looking with him at running our application with the HLS scheduler. =20 I've encountered an issue where processes seem to freeze up after running for any amount of time. Sometimes it's my application and I've also noticed it happen with top. =20 I've included the dmesg output below. =20 HLS MP initializing (HLS_DBG_PRINT_LEVEL =3D 1). [-1961706252], 833 : sched 'ROOT' registered in slot 0 [-1885129302], 833 : sched 'JOIN' registered in slot 1 [-1808565821], 833 : sched 'TH' registered in slot 2 [-1734729187], 833 : sched 'RR' registered in slot 3 [-1660892643], 833 : sched 'PS' registered in slot 4 [-1587056079], 833 : sched 'RES' registered in slot 5 already PRIVATE_DATA !=3D NULL??? already PRIVATE_DATA !=3D NULL??? hls_ctl: Moving to res1 ...and setting the parameters! bug: kernel timer added twice at e24c1689. HLSUnblockThreadHook --- WAI =3D 2 [512] HLSUnblockThreadHook --- WAI =3D 2 [512] HLS ERROR: Task 687 has rt_priority =3D 100 and state =3D 1 HLSUnblockThreadHook --- WAI =3D 5 [512] HLSUnblockThreadHook --- WAI =3D 5 [512] HLS ERROR: Task 687 has rt_priority =3D 100 and state =3D 1 HLS ERROR: Task 687 has rt_priority =3D 100 and state =3D 1 HLSUnblockThreadHook --- WAI =3D 5 [512] HLSUnblockThreadHook --- WAI =3D 5 [512] HLS ERROR: Task 687 has rt_priority =3D 100 and state =3D 1 HLS ERROR!!! Task state changed during the hook??? 0 !=3D 1!!! Correcting... HLS ERROR: Task 512 has rt_priority =3D 100 and state =3D 1 HLS ERROR: Task 512 has rt_priority =3D 100 and state =3D 0 HLS ERROR: Task 687 has rt_priority =3D 100 and state =3D 1 HLSUnblockThreadHook --- WAI =3D 5 [512] HLSUnblockThreadHook --- WAI =3D 5 [512] HLS ERROR: Task 687 has rt_priority =3D 100 and state =3D 1 HLS ERROR: Task 507 has rt_priority =3D 100 and state =3D 1 HLSUnblockThreadHook --- WAI =3D 5 [512] HLSUnblockThreadHook --- WAI =3D 5 [512] HLS ERROR: Task 507 has rt_priority =3D 100 and state =3D 1 =20 Here are the processes that are in error: =20 root 687 685 0 18:37 ? 00:00:00 in.telnetd: dm3 root 507 1 0 18:37 ? 00:00:00 syslogd -m 0 root 512 1 0 18:37 ? 00:00:00 klogd -x =20 On a previous run, I was seeing error messages with the init process, ie .. =20 Sep 3 17:58:16 wally kernel: HLS ERROR: Task 1 has rt_priority =3D 100 and state =3D 1 Sep 3 17:58:16 wally kernel: HLS ERROR: Task 1 has rt_priority =3D 100 and state =3D 1 Sep 3 17:58:16 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [511] Sep 3 17:58:16 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [511] Sep 3 17:58:16 wally kernel: HLS ERROR: Task 1 has rt_priority =3D 100 and state =3D 1 Sep 3 17:58:16 wally kernel: HLS ERROR!!! Task state changed during the hook??? 0 !=3D 1!!! Correcting... =20 And also, when my process freezes, I'm unable to kill it, even with kill -9 <pid>! =20 I don't know if it's related, but I've also seen the following messages in the syslog: =20 Sep 3 18:50:17 wally kernel: hls_ctl: Moving to res1 Sep 3 18:50:17 wally kernel: ...and setting the parameters! Sep 3 18:50:22 wally kernel: bug: kernel timer added twice at e24c1689. =20 Any help would be appreciated, and let me know if I can enable any more debugging to give more information. =20 Thanks and regards, Tony |
|
From: Paul K. <pko...@au...> - 2003-09-04 07:11:23
|
Hi all,
> > [Aside: from this I take it that CPU reservations are stronger than=20
> > SCHED_RR/SCHED_FIFO?
> In the default hierarchy that is built when the HLS module is=20
> loaded, yes. There is a RR scheduler (rr1), and the res1=20
> scheduler is scheduled on rr1 with priority 20. All the=20
> "regular tasks" are scheduled on another RR scheduler (rr2),=20
> which is scheduled on rr1 with priority 10. Hence, res tasks=20
> are scheduled in foreground respect to all the other tasks.=20
> Of course, you can change a task to rr1 (with priority > 20)=20
> to schedule 1 in foreground respect to res tasks...
I noted in one of John's HLS papers a scheduler hierarchy that looked
like this:
ROOT---|
| |
RES--JOIN
|
PS
Does this hierarchy give stronger guarantees to a RES task than the
standard hierarchy in HLS? I assume the answer is "no" iff there are no
rr1 tasks with priority >=3D 20?
I recall you said the join scheduler is currently broken. Is the
following hierarchy valid and similar (only hard reservations allowed,
no soft ones)?
ROOT---|
| |
RES PS =20
I'm trying to understand whether or not the standard hierarchy will
suffice for our application, or whether we need to compose a (more)
appropriate hierarchy. We want to guarantee our application meets its
real time constraints (no missed deadlines). More below...
>=20
> > With the latest talk about linux tasks vs HLS
> > tasks, what is the relationship between HLS round-robin and linux's=20
> > SCHED_OTHER, SCHED_FIFO and SCHED_RR?
> Currently, all linux tasks are scheduled in background=20
> respect to HLS tasks (a non-HLS task is scheduled only when=20
> no HLS tasks are ready). Hence, doing=20
I thought that all tasks were converted to HLS tasks (under rr2) when
the scheduler is first loaded into the kernel? This seems to be what
/proc/HLS/tasks shows. When any new tasks appear, I assumed they end up
on the default HLS scheduler (rr2) unless otherwise directed. When you
say "background", do you mean rr2? I assumed that the HLS rr2 scheduler
is basically playing the role SCHED_OTHER did before HLS was loaded? The
rr1 scheduler in the hierarchy is confusing me a little bit
(apologies!).
> sched_setsched(SCHED_RR) can be dangerous, because it results=20
> to do the opposite of what the user expects... This is what I=20
> am trying to fix.
>=20
> > Do SCHED_OTHER tasks -> HLS
> > round-robin? What about SCHED_RR/SCHED_FIFO tasks?
> This is exactly what we have to decide right now... The=20
> implemented solution (SCHED_OTHER, SCHED_RR, SCHED_FIFO -->=20
> background respect to
> HLS) is not good.
>=20
> > Is that what you're
> > grappling with at the moment, thinking about a HLS rt scheduler?
> Yes... You can for example set "rt scheduler =3D rr1" (for the=20
> standard hierarchy), so that changing a task to SCHED_FIFO or=20
> SCHED_RR will really increase its priority.
What happens to res1 tasks in this case? Will the sched_setscheduler()
call for a res1 task then fail if there are "demanding" SCHED_FIFO/RR
tasks or vice versa?
> =20
> > Based on the third simulation, I would've thought our application=20
> > would be okay. However it continues to show missed deadline=20
> problems.=20
> > It could be we need to revisit reservation requirements,=20
> but I'm also=20
> > keen to understand interrupt latency issues and scheduling issues.
> Does your application do some setsched(SCHED_FIFO) (or=20
> SCHED_RR)? Does it create a high I/O load?
>=20
Our application only schedules itself under res1 (5ms/6.625ms).
It does a fair bit of I/O: in each 6.625ms period, it does some number
crunching, sets up 3 DMA transfers to a cPCI card (1 read, 2 writes,
tens of KB) and does read/writes to 2 Ethernet cards (low throughput, <
1MB/s); once all that is done, it blocks. It gets woken up by the RTC
interrupts every 1ms and polls the cPCI card time counters to check if
they've ticked over into a new 6.625ms period; if so, it does all its
processing again,etc; if not, it immediately blocks again. Shortly we'll
forget the RTC 1ms interrupt and polling mechanism, and get the cPCI to
interrupt every 6.625ms. I hope that made sense!
Regards,
Paul. =20
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf _______________________________________________
Linux-hls-devel mailing list Lin...@li...
https://lists.sourceforge.net/lists/listinfo/linux-hls-devel
|
|
From: Luca A. <luc...@em...> - 2003-09-04 10:09:20
|
Hi Paul,
just a quick mail (I am a little bit in hurry, right now); I'll send a
longer reply this evening.
[...]
> I noted in one of John's HLS papers a scheduler hierarchy that looked
> like this:
>
> ROOT---|
> | |
> RES--JOIN
> |
> PS
>
> Does this hierarchy give stronger guarantees to a RES task than the
> standard hierarchy in HLS? I assume the answer is "no" iff there are no
> rr1 tasks with priority >= 20?
Don't know about this (I'll need to re-read that paper)...
[...]
> I'm trying to understand whether or not the standard hierarchy will
> suffice for our application, or whether we need to compose a (more)
> appropriate hierarchy. We want to guarantee our application meets its
> real time constraints (no missed deadlines). More below...
I think that the standard hierarchy should be enough... You eventually
will have to change some tasks' priorities.
If I remember well, the standard hierarchy is:
root
|
|
-----rr1-----
| | |
| | |
res1 rr2 ps1
P=20 P=10 P=9 (rr2 is the default scheduler)
I assume you are not using the proportional share scheduler ps1. Anyway,
you have a lot of freedom in scheduling your tasks:
1) you can leave a task under the rr2 scheduler, changing its priority
to make it more or less important
2) you can move a task to the rr1 scheduler, giving it priority < 10, so
that it is scheduled in background respect to all the rr2 tasks
3) you can move a task to the rr1 scheduler, giving it priority 20 > P >
10, so that it is scheduled in foreground respect to all the rr2 tasks,
but in background respect to all the reserved tasks (res1)
3) you can move a task to the rr1 scheduler, giving it priority > 20, so
that it is scheduled in foreground respect to all the rr2 and res1 tasks
> I thought that all tasks were converted to HLS tasks (under rr2) when
> the scheduler is first loaded into the kernel?
Yes, this is correct: when the module is inserted all the tasks are
moved to the default scheduler (rr2), and when a task is created, it is
scheduled by the default scheduler.
But if you explicitly call sched_setscheduler() choosing the SCHED_RR,
SCHED_FIFO, or SCHED_OTHER policy, then the task returns to be a
"regular linux task", and is scheduled in background respect to all the
HLS tasks.
> When you say "background", do you mean rr2?
No, I was meaning that when a task decides to return to be a linux task
(by selecting the SCHED_RR, SCHED_FIFO, or SCHED_OTHER policy), it is
not scheduled anymore unless all the HLS tasks are idle.
> I assumed that the HLS rr2 scheduler is basically playing the role
> SCHED_OTHER did before HLS was loaded?
Yes, this is almost correct. The only problem is that if a task
explicitly selects SCHED_OTHER, it currently returns to be a non HLS
task. Setting SCHED_OTHER ---> default scheduler is a good idea, and I
will do it.
> The
> rr1 scheduler in the hierarchy is confusing me a little bit
> (apologies!).
Well, let's see if I remember well (John, please correct me...). If I am
not wrong, a root scheduler can have only a single scheduler as a child.
rr1 is the child of the root scheduler, and it is used to "schedule the
other schedulers" in a prioritized way. I hope this clarifies things a
little bit...
> > Yes... You can for example set "rt scheduler = rr1" (for the
> > standard hierarchy), so that changing a task to SCHED_FIFO or
> > SCHED_RR will really increase its priority.
>
> What happens to res1 tasks in this case?
If you set task priorities < 20, it is not affected.
> Will the sched_setscheduler()
> call for a res1 task then fail if there are "demanding" SCHED_FIFO/RR
> tasks or vice versa?
No, we do not implement hierarchical guarantees. If some time-demanding
task is scheduled on rr1 with priority > 20, then the res1 tasks can
fail to get their reserved time even if sched_setscheduler() did not
fail.
Luca
--
_____________________________________________________________________________
Copy this in your signature, if you think it is important:
N O W A R ! ! !
--
Email.it, the professional e-mail, gratis per te: http://www.email.it/f
Sponsor:
Difendi la tua casella di posta dai virus e dallo spam, prendi Email.it Pro15, Pro50 o Pro100 la casella professionale e sicura.
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=1048&d=4-9
|
|
From: Tony L. <ton...@ya...> - 2003-09-06 03:14:28
|
Hi Luca, Thanks for the suggestions! Yes, the kernel we are using does have the pre-emptive kernel patch in it. I removed the patch yesterday but had problems building the driver for the custom PCI card we are using(?) - I am looking at rebuilding it and then running the test without the pre-emptive patch. >Looks like someone is unblocking ... I ran our application a few more times and observed the machine to freeze almost immediately after starting it. Given your comment above, the only thing that is running from the start of the program is the real time clock interrupt read() inside an infinite while() loop. It may be that the this is causing the unblocking behaviour you interpereted. On Monday I will : - apply the debug patch from your last email - uncomment the __DO_CLI__ macros - disable the rtc interrupts in my application - run my application If the problem doesn't occur then the rtc may be the cause. Otherwise I will re-enable the rtc and send you the output of dmesg. Following that test I will go back to getting the kernel running without the pre-emptive kernel patch and then let you know if the machine still freezes (and send dmesg output if needed). Thanks again for your help. Tony __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: Luca A. <luc...@em...> - 2003-09-07 09:47:43
|
Hi Tony,
from your description, I think that the RTC is part of the problem... It
probably generates a high priority interrupt that interferes with HLS
(there should be some bug in the HLS locking code)... I'll double-check
that.
> On Monday I will :
> - apply the debug patch from your last email
> - uncomment the __DO_CLI__ macros
> - disable the rtc interrupts in my application
> - run my application
Ok, thanks!!! These results will be very useful.
Thanks again for your testing,
Luca
--
_____________________________________________________________________________
Copy this in your signature, if you think it is important:
N O W A R ! ! !
--
Email.it, the professional e-mail, gratis per te: http://www.email.it/f
Sponsor:
Occhialeria.it
Scopri le migliori marche a prezzi imbattibili
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=879&d=7-9
|
|
From: Tony L. <tl...@au...> - 2003-09-09 02:35:54
|
Hi Luca, We've had problems with our mail server, so this email is a re-post of the one I sent yesterday with some added comments.... I applied the patch you sent and also uncommented the __DO_CLI__ macros from hls_timers.c (Note, I had to add "unsigned long flags;" to hls_timers.c to get the code to build with the macro enabled). I then ran my program a few times (~10) with and without the rtc. The machine froze independently of running my program with the rtc. On one occasion, the machine froze when I ran "init 6" to reboot it (when I couldn't kill off my program). I suspected that the amount of logging to /var/log/messages was the cause of the freeze, so I rebuilt the HLS scheduler with DEBUG=3D0, but then the module wouldn't install and locked up my machine! I then rebuilt the HLS scheduler with CLI=3D1 CREATE=3D1 INT_SCHED=3D1 = DEBUG=3D1 MULDIV=3D1 (ie normal build) and stopped syslogd. I then tested my program with no syslogd and with the rtc - which froze my program and also the machine, so there does seem to be a problem with the rtc and hls. During all of the rtc tests, I only saw the debug statement from your last patch once, eg: Sep 9 09:38:42 wally kernel: HLS ERROR: Task 635 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:38:42 wally last message repeated 34 times Sep 9 09:39:38 wally kernel: hls_ctl: Moving to res1 Sep 9 09:39:38 wally kernel: ...and setting the parameters! Sep 9 09:39:47 wally kernel: HLS ERROR: Scheduler res1 posted a timer twice!!! WAI =3D 0 Sep 9 09:40:04 wally last message repeated 2 times Sep 9 09:40:05 wally kernel: HLS ERROR: Task 1130 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:40:05 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [505] I then tested with no syslogd and no rtc. My program froze after running for a few minutes, but the machine did not lock up, so I was able to stop my program by removing the HLS module. This produces the following errors, but I'm able to access the machine OK after removing the module: (task 861 is my program) Removing task 861 from HLS ../hls/hls_hooks.c:329: failed HLS assertion: "current_task() =3D=3D Thread". =20 ../hls/hls_hooks.c:329: failed HLS assertion: "current_task() =3D=3D Thread". =20 HLS ERROR: not panicing, but continuing... HLS ERROR: not panicing, but continuing... =20 During both rtc/non-rtc tests, either nothing was logged before the program froze, or messages similar to the following appeared in the messages file while the program was frozen (a lot of them!): Sep 9 10:26:46 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [503] Sep 9 10:26:46 wally kernel: HLS ERROR: Task 754 has rt_priority =3D = 100 and state =3D 1 Sep 9 10:26:46 wally kernel: HLS ERROR!!! Task state changed during the hook??? 0 !=3D 1!!! Correcting... Hope this was helpful. Thanks, Tony |
|
From: Luca A. <luc...@em...> - 2003-09-09 07:48:46
|
Hi Tony,
thanks for all the information... I am going to parse it, and I'll let
you know as soon as I discover something.
Just few questions, to be sure that I understood everything correctly:
- Does your machine crash even without using the RTC (in your first
mail, I read "The good news is that the problem seems to be caused
by the rtc as you suspected", but in the second one I seem to understand
the opposite)? If yes, I would say that we are in big trouble... I
cannot understand why HLS is having problems in your machine and I am
not able to reproduce them...
- Was the test performed with or without the kernel preemption patch?
- If I understand well, the "HLS ERROR: Scheduler xxx posted a timer
twice" is not the first error in your log... Is my understanding
correct? If yes, can you post the first error that you see in your log,
and the lines coming immediately before it in the log?
- Is the HLS module failing when compiled with CLI=1 CREATE=1
INT_SCHED=1 DEBUG=0 MULDIV=1?
If no, can you try it and send a log (when you have time)?
Thanks,
Luca
--
_____________________________________________________________________________
Copy this in your signature, if you think it is important:
N O W A R ! ! !
--
Email.it, the professional e-mail, gratis per te: http://www.email.it/f
Sponsor:
Natsabe.it la più grande erboristeria online italiana
prezzi bassi tutto l'anno !
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=1298&d=9-9
|
|
From: Tony L. <ton...@ya...> - 2003-09-09 03:39:21
|
Hi Luca, (You may get this email twice due to problems with my mail server - sorry). I applied the patch you sent and also uncommented the __DO_CLI__ macros from hls_timers.c (Note, I had to add "unsigned long flags;" to hls_timers.c to get the code to build with the macro enabled). I then ran my application with and without the rtc. The good news is that the problem seems to be caused by the rtc as you suspected; unfortunately I didn't see any of the debug messages from the patch that you sent. I only ran the test a couple of times (I have to reboot in between tests when it fails), and I will run it a few more times tomorrow and let you know if I get any output. Thanks, Tony __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: Tony L. <tl...@au...> - 2003-09-12 03:46:03
|
Hi Luca, Sorry for the late reply - our mail server is finally working correctly (I hope!). >Does your machine crash even without using the RTC=20 Yes. I sent the first message at the end of the day after only running a couple of tests, but I then ran more tests the next day which showed the problem to occur even without the RTC. Sorry about the confusion - I thought the first email was lost but our mail server sent it out a day late! >Was the test performed with or without the kernel preemption patch? Yes, we have been using the kernel preemption patch. I did plan to re-test without this patch but I've been busy with other work. >If I understand well, the "HLS ERROR: Scheduler xxx posted a timer twice" is not the first error in your log... Is my understanding correct? Yes, here is the log from when the hls_module was inserted: Sep 9 09:36:31 wally kernel: HLS MP initializing (HLS_DBG_PRINT_LEVEL = =3D 1). Sep 9 09:36:31 wally kernel: [1643284912], 803 : sched 'ROOT' registered in slot 0 Sep 9 09:36:31 wally kernel: [1718493520], 803 : sched 'JOIN' registered in slot 1 Sep 9 09:36:31 wally kernel: [1793690865], 803 : sched 'TH' registered in slot 2 Sep 9 09:36:31 wally kernel: [1866161544], 803 : sched 'RR' registered in slot 3 Sep 9 09:36:31 wally kernel: [1938629664], 803 : sched 'PS' registered in slot 4 Sep 9 09:36:31 wally kernel: [2011100543], 803 : sched 'RES' registered in slot 5 Sep 9 09:36:31 wally kernel: already PRIVATE_DATA !=3D NULL??? Sep 9 09:36:31 wally kernel: already PRIVATE_DATA !=3D NULL??? Sep 9 09:36:36 wally kernel: hls_ctl: Moving to res1 Sep 9 09:36:36 wally kernel: ...and setting the parameters! Sep 9 09:36:52 wally kernel: HLS ERROR: Task 756 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:36:52 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [505] Sep 9 09:36:52 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [505] Sep 9 09:36:52 wally kernel: HLS ERROR: Task 756 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:36:52 wally kernel: HLS ERROR: Task 756 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:36:52 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [505] Sep 9 09:36:52 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [505] Sep 9 09:36:52 wally kernel: HLS ERROR: Task 756 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:36:52 wally kernel: HLS ERROR!!! Task state changed during the hook??? 0 !=3D 1!!! Correcting... Lots of the above messages, then ... Sep 9 09:38:42 wally kernel: HLS ERROR: Task 635 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:38:42 wally kernel: HLS ERROR: Task 635 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:38:42 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [505] Sep 9 09:38:42 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [505] Sep 9 09:38:42 wally kernel: HLS ERROR: Task 635 has rt_priority =3D = 100 and state =3D 1 Sep 9 09:38:42 wally last message repeated 34 times Sep 9 09:39:38 wally kernel: hls_ctl: Moving to res1 Sep 9 09:39:38 wally kernel: ...and setting the parameters! Sep 9 09:39:47 wally kernel: HLS ERROR: Scheduler res1 posted a timer twice!!! WAI =3D 0 Sep 9 09:40:04 wally last message repeated 2 times Sep 9 09:40:05 wally kernel: HLS ERROR: Task 1130 has rt_priority =3D = 100 and state =3D 1 >Is the HLS module failing when compiled with CLI=3D1 CREATE=3D1 INT_SCHED=3D1 DEBUG=3D0 MULDIV=3D1? I've just tried this and it builds OK. My machine still froze - here's the output ... (I don't think that anything new was logged) Sep 12 13:03:02 wally kernel: HLS MP initializing (no debugging). Sep 12 13:03:02 wally kernel: already PRIVATE_DATA !=3D NULL??? Sep 12 13:03:02 wally kernel: already PRIVATE_DATA !=3D NULL??? Sep 12 13:04:26 wally kernel: hls_ctl: Moving to res1 Sep 12 13:04:26 wally kernel: ...and setting the parameters! Sep 12 13:04:49 wally kernel: HLS ERROR: Task 1105 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:49 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [498] Sep 12 13:04:49 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [498] Sep 12 13:04:49 wally kernel: HLS ERROR: Task 1105 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:49 wally kernel: HLS ERROR: Task 1105 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:49 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [498] Sep 12 13:04:49 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [498] Sep 12 13:04:49 wally kernel: HLS ERROR: Task 1105 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:49 wally kernel: HLS ERROR!!! Task state changed during the hook??? 0 !=3D 1!!! Correcting... Sep 12 13:04:49 wally kernel: HLS ERROR: Task 498 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:49 wally kernel: HLS ERROR: Task 498 has rt_priority =3D = 100 and state =3D 0 Sep 12 13:04:52 wally kernel: HLS ERROR: Task 1105 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:52 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [498] Sep 12 13:04:52 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [498] Sep 12 13:04:52 wally kernel: HLS ERROR: Task 1105 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:52 wally kernel: HLS ERROR: Task 1105 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:52 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [498] Sep 12 13:04:52 wally kernel: HLSUnblockThreadHook --- WAI =3D 5 = [498] Sep 12 13:04:52 wally kernel: HLS ERROR: Task 1105 has rt_priority =3D = 100 and state =3D 1 Sep 12 13:04:52 wally kernel: HLS ERROR!!! Task state changed during the hook??? 0 !=3D 1!!! Correcting... ... etc ... Thanks, Tony |
|
From: Luca A. <luc...@em...> - 2003-09-12 15:18:12
|
Hi Tony,
[...]
> >Does your machine crash even without using the RTC
> Yes. I sent the first message at the end of the day after only running a
> couple of tests, but I then ran more tests the next day which showed the
> problem to occur even without the RTC. Sorry about the confusion - I
> thought the first email was lost but our mail server sent it out a day
> late!
Ok, so RTC is not the cause of the problem... Hence, since I am not able
to reproduce the crash I suspect that the problem is in a bad
interaction with the preemption patch.
> >Was the test performed with or without the kernel preemption patch?
> Yes, we have been using the kernel preemption patch.
I think this is the only difference between your configuration and my
one... I hope to be able to get an x86 test machine in the next week, so
that I can install the preemption patch and try to reproduce the
problem.
> I did plan to
> re-test without this patch but I've been busy with other work.
This would be very useful. Anyway, there is no hurry (I still have to
set up a test environment with a preemptive kernel).
> >If I understand well, the "HLS ERROR: Scheduler xxx posted a timer
> twice" is not the first error in your log... Is my understanding
> correct?
>
> Yes, here is the log from when the hls_module was inserted:
Ok. So the timer was not the cause of the problem...
[...]
> Sep 9 09:36:31 wally kernel: [1938629664], 803 : sched 'PS' registered
> in slot 4
> Sep 9 09:36:31 wally kernel: [2011100543], 803 : sched 'RES' registered
> in slot 5
> Sep 9 09:36:31 wally kernel: already PRIVATE_DATA != NULL???
> Sep 9 09:36:31 wally kernel: already PRIVATE_DATA != NULL???
> Sep 9 09:36:36 wally kernel: hls_ctl: Moving to res1
> Sep 9 09:36:36 wally kernel: ...and setting the parameters!
> Sep 9 09:36:52 wally kernel: HLS ERROR: Task 756 has rt_priority = 100
> and state = 1
> Sep 9 09:36:52 wally kernel: HLSUnblockThreadHook --- WAI = 5 [505]
> Sep 9 09:36:52 wally kernel: HLSUnblockThreadHook --- WAI = 5 [505]
It seems that about 15 seconds after moving a task to the res1
scheduler, the internal HLS status gets corrupted... I think the latest
two messages are just the log daemon that wakes up due the the "HLS
ERROR:" printk. Hence, the important message is "HLS ERROR: Task 756 has
rt_priority = 100 and state = 1". I'll have a look...
> >Is the HLS module failing when compiled with CLI=1 CREATE=1
> INT_SCHED=1 DEBUG=0 MULDIV=1?
> I've just tried this and it builds OK. My machine still froze - here's
> the output ... (I don't think that anything new was logged)
Looks like a similar output...
BTW, what does your program do? (how many reservations does it create,
how big are those reservations, and so on...). I think Paul tested both
the schedtest example and hourglass, and it worked...
Can you provide a simple program that causes this freeze?
Thanks,
Luca
--
_____________________________________________________________________________
Copy this in your signature, if you think it is important:
N O W A R ! ! !
--
Email.it, the professional e-mail, gratis per te: http://www.email.it/f
Sponsor:
Se ami vivere all'aperto non puoi farne a meno...clicca e scopri cos'è
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=1775&d=12-9
|
|
From: Luca A. <luc...@em...> - 2003-09-04 05:40:28
|
Hi Paul,
> To answer one question it looks like 5 seconds pass before the timer bug
> message appears.
>
> Sep 3 18:50:17 wally kernel: hls_ctl: Moving to res1
> Sep 3 18:50:17 wally kernel: ...and setting the parameters!
> Sep 3 18:50:22 wally kernel: bug: kernel timer added twice at e24c1689
Ok, thanks! Unfortunately, I did not understand what is going on yet,
but I am working on this...
> I don't know if it's relevant but we use the RTC to generate 1 ms (1
> kHz) interrupts to our process.
This should not be a problem.
> I assume that the 1 kHz kernel timer is
> generated by a different timer (the PIC/APIC?)
Yes, it is generated by the PIT.
> We tried to specify a CPU reservation of 5ms out of 6.625ms. Before
> handballing to Tony I previously tested a "simulation" using hourglass
> (v0.6) to check on interrupt latency and deadline misses.
>
> Typing:
>
> hourglass -n 1 -t 0 -rh 5ms 6.625ms -w LAT
> while simultaneously running top with update period 0.01 gave latencies
> of around 1.3ms
>
> hourglass -n 1 -t 0 -rh 5ms 6.625ms -w LAT -i RTC
> while simultaneously running top with update period 0.01 gave latencies
> of around 0.3ms
>
> hourglass -n 5 -t 0 -rh 5ms 6.625ms -w PERIODIC 4ms 6.625ms -t 1 -p
> RTHIGH -w PERIODIC 1ms 10ms
> showed that task 0 met all its deadlines but 1 (missed 1 at the
> beginning, a synchronisation/startup problem still?)
I'll try to have a look and to investigate these latencies.
A thing that I can say for sure is that a period of 6.625 can give some
problems, since it is not a multiple of the system tick (I expect some
additional latencies due to this, but I don't think it is the reason for
the problem you are seeing).
> [Aside: from this I take it that CPU reservations are stronger than
> SCHED_RR/SCHED_FIFO?
In the default hierarchy that is built when the HLS module is loaded,
yes. There is a RR scheduler (rr1), and the res1 scheduler is scheduled
on rr1 with priority 20. All the "regular tasks" are scheduled on
another RR scheduler (rr2), which is scheduled on rr1 with priority 10.
Hence, res tasks are scheduled in foreground respect to all the other
tasks.
Of course, you can change a task to rr1 (with priority > 20) to schedule
1 in foreground respect to res tasks...
> With the latest talk about linux tasks vs HLS
> tasks, what is the relationship between HLS round-robin and linux's
> SCHED_OTHER, SCHED_FIFO and SCHED_RR?
Currently, all linux tasks are scheduled in background respect to HLS
tasks (a non-HLS task is scheduled only when no HLS tasks are ready).
Hence, doing sched_setsched(SCHED_RR) can be dangerous, because it
results to do the opposite of what the user expects... This is what I am
trying to fix.
> Do SCHED_OTHER tasks -> HLS
> round-robin? What about SCHED_RR/SCHED_FIFO tasks?
This is exactly what we have to decide right now... The implemented
solution (SCHED_OTHER, SCHED_RR, SCHED_FIFO --> background respect to
HLS) is not good.
> Is that what you're
> grappling with at the moment, thinking about a HLS rt scheduler?
Yes... You can for example set "rt scheduler = rr1" (for the standard
hierarchy), so that changing a task to SCHED_FIFO or SCHED_RR will
really increase its priority.
> Based on the third simulation, I would've thought our application would
> be okay. However it continues to show missed deadline problems. It could
> be we need to revisit reservation requirements, but I'm also keen to
> understand interrupt latency issues and scheduling issues.
Does your application do some setsched(SCHED_FIFO) (or SCHED_RR)? Does
it create a high I/O load?
I am continuing to study the problem reported yesterday (unfortunately,
I cannot reproduce it), and I'll let you know as soon as I discover
something...
Luca
--
_____________________________________________________________________________
Copy this in your signature, if you think it is important:
N O W A R ! ! !
--
Email.it, the professional e-mail, gratis per te: http://www.email.it/f
Sponsor:
Non capite un cavolo di borsa? Investite nella zucca.
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=665&d=4-9
|