sched_clock() is employed in 2.6.0 to obtain a timestamp for two purposes:
1) as part of the recalc_task_prio() algorithm (et al), and
2) in can_migrate_task() to determine if another CPU's task is cache-hot.
For purpose #2, the timestamp needs only a "jiffies" accuracy, although it
does require that the timestamp be synchronized across all CPUs in the system.
For purpose #1, the timestamp is preferably one with an accuracy higher than
"jiffies," although the timestamp need not be synchronized across the CPUs.
The 2.6.0 sched_clock() is broken for "drifty timebase" ia64 platforms like
the SGI NUMA because it is implemented by reading the local CPU's ITC (the
processor's cycle counter), which isn't synchronized among the CPUs. This
breaks can_migrate_task().
i386-NUMA deals with an unsychronized TSC by implementing sched_clock() using
"jiffies" as the synchronized timebase. This solves the above purpose #2, but
loses the accuracy that purpose #1 would like. The Alpha implemention also
uses "jiffies."
I proposed a patch to implement a platform-specific ia64 sched_clock() for use
by SGI's SN platform, and David Mosberger countered with a desire for a more
general solution. He suggests something along the lines of separating the
current use of sched_clock() into potentially two different timestamps for
"drifty timebase" platforms:
1) Continuing to use a high-resolution timestamp, using a local timebase, for
purpose #1.
2) For drifty platforms, separately using a low-resolution "jiffies"-based
timestamp, for purpose #2. Architectures and platforms without drift would
continue to use the current sched_clock() for both #1 and #2. Drifty
platforms would (by use of macros) implement separate timestamps.
This new scheme would allow i386-NUMA to use a high-resolution timebase for
purpose #1. Do the i386-NUMA folks have any interest in this?
John Hawkes
|