From: Glauber de O. C. <gc...@re...> - 2007-10-29 22:10:13
|
From: Glauber de Oliveira Costa <glauber@t60.localdomain> tsc is very good time source (when it does not have drifts, does not change it's frequency, i.e. when it works), so it should have its rating raised to a value greater than, or equal 400. Since it's being a tendency among paravirt clocksources to use values around 400, we should declare tsc as even better: So we use 500. This patch also touches the comments on clocksource.h, which suggests that 499 would be a limit on the rating values. Signed-off-by: Glauber de Oliveira Costa <gc...@re...> --- arch/x86/kernel/tsc_32.c | 2 +- arch/x86/kernel/tsc_64.c | 2 +- include/linux/clocksource.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index 9ebc0da..4d91e59 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -280,7 +280,7 @@ static cycle_t read_tsc(void) static struct clocksource clocksource_tsc = { .name = "tsc", - .rating = 300, + .rating = 500, .read = read_tsc, .mask = CLOCKSOURCE_MASK(64), .mult = 0, /* to be set */ diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c index 9c70af4..4fd5b1b 100644 --- a/arch/x86/kernel/tsc_64.c +++ b/arch/x86/kernel/tsc_64.c @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void) static struct clocksource clocksource_tsc = { .name = "tsc", - .rating = 300, + .rating = 500, .read = read_tsc, .mask = CLOCKSOURCE_MASK(64), .shift = 22, diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 107787a..5b0aadd 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -39,7 +39,7 @@ struct clocksource; * A correct and usable clocksource. * 300-399: Desired. * A reasonably fast and accurate clocksource. - * 400-499: Perfect + * >= 400 : Perfect * The ideal clocksource. A must-use where * available. * @read: returns a cycle value -- 1.5.0.6 |
From: Thomas G. <tg...@li...> - 2007-10-29 22:18:05
|
On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote: CC'ed John and removed glauber@t60.localdomain :) > From: Glauber de Oliveira Costa <glauber@t60.localdomain> > > tsc is very good time source (when it does not have drifts, does not > change it's frequency, i.e. when it works), so it should have its rating > raised to a value greater than, or equal 400. > > Since it's being a tendency among paravirt clocksources to use values > around 400, we should declare tsc as even better: So we use 500. > > This patch also touches the comments on clocksource.h, which suggests > that 499 would be a limit on the rating values. > > Signed-off-by: Glauber de Oliveira Costa <gc...@re...> Acked-by: Thomas Gleixner <tg...@li...> > --- > arch/x86/kernel/tsc_32.c | 2 +- > arch/x86/kernel/tsc_64.c | 2 +- > include/linux/clocksource.h | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c > index 9ebc0da..4d91e59 100644 > --- a/arch/x86/kernel/tsc_32.c > +++ b/arch/x86/kernel/tsc_32.c > @@ -280,7 +280,7 @@ static cycle_t read_tsc(void) > > static struct clocksource clocksource_tsc = { > .name = "tsc", > - .rating = 300, > + .rating = 500, > .read = read_tsc, > .mask = CLOCKSOURCE_MASK(64), > .mult = 0, /* to be set */ > diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c > index 9c70af4..4fd5b1b 100644 > --- a/arch/x86/kernel/tsc_64.c > +++ b/arch/x86/kernel/tsc_64.c > @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void) > > static struct clocksource clocksource_tsc = { > .name = "tsc", > - .rating = 300, > + .rating = 500, > .read = read_tsc, > .mask = CLOCKSOURCE_MASK(64), > .shift = 22, > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 107787a..5b0aadd 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -39,7 +39,7 @@ struct clocksource; > * A correct and usable clocksource. > * 300-399: Desired. > * A reasonably fast and accurate clocksource. > - * 400-499: Perfect > + * >= 400 : Perfect > * The ideal clocksource. A must-use where > * available. > * @read: returns a cycle value > -- > 1.5.0.6 > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to maj...@vg... > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > |
From: Ingo M. <mi...@el...> - 2007-10-30 01:04:46
|
* Thomas Gleixner <tg...@li...> wrote: > > Since it's being a tendency among paravirt clocksources to use > > values around 400, we should declare tsc as even better: So we use > > 500. > > > > This patch also touches the comments on clocksource.h, which > > suggests that 499 would be a limit on the rating values. > > > > Signed-off-by: Glauber de Oliveira Costa <gc...@re...> > > Acked-by: Thomas Gleixner <tg...@li...> Acked-by: Ingo Molnar <mi...@el...> Ingo |
From: john s. <joh...@us...> - 2007-10-30 01:25:37
|
On Mon, 2007-10-29 at 23:17 +0100, Thomas Gleixner wrote: > On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote: > > CC'ed John and removed glauber@t60.localdomain :) > > > From: Glauber de Oliveira Costa <glauber@t60.localdomain> > > > > tsc is very good time source (when it does not have drifts, does not > > change it's frequency, i.e. when it works), so it should have its rating > > raised to a value greater than, or equal 400. But all of those qualities (and more) make it less then ideal. What issue exactly does this patch resolve? > > Since it's being a tendency among paravirt clocksources to use values > > around 400, we should declare tsc as even better: So we use 500. This is the rating inflation I was initially worried about and created the advisory scale to avoid. If paravirt clocksources are rated too high, they should be dropped. I strongly disagree that the TSC is a "perfect" clocksource, and I think its rating is appropriate. > > This patch also touches the comments on clocksource.h, which suggests > > that 499 would be a limit on the rating values. > > > > Signed-off-by: Glauber de Oliveira Costa <gc...@re...> > > Acked-by: Thomas Gleixner <tg...@li...> If a better justification can be provided, I might reconsider, but right now I can't ack this. thanks -john > > --- > > arch/x86/kernel/tsc_32.c | 2 +- > > arch/x86/kernel/tsc_64.c | 2 +- > > include/linux/clocksource.h | 2 +- > > 3 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c > > index 9ebc0da..4d91e59 100644 > > --- a/arch/x86/kernel/tsc_32.c > > +++ b/arch/x86/kernel/tsc_32.c > > @@ -280,7 +280,7 @@ static cycle_t read_tsc(void) > > > > static struct clocksource clocksource_tsc = { > > .name = "tsc", > > - .rating = 300, > > + .rating = 500, > > .read = read_tsc, > > .mask = CLOCKSOURCE_MASK(64), > > .mult = 0, /* to be set */ > > diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c > > index 9c70af4..4fd5b1b 100644 > > --- a/arch/x86/kernel/tsc_64.c > > +++ b/arch/x86/kernel/tsc_64.c > > @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void) > > > > static struct clocksource clocksource_tsc = { > > .name = "tsc", > > - .rating = 300, > > + .rating = 500, > > .read = read_tsc, > > .mask = CLOCKSOURCE_MASK(64), > > .shift = 22, > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > > index 107787a..5b0aadd 100644 > > --- a/include/linux/clocksource.h > > +++ b/include/linux/clocksource.h > > @@ -39,7 +39,7 @@ struct clocksource; > > * A correct and usable clocksource. > > * 300-399: Desired. > > * A reasonably fast and accurate clocksource. > > - * 400-499: Perfect > > + * >= 400 : Perfect > > * The ideal clocksource. A must-use where > > * available. > > * @read: returns a cycle value > > -- > > 1.5.0.6 > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to maj...@vg... > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > |
From: Zachary A. <za...@vm...> - 2007-10-29 22:41:22
|
On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote: > From: Glauber de Oliveira Costa <glauber@t60.localdomain> > > tsc is very good time source (when it does not have drifts, does not > change it's frequency, i.e. when it works), so it should have its rating > raised to a value greater than, or equal 400. > > Since it's being a tendency among paravirt clocksources to use values > around 400, we should declare tsc as even better: So we use 500. Why is the TSC better than a paravirt clocksource? In our case this is definitely inaccurate. Paravirt clocksources should be preferred to TSC, and both must be made available in hardware for platforms which do not support paravirt. Also, please cc all the paravirt developers on things related to paravirt, especially things with such broad effect. I think 400 is a good value for a perfect native clocksource. >400 should be reserved for super-real (i.e. paravirt) sources that should always be chosen over a hardware realistic implementation in a virtual environment. Zach |
From: Jeremy F. <je...@go...> - 2007-10-29 22:45:10
|
Zachary Amsden wrote: > On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote: > >> From: Glauber de Oliveira Costa <glauber@t60.localdomain> >> >> tsc is very good time source (when it does not have drifts, does not >> change it's frequency, i.e. when it works), so it should have its rating >> raised to a value greater than, or equal 400. >> >> Since it's being a tendency among paravirt clocksources to use values >> around 400, we should declare tsc as even better: So we use 500. >> > > Why is the TSC better than a paravirt clocksource? In our case this is > definitely inaccurate. Paravirt clocksources should be preferred to > TSC, and both must be made available in hardware for platforms which do > not support paravirt. > > Also, please cc all the paravirt developers on things related to > paravirt, especially things with such broad effect. I think 400 is a > good value for a perfect native clocksource. >400 should be reserved > for super-real (i.e. paravirt) sources that should always be chosen over > a hardware realistic implementation in a virtual environment. > Yes, agreed. The tsc is never the right thing to use if there's a paravirt clocksource available. What's wrong with rating it 300? What inferior clocksource does it lose out to? Shouldn't that clocksource be lowered? (Why don't we just use 1 to 10?) J |
From: Ingo M. <mi...@el...> - 2007-10-30 00:43:30
|
* Zachary Amsden <za...@vm...> wrote: > On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote: > > From: Glauber de Oliveira Costa <glauber@t60.localdomain> > > > > tsc is very good time source (when it does not have drifts, does not > > change it's frequency, i.e. when it works), so it should have its rating > > raised to a value greater than, or equal 400. > > > > Since it's being a tendency among paravirt clocksources to use values > > around 400, we should declare tsc as even better: So we use 500. > > Why is the TSC better than a paravirt clocksource? In our case this > is definitely inaccurate. Paravirt clocksources should be preferred > to TSC, and both must be made available in hardware for platforms > which do not support paravirt. if it's inaccurate why are you exposing it to the guest then? Native only uses the TSC if it's safe and accurate to do so. Ingo |
From: Jeremy F. <je...@go...> - 2007-10-29 22:52:33
|
Ingo Molnar wrote: > * Zachary Amsden <za...@vm...> wrote: > > >> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote: >> >>> From: Glauber de Oliveira Costa <glauber@t60.localdomain> >>> >>> tsc is very good time source (when it does not have drifts, does not >>> change it's frequency, i.e. when it works), so it should have its rating >>> raised to a value greater than, or equal 400. >>> >>> Since it's being a tendency among paravirt clocksources to use values >>> around 400, we should declare tsc as even better: So we use 500. >>> >> Why is the TSC better than a paravirt clocksource? In our case this >> is definitely inaccurate. Paravirt clocksources should be preferred >> to TSC, and both must be made available in hardware for platforms >> which do not support paravirt. >> > > if it's inaccurate why are you exposing it to the guest then? Native > only uses the TSC if it's safe and accurate to do so. > It is used as part of the Xen clocksource as a short term extrapolator, with correction parameters supplied by the hypervisor. It should never be used directly. J |
From: Ingo M. <mi...@el...> - 2007-10-29 23:45:01
|
* Jeremy Fitzhardinge <je...@go...> wrote: > > if it's inaccurate why are you exposing it to the guest then? Native > > only uses the TSC if it's safe and accurate to do so. > > It is used as part of the Xen clocksource as a short term > extrapolator, with correction parameters supplied by the hypervisor. > It should never be used directly. that's totally broken then. You cannot create an SMP-safe monotonic clocksource via interpolation - native does not do it either. Good thing this problem got exposed, it needs to be fixed. Ingo |
From: Zachary A. <za...@vm...> - 2007-10-29 22:54:10
|
On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote: > * Zachary Amsden <za...@vm...> wrote: > if it's inaccurate why are you exposing it to the guest then? Native > only uses the TSC if it's safe and accurate to do so. Not every guest support paravirt, but for correctness, all guests require TSC, which must be exposed all the way up to userspace, no matter what the efficiency or accuracy may be. Zach |
From: Ingo M. <mi...@el...> - 2007-10-30 00:29:23
|
* Zachary Amsden <za...@vm...> wrote: > On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote: > > * Zachary Amsden <za...@vm...> wrote: > > > if it's inaccurate why are you exposing it to the guest then? Native > > only uses the TSC if it's safe and accurate to do so. > > Not every guest support paravirt, but for correctness, all guests > require TSC, which must be exposed all the way up to userspace, no > matter what the efficiency or accuracy may be. but if there's a perfect TSC available (there is such hardware) then the TSC _is_ the best clocksource. Paravirt now turns it off unconditionally in essence. anyway, that's at most an optimization issue. No strong feelings here, and we can certainly delay this patch until this gets all sorted out. Ingo |
From: Glauber de O. C. <gc...@re...> - 2007-10-30 11:58:45
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Zachary Amsden escreveu: > On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote: >> * Zachary Amsden <za...@vm...> wrote: > >> if it's inaccurate why are you exposing it to the guest then? Native >> only uses the TSC if it's safe and accurate to do so. > > Not every guest support paravirt, but for correctness, all guests > require TSC, which must be exposed all the way up to userspace, no > matter what the efficiency or accuracy may be. > > Zach > However, with such accuracy, it will fail the tsc clocksources tests. If it passes, it'a good clocksource to use. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Remi - http://enigmail.mozdev.org iD8DBQFHJxyujYI8LaFUWXMRAhI5AKCDl/O7iA3TdtpvElVg8NoSDVfKpgCeNvz3 ZEZDMxg8T7FA2R+5cgH/uKI= =8qSH -----END PGP SIGNATURE----- |
From: Zachary A. <za...@vm...> - 2007-10-29 23:13:08
|
On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote: > * Zachary Amsden <za...@vm...> wrote: > > Not every guest support paravirt, but for correctness, all guests > > require TSC, which must be exposed all the way up to userspace, no > > matter what the efficiency or accuracy may be. > > but if there's a perfect TSC available (there is such hardware) then the > TSC _is_ the best clocksource. Paravirt now turns it off unconditionally > in essence. No, if no paravirt clocksource is detected, nothing can override the perfect TSC hardware clocksource rating of 400. And if a paravirt clocksource is detected, it is always better than TSC. > anyway, that's at most an optimization issue. No strong feelings here, > and we can certainly delay this patch until this gets all sorted out. This patch should be nacked, since it is just wrong. This is not an optimization issue. It is an accuracy issue for all virtualization environments that affects long term kernel clock stability, which is important to fix, and the best way to do that is to use a paravirt clocksource. Zach |
From: Ingo M. <mi...@el...> - 2007-10-29 23:17:12
|
* Zachary Amsden <za...@vm...> wrote: > On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote: > > * Zachary Amsden <za...@vm...> wrote: > > > > Not every guest support paravirt, but for correctness, all guests > > > require TSC, which must be exposed all the way up to userspace, no > > > matter what the efficiency or accuracy may be. > > > > but if there's a perfect TSC available (there is such hardware) then the > > TSC _is_ the best clocksource. Paravirt now turns it off unconditionally > > in essence. > > No, if no paravirt clocksource is detected, nothing can override the > perfect TSC hardware clocksource rating of 400. And if a paravirt > clocksource is detected, it is always better than TSC. > > > anyway, that's at most an optimization issue. No strong feelings here, > > and we can certainly delay this patch until this gets all sorted out. > > This patch should be nacked, since it is just wrong. This is not an > optimization issue. It is an accuracy issue for all virtualization > environments that affects long term kernel clock stability, which is > important to fix, and the best way to do that is to use a paravirt > clocksource. i know it's not an optimization issue. Your current pessimisation of even perfect TSCs _is_ an optimization issue. (and note that if the TSC is unstable the guest will/should notice it and will fall back to the hyper clocksource) Ingo |
From: Jeremy F. <je...@go...> - 2007-10-29 23:17:32
|
Ingo Molnar wrote: > that's totally broken then. You cannot create an SMP-safe monotonic > clocksource via interpolation - native does not do it either. Good thing > this problem got exposed, it needs to be fixed. > Sigh, I don't really want to have this fight again. I don't really see what point there is in raising the tsc's rating (why is 300 insufficient again?), but regardless of the value, the Xen clocksource rating needs to be higher. J |
From: Ingo M. <mi...@el...> - 2007-10-29 23:22:04
|
* Jeremy Fitzhardinge <je...@go...> wrote: > Ingo Molnar wrote: > > that's totally broken then. You cannot create an SMP-safe monotonic > > clocksource via interpolation - native does not do it either. Good thing > > this problem got exposed, it needs to be fixed. > > > > Sigh, I don't really want to have this fight again. i dont remember us having discussed this before, ever. If there's any "fight" about monotonicity and SMP then it would be a pretty onesided affair, with you being beaten up seriously ;-) > I don't really see what point there is in raising the tsc's rating > (why is 300 insufficient again?), but regardless of the value, the Xen > clocksource rating needs to be higher. anyway, i agree that this patch cannot go in in its current form. Ingo |
From: Jeremy F. <je...@go...> - 2007-10-29 23:33:38
|
Ingo Molnar wrote: > i dont remember us having discussed this before, ever. If there's any > "fight" about monotonicity and SMP then it would be a pretty onesided > affair, with you being beaten up seriously ;-) > This is part of Xen's ABI, so it isn't easily changed. You're right that getting monotonicity is a pretty ugly affair of doing something like "if (now < previous_return) return ++previous_return;", but its better than using the tsc. (Last time around, it was "Xen can't use the tsc because it can't ever be used as a stable timebase" - though I don't think that was you specifically.) J |
From: Dan H. <dh...@vm...> - 2007-10-29 23:22:24
|
On 10/29/2007 04:02 PM, Ingo Molnar wrote: > * Zachary Amsden <za...@vm...> wrote: > >> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote: >>> * Zachary Amsden <za...@vm...> wrote: >>> if it's inaccurate why are you exposing it to the guest then? Native >>> only uses the TSC if it's safe and accurate to do so. >> Not every guest support paravirt, but for correctness, all guests >> require TSC, which must be exposed all the way up to userspace, no >> matter what the efficiency or accuracy may be. > > but if there's a perfect TSC available (there is such hardware) then the > TSC _is_ the best clocksource. Paravirt now turns it off unconditionally > in essence. > Not really. In the case hardware TSC is perfect, the paravirt time counter can be implemented directly in terms of hardware TSC; there is no loss in optimization. This is done transparently. And virtual TSC can be implemented this way too. The real improvement that a paravirt clocksource offers over the TSC clocksource is that the guest does not need to measure the TSC frequency itself against some other constant frequency source (which is problematic on a virtual machine). Instead, the paravirt clocksource queries the hypervisor for the frequency of the counter. As you know, with clocksource style kernels, it's important to get this frequency correct, or else the guest will have long-term time drift. |
From: Avi K. <av...@qu...> - 2007-10-30 04:26:24
|
Dan Hecht wrote: > Not really. In the case hardware TSC is perfect, the paravirt time > counter can be implemented directly in terms of hardware TSC; there is > no loss in optimization. This is done transparently. And virtual TSC > can be implemented this way too. > > The real improvement that a paravirt clocksource offers over the TSC > clocksource is that the guest does not need to measure the TSC frequency > itself against some other constant frequency source (which is > problematic on a virtual machine). Instead, the paravirt clocksource > queries the hypervisor for the frequency of the counter. As you know, > with clocksource style kernels, it's important to get this frequency > correct, or else the guest will have long-term time drift. > > In addition, a paravirt clocksource can compensate for events like vcpu migration to another host cpu. So I agree: a paravirt clocksource is always better than or equal to the tsc. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Ingo M. <mi...@el...> - 2007-10-30 07:14:53
|
* Dan Hecht <dh...@vm...> wrote: >> but if there's a perfect TSC available (there is such hardware) then >> the TSC _is_ the best clocksource. Paravirt now turns it off >> unconditionally in essence. > > Not really. In the case hardware TSC is perfect, the paravirt time > counter can be implemented directly in terms of hardware TSC; there is > no loss in optimization. This is done transparently. And virtual TSC > can be implemented this way too. Of course if you duplicate all (or part) of the TSC clocksource driver in the paravirt guest code then the "paravirt clocksource" is at least as good as the TSC. But that argument is playing word-games, _of course_ if you use the same (or similar) code it's at least as good. The real question are clocksources that communicate out to the hypervisor, and hence have higher overhead than a native, TSC based clocksource - and clocksources that use the TSC in a broken way. > The real improvement that a paravirt clocksource offers over the TSC > clocksource is that the guest does not need to measure the TSC > frequency itself against some other constant frequency source (which > is problematic on a virtual machine). [...] hey, you need not tell me, i've implemented a hyper-clocksource driver myself. But calibration is a boot only issue and there's no reason why calibration _has_ to be fragile. For example we could easily extend the TSC clocksource driver to not calibrate in the guest but take calibration information from the host. It's in essence a trivial and obvious extension to calibration. That way we get the highest possible performance _and_ we share much of the clocksource driver with the host. also, the way the TSC is used by guests like Xen is fundamentally fragile on SMP. So i have a good reason to distrust the approach of hypervisors to timekeeping. The maintenance problem to me is that everyone in the paravirt space is busy coding away in their own (often broken) direction, replicating the essence of the TSC clocksource driver 4 times over again and again, with subtle bugs in each variant, even in cases where the TSC readout can be trusted perfectly well. "Consolidation" and "sharing code" is not a particularly strong point of the paravirt projects ;-) (ok, KVM is a notable exception there.) anyway, i do agree that this patch is wrong currently, mainly due to TSC calibration not being reliable in guest-space at the moment - but the whole concept of putting a separate clocksource driver into each paravirt guest, even in the case where the TSC is perfect, is madness. That code, once the hardware gets sane (and there are good signs for that), and once calibration can be passed from host to guest reliably, _will_ be consolidated, because it makes perfect technical sense. Ingo |
From: H. P. A. <hp...@zy...> - 2007-10-30 00:18:20
|
Glauber de Oliveira Costa wrote: > From: Glauber de Oliveira Costa <glauber@t60.localdomain> > > tsc is very good time source (when it does not have drifts, does not > change it's frequency, i.e. when it works), so it should have its rating > raised to a value greater than, or equal 400. > > Since it's being a tendency among paravirt clocksources to use values > around 400, we should declare tsc as even better: So we use 500. > > This patch also touches the comments on clocksource.h, which suggests > that 499 would be a limit on the rating values. > > Signed-off-by: Glauber de Oliveira Costa <gc...@re...> Are you sure these paravirt sources don't intentionally trump the TSC? -hpa |
From: Ian P. <Ian...@cl...> - 2007-10-30 00:58:36
|
> > Sigh, I don't really want to have this fight again. >=20 > i dont remember us having discussed this before, ever. If there's any > "fight" about monotonicity and SMP then it would be a pretty onesided > affair, with you being beaten up seriously ;-) Actually, it is possible, even for NUMA systems with CPUs running off completely different oscillators, and in the presence of CPU frequency changes, power management, and even in the presence of thermal throttling (though the latter introduces temporary inaccuracies it doesn't affect monotonicity or rate). Take a look at the Xen code to see how each physical CPU is independently calibrated on an ongoing basis, how movement of VCPUs between physical CPUs is tracked, and how shared variables are used to ensure montonicity if a guest requires it.=20 The fixed-rate TSCs on newer CPUs make some of this stuff easier, but you still need to cope with different source oscillators and some power management states. =20 Ian > > I don't really see what point there is in raising the tsc's rating > > (why is 300 insufficient again?), but regardless of the value, the Xen > > clocksource rating needs to be higher. >=20 > anyway, i agree that this patch cannot go in in its current form. >=20 > Ingo > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" > in > the body of a message to maj...@vg... > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ |
From: Ingo M. <mi...@el...> - 2007-10-30 07:19:39
|
* Ian Pratt <Ian...@cl...> wrote: > > > Sigh, I don't really want to have this fight again. > > > > i dont remember us having discussed this before, ever. If there's any > > "fight" about monotonicity and SMP then it would be a pretty onesided > > affair, with you being beaten up seriously ;-) > > Actually, it is possible, even for NUMA systems with CPUs running off > completely different oscillators, and in the presence of CPU frequency > changes, power management, and even in the presence of thermal > throttling (though the latter introduces temporary inaccuracies it > doesn't affect monotonicity or rate). > > Take a look at the Xen code to see how each physical CPU is > independently calibrated on an ongoing basis, how movement of VCPUs > between physical CPUs is tracked, and how shared variables are used to > ensure montonicity if a guest requires it. I think that's wishful thinking. Check out: http://people.redhat.com/mingo/time-warp-test/time-warp-test.c change TEST_TSC to 0, run it on an SMP guest (on a reasonably fast machine) and let me know whether you can make SMP guests not come up with monotonicity violations in the CLOCK tests. Ingo |
From: Rusty R. <ru...@ru...> - 2007-10-30 02:39:12
|
On Tuesday 30 October 2007 09:17:38 Thomas Gleixner wrote: > On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote: > > CC'ed John and removed glauber@t60.localdomain :) > > > From: Glauber de Oliveira Costa <glauber@t60.localdomain> > > > > tsc is very good time source (when it does not have drifts, does not > > change it's frequency, i.e. when it works), so it should have its rating > > raised to a value greater than, or equal 400. > > > > Since it's being a tendency among paravirt clocksources to use values > > around 400, we should declare tsc as even better: So we use 500. > > > > This patch also touches the comments on clocksource.h, which suggests > > that 499 would be a limit on the rating values. > > > > Signed-off-by: Glauber de Oliveira Costa <gc...@re...> > > Acked-by: Thomas Gleixner <tg...@li...> No. tsc is very good, it's not perfect. If a paravirt clock registers 400 it really means "pick me over the tsc". That's *why* they use > 400: it's in the documentation. Rusty. |
From: Ingo M. <mi...@el...> - 2007-10-30 07:37:48
|
* Rusty Russell <ru...@ru...> wrote: > No. tsc is very good, it's not perfect. If a paravirt clock > registers 400 it really means "pick me over the tsc". often the TSC is not perfect, but _IF_ it's perfect, using the paravirt driver is a pessimisation in performance. the main problem at the moment is that there's no mechanism at the moment to convey to the guest the information that the TSC is "perfect", and to convey the calibration values. > That's *why* they use > 400: it's in the documentation. static values do not capture conditional quality like that of the TSC. and just in case it's not obvious: i am not arguing for the inclusion of the patch, i'm just pointing out the plain fact that in the case where the TSC _is_ reliable, 5 different clocksource drivers for has obvious disadvantages. Anyone arguing against that simple point needs his head examined :) Once we can pass around calibration information from the host to the guest (which we dont do at the moment) there will be reason not to use the native clocksource driver in the guest. in the long run, the paravirt clocksource drivers must become _fallback_ drivers, for the case when the TSC is not perfect. Instead of the current "lets try to make it reliable but also nearly as fast as the TSC", which leads to inevitable breakage on SMP. Ingo |