From: Jan B. <JBe...@no...> - 2009-10-13 07:34:48
|
>>> Thomas Schlichter <tho...@we...> 12.10.09 20:32 >>> >@@ -268,11 +269,14 @@ EXPORT_SYMBOL(ioremap_nocache); > */ > void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size) > { >- if (pat_enabled) >- return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC, >- __builtin_return_address(0)); >- else >- return ioremap_nocache(phys_addr, size); >+ if (!pat_enabled) { >+ void __iomem *ret = ioremap_nocache(phys_addr, size); >+ if (ret) >+ mtrr_add(phys_addr, size, MTRR_TYPE_WRCOMB, false); This won't work if phys_addr or size aren't page aligned, or if size isn't a power of two. >+ return ret; >+ } >+ return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC, >+ __builtin_return_address(0)); > } > EXPORT_SYMBOL(ioremap_wc); > >@@ -1010,8 +1011,13 @@ int set_memory_wc(unsigned long addr, int numpages) > { > int ret; > >- if (!pat_enabled) >- return set_memory_uc(addr, numpages); >+ if (!pat_enabled) { >+ ret = set_memory_uc(addr, numpages); >+ if (!ret) >+ mtrr_add(__pa(addr), numpages * PAGE_SIZE, >+ MTRR_TYPE_WRCOMB, false); Similarly, this requires numpages to be a power of two. >+ return ret; >+ } > > ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE, > _PAGE_CACHE_WC, NULL); I think user mode code handles this by splitting the request and trying to establish as many entries as possible (and those as big as possible). Also, in all cases you pass 'false' for the 'increment' parameter, in order to avoid having to tear down the established entries. While this may be reasonable for kernel initiated mappings, I don't think that's acceptable for such originating from user mode. Jan |
From: Thomas S. <tho...@we...> - 2009-10-19 15:00:14
|
Suresh Siddha wrote: > If we are going to make ioremap() and set_memory_wc() add mtrr's in > non-pat case, then we need to delete the added mtrr(s) in the > corresponding iounmap() and set_memory_wb() aswell. > > hmm, this is becoming too complex. The way i915 and other graphics > drivers are using set_memory_wc(), it is def a bad idea to start adding > mtrr's behind the back for non-pat case. Yes, maybe it's better to drop it for ioremap() and set_memory_wc(). But I'd like to keep it for mmapping the PCI region. It should help all the people with PAT-incapable CPUs and graphics chips without DRM support (for them there simply is no driver that should set up the MTRR entries...). > Can't we just force PAT option always and we probably don't care about > ioremap_wc() on processors were PAT doesn't get enabled because of known > errata. I don't think this is a good idea, Robert Hancock wrote there may be millions of such Laptops (Core Solo/Duo erratum AE7, Pentium M erratum Y31) : http://marc.info/?l=linux-kernel&m=125537136105246&w=2 > or Perhaps just try to add mtrr only for the pci mmap case like the 4th > patch in this series.. I'd prefer this! ;-) Kind regards, Thomas |
From: Ingo M. <mi...@el...> - 2009-10-19 15:32:30
|
* Thomas Schlichter <tho...@we...> wrote: > I don't think this is a good idea, Robert Hancock wrote there may be > millions of such Laptops (Core Solo/Duo erratum AE7, Pentium M erratum > Y31) : > > http://marc.info/?l=linux-kernel&m=125537136105246&w=2 > > > or Perhaps just try to add mtrr only for the pci mmap case like the > > 4th patch in this series.. > > I'd prefer this! ;-) Hm, we could perhaps do that - but i think we should only do that on systems that have PAT disabled. Which brings up the question of how to properly QA such a narrow segment of the market. Maybe disabling CONFIG_X86_PAT should enable that logic too. Ingo |
From: Suresh S. <sur...@in...> - 2009-10-19 21:50:55
|
On Mon, 2009-10-19 at 08:31 -0700, Ingo Molnar wrote: > * Thomas Schlichter <tho...@we...> wrote: > > > I don't think this is a good idea, Robert Hancock wrote there may be > > millions of such Laptops (Core Solo/Duo erratum AE7, Pentium M erratum > > Y31) : > > > > http://marc.info/?l=linux-kernel&m=125537136105246&w=2 > > > > > or Perhaps just try to add mtrr only for the pci mmap case like the > > > 4th patch in this series.. > > > > I'd prefer this! ;-) > > Hm, we could perhaps do that - but i think we should only do that on > systems that have PAT disabled. > > Which brings up the question of how to properly QA such a narrow segment > of the market. Maybe disabling CONFIG_X86_PAT should enable that logic > too. Also, I think we can simplify and avoid mtrr_add_unaligned() and worry only about the pci mmap aligned cases etc. Those are the only interesting ones. thanks, suresh |
From: Thomas S. <tho...@we...> - 2009-10-20 20:35:35
|
Ingo Molnar wrote: > * Thomas Schlichter <tho...@we...> wrote: > > > or Perhaps just try to add mtrr only for the pci mmap case like the > > > 4th patch in this series.. > > > > I'd prefer this! ;-) > > Hm, we could perhaps do that - but i think we should only do that on > systems that have PAT disabled. The patch I sent should do exactly that. > Which brings up the question of how to properly QA such a narrow segment > of the market. Maybe disabling CONFIG_X86_PAT should enable that logic > too. When CONFIG_X86_PAT is disabled, pat_enabled is 0, and thus my new code should be active. Or do I miss something? What do you think about the latest version of my patch series I just sent? Kind regards, Thomas |
From: Suresh S. <sur...@in...> - 2009-10-20 22:00:19
|
On Tue, 2009-10-20 at 13:35 -0700, Thomas Schlichter wrote: > What do you think about the latest version of my patch series I just sent? I think we can simplify this by just using mtrr_add_page() and avoid all the complexity that comes with mtrr_add_unaligned(). pci_mmap_range() should call one mtrr_add_page() if the base and size are nicely aligned. Almost all the cases, this is the usage sequence here anyway. thanks, suresh |
From: Ingo M. <mi...@el...> - 2009-10-21 11:53:36
|
* Suresh Siddha <sur...@in...> wrote: > On Tue, 2009-10-20 at 13:35 -0700, Thomas Schlichter wrote: > > What do you think about the latest version of my patch series I just sent? > > I think we can simplify this by just using mtrr_add_page() and avoid > all the complexity that comes with mtrr_add_unaligned(). > > pci_mmap_range() should call one mtrr_add_page() if the base and size > are nicely aligned. Almost all the cases, this is the usage sequence > here anyway. yep, keeping it simple is a must. Ingo |
From: Thomas S. <tho...@we...> - 2009-10-19 15:07:52
|
Konrad Rzeszutek Wilk wrote: > I am not familiar with libpciaccess, but I was wondering why that library > cannot realize it failed in its endeavour and use other means to accomplish > its goals? The thing is that the mmap() will succeed! Only the memory region will not be set up as write combining when PAT is disabled. So userspace would have to find out it PAT is enabled in kernel and accordingly set up MTRR entries. And currently I don't know of a kernel interface that would tell userspace if PAT is enabled. Kind regards, Thomas |
From: Thomas S. <tho...@we...> - 2009-10-19 15:10:34
|
Ingo Molnar wrote: > * Suresh Siddha <sur...@in...> wrote: > > Can't we just force PAT option always and we probably don't care about > > ioremap_wc() on processors were PAT doesn't get enabled because of > > known errata. > > We can make PAT configurability dependent on EMBEDDED-y - mind sending a > patch for that? I think there already is such a patch in -tip: http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commitdiff;h=c03cb3149daed3e411657e3212d05ae27cf1a874 Kind regards, Thomas |
From: Ingo M. <mi...@el...> - 2009-10-19 15:29:16
|
* Thomas Schlichter <tho...@we...> wrote: > > We can make PAT configurability dependent on EMBEDDED-y - mind > > sending a patch for that? > > I think there already is such a patch in -tip: > http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commitdiff;h=c03cb3149daed3e411657e3212d05ae27cf1a874 Yes :-) Ingo |
From: Thomas S. <tho...@we...> - 2009-10-21 13:45:24
|
Suresh Siddha wrote: > I think we can simplify this by just using mtrr_add_page() and avoid > all the complexity that comes with mtrr_add_unaligned(). > > pci_mmap_range() should call one mtrr_add_page() if the base and size > are nicely aligned. Almost all the cases, this is the usage sequence > here anyway. Ingo Molnar wrote: > Can we eliminate mtrr_add_unaligned() as Suresh suggested, and still > make it work on your testbox? Yes, I had that in the first place, but Jan suggested to extend it to also handle non-aligned, non-power-of-two cases: http://marc.info/?l=linux-kernel&m=125541951529918&w=2 So if it's OK for Jan, I'll reduce the functionality again and use mtrr_add() instead. Btw. this simply means to drop mtrr_add_unaligned(), all the other parts are still required for reference counting and a proper mtrr_del() on file close. Kind regards, Thomas |
From: Jan B. <JBe...@no...> - 2009-10-21 14:10:35
|
>>> Thomas Schlichter <tho...@we...> 21.10.09 15:45 >>> >Ingo Molnar wrote: >> Can we eliminate mtrr_add_unaligned() as Suresh suggested, and still >> make it work on your testbox? > >Yes, I had that in the first place, but Jan suggested to extend it to also >handle non-aligned, non-power-of-two cases: > http://marc.info/?l=linux-kernel&m=125541951529918&w=2 I merely pointed out it wouldn't work for unaligned addresses or sizes passed in. >So if it's OK for Jan, I'll reduce the functionality again and use mtrr_add() >instead. Btw. this simply means to drop mtrr_add_unaligned(), all the other >parts are still required for reference counting and a proper mtrr_del() on file >close. I'm perfectly fine with just handling the aligned case, as long as the code checks that the alignment constraints are met. Jan |
From: Ingo M. <mi...@el...> - 2009-10-21 17:36:02
|
* Jan Beulich <JBe...@no...> wrote: > > So if it's OK for Jan, I'll reduce the functionality again and use > > mtrr_add() instead. Btw. this simply means to drop > > mtrr_add_unaligned(), all the other parts are still required for > > reference counting and a proper mtrr_del() on file close. > > I'm perfectly fine with just handling the aligned case, as long as the > code checks that the alignment constraints are met. It could even emit a debug warning if they are not met - that way we'll see how important the unaligned case is. Ingo |
From: Thomas S. <tho...@we...> - 2009-10-21 20:01:50
|
Ingo Molnar wrote: > * Jan Beulich <JBe...@no...> wrote: > > I'm perfectly fine with just handling the aligned case, as long as the > > code checks that the alignment constraints are met. > > It could even emit a debug warning if they are not met - that way we'll > see how important the unaligned case is. OK, so I think the attached patches should do it. Hopefully I have addressed all your comments. This series works for my test machine, without it, or without X running, I have these MTRR entries: reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back reg01: base=0x06ff00000 ( 1791MB), size= 1MB, count=1: uncachable reg02: base=0x070000000 ( 1792MB), size= 256MB, count=1: uncachable With the patches applied and X running I get these: reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back reg01: base=0x06ff00000 ( 1791MB), size= 1MB, count=1: uncachable reg02: base=0x070000000 ( 1792MB), size= 256MB, count=1: uncachable reg03: base=0x0d0000000 ( 3328MB), size= 256MB, count=1: write-combining Best regards, Thomas |
From: Suresh S. <sur...@in...> - 2009-10-22 10:32:33
|
On Wed, 2009-10-21 at 13:01 -0700, Thomas Schlichter wrote: > OK, so I think the attached patches should do it. Hopefully I have addressed > all your comments. Thomas, I have couple of issues with this patchset still. pci_mmap_page_range() doesn't get called for each fork(). So, we won't be ref counting the mtrr usage properly. I need to think a bit more carefully on this. Can I get back to you early next week on this, as I am traveling and need to think through this? We already keep track of some of the PAT ref counting using track_pfn_vma_copy(). And we need to extend/use something similar here. Even if we need to extend sysfs or pci vma ops, we need to increment and decrement the ref count of the mtrr register that gets used. There is no need to go through num_var_ranges etc. thanks, suresh |
From: Jesse B. <jb...@vi...> - 2009-10-22 23:38:02
|
On Thu, 22 Oct 2009 14:47:30 -0700 Suresh Siddha <sur...@in...> wrote: > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote: > > Can we just not create the _wc sysfs entry if we don't have PAT? I > > don't think there's userland relying on its presence as opposed to > > the non-_wc entry. > > Yes indeed. Jesse do you see an issue with this? This is simple and > clean. Thanks Eric. Yeah, I think that will be fine. In fact, older versions of libpciaccess will behave better if we do it that way (iirc it only allocates an MTRR if the resource_wc file doesn't exist or fails to get mapped). Jesse |
From: Suresh S. <sur...@in...> - 2009-10-23 00:29:01
|
On Thu, 2009-10-22 at 16:10 -0700, Jesse Barnes wrote: > On Thu, 22 Oct 2009 14:47:30 -0700 > Suresh Siddha <sur...@in...> wrote: > > > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote: > > > Can we just not create the _wc sysfs entry if we don't have PAT? I > > > don't think there's userland relying on its presence as opposed to > > > the non-_wc entry. > > > > Yes indeed. Jesse do you see an issue with this? This is simple and > > clean. Thanks Eric. > > Yeah, I think that will be fine. In fact, older versions of > libpciaccess will behave better if we do it that way (iirc it only > allocates an MTRR if the resource_wc file doesn't exist or fails to get > mapped). Eric, care to send the patch? |
From: Eric A. <er...@an...> - 2009-10-23 02:04:07
|
On Thu, 2009-10-22 at 17:11 -0700, Suresh Siddha wrote: > On Thu, 2009-10-22 at 16:10 -0700, Jesse Barnes wrote: > > On Thu, 22 Oct 2009 14:47:30 -0700 > > Suresh Siddha <sur...@in...> wrote: > > > > > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote: > > > > Can we just not create the _wc sysfs entry if we don't have PAT? I > > > > don't think there's userland relying on its presence as opposed to > > > > the non-_wc entry. > > > > > > Yes indeed. Jesse do you see an issue with this? This is simple and > > > clean. Thanks Eric. > > > > Yeah, I think that will be fine. In fact, older versions of > > libpciaccess will behave better if we do it that way (iirc it only > > allocates an MTRR if the resource_wc file doesn't exist or fails to get > > mapped). > > Eric, care to send the patch? I don't have a patch, I was just suggesting a way to handle the submitter's problem that won't involve complicated changes that nobody else will be testing since everyone *should* have a graphics driver for their graphics hardware. -- Eric Anholt er...@an... eri...@in... |
From: Suresh S. <sur...@in...> - 2009-10-23 04:34:12
|
On Thu, 2009-10-22 at 18:53 -0700, Eric Anholt wrote: > On Thu, 2009-10-22 at 17:11 -0700, Suresh Siddha wrote: > > On Thu, 2009-10-22 at 16:10 -0700, Jesse Barnes wrote: > > > On Thu, 22 Oct 2009 14:47:30 -0700 > > > Suresh Siddha <sur...@in...> wrote: > > > > > > > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote: > > > > > Can we just not create the _wc sysfs entry if we don't have PAT? I > > > > > don't think there's userland relying on its presence as opposed to > > > > > the non-_wc entry. > > > > > > > > Yes indeed. Jesse do you see an issue with this? This is simple and > > > > clean. Thanks Eric. > > > > > > Yeah, I think that will be fine. In fact, older versions of > > > libpciaccess will behave better if we do it that way (iirc it only > > > allocates an MTRR if the resource_wc file doesn't exist or fails to get > > > mapped). > > > > Eric, care to send the patch? > > I don't have a patch, I was just suggesting a way to handle the > submitter's problem that won't involve complicated changes that nobody > else will be testing since everyone *should* have a graphics driver for > their graphics hardware. I can send the patch early next week unless Thomas or someone else beats me. |
From: Suresh S. <sur...@in...> - 2009-10-22 21:48:12
|
On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote: > Can we just not create the _wc sysfs entry if we don't have PAT? I > don't think there's userland relying on its presence as opposed to the > non-_wc entry. Yes indeed. Jesse do you see an issue with this? This is simple and clean. Thanks Eric. |
From: Jesse B. <jb...@vi...> - 2009-10-23 04:58:51
|
On Thu, 22 Oct 2009 18:53:19 -0700 Eric Anholt <er...@an...> wrote: > On Thu, 2009-10-22 at 17:11 -0700, Suresh Siddha wrote: > > On Thu, 2009-10-22 at 16:10 -0700, Jesse Barnes wrote: > > > On Thu, 22 Oct 2009 14:47:30 -0700 > > > Suresh Siddha <sur...@in...> wrote: > > > > > > > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote: > > > > > Can we just not create the _wc sysfs entry if we don't have > > > > > PAT? I don't think there's userland relying on its presence > > > > > as opposed to the non-_wc entry. > > > > > > > > Yes indeed. Jesse do you see an issue with this? This is simple > > > > and clean. Thanks Eric. > > > > > > Yeah, I think that will be fine. In fact, older versions of > > > libpciaccess will behave better if we do it that way (iirc it only > > > allocates an MTRR if the resource_wc file doesn't exist or fails > > > to get mapped). > > > > Eric, care to send the patch? > > I don't have a patch, I was just suggesting a way to handle the > submitter's problem that won't involve complicated changes that nobody > else will be testing since everyone *should* have a graphics driver > for their graphics hardware. Here's a quick & dirty version, totally untested. A cleaner approach would be to separate the WC mapping routines and hide the return -EINVAL in arch specific code... Jesse diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 0f6382f..41010bb 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -23,6 +23,9 @@ #include <linux/mm.h> #include <linux/capability.h> #include <linux/pci-aspm.h> +#ifdef CONFIG_X86 +#include <asm/pat.h> +#endif #include "pci.h" static int sysfs_initialized; /* = 0 */ @@ -730,6 +733,10 @@ static int pci_mmap_resource_wc(struct kobject *kobj, struct bin_attribute *attr, struct vm_area_struct *vma) { +#ifdef CONFIG_X86 + if (!pat_enabled) + return -EINVAL; +#endif return pci_mmap_resource(kobj, attr, vma, 1); } |
From: Suresh S. <sur...@in...> - 2009-10-23 04:59:03
|
On Thu, 2009-10-22 at 21:31 -0700, Jesse Barnes wrote: > Here's a quick & dirty version, totally untested. A cleaner approach > would be to separate the WC mapping routines and hide the return > -EINVAL in arch specific code... Jesse How about this patch? Doing this in x86 is cleaner. I would like Acks/sign-offs-by Thomas, Eric and Jesse, if it is ok with this patch and works. thanks, suresh --- From: Suresh Siddha <sur...@in...> Subject: x86, pat: return EINVAL for pci mmap WC request for !pat_enabled Thomas Schlichter reported: > X.org uses libpciaccess which tries to mmap with write combining enabled via > /sys/bus/pci/devices/*/resource0_wc. Currently, when PAT is not enabled, the > kernel does fall back to uncached mmap. Then libpciaccess thinks it succeeded > mapping with write combining enabled and does not set up suited MTRR entries. > ;-( Instead of silently mapping pci mmap region as UC minus in the case of !pat_enabled and wc request, we can return error. Eric Anholt mentioned that caller (like X) typically follows up with UC minus pci mmap request and if there is a free mtrr slot, caller will manage adding WC mtrr. Jesse Barnes says: > Older versions of libpciaccess will behave better if we do it that way > (iirc it only allocates an MTRR if the resource_wc file doesn't exist or > fails to get mapped). Reported-by: Thomas Schlichter <tho...@we...> Signed-off-by: Suresh Siddha <sur...@in...> --- diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c index b22d13b..a672f12 100644 --- a/arch/x86/pci/i386.c +++ b/arch/x86/pci/i386.c @@ -282,6 +282,15 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma, return -EINVAL; prot = pgprot_val(vma->vm_page_prot); + + /* + * Return error if pat is not enabled and write_combine is requested. + * Caller can followup with UC MINUS request and add a WC mtrr if there + * is a free mtrr slot. + */ + if (!pat_enabled && write_combine) + return -EINVAL; + if (pat_enabled && write_combine) prot |= _PAGE_CACHE_WC; else if (pat_enabled || boot_cpu_data.x86 > 3) |
From: Thomas S. <tho...@we...> - 2009-10-23 07:25:09
|
Suresh Siddha wrote: > On Thu, 2009-10-22 at 21:31 -0700, Jesse Barnes wrote: > > Here's a quick & dirty version, totally untested. A cleaner approach > > would be to separate the WC mapping routines and hide the return > > -EINVAL in arch specific code... > > Jesse How about this patch? Doing this in x86 is cleaner. > > I would like Acks/sign-offs-by Thomas, Eric and Jesse, if it is ok with > this patch and works. Hmm, at this point I already was more than a week ago: http://marc.info/?l=linux-kernel&m=125537770514713&w=2 OK, I also modified ioremap() and set_memory_wc() but your patch is just part of what I did there... And Eric Anholt answered: > Seems like we should install an MTRR instead. Requiring userland to set > up the MTRR on the kernel's behalf is backwards. Where I totally agree with him. Regards, Thomas |
From: Suresh S. <sur...@in...> - 2009-10-23 14:25:42
|
On Fri, 2009-10-23 at 00:24 -0700, Thomas Schlichter wrote: > Hmm, at this point I already was more than a week ago: > http://marc.info/?l=linux-kernel&m=125537770514713&w=2 > > OK, I also modified ioremap() and set_memory_wc() but your patch is just part > of what I did there... oops! Sorry I read the later emails in this thread carefully but not the conversation with Thomas Hellstrom. Anyhow, glad that you were on top of this. Thanks. Ingo, can you please queue this patch with a sign-off from Thomas Schlichter too? thanks, suresh |
From: Ingo M. <mi...@el...> - 2009-10-23 14:37:53
|
* Suresh Siddha <sur...@in...> wrote: > On Fri, 2009-10-23 at 00:24 -0700, Thomas Schlichter wrote: > > Hmm, at this point I already was more than a week ago: > > http://marc.info/?l=linux-kernel&m=125537770514713&w=2 > > > > OK, I also modified ioremap() and set_memory_wc() but your patch is just part > > of what I did there... > > oops! Sorry I read the later emails in this thread carefully but not > the conversation with Thomas Hellstrom. Anyhow, glad that you were on > top of this. Thanks. > > Ingo, can you please queue this patch with a sign-off from Thomas > Schlichter too? Please resend the latest with all the acks/signoffs added, with the subject line changed to '[PATCH] ...' and all people Cc:-ed. Thanks, Ingo |