From: Daniel P. <phi...@bo...> - 2002-04-06 06:00:01
|
Here's the config_nonlinear patch for uml. I'm not going to go deeply into the theory here because for one thing we've all discussed it at some length, and I'm preparing a more detailed [rfc] for lkml. However, I wanted to post this on lse earlier rather than later though, so we can take a look at the concrete code as opposed to theorizing about it. The basic ideas are: - A new, conceptual address space 'logical' is inserted between 'virtual' and 'physical' - Both bootmem and buddy-system allocations are carried out in the logical space rather than physical - The logical address space is continuous. As well as allocation, all the usual Linux assumptions based on a linear memory map continue to hold. - The logical <==> physical translations are carried out with the aid of a pair of tables, indexed by a few high bits of the physical or logical address, respectively. These tables are small. In the current patch, each table is 32 longs in size, byte tables could be used just as well, - The unit of physical contiguity is called a 'section'. The size of a section is defined by SECTION_SHIFT. Logical and physical address spaces are up divided into sections of the same size. Logical sections are mapped (via the table) onto physical sections in any order. Typically the logical map will have no holes in it (though there is no real requirement for this) and the physical map will have holes (after all, this is why this patch was developed). Compared to the incumbent config_discontigmem model, the config_nonlinear approach offers a number of advantages: - Avoids table lookup for virtual_to_page and VALID_PAGE - Does not fragment the logical allocation space - Needs no _alloc_pages layer underneath alloc_pages - VALID_PAGE needs no table translation. In addition, the config_nonlinear approach is nearly completely generic across architectures, whereas the config_discontigmem shows a disturbing amount of variation across architectures (for no real reason other than code drift I believe) and is also bound together with the config_numa option in a rather unsatisfying way. The config_nonlinear model introduces a number of new address translation functions. The functions formerly known as virt_to_phys and phys_to_virt are renamed: static inline unsigned long virtual_to_logical(void *v) static inline void *logical_to_virtual(unsigned long p) The following four functions are the only ones that involve translation through tables: static inline unsigned long virtual_to_physical(void *v) static inline void *physical_to_virtual(unsigned long p) static inline unsigned long physical_to_pagenum(unsigned long p) static inline unsigned long pagenum_to_physical(unsigned long n) The reason that there are four instead of two is that the table lookup can be optimized depending on the different usage. Incidently, the tables as implemented in the current patch tranlate section numbers to page numbers. Arguably, translating section numbers to section numbers would be a superior approach in some cases, and this itself could be a config option. If config_nonlinear is 'n', the following functions have simple definitions: #define virtual_to_physical virtual_to_logical #define physical_to_virtual logical_to_virtual #define pagenum_to_physical(n) ((n) << PAGE_SHIFT) #define physical_to_pagenum(p) ((p) >> PAGE_SHIFT) Besides that, the only difference between config_nonlinear on and off is the definition of the following: - init_nonlinear - show_nonlinear (debug output) Much of the patch is devoted to partitioning the usage of __pa (and the synonomous virt_to_phys) into virtual_to_logical and virtual_to_physical according to usage. (These are the same when config_nonlinear is 'n'.) An analogous partition is done for __va (aka phys_to_virt). As well as making necessary changes, the patch changes some names that don't really need to be changed, i.e., phys_to_virt becomes physical_to_virtual, which is not necessarily an improvement. I'll be playing with this a little, the final names aren't settled at all. I'm open to flames and other opinions. The attached patch demonstrates the config_nonlinear principle by mapping each even megabyte of virtual memory to the corresponding odd megabyte of 'physical' (within uml's emulation context) memory, in other words, it swaps every second megabyte. This patch is not gauranteed to leave you with a fully functional uml system, in fact, it probably doesn't now that I look at it a little more closely. However, it does boot, and the remaining debugging of the address translations can be carried out with the help of test programs running under uml. So it's not far from being fully functional. -- Daniel --- ../2.4.17.uml.clean/arch/um/config.in Mon Mar 25 17:27:25 2002 +++ ./arch/um/config.in Fri Apr 5 10:30:08 2002 @@ -36,6 +36,7 @@ bool '2G/2G host address space split' CONFIG_HOST_2G_2G bool 'Symmetric multi-processing support' CONFIG_UML_SMP define_bool CONFIG_SMP $CONFIG_UML_SMP +bool 'Support for nonlinear physical memory' CONFIG_NONLINEAR string 'Default main console channel initialization' CONFIG_CON_ZERO_CHAN \ "fd:0,fd:1" string 'Default console channel initialization' CONFIG_CON_CHAN "xterm" --- ../2.4.17.uml.clean/arch/um/kernel/mem.c Mon Mar 25 17:27:26 2002 +++ ./arch/um/kernel/mem.c Fri Apr 5 16:25:12 2002 @@ -122,8 +122,8 @@ printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10); for (; start < end; start += PAGE_SIZE) { - ClearPageReserved(virt_to_page(start)); - set_page_count(virt_to_page(start), 1); + ClearPageReserved(virtual_to_page(start)); + set_page_count(virtual_to_page(start), 1); free_page(start); totalram_pages++; } --- ../2.4.17.uml.clean/arch/um/kernel/process_kern.c Mon Mar 25 17:27:26 2002 +++ ./arch/um/kernel/process_kern.c Fri Apr 5 10:30:08 2002 @@ -501,12 +501,8 @@ #ifdef CONFIG_SMP return("(Unknown)"); #else - unsigned long addr; - - if((addr = um_virt_to_phys(current, - current->mm->arg_start)) == 0xffffffff) - return("(Unknown)"); - else return((char *) addr); + unsigned long addr = um_virt_to_phys(current, current->mm->arg_start); + return addr == 0xffffffff? "(Unknown)": physical_to_virtual(addr); #endif } --- ../2.4.17.uml.clean/arch/um/kernel/um_arch.c Mon Mar 25 17:27:27 2002 +++ ./arch/um/kernel/um_arch.c Fri Apr 5 21:06:05 2002 @@ -270,6 +270,46 @@ extern int jail; void *brk_start; +#ifdef CONFIG_NONLINEAR +unsigned long psection[MAX_SECTIONS]; +unsigned long vsection[MAX_SECTIONS]; + +static int init_nonlinear(void) +{ + unsigned i, sect2pfn = SECTION_SHIFT - PAGE_SHIFT; + unsigned base_section = (PAGE_OFFSET - NONLINEAR_BASE) >> SECTION_SHIFT; + + printk(">>> sections = %x\n", MAX_SECTIONS - base_section); + memset(psection, -1, sizeof(psection)); + memset(vsection, -1, sizeof(vsection)); + for (i = 0; i < MAX_SECTIONS - base_section; i++) + psection[base_section + i] = (i ^ (i >= 2)) << sect2pfn; + + for (i = 0; i < MAX_SECTIONS; i++) + if (~psection[i] && psection[i] >> sect2pfn < MAX_SECTIONS) + vsection[psection[i] >> sect2pfn] = i << sect2pfn; + + return 0; +} + +static void show_nonlinear(void) +{ + int i; + printk(">>> Logical section to Physical num: "); + for (i = 0; i < MAX_SECTIONS; i++) printk("%lx ", psection[i]); printk("\n"); + printk(">>> Physical section to Logical num: "); + for (i = 0; i < MAX_SECTIONS; i++) printk("%lx ", vsection[i]); printk("\n"); +} + +#else +# ifndef nil +# define nil do { } while (0) +# endif + +#define init_nonlinear() nil +#define show_nonlinear() nil +#endif + int linux_main(int argc, char **argv) { unsigned long start_pfn, end_pfn, bootmap_size; @@ -294,6 +334,9 @@ /* Start physical memory at least 4M after the current brk */ uml_physmem = ROUND_4M(brk_start) + (1 << 22); + init_nonlinear(); + show_nonlinear(); + setup_machinename(system_utsname.machine); argv1_begin = argv[1]; @@ -322,10 +365,10 @@ setup_memory(); high_physmem = uml_physmem + physmem_size; - start_pfn = PFN_UP(__pa(uml_physmem)); - end_pfn = PFN_DOWN(__pa(high_physmem)); + start_pfn = PFN_UP(virtual_to_logical(uml_physmem)); + end_pfn = PFN_DOWN(virtual_to_logical(high_physmem)); bootmap_size = init_bootmem(start_pfn, end_pfn - start_pfn); - free_bootmem(__pa(uml_physmem) + bootmap_size, + free_bootmem(virtual_to_logical(uml_physmem) + bootmap_size, high_physmem - uml_physmem - bootmap_size); uml_postsetup(); --- ../2.4.17.uml.clean/drivers/char/mem.c Fri Dec 21 18:41:54 2001 +++ ./drivers/char/mem.c Fri Apr 5 10:30:08 2002 @@ -79,7 +79,7 @@ unsigned long end_mem; ssize_t read; - end_mem = __pa(high_memory); + end_mem = virtual_to_logical(high_memory); if (p >= end_mem) return 0; if (count > end_mem - p) @@ -101,7 +101,7 @@ } } #endif - if (copy_to_user(buf, __va(p), count)) + if (copy_to_user(buf, logical_to_virtual(p), count)) return -EFAULT; read += count; *ppos += read; @@ -114,12 +114,12 @@ unsigned long p = *ppos; unsigned long end_mem; - end_mem = __pa(high_memory); + end_mem = virtual_to_logical(high_memory); if (p >= end_mem) return 0; if (count > end_mem - p) count = end_mem - p; - return do_write_mem(file, __va(p), p, buf, count, ppos); + return do_write_mem(file, logical_to_virtual(p), p, buf, count, ppos); } #ifndef pgprot_noncached @@ -178,7 +178,7 @@ test_bit(X86_FEATURE_CENTAUR_MCR, &boot_cpu_data.x86_capability) ) && addr >= __pa(high_memory); #else - return addr >= __pa(high_memory); + return addr >= virtual_to_physical(high_memory); // bogosity alert!! #endif } @@ -200,7 +200,7 @@ /* * Don't dump addresses that are not real memory to a core file. */ - if (offset >= __pa(high_memory) || (file->f_flags & O_SYNC)) + if (offset >= virtual_to_logical(high_memory) || (file->f_flags & O_SYNC)) vma->vm_flags |= VM_IO; if (remap_page_range(vma->vm_start, offset, vma->vm_end-vma->vm_start, --- ../2.4.17.uml.clean/fs/proc/kcore.c Fri Sep 14 01:04:43 2001 +++ ./fs/proc/kcore.c Fri Apr 5 10:30:08 2002 @@ -239,7 +239,7 @@ phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = dataoff; phdr->p_vaddr = PAGE_OFFSET; - phdr->p_paddr = __pa(PAGE_OFFSET); + phdr->p_paddr = virtual_to_physical(PAGE_OFFSET); phdr->p_filesz = phdr->p_memsz = ((unsigned long)high_memory - PAGE_OFFSET); phdr->p_align = PAGE_SIZE; @@ -256,7 +256,7 @@ phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = (size_t)m->addr - PAGE_OFFSET + dataoff; phdr->p_vaddr = (size_t)m->addr; - phdr->p_paddr = __pa(m->addr); + phdr->p_paddr = virtual_to_physical(m->addr); phdr->p_filesz = phdr->p_memsz = m->size; phdr->p_align = PAGE_SIZE; } @@ -382,7 +382,7 @@ } #endif /* fill the remainder of the buffer from kernel VM space */ - start = (unsigned long)__va(*fpos - elf_buflen); + start = (unsigned long) logical_to_virtual(*fpos - elf_buflen); if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) tsz = buflen; --- ../2.4.17.uml.clean/include/asm-i386/io.h Wed Mar 27 23:31:33 2002 +++ ./include/asm-i386/io.h Fri Apr 5 20:49:51 2002 @@ -60,20 +60,6 @@ #endif /* - * Change virtual addresses to physical addresses and vv. - * These are pretty trivial - */ -static inline unsigned long virt_to_phys(void *address) -{ - return __pa(address); -} - -static inline void * phys_to_virt(unsigned long address) -{ - return __va(address); -} - -/* * Change "struct page" to physical address. */ #define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT) --- ../2.4.17.uml.clean/include/asm-um/page.h Wed Mar 27 23:31:33 2002 +++ ./include/asm-um/page.h Fri Apr 5 20:49:42 2002 @@ -29,30 +29,88 @@ #endif /* __ASSEMBLY__ */ +#define __va_space (8*1024*1024) + extern unsigned long uml_physmem; +extern unsigned long max_mapnr; -#define __va_space (8*1024*1024) +static inline int VALID_PAGE(struct page *page) +{ + return page - mem_map < max_mapnr; +} -static inline unsigned long __pa(void *virt) +/* Logical/Virtual */ + +static inline void *logical_to_virtual(unsigned long p) { - return (unsigned long) (virt) - PAGE_OFFSET; + return (void *) ((unsigned long) p + PAGE_OFFSET); } -static inline void *__va(unsigned long phys) +static inline unsigned long virtual_to_logical(void *v) { - return (void *) ((unsigned long) (phys) + PAGE_OFFSET); +// assert(it's a kernel virtual); + return (unsigned long) v - PAGE_OFFSET; } -static inline struct page *virt_to_page(void *kaddr) +#ifdef CONFIG_NONLINEAR +#define MAX_SECTIONS (32) +#define SECTION_SHIFT 20 /* 1 meg sections */ +#define SECTION_MASK (~(-1 << SECTION_SHIFT)) +#define NONLINEAR_BASE PAGE_OFFSET + +extern unsigned long psection[MAX_SECTIONS]; +extern unsigned long vsection[MAX_SECTIONS]; + +#include <stdarg.h> +#include <linux/linkage.h> + +asmlinkage int printk(const char *fmt, ...) + __attribute__ ((format (printf, 1, 2))); + +static inline unsigned long virtual_to_physical(void *v) { - return mem_map + (__pa(kaddr) >> PAGE_SHIFT); + unsigned long p = (unsigned long) v - NONLINEAR_BASE; + return (psection[p >> SECTION_SHIFT] << PAGE_SHIFT) + (p & SECTION_MASK); } -extern unsigned long max_mapnr; +static inline void *physical_to_virtual(unsigned long p) +{ + return (void *) ((vsection[p >> SECTION_SHIFT] << PAGE_SHIFT) + (p & SECTION_MASK)) + + NONLINEAR_BASE; +} -static inline int VALID_PAGE(struct page *page) +static inline unsigned long pagenum_to_physical(unsigned long n) { - return page - mem_map < max_mapnr; + return (psection[n >> (SECTION_SHIFT - PAGE_SHIFT)] + + (n & (SECTION_MASK >> PAGE_SHIFT))) + << PAGE_SHIFT; +} + +static inline unsigned long physical_to_pagenum(unsigned long p) +{ + return vsection[p >> SECTION_SHIFT] + ((p & SECTION_MASK) >> PAGE_SHIFT); +} + +#else +#define virtual_to_physical virtual_to_logical +#define physical_to_virtual logical_to_virtual +#define pagenum_to_physical(n) ((n) << PAGE_SHIFT) +#define physical_to_pagenum(p) ((p) >> PAGE_SHIFT) +#endif /* CONFIG_NONLINEAR */ + +static inline struct page *physical_to_page(unsigned long p) +{ + return mem_map + physical_to_pagenum(p); +} + +static inline unsigned long virtual_to_pagenum(void *v) +{ + return virtual_to_logical(v) >> PAGE_SHIFT; +} + +static inline struct page *virtual_to_page(void *v) +{ + return mem_map + virtual_to_pagenum(v); } #endif --- ../2.4.17.uml.clean/include/asm-um/pgtable.h Mon Mar 25 17:27:28 2002 +++ ./include/asm-um/pgtable.h Fri Apr 5 20:49:51 2002 @@ -150,7 +150,7 @@ #define BAD_PAGETABLE __bad_pagetable() #define BAD_PAGE __bad_page() -#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page)) +#define ZERO_PAGE(vaddr) (virtual_to_page(empty_zero_page)) /* number of bits that fit into a memory pointer */ #define BITS_PER_PTR (8*sizeof(unsigned long)) @@ -197,9 +197,8 @@ #define page_address(page) ({ if (!(page)->virtual) BUG(); (page)->virtual; }) #define __page_address(page) ({ PAGE_OFFSET + (((page) - mem_map) << PAGE_SHIFT); }) #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT)) -#define pte_page(x) \ - (mem_map+((unsigned long)((__pa(pte_val(x)) >> PAGE_SHIFT)))) -#define pte_address(x) ((void *) ((unsigned long) pte_val(x) & PAGE_MASK)) +#define pte_page(x) (mem_map + physical_to_pagenum(pte_val(x))) +#define pte_address(x) (physical_to_virtual(pte_val(x) & PAGE_MASK)) static inline pte_t pte_mknewprot(pte_t pte) { @@ -316,15 +315,14 @@ #define mk_pte(page, pgprot) \ ({ \ pte_t __pte; \ - \ - pte_val(__pte) = ((unsigned long) __va((page-mem_map)*(unsigned long)PAGE_SIZE + pgprot_val(pgprot))); \ + pte_val(__pte) = pagenum_to_physical(page - mem_map) + pgprot_val(pgprot); \ if(pte_present(__pte)) pte_mknewprot(pte_mknewpage(__pte)); \ __pte; \ }) /* This takes a physical page address that is used by the remapping functions */ #define mk_pte_phys(physpage, pgprot) \ -({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); __pte; }) +({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); BUG(); __pte; }) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { --- ../2.4.17.uml.clean/include/asm-um/processor-generic.h Mon Mar 25 17:27:28 2002 +++ ./include/asm-um/processor-generic.h Fri Apr 5 20:49:51 2002 @@ -115,7 +115,7 @@ extern struct task_struct *alloc_task_struct(void); extern void free_task_struct(struct task_struct *task); -#define get_task_struct(tsk) atomic_inc(&virt_to_page(tsk)->count) +#define get_task_struct(tsk) atomic_inc(&virtual_to_page(tsk)->count) extern void release_thread(struct task_struct *); extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags); --- ../2.4.17.uml.clean/include/linux/bootmem.h Thu Nov 22 20:47:23 2001 +++ ./include/linux/bootmem.h Fri Apr 5 20:49:51 2002 @@ -35,11 +35,11 @@ extern void __init free_bootmem (unsigned long addr, unsigned long size); extern void * __init __alloc_bootmem (unsigned long size, unsigned long align, unsigned long goal); #define alloc_bootmem(x) \ - __alloc_bootmem((x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem((x), SMP_CACHE_BYTES, virtual_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low(x) \ __alloc_bootmem((x), SMP_CACHE_BYTES, 0) #define alloc_bootmem_pages(x) \ - __alloc_bootmem((x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem((x), PAGE_SIZE, virtual_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low_pages(x) \ __alloc_bootmem((x), PAGE_SIZE, 0) extern unsigned long __init free_all_bootmem (void); @@ -50,9 +50,9 @@ extern unsigned long __init free_all_bootmem_node (pg_data_t *pgdat); extern void * __init __alloc_bootmem_node (pg_data_t *pgdat, unsigned long size, unsigned long align, unsigned long goal); #define alloc_bootmem_node(pgdat, x) \ - __alloc_bootmem_node((pgdat), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem_node((pgdat), (x), SMP_CACHE_BYTES, virtual_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_pages_node(pgdat, x) \ - __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, virtual_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low_pages_node(pgdat, x) \ __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, 0) --- ../2.4.17.uml.clean/mm/bootmem.c Fri Dec 21 18:42:04 2001 +++ ./mm/bootmem.c Fri Apr 5 18:42:10 2002 @@ -51,7 +51,7 @@ pgdat_list = pgdat; mapsize = (mapsize + (sizeof(long) - 1UL)) & ~(sizeof(long) - 1UL); - bdata->node_bootmem_map = phys_to_virt(mapstart << PAGE_SHIFT); + bdata->node_bootmem_map = logical_to_virtual(mapstart << PAGE_SHIFT); bdata->node_boot_start = (start << PAGE_SHIFT); bdata->node_low_pfn = end; @@ -214,12 +214,12 @@ areasize = 0; // last_pos unchanged bdata->last_offset = offset+size; - ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset + + ret = logical_to_virtual(bdata->last_pos*PAGE_SIZE + offset + bdata->node_boot_start); } else { remaining_size = size - remaining_size; areasize = (remaining_size+PAGE_SIZE-1)/PAGE_SIZE; - ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset + + ret = logical_to_virtual(bdata->last_pos*PAGE_SIZE + offset + bdata->node_boot_start); bdata->last_pos = start+areasize-1; bdata->last_offset = remaining_size; @@ -228,7 +228,7 @@ } else { bdata->last_pos = start + areasize - 1; bdata->last_offset = size & ~PAGE_MASK; - ret = phys_to_virt(start * PAGE_SIZE + bdata->node_boot_start); + ret = logical_to_virtual(start * PAGE_SIZE + bdata->node_boot_start); } /* * Reserve the area now: @@ -265,7 +265,7 @@ * Now free the allocator bitmap itself, it's not * needed anymore: */ - page = virt_to_page(bdata->node_bootmem_map); + page = virtual_to_page(bdata->node_bootmem_map); count = 0; for (i = 0; i < ((bdata->node_low_pfn-(bdata->node_boot_start >> PAGE_SHIFT))/8 + PAGE_SIZE-1)/PAGE_SIZE; i++,page++) { count++; --- ../2.4.17.uml.clean/mm/memory.c Fri Dec 21 18:42:05 2001 +++ ./mm/memory.c Fri Apr 5 17:04:53 2002 @@ -796,7 +796,7 @@ unsigned long phys_addr, pgprot_t prot) { unsigned long end; - +BUG(); address &= ~PMD_MASK; end = address + size; if (end > PMD_SIZE) @@ -806,7 +806,7 @@ pte_t oldpage; oldpage = ptep_get_and_clear(pte); - page = virt_to_page(__va(phys_addr)); + page = physical_to_page(phys_addr); if ((!VALID_PAGE(page)) || PageReserved(page)) set_pte(pte, mk_pte_phys(phys_addr, prot)); forget_pte(oldpage); --- ../2.4.17.uml.clean/mm/page_alloc.c Tue Nov 20 01:35:40 2001 +++ ./mm/page_alloc.c Fri Apr 5 18:28:30 2002 @@ -444,7 +444,7 @@ void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) - __free_pages(virt_to_page(addr), order); + __free_pages(virtual_to_page(addr), order); } /* @@ -735,7 +735,7 @@ struct page *page = mem_map + offset + i; page->zone = zone; if (j != ZONE_HIGHMEM) - page->virtual = __va(zone_start_paddr); + page->virtual = logical_to_virtual(zone_start_paddr); zone_start_paddr += PAGE_SIZE; } --- ../2.4.17.uml.clean/mm/page_io.c Tue Nov 20 00:19:42 2001 +++ ./mm/page_io.c Fri Apr 5 18:27:58 2002 @@ -110,7 +110,7 @@ */ void rw_swap_page_nolock(int rw, swp_entry_t entry, char *buf) { - struct page *page = virt_to_page(buf); + struct page *page = virtual_to_page(buf); if (!PageLocked(page)) PAGE_BUG(page); --- ../2.4.17.uml.clean/mm/slab.c Fri Dec 21 18:42:05 2001 +++ ./mm/slab.c Fri Apr 5 18:08:32 2002 @@ -506,7 +506,7 @@ static inline void kmem_freepages (kmem_cache_t *cachep, void *addr) { unsigned long i = (1<<cachep->gfporder); - struct page *page = virt_to_page(addr); + struct page *page = virtual_to_page(addr); /* free_pages() does not clear the type bit - we do that. * The pages have been unlinked from their cache-slab, @@ -1151,7 +1151,7 @@ /* Nasty!!!!!! I hope this is OK. */ i = 1 << cachep->gfporder; - page = virt_to_page(objp); + page = virtual_to_page(objp); do { SET_PAGE_CACHE(page, cachep); SET_PAGE_SLAB(page, slabp); @@ -1395,14 +1395,14 @@ { slab_t* slabp; - CHECK_PAGE(virt_to_page(objp)); + CHECK_PAGE(virtual_to_page(objp)); /* reduces memory footprint * if (OPTIMIZE(cachep)) slabp = (void*)((unsigned long)objp&(~(PAGE_SIZE-1))); else */ - slabp = GET_PAGE_SLAB(virt_to_page(objp)); + slabp = GET_PAGE_SLAB(virtual_to_page(objp)); #if DEBUG if (cachep->flags & SLAB_DEBUG_INITIAL) @@ -1475,7 +1475,7 @@ #ifdef CONFIG_SMP cpucache_t *cc = cc_data(cachep); - CHECK_PAGE(virt_to_page(objp)); + CHECK_PAGE(virtual_to_page(objp)); if (cc) { int batchcount; if (cc->avail < cc->limit) { @@ -1557,8 +1557,8 @@ { unsigned long flags; #if DEBUG - CHECK_PAGE(virt_to_page(objp)); - if (cachep != GET_PAGE_CACHE(virt_to_page(objp))) + CHECK_PAGE(virtual_to_page(objp)); + if (cachep != GET_PAGE_CACHE(virtual_to_page(objp))) BUG(); #endif @@ -1582,8 +1582,8 @@ if (!objp) return; local_irq_save(flags); - CHECK_PAGE(virt_to_page(objp)); - c = GET_PAGE_CACHE(virt_to_page(objp)); + CHECK_PAGE(virtual_to_page(objp)); + c = GET_PAGE_CACHE(virtual_to_page(objp)); __kmem_cache_free(c, (void*)objp); local_irq_restore(flags); } --- ../2.4.17.uml.clean/mm/swapfile.c Fri Dec 21 18:42:05 2001 +++ ./mm/swapfile.c Fri Apr 5 18:28:55 2002 @@ -962,7 +962,7 @@ goto bad_swap; } - lock_page(virt_to_page(swap_header)); + lock_page(virtual_to_page(swap_header)); rw_swap_page_nolock(READ, SWP_ENTRY(type,0), (char *) swap_header); if (!memcmp("SWAP-SPACE",swap_header->magic.magic,10)) |
From: Martin J. B. <Mar...@us...> - 2002-04-06 16:19:52
|
> - The unit of physical contiguity is called a 'section'. The size of a > section is defined by SECTION_SHIFT. > > ... > > +#define SECTION_SHIFT 20 /* 1 meg sections */ The buddy allocator assumes that it can return phys contig sections. To avoid breaking this assumption, doesn't 2^SECTION_SHIFT have to be >= (2^MAX_ORDER)*PAGE_SIZE? From our previous discussion, I thought we'd said that your sections have to be at least that big .... Or am I confused? M. |
From: Daniel P. <phi...@bo...> - 2002-04-06 20:20:32
|
On April 6, 2002 06:20 pm, Martin J. Bligh wrote: > > - The unit of physical contiguity is called a 'section'. The size of a > > section is defined by SECTION_SHIFT. > > > > ... > > > > +#define SECTION_SHIFT 20 /* 1 meg sections */ > > The buddy allocator assumes that it can return phys contig sections. > To avoid breaking this assumption, doesn't 2^SECTION_SHIFT have to > be >= (2^MAX_ORDER)*PAGE_SIZE? From our previous discussion, I thought > we'd said that your sections have to be at least that big .... > Or am I confused? No, you're right. I should have set my demo SECTION_SHIFT to MAX_ORDER + PAGE_SHIFT. Not that it caused a problem in this case ;-) Realistic section sizes start in the large numbers of megabytes. -- Daniel |
From: Martin J. B. <Mar...@us...> - 2002-04-06 21:41:16
|
I'm curious as to how mem_map works under this new infrastructure. As it's basically a contiguous array of struct pages, I would have thought it indexed as a continuous array by the logical address space ... is that correct? I see this in your patch still ... #define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT) is that just a leftover bit of stuff that hasn't been removed yet, or am I misunderstanding how mem_map is now used? M. --On Friday, April 05, 2002 10:57 PM +0200 Daniel Phillips <phi...@bo...> wrote: > Here's the config_nonlinear patch for uml. I'm not going to go deeply > into the theory here because for one thing we've all discussed it at > some length, and I'm preparing a more detailed [rfc] for lkml. However, > I wanted to post this on lse earlier rather than later though, so we can > take a look at the concrete code as opposed to theorizing about it. > > The basic ideas are: > > - A new, conceptual address space 'logical' is inserted between > 'virtual' and 'physical' > > - Both bootmem and buddy-system allocations are carried out in the > logical space rather than physical > > - The logical address space is continuous. As well as allocation, all > the usual Linux assumptions based on a linear memory map continue to > hold. > > - The logical <==> physical translations are carried out with the aid of > a pair of tables, indexed by a few high bits of the physical or > logical address, respectively. These tables are small. In the > current patch, each table is 32 longs in size, byte tables could be > used just as well, > > - The unit of physical contiguity is called a 'section'. The size of a > section is defined by SECTION_SHIFT. Logical and physical address > spaces are up divided into sections of the same size. Logical > sections are mapped (via the table) onto physical sections in any > order. Typically the logical map will have no holes in it (though > there is no real requirement for this) and the physical map will have > holes (after all, this is why this patch was developed). > > Compared to the incumbent config_discontigmem model, the config_nonlinear > approach offers a number of advantages: > > - Avoids table lookup for virtual_to_page and VALID_PAGE > - Does not fragment the logical allocation space > - Needs no _alloc_pages layer underneath alloc_pages > - VALID_PAGE needs no table translation. > > In addition, the config_nonlinear approach is nearly completely generic > across architectures, whereas the config_discontigmem shows a disturbing > amount of variation across architectures (for no real reason other than > code drift I believe) and is also bound together with the config_numa > option in a rather unsatisfying way. > > The config_nonlinear model introduces a number of new address translation > functions. The functions formerly known as virt_to_phys and phys_to_virt > are renamed: > > static inline unsigned long virtual_to_logical(void *v) > static inline void *logical_to_virtual(unsigned long p) > > The following four functions are the only ones that involve translation > through tables: > > static inline unsigned long virtual_to_physical(void *v) > static inline void *physical_to_virtual(unsigned long p) > static inline unsigned long physical_to_pagenum(unsigned long p) > static inline unsigned long pagenum_to_physical(unsigned long n) > > The reason that there are four instead of two is that the table lookup > can be optimized depending on the different usage. Incidently, the > tables as implemented in the current patch tranlate section numbers to > page numbers. Arguably, translating section numbers to section numbers > would be a superior approach in some cases, and this itself could be a > config option. > > If config_nonlinear is 'n', the following functions have simple > definitions: > > #define virtual_to_physical virtual_to_logical > #define physical_to_virtual logical_to_virtual > #define pagenum_to_physical(n) ((n) << PAGE_SHIFT) > #define physical_to_pagenum(p) ((p) >> PAGE_SHIFT) > > Besides that, the only difference between config_nonlinear on and off > is the definition of the following: > > - init_nonlinear > - show_nonlinear (debug output) > > Much of the patch is devoted to partitioning the usage of __pa (and the > synonomous virt_to_phys) into virtual_to_logical and virtual_to_physical > according to usage. (These are the same when config_nonlinear is 'n'.) > An analogous partition is done for __va (aka phys_to_virt). > > As well as making necessary changes, the patch changes some names that > don't really need to be changed, i.e., phys_to_virt becomes > physical_to_virtual, which is not necessarily an improvement. I'll be > playing with this a little, the final names aren't settled at all. I'm > open to flames and other opinions. > > The attached patch demonstrates the config_nonlinear principle by mapping > each even megabyte of virtual memory to the corresponding odd megabyte of > 'physical' (within uml's emulation context) memory, in other words, it > swaps every second megabyte. > > This patch is not gauranteed to leave you with a fully functional uml > system, in fact, it probably doesn't now that I look at it a little more > closely. However, it does boot, and the remaining debugging of the > address translations can be carried out with the help of test programs > running under uml. So it's not far from being fully functional. > > -- > Daniel > > --- ../2.4.17.uml.clean/arch/um/config.in Mon Mar 25 17:27:25 2002 > +++ ./arch/um/config.in Fri Apr 5 10:30:08 2002 > @@ -36,6 +36,7 @@ > bool '2G/2G host address space split' CONFIG_HOST_2G_2G > bool 'Symmetric multi-processing support' CONFIG_UML_SMP > define_bool CONFIG_SMP $CONFIG_UML_SMP > +bool 'Support for nonlinear physical memory' CONFIG_NONLINEAR > string 'Default main console channel initialization' > CONFIG_CON_ZERO_CHAN \ "fd:0,fd:1" > string 'Default console channel initialization' CONFIG_CON_CHAN "xterm" > --- ../2.4.17.uml.clean/arch/um/kernel/mem.c Mon Mar 25 17:27:26 2002 > +++ ./arch/um/kernel/mem.c Fri Apr 5 16:25:12 2002 > @@ -122,8 +122,8 @@ > printk ("Freeing initrd memory: %ldk freed\n", > (end - start) >> 10); > for (; start < end; start += PAGE_SIZE) { > - ClearPageReserved(virt_to_page(start)); > - set_page_count(virt_to_page(start), 1); > + ClearPageReserved(virtual_to_page(start)); > + set_page_count(virtual_to_page(start), 1); > free_page(start); > totalram_pages++; > } > --- ../2.4.17.uml.clean/arch/um/kernel/process_kern.c Mon Mar 25 17:27:26 > 2002 +++ ./arch/um/kernel/process_kern.c Fri Apr 5 10:30:08 2002 > @@ -501,12 +501,8 @@ > #ifdef CONFIG_SMP > return("(Unknown)"); > #else > - unsigned long addr; > - > - if((addr = um_virt_to_phys(current, > - current->mm->arg_start)) == 0xffffffff) > - return("(Unknown)"); > - else return((char *) addr); > + unsigned long addr = um_virt_to_phys(current, current->mm->arg_start); > + return addr == 0xffffffff? "(Unknown)": physical_to_virtual(addr); > #endif > } > > --- ../2.4.17.uml.clean/arch/um/kernel/um_arch.c Mon Mar 25 17:27:27 2002 > +++ ./arch/um/kernel/um_arch.c Fri Apr 5 21:06:05 2002 > @@ -270,6 +270,46 @@ > extern int jail; > void *brk_start; > > +#ifdef CONFIG_NONLINEAR > +unsigned long psection[MAX_SECTIONS]; > +unsigned long vsection[MAX_SECTIONS]; > + > +static int init_nonlinear(void) > +{ > + unsigned i, sect2pfn = SECTION_SHIFT - PAGE_SHIFT; > + unsigned base_section = (PAGE_OFFSET - NONLINEAR_BASE) >> SECTION_SHIFT; > + > + printk(">>> sections = %x\n", MAX_SECTIONS - base_section); > + memset(psection, -1, sizeof(psection)); > + memset(vsection, -1, sizeof(vsection)); > + for (i = 0; i < MAX_SECTIONS - base_section; i++) > + psection[base_section + i] = (i ^ (i >= 2)) << sect2pfn; > + > + for (i = 0; i < MAX_SECTIONS; i++) > + if (~psection[i] && psection[i] >> sect2pfn < MAX_SECTIONS) > + vsection[psection[i] >> sect2pfn] = i << sect2pfn; > + > + return 0; > +} > + > +static void show_nonlinear(void) > +{ > + int i; > + printk(">>> Logical section to Physical num: "); > + for (i = 0; i < MAX_SECTIONS; i++) printk("%lx ", psection[i]); > printk("\n"); + printk(">>> Physical section to Logical num: "); > + for (i = 0; i < MAX_SECTIONS; i++) printk("%lx ", vsection[i]); > printk("\n"); +} > + > +#else > +# ifndef nil > +# define nil do { } while (0) > +# endif > + > +#define init_nonlinear() nil > +#define show_nonlinear() nil > +#endif > + > int linux_main(int argc, char **argv) > { > unsigned long start_pfn, end_pfn, bootmap_size; > @@ -294,6 +334,9 @@ > /* Start physical memory at least 4M after the current brk */ > uml_physmem = ROUND_4M(brk_start) + (1 << 22); > > + init_nonlinear(); > + show_nonlinear(); > + > setup_machinename(system_utsname.machine); > > argv1_begin = argv[1]; > @@ -322,10 +365,10 @@ > setup_memory(); > high_physmem = uml_physmem + physmem_size; > > - start_pfn = PFN_UP(__pa(uml_physmem)); > - end_pfn = PFN_DOWN(__pa(high_physmem)); > + start_pfn = PFN_UP(virtual_to_logical(uml_physmem)); > + end_pfn = PFN_DOWN(virtual_to_logical(high_physmem)); > bootmap_size = init_bootmem(start_pfn, end_pfn - start_pfn); > - free_bootmem(__pa(uml_physmem) + bootmap_size, > + free_bootmem(virtual_to_logical(uml_physmem) + bootmap_size, > high_physmem - uml_physmem - bootmap_size); > uml_postsetup(); > > --- ../2.4.17.uml.clean/drivers/char/mem.c Fri Dec 21 18:41:54 2001 > +++ ./drivers/char/mem.c Fri Apr 5 10:30:08 2002 > @@ -79,7 +79,7 @@ > unsigned long end_mem; > ssize_t read; > > - end_mem = __pa(high_memory); > + end_mem = virtual_to_logical(high_memory); > if (p >= end_mem) > return 0; > if (count > end_mem - p) > @@ -101,7 +101,7 @@ > } > } > #endif > - if (copy_to_user(buf, __va(p), count)) > + if (copy_to_user(buf, logical_to_virtual(p), count)) > return -EFAULT; > read += count; > *ppos += read; > @@ -114,12 +114,12 @@ > unsigned long p = *ppos; > unsigned long end_mem; > > - end_mem = __pa(high_memory); > + end_mem = virtual_to_logical(high_memory); > if (p >= end_mem) > return 0; > if (count > end_mem - p) > count = end_mem - p; > - return do_write_mem(file, __va(p), p, buf, count, ppos); > + return do_write_mem(file, logical_to_virtual(p), p, buf, count, ppos); > } > > #ifndef pgprot_noncached > @@ -178,7 +178,7 @@ > test_bit(X86_FEATURE_CENTAUR_MCR, &boot_cpu_data.x86_capability) ) > && addr >= __pa(high_memory); > #else > - return addr >= __pa(high_memory); > + return addr >= virtual_to_physical(high_memory); // bogosity alert!! > #endif > } > > @@ -200,7 +200,7 @@ > /* > * Don't dump addresses that are not real memory to a core file. > */ > - if (offset >= __pa(high_memory) || (file->f_flags & O_SYNC)) > + if (offset >= virtual_to_logical(high_memory) || (file->f_flags & > O_SYNC)) vma->vm_flags |= VM_IO; > > if (remap_page_range(vma->vm_start, offset, vma->vm_end-vma->vm_start, > --- ../2.4.17.uml.clean/fs/proc/kcore.c Fri Sep 14 01:04:43 2001 > +++ ./fs/proc/kcore.c Fri Apr 5 10:30:08 2002 > @@ -239,7 +239,7 @@ > phdr->p_flags = PF_R|PF_W|PF_X; > phdr->p_offset = dataoff; > phdr->p_vaddr = PAGE_OFFSET; > - phdr->p_paddr = __pa(PAGE_OFFSET); > + phdr->p_paddr = virtual_to_physical(PAGE_OFFSET); > phdr->p_filesz = phdr->p_memsz = ((unsigned long)high_memory - > PAGE_OFFSET); phdr->p_align = PAGE_SIZE; > > @@ -256,7 +256,7 @@ > phdr->p_flags = PF_R|PF_W|PF_X; > phdr->p_offset = (size_t)m->addr - PAGE_OFFSET + dataoff; > phdr->p_vaddr = (size_t)m->addr; > - phdr->p_paddr = __pa(m->addr); > + phdr->p_paddr = virtual_to_physical(m->addr); > phdr->p_filesz = phdr->p_memsz = m->size; > phdr->p_align = PAGE_SIZE; > } > @@ -382,7 +382,7 @@ > } > #endif > /* fill the remainder of the buffer from kernel VM space */ > - start = (unsigned long)__va(*fpos - elf_buflen); > + start = (unsigned long) logical_to_virtual(*fpos - elf_buflen); > if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) > tsz = buflen; > > --- ../2.4.17.uml.clean/include/asm-i386/io.h Wed Mar 27 23:31:33 2002 > +++ ./include/asm-i386/io.h Fri Apr 5 20:49:51 2002 > @@ -60,20 +60,6 @@ > #endif > > /* > - * Change virtual addresses to physical addresses and vv. > - * These are pretty trivial > - */ > -static inline unsigned long virt_to_phys(void *address) > -{ > - return __pa(address); > -} > - > -static inline void * phys_to_virt(unsigned long address) > -{ > - return __va(address); > -} > - > -/* > * Change "struct page" to physical address. > */ > #define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT) > --- ../2.4.17.uml.clean/include/asm-um/page.h Wed Mar 27 23:31:33 2002 > +++ ./include/asm-um/page.h Fri Apr 5 20:49:42 2002 > @@ -29,30 +29,88 @@ > > #endif /* __ASSEMBLY__ */ > > +#define __va_space (8*1024*1024) > + > extern unsigned long uml_physmem; > +extern unsigned long max_mapnr; > > -#define __va_space (8*1024*1024) > +static inline int VALID_PAGE(struct page *page) > +{ > + return page - mem_map < max_mapnr; > +} > > -static inline unsigned long __pa(void *virt) > +/* Logical/Virtual */ > + > +static inline void *logical_to_virtual(unsigned long p) > { > - return (unsigned long) (virt) - PAGE_OFFSET; > + return (void *) ((unsigned long) p + PAGE_OFFSET); > } > > -static inline void *__va(unsigned long phys) > +static inline unsigned long virtual_to_logical(void *v) > { > - return (void *) ((unsigned long) (phys) + PAGE_OFFSET); > +// assert(it's a kernel virtual); > + return (unsigned long) v - PAGE_OFFSET; > } > > -static inline struct page *virt_to_page(void *kaddr) > +#ifdef CONFIG_NONLINEAR > +#define MAX_SECTIONS (32) > +#define SECTION_SHIFT 20 /* 1 meg sections */ > +#define SECTION_MASK (~(-1 << SECTION_SHIFT)) > +#define NONLINEAR_BASE PAGE_OFFSET > + > +extern unsigned long psection[MAX_SECTIONS]; > +extern unsigned long vsection[MAX_SECTIONS]; > + > +#include <stdarg.h> > +#include <linux/linkage.h> > + > +asmlinkage int printk(const char *fmt, ...) > + __attribute__ ((format (printf, 1, 2))); > + > +static inline unsigned long virtual_to_physical(void *v) > { > - return mem_map + (__pa(kaddr) >> PAGE_SHIFT); > + unsigned long p = (unsigned long) v - NONLINEAR_BASE; > + return (psection[p >> SECTION_SHIFT] << PAGE_SHIFT) + (p & > SECTION_MASK); } > > -extern unsigned long max_mapnr; > +static inline void *physical_to_virtual(unsigned long p) > +{ > + return (void *) ((vsection[p >> SECTION_SHIFT] << PAGE_SHIFT) + (p & > SECTION_MASK)) + + NONLINEAR_BASE; > +} > > -static inline int VALID_PAGE(struct page *page) > +static inline unsigned long pagenum_to_physical(unsigned long n) > { > - return page - mem_map < max_mapnr; > + return (psection[n >> (SECTION_SHIFT - PAGE_SHIFT)] + > + (n & (SECTION_MASK >> PAGE_SHIFT))) > + << PAGE_SHIFT; > +} > + > +static inline unsigned long physical_to_pagenum(unsigned long p) > +{ > + return vsection[p >> SECTION_SHIFT] + ((p & SECTION_MASK) >> > PAGE_SHIFT); +} > + > +#else > +#define virtual_to_physical virtual_to_logical > +#define physical_to_virtual logical_to_virtual > +#define pagenum_to_physical(n) ((n) << PAGE_SHIFT) > +#define physical_to_pagenum(p) ((p) >> PAGE_SHIFT) > +#endif /* CONFIG_NONLINEAR */ > + > +static inline struct page *physical_to_page(unsigned long p) > +{ > + return mem_map + physical_to_pagenum(p); > +} > + > +static inline unsigned long virtual_to_pagenum(void *v) > +{ > + return virtual_to_logical(v) >> PAGE_SHIFT; > +} > + > +static inline struct page *virtual_to_page(void *v) > +{ > + return mem_map + virtual_to_pagenum(v); > } > > #endif > --- ../2.4.17.uml.clean/include/asm-um/pgtable.h Mon Mar 25 17:27:28 2002 > +++ ./include/asm-um/pgtable.h Fri Apr 5 20:49:51 2002 > @@ -150,7 +150,7 @@ > > #define BAD_PAGETABLE __bad_pagetable() > #define BAD_PAGE __bad_page() > -#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page)) > +#define ZERO_PAGE(vaddr) (virtual_to_page(empty_zero_page)) > > /* number of bits that fit into a memory pointer */ > #define BITS_PER_PTR (8*sizeof(unsigned long)) > @@ -197,9 +197,8 @@ > #define page_address(page) ({ if (!(page)->virtual) BUG(); > (page)->virtual; }) #define __page_address(page) ({ PAGE_OFFSET + > (((page) - mem_map) << PAGE_SHIFT); }) #define pages_to_mb(x) ((x) >> > (20-PAGE_SHIFT)) > -#define pte_page(x) \ > - (mem_map+((unsigned long)((__pa(pte_val(x)) >> PAGE_SHIFT)))) > -#define pte_address(x) ((void *) ((unsigned long) pte_val(x) & > PAGE_MASK)) +#define pte_page(x) (mem_map + > physical_to_pagenum(pte_val(x))) +#define pte_address(x) > (physical_to_virtual(pte_val(x) & PAGE_MASK)) > static inline pte_t pte_mknewprot(pte_t pte) > { > @@ -316,15 +315,14 @@ > #define mk_pte(page, pgprot) \ > ({ \ > pte_t __pte; \ > - \ > - pte_val(__pte) = ((unsigned long) __va((page-mem_map)*(unsigned > long)PAGE_SIZE + pgprot_val(pgprot))); \ + pte_val(__pte) = > pagenum_to_physical(page - mem_map) + pgprot_val(pgprot); \ > if(pte_present(__pte)) pte_mknewprot(pte_mknewpage(__pte)); \ > __pte; \ > }) > > /* This takes a physical page address that is used by the remapping > functions */ #define mk_pte_phys(physpage, pgprot) \ > -({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); __pte; }) > +({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); BUG(); > __pte; }) > static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) > { > --- ../2.4.17.uml.clean/include/asm-um/processor-generic.h Mon Mar 25 > 17:27:28 2002 +++ ./include/asm-um/processor-generic.h Fri Apr 5 > 20:49:51 2002 @@ -115,7 +115,7 @@ > extern struct task_struct *alloc_task_struct(void); > extern void free_task_struct(struct task_struct *task); > > -#define get_task_struct(tsk) atomic_inc(&virt_to_page(tsk)->count) > +#define get_task_struct(tsk) > atomic_inc(&virtual_to_page(tsk)->count) > extern void release_thread(struct task_struct *); > extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long > flags); --- ../2.4.17.uml.clean/include/linux/bootmem.h Thu Nov 22 > 20:47:23 2001 +++ ./include/linux/bootmem.h Fri Apr 5 20:49:51 2002 > @@ -35,11 +35,11 @@ > extern void __init free_bootmem (unsigned long addr, unsigned long size); > extern void * __init __alloc_bootmem (unsigned long size, unsigned long > align, unsigned long goal); #define alloc_bootmem(x) \ > - __alloc_bootmem((x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) > + __alloc_bootmem((x), SMP_CACHE_BYTES, virtual_to_logical((void *) > MAX_DMA_ADDRESS)) #define alloc_bootmem_low(x) \ > __alloc_bootmem((x), SMP_CACHE_BYTES, 0) > #define alloc_bootmem_pages(x) \ > - __alloc_bootmem((x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) > + __alloc_bootmem((x), PAGE_SIZE, virtual_to_logical((void *) > MAX_DMA_ADDRESS)) #define alloc_bootmem_low_pages(x) \ > __alloc_bootmem((x), PAGE_SIZE, 0) > extern unsigned long __init free_all_bootmem (void); > @@ -50,9 +50,9 @@ > extern unsigned long __init free_all_bootmem_node (pg_data_t *pgdat); > extern void * __init __alloc_bootmem_node (pg_data_t *pgdat, unsigned > long size, unsigned long align, unsigned long goal); #define > alloc_bootmem_node(pgdat, x) \ > - __alloc_bootmem_node((pgdat), (x), SMP_CACHE_BYTES, > __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem_node((pgdat), (x), > SMP_CACHE_BYTES, virtual_to_logical((void *) MAX_DMA_ADDRESS)) #define > alloc_bootmem_pages_node(pgdat, x) \ > - __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) > + __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, virtual_to_logical((void > *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low_pages_node(pgdat, x) \ > __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, 0) > > --- ../2.4.17.uml.clean/mm/bootmem.c Fri Dec 21 18:42:04 2001 > +++ ./mm/bootmem.c Fri Apr 5 18:42:10 2002 > @@ -51,7 +51,7 @@ > pgdat_list = pgdat; > > mapsize = (mapsize + (sizeof(long) - 1UL)) & ~(sizeof(long) - 1UL); > - bdata->node_bootmem_map = phys_to_virt(mapstart << PAGE_SHIFT); > + bdata->node_bootmem_map = logical_to_virtual(mapstart << PAGE_SHIFT); > bdata->node_boot_start = (start << PAGE_SHIFT); > bdata->node_low_pfn = end; > > @@ -214,12 +214,12 @@ > areasize = 0; > // last_pos unchanged > bdata->last_offset = offset+size; > - ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset + > + ret = logical_to_virtual(bdata->last_pos*PAGE_SIZE + offset + > bdata->node_boot_start); > } else { > remaining_size = size - remaining_size; > areasize = (remaining_size+PAGE_SIZE-1)/PAGE_SIZE; > - ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset + > + ret = logical_to_virtual(bdata->last_pos*PAGE_SIZE + offset + > bdata->node_boot_start); > bdata->last_pos = start+areasize-1; > bdata->last_offset = remaining_size; > @@ -228,7 +228,7 @@ > } else { > bdata->last_pos = start + areasize - 1; > bdata->last_offset = size & ~PAGE_MASK; > - ret = phys_to_virt(start * PAGE_SIZE + bdata->node_boot_start); > + ret = logical_to_virtual(start * PAGE_SIZE + bdata->node_boot_start); > } > /* > * Reserve the area now: > @@ -265,7 +265,7 @@ > * Now free the allocator bitmap itself, it's not > * needed anymore: > */ > - page = virt_to_page(bdata->node_bootmem_map); > + page = virtual_to_page(bdata->node_bootmem_map); > count = 0; > for (i = 0; i < ((bdata->node_low_pfn-(bdata->node_boot_start >> > PAGE_SHIFT))/8 + PAGE_SIZE-1)/PAGE_SIZE; i++,page++) { count++; > --- ../2.4.17.uml.clean/mm/memory.c Fri Dec 21 18:42:05 2001 > +++ ./mm/memory.c Fri Apr 5 17:04:53 2002 > @@ -796,7 +796,7 @@ > unsigned long phys_addr, pgprot_t prot) > { > unsigned long end; > - > +BUG(); > address &= ~PMD_MASK; > end = address + size; > if (end > PMD_SIZE) > @@ -806,7 +806,7 @@ > pte_t oldpage; > oldpage = ptep_get_and_clear(pte); > > - page = virt_to_page(__va(phys_addr)); > + page = physical_to_page(phys_addr); > if ((!VALID_PAGE(page)) || PageReserved(page)) > set_pte(pte, mk_pte_phys(phys_addr, prot)); > forget_pte(oldpage); > --- ../2.4.17.uml.clean/mm/page_alloc.c Tue Nov 20 01:35:40 2001 > +++ ./mm/page_alloc.c Fri Apr 5 18:28:30 2002 > @@ -444,7 +444,7 @@ > void free_pages(unsigned long addr, unsigned int order) > { > if (addr != 0) > - __free_pages(virt_to_page(addr), order); > + __free_pages(virtual_to_page(addr), order); > } > > /* > @@ -735,7 +735,7 @@ > struct page *page = mem_map + offset + i; > page->zone = zone; > if (j != ZONE_HIGHMEM) > - page->virtual = __va(zone_start_paddr); > + page->virtual = logical_to_virtual(zone_start_paddr); > zone_start_paddr += PAGE_SIZE; > } > > --- ../2.4.17.uml.clean/mm/page_io.c Tue Nov 20 00:19:42 2001 > +++ ./mm/page_io.c Fri Apr 5 18:27:58 2002 > @@ -110,7 +110,7 @@ > */ > void rw_swap_page_nolock(int rw, swp_entry_t entry, char *buf) > { > - struct page *page = virt_to_page(buf); > + struct page *page = virtual_to_page(buf); > > if (!PageLocked(page)) > PAGE_BUG(page); > --- ../2.4.17.uml.clean/mm/slab.c Fri Dec 21 18:42:05 2001 > +++ ./mm/slab.c Fri Apr 5 18:08:32 2002 > @@ -506,7 +506,7 @@ > static inline void kmem_freepages (kmem_cache_t *cachep, void *addr) > { > unsigned long i = (1<<cachep->gfporder); > - struct page *page = virt_to_page(addr); > + struct page *page = virtual_to_page(addr); > > /* free_pages() does not clear the type bit - we do that. > * The pages have been unlinked from their cache-slab, > @@ -1151,7 +1151,7 @@ > > /* Nasty!!!!!! I hope this is OK. */ > i = 1 << cachep->gfporder; > - page = virt_to_page(objp); > + page = virtual_to_page(objp); > do { > SET_PAGE_CACHE(page, cachep); > SET_PAGE_SLAB(page, slabp); > @@ -1395,14 +1395,14 @@ > { > slab_t* slabp; > > - CHECK_PAGE(virt_to_page(objp)); > + CHECK_PAGE(virtual_to_page(objp)); > /* reduces memory footprint > * > if (OPTIMIZE(cachep)) > slabp = (void*)((unsigned long)objp&(~(PAGE_SIZE-1))); > else > */ > - slabp = GET_PAGE_SLAB(virt_to_page(objp)); > + slabp = GET_PAGE_SLAB(virtual_to_page(objp)); > > #if DEBUG > if (cachep->flags & SLAB_DEBUG_INITIAL) > @@ -1475,7 +1475,7 @@ > #ifdef CONFIG_SMP > cpucache_t *cc = cc_data(cachep); > > - CHECK_PAGE(virt_to_page(objp)); > + CHECK_PAGE(virtual_to_page(objp)); > if (cc) { > int batchcount; > if (cc->avail < cc->limit) { > @@ -1557,8 +1557,8 @@ > { > unsigned long flags; > #if DEBUG > - CHECK_PAGE(virt_to_page(objp)); > - if (cachep != GET_PAGE_CACHE(virt_to_page(objp))) > + CHECK_PAGE(virtual_to_page(objp)); > + if (cachep != GET_PAGE_CACHE(virtual_to_page(objp))) > BUG(); > #endif > > @@ -1582,8 +1582,8 @@ > if (!objp) > return; > local_irq_save(flags); > - CHECK_PAGE(virt_to_page(objp)); > - c = GET_PAGE_CACHE(virt_to_page(objp)); > + CHECK_PAGE(virtual_to_page(objp)); > + c = GET_PAGE_CACHE(virtual_to_page(objp)); > __kmem_cache_free(c, (void*)objp); > local_irq_restore(flags); > } > --- ../2.4.17.uml.clean/mm/swapfile.c Fri Dec 21 18:42:05 2001 > +++ ./mm/swapfile.c Fri Apr 5 18:28:55 2002 > @@ -962,7 +962,7 @@ > goto bad_swap; > } > > - lock_page(virt_to_page(swap_header)); > + lock_page(virtual_to_page(swap_header)); > rw_swap_page_nolock(READ, SWP_ENTRY(type,0), (char *) swap_header); > > if (!memcmp("SWAP-SPACE",swap_header->magic.magic,10)) > > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech > |
From: Daniel P. <phi...@bo...> - 2002-04-07 01:37:13
|
On April 6, 2002 11:41 pm, Martin J. Bligh wrote: > I'm curious as to how mem_map works under this new infrastructure. > As it's basically a contiguous array of struct pages, I would have > thought it indexed as a continuous array by the logical address space > ... is that correct? I see this in your patch still ... > > #define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT) > > is that just a leftover bit of stuff that hasn't been removed > yet, or am I misunderstanding how mem_map is now used? You were correct the first time - mem_map is still an array. It maps directly onto the logical address space. In today's revised patch, which I'm still tidying up, we have virt_to_logical (subtract a constant) and logical_to_ordinal (shift right by PAGE_SHIFT). I am also considering the wisdom of defining ordinal_to_page (multiply by sizeof(struct page) and add mem_map) and page_to_ordinal (subtranct mem_map and divide by sizeof(struct page). These last two hooks aren't needed for config_nonlinear, but they are probably needed for config_numa, i.e., you want to hide all +/1 mem_map. -- Daniel |
From: Jack S. <st...@sg...> - 2002-04-08 17:40:04
|
> > Here's the config_nonlinear patch for uml. I'm not going to go deeply into > the theory here because for one thing we've all discussed it at some length, > and I'm preparing a more detailed [rfc] for lkml. However, I wanted to post > this on lse earlier rather than later though, so we can take a look at the > concrete code as opposed to theorizing about it. > ... Daniel - I've downloaded your patch & am starting to take a look at it. Before eliminating DISCONTIGMEM, we need to make sure that the Atlas version of the DISCONTIG patch still works (or can be made to work). We should also make sure that replacing the current DISCONTIG code is the best technical approach to take. Atlas (HP, IBM, Intel, NEC, SGI) have made substantial modifications to DISCONTIG for IA64. These modifications are available on the Atlas site: https://sourceforge.net/projects/discontig/ This patch is based on kanoj's DISCONTIG patch & fixes a number of the problems that you referred to in earlier mail. In addition, it makes DISCONTIG function correctly on a number of NUMA platforms with fairly diverse requirements. The patch is currently running on a number of IA64 NUMA platforms. The patch makes no substantial changes to any non-IA64 files. As currently written, the patch is specific to IA64, but could likely be extended to other platforms without much work. Aside from some packaging issues, the majority of the changes would be in the area of discovering the memory configuration of the platform. IA64 uses extentions that have been made to the platform's ACPI tables. Some of the platforms supported by the atlas modifications to the DISCONTIG patch have fairly bizarre requirements. I'm most familar with the requirements of the SGI system. (This system was designed by some of the evil engineers that was referred to in earlier mail :-) - this is a NUMA system - no memory at physical address 0. - no specific physical address is guaranteed to exist. Lowest POSSIBLE address is at 192GB, but even that address may not be present. Lowest address will be 192GB+N*256GB where N is some number in the range 0..2047. - Memory within a node is sparse. For example, a 1 node system with 4 1GB dimms, memory be at: 192GB - 193GB 208GB - 209GB 224GB - 225GB 240GB - 241GB - multinode systems have VERY sparse memory. Base address for each node's memory starts at 256GB intervals & may only contain a few GB. - IO space is located WITHIN the physical address ranges of RAM. On multinode systems, IO space is interspersed with normal RAM. For example, if N is 0, IO space is at 64GB-192GB. Do you see any problem adapting your patch to work with these requirements??? As far as NUMA is concerned, all page struct entries for the memory on a node are contained on the node. No off-node references are required to reference a local page struct entry. References to off-node page struct's dont require off-node references except for the page-struct entry itself. ---------- FWIW (From the generic header file mmzone.h) /* * General Concepts: * * - Nodes are numbered several ways: * * compact node numbers - compact node numbers are a dense numbering of * all the nodes in the system. An N node system will have compact * nodes numbered 0 .. N-1. There is no significance to the node * numbers. The compact node number assigned to a specific physical * node may vary from boot to boot. The boot node is not necessarily * node 0. * * physical node numbers - Physical node numbers may not be dense * nor do they necessarily start with 0. The exact significance of * a physical node number is platform specific. * * proximity domain numbers - these numbers are assigned by ACPI. * Each platform must provide a platform specific function * for mapping proximity node numbers to physical node numbers. * * Most of the code in the kernel uses compact node numbers to identify nodes. * * * - Memory is conceptually divided into chunks. A chunk is either * completely present, or else the kernel assumes it is completely * absent. Each node consists of a number of possibly discontiguous chunks. * * - A contiguous group of memory chunks that reside on the same node * are referred to as a clump. Note that a clump may be partially present. * (Note, on some hardware implementations, a clump is the same as a memory * bank or a DIMM). * * - a node consists of multiple clumps of memory. From a NUMA perspective, * accesses to all clumps on the node have the same latency. Except for zone issues, * the clumps are treated as equivalent for allocation/performance purposes. * * - each node has a single contiguous mem_map array. The array contains page struct * entries for every page on the node. There are no "holes" in the mem_map array. * The node data area (see below) has pointers to the start of the mem_map entries * for each clump on the node. * * - associated with each node is a pg_data_t structure. This structure contains the * information used by the linux memory allocator for managing the memory on the * node. The pg_data_t structure for a node is located on the node. * * - to minimize offnode memory references, a "node directory" is maintained on each * node. This directory replicates frequently used read-only data structures that * are used in macro evaluation. Examples include the addresses of the * pernode pg_data structures for each node. * * - the MAP_NR function has been modified to be "clump aware" & uses the clump_mem_map_base * array in the node data area for generating MAP_NR numbers. * * - the node data area contains array of pointers to the mem_map entries for each clump * of memory. The array is indexed by a platform specific function. * * - each cpu has a pointer it's node data area contained in it's cpu_data structure. * * - each platform is responsible for defining the following constants & functions: * * PLAT_BOOTMEM_ALLOC_GOAL(cnode,kaddr) - Calculate a "goal" value to be passed * to __alloc_bootmem_node for allocating structures on nodes so that * they dont alias to the same line in the cache as the previous * allocated structure. You can return 0 if your platform doesnt have * this problem. * (Note: need better solution but works for now ZZZ). * * PLAT_CHUNKSIZE - defines the size of the platform memory chunk. * * PLAT_CHUNKNUM(kaddr) - takes a kaddr & returns its chunk number * * PLAT_CLUMP_MEM_MAP_INDEX(kaddr) - Given a kaddr, find the index into the * clump_mem_map_base array of the page struct entry for the first page * of the clump. * * PLAT_CLUMP_OFFSET(kaddr) - find the byte offset of a kaddr within the clump that * contains it. * * PLAT_CLUMPSIZE - defines the size in bytes of the smallest clump supported on the platform. * * PLAT_CLUMPS_PER_NODE - maximum number of clumps per node * * PLAT_MAXCLUMPS - maximum number of clumps on all node combined * * PLAT_MAX_COMPACT_NODES - maximum number of nodes in a system. (do not confuse this * with the maximum node number. Nodes can be sparsely numbered). * * PLAT_MAX_NODE_NUMBER - maximum physical node number plus 1 * * PLAT_MAX_PHYS_MEMORY - maximum physical memory address * * PLAT_PXM_TO_PHYS_NODE_NUMBER(pxm) - convert a proximity_domain number (from ACPI) * into a physical node number * * PLAT_VALID_MEM_KADDR(kaddr) - tests a kaddr to see if it potentially represents a * valid physical memory address. Return 1 if potentially valid, 0 otherwise. * (This function generally tests to see if any invalid bits are set in * the address). * */ (From the header file for SGI's mmzone.h) /* * SGI SN2 Arch defined values * * An SN2 physical address is broken down as follows: * * +-----------------------------------------+ * | | | | node offset | * | unused | node | AS |-------------------| * | | | | cn | clump offset | * +-----------------------------------------+ * 6 4 4 3 3 3 3 3 3 0 * 3 9 8 8 7 6 5 4 3 0 * * bits 63-49 Unused - must be zero * bits 48-38 Node number. Note that some configurations do NOT * have a node zero. * bits 37-36 Address space ID. Cached memory has a value of 3 (!!!). * Chipset & IO addresses have other values. * (Yikes!! The hardware folks hate us...) * bits 35-0 Node offset. * * The node offset can be further broken down as: * bits 35-34 Clump (bank) number. * bits 33-0 Clump (bank) offset. * * A node consists of up to 4 clumps (banks) of memory. A clump may be empty, or may be * populated with a single contiguous block of memory starting at clump * offset 0. The size of the block is (2**n) * 64MB, where 0<n<9. * * Important notes: * - IO space addresses are embedded with the range of valid memory addresses. * - All cached memory addresses have bits 36 & 37 set to 1's. * - There is no physical address 0. * */ -- Thanks Jack Steiner (651-683-5302) (vnet 233-5302) st...@sg... |
From: Martin J. B. <Mar...@us...> - 2002-04-09 05:43:46
|
> Some of the platforms supported by the atlas modifications > to the DISCONTIG patch have fairly bizarre requirements. From my understanding of Daniel's patch, it is of no consequence whatsoever how sparse / discontiguous your physical memory layout, as long as your contiguous areas of physical ram are a multiple of 2^MAX_ORDER pages - this is just a trick to make sure the buddy allocator always returns physically contiguous pages. One of the really nice things about the way Daniel does this is that it hides the complexities of discontiguous memory from the mem allocation code, which is the heavily used path. You only have to worry about it on setup, or memory hotplug. I tried to look at your patch in some more detail (we took a brief look before), but I think there's some dependency on a previous ia64 patch, which I don't recall, and can't find easily - can you point me to it? From what I could see from your patch, I don't see how __alloc_pages walks through the clumps / chunks looking for a valid memory area. Could you explain how this works? The only way I could think of without the remapping Daniel is doing is to have a seperate buddy allocator for each clump, or even for each chunk ... or else you just have one allocator per zone, and only free the pages into it on init that are actually present? (the latter seems the more feasible). > Memory is conceptually divided into chunks. A chunk is either > completely present, or else the kernel assumes it is completely > absent. Each node consists of a number of possibly discontiguous > chunks. > > A contiguous group of memory chunks that reside on the same node > are referred to as a clump. Note that a clump may be partially > present. (Note, on some hardware implementations, a clump is > the same as a memory bank or a DIMM). > > a node consists of multiple clumps of memory. From a NUMA > perspective, accesses to all clumps on the node have the same > latency. Except for zone issues, the clumps are treated as > equivalent for allocation/performance purposes. I'm unsure why you need this many levels of indirection ... if a clump is physically contiguous, why do you need to divide it into chunks? Is a chunk just the smallest unit that can be taken on or off line in a hot-swap operation? > As far as NUMA is concerned, all page struct entries for the memory on a > node are contained on the node. No off-node references are required to > reference a local page struct entry. References to off-node page struct's > dont require off-node references except for the page-struct entry itself. I don't think this would change for Daniel's stuff - as long as the pagetables themselves are allocated on a node-local basis. M. |
From: Daniel P. <phi...@bo...> - 2002-04-09 08:14:48
|
On April 9, 2002 07:44 am, Martin J. Bligh wrote: > > As far as NUMA is concerned, all page struct entries for the memory on a > > node are contained on the node. No off-node references are required to > > reference a local page struct entry. References to off-node page struct's > > dont require off-node references except for the page-struct entry itself. > > I don't think this would change for Daniel's stuff - as long as the > pagetables themselves are allocated on a node-local basis. There is one thing I can't handle elegantly within the config_nonlinear model: forcing all the struct pages belonging to one node to live in the same node. However, it's not too hard to graft that on as part of a numa patch that depends on config_nonlinear, should the extra capabilities of config_nonlinear be desired for a particular numa architecture. -- Daniel |
From: Martin J. B. <Mar...@us...> - 2002-04-09 14:14:11
|
>> I don't think this would change for Daniel's stuff - as long as the >> pagetables themselves are allocated on a node-local basis. > > There is one thing I can't handle elegantly within the config_nonlinear > model: forcing all the struct pages belonging to one node to live in the > same node. However, it's not too hard to graft that on as part of a numa > patch that depends on config_nonlinear, should the extra capabilities of > config_nonlinear be desired for a particular numa architecture. I don't see why that's any harder with your patch than without - you just alloc_bootmem for the lmem_map from the node for which the struct pages are related - just as we do today. In fact, for a 32 bit NUMA architecture, it's impossible to do this *without* your patch, as ZONE_NORMAL will all end up on node 0 (assuming a flat layout, and >= 1Gb in the first node). Thus all the struct pages have to go in node 0, not on their own node. Perhaps you see a problem I don't, but to me the situation seems to be better with your patch than without ... M. PS. BTW - has anyone looked at killing the global mem_map kludge we currently use for NUMA, and the dummy pointers we keep passing into free_area_init_core? Dan and I were discussing this recently - he pointed out mem_map ought not to be used for NUMA at all, there's really no such concept, just the lmem_map's ... do we *really* need a global pfn to work with for any given struct page? |
From: Daniel P. <phi...@bo...> - 2002-04-11 23:47:36
|
On April 9, 2002 04:14 pm, Martin J. Bligh wrote: > >> I don't think this would change for Daniel's stuff - as long as the > >> pagetables themselves are allocated on a node-local basis. > > > > There is one thing I can't handle elegantly within the config_nonlinear > > model: forcing all the struct pages belonging to one node to live in the > > same node. However, it's not too hard to graft that on as part of a numa > > patch that depends on config_nonlinear, should the extra capabilities of > > config_nonlinear be desired for a particular numa architecture. > > I don't see why that's any harder with your patch than without - > you just alloc_bootmem for the lmem_map from the node for which > the struct pages are related - just as we do today. Exactly, I'm just pointing out that it's not obvious how one could do that without the lmem_maps. > In fact, for a 32 bit NUMA architecture, it's impossible to do this > *without* your patch, as ZONE_NORMAL will all end up on node 0 > (assuming a flat layout, and >= 1Gb in the first node). Thus all > the struct pages have to go in node 0, not on their own node. Yes, as we observed in the irc discussion some weeks ago. > Perhaps you see a problem I don't, but to me the situation seems to > be better with your patch than without ... That's what I think. However in the interest of truth in advertising, I'm not claiming it's a cure for all that ails you. I'm presenting it as a more powerful, slightly more efficient replacement for config_discontigmem, as possible inspiration for some cleanups in config_numa, and as a method of distributing zone_normal across nodes on 32 bit numa. > PS. BTW - has anyone looked at killing the global mem_map kludge we > currently use for NUMA, and the dummy pointers we keep passing into > free_area_init_core? Dan and I were discussing this recently - > he pointed out mem_map ought not to be used for NUMA at all, there's > really no such concept, just the lmem_map's ... do we *really* need a > global pfn to work with for any given struct page? In my next iteration of the patch I will look at bringing all uses of mem_map inside the arch header files. This is most of the work needed to purge it from the numa code, as would be proper. -- Daniel |
From: Jack S. <st...@sg...> - 2002-04-09 15:03:16
|
> > Some of the platforms supported by the atlas modifications > > to the DISCONTIG patch have fairly bizarre requirements. > > >From my understanding of Daniel's patch, it is of no consequence > whatsoever how sparse / discontiguous your physical memory layout, > as long as your contiguous areas of physical ram are a multiple > of 2^MAX_ORDER pages - this is just a trick to make sure the buddy > allocator always returns physically contiguous pages. Not a problem. (Although I violated this once & experienced very amusing behavior :-) > > One of the really nice things about the way Daniel does this is that > it hides the complexities of discontiguous memory from the mem > allocation code, which is the heavily used path. You only have to > worry about it on setup, or memory hotplug. FWIW, Discontig has no changes in the memory allocator. There are a few minor changes in the boot allocator, though. > > I tried to look at your patch in some more detail (we took a brief > look before), but I think there's some dependency on a previous ia64 > patch, which I don't recall, and can't find easily - can you point > me to it? The patch has no dependencies on any other patches with the exception of Mosberger's IA64 patch. If you download the DISCONTIG patch from sourceforge & apply it to a 2.4.17 kernel + IA64 + DISCONTIG, it should be complete & functional. It should boot & run on several NUMA platforms used by Atlas members. (It requires an additional patch to actually run on SGI SN hardware, but that is unrelated to DISCONTIG. There are still a few mods to IA64 code that we have had accepted into the IA64 patch). > > >From what I could see from your patch, I don't see how __alloc_pages > walks through the clumps / chunks looking for a valid memory area. > Could you explain how this works? The only way I could think of > without the remapping Daniel is doing is to have a seperate buddy > allocator for each clump, or even for each chunk ... or else you > just have one allocator per zone, and only free the pages into it > on init that are actually present? (the latter seems the more > feasible). The allocator is not aware of clumps or chunks. The allocator is aware only of nodes. Each node has a single contiguous memmap structure that contains all the pages on the node. Consecutive entries in the memmap array are not necessarily consecutive physical pages. The "holes" caused by non-contiguous clumps are not reflected in the memmap array. This works because: - page_to_phys() uses page_address() which uses "virtual" in the page struct entry. - virt_to_page() uses the clump arrays & the clump map_nr functions CLUMP_MEM_MAP_BASE() & CLUMP_MAP_NR(). These functions use node-local data structures to do the mapping so that no off-node memory references are made. On some architectures, these functions could be purely arithmetic - no memory references. This would not work, however, on the SGI systems because of the sparseness of the nodes & the memory on the nodes. - CHUNK sizes are alway a multiple of 2^MAX_ORDER pages > > > Memory is conceptually divided into chunks. A chunk is either > > completely present, or else the kernel assumes it is completely > > absent. Each node consists of a number of possibly discontiguous > > chunks. > > > > A contiguous group of memory chunks that reside on the same node > > are referred to as a clump. Note that a clump may be partially > > present. (Note, on some hardware implementations, a clump is > > the same as a memory bank or a DIMM). > > > > a node consists of multiple clumps of memory. From a NUMA > > perspective, accesses to all clumps on the node have the same > > latency. Except for zone issues, the clumps are treated as > > equivalent for allocation/performance purposes. > > I'm unsure why you need this many levels of indirection ... if a > clump is physically contiguous, why do you need to divide it into > chunks? Is a chunk just the smallest unit that can be taken on or > off line in a hot-swap operation? Chunks are present primarily for kern_addr_valid(). Since clumps may be only partially present, we needs something that has finer grain resolution. The CHUNK is the unit that is completely present OR completely absent. Another way to look at it is that a CLUMP is like a DIMM. A CHUNK is like the minimum DIMM size - all DIMMs are some multiple of the minimum DIMM size. (This can get more more complicated on some hardware). > > > As far as NUMA is concerned, all page struct entries for the memory on a > > node are contained on the node. No off-node references are required to > > reference a local page struct entry. References to off-node page struct's > > dont require off-node references except for the page-struct entry itself. > > I don't think this would change for Daniel's stuff - as long as the > pagetables themselves are allocated on a node-local basis. -- Thanks Jack Steiner (651-683-5302) (vnet 233-5302) st...@sg... |
From: Daniel P. <phi...@bo...> - 2002-04-11 20:09:58
|
(sorry for the accidental reposting of the previous message) On April 9, 2002 05:02 pm, Jack Steiner wrote: > > One of the really nice things about the way Daniel does this is that > > it hides the complexities of discontiguous memory from the mem > > allocation code, which is the heavily used path. You only have to > > worry about it on setup, or memory hotplug. > > FWIW, Discontig has no changes in the memory allocator. There are a few minor > changes in the boot allocator, though. You mean your incarnation of disconfigmem has no changes in the alloc_pages path, right? Because the incumbent version adds _alloc_pages, which is a major change, and a broken one at that, since round robin allocation makes no sense in the context of non-numa discontigmem usage. > The allocator is not aware of clumps or chunks. The allocator is > aware only of nodes. Each node has a single contiguous memmap structure > that contains all the pages on the node. Yes, this is also what I set out to achieve. > Consecutive entries in the memmap array are not necessarily > consecutive physical pages. The "holes" caused by non-contiguous > clumps are not reflected in the memmap array. > > This works because: > > - page_to_phys() uses page_address() which uses "virtual" in > the page struct entry. Heads up: ->virtual may be eliminated in 2.5, except for config_highmem. > [...] > > > I'm unsure why you need this many levels of indirection ... if a > > clump is physically contiguous, why do you need to divide it into > > chunks? Is a chunk just the smallest unit that can be taken on or > > off line in a hot-swap operation? > > Chunks are present primarily for kern_addr_valid(). Since clumps > may be only partially present, we needs something that has > finer grain resolution. The CHUNK is the unit that is completely > present OR completely absent. When you add/remove chunks, do they appear at aribitrary physical addresses? -- Daniel |
From: Kanoj S. <kan...@ya...> - 2002-04-11 21:46:05
|
--- Daniel Phillips <phi...@bo...> wrote: > (sorry for the accidental reposting of the previous > message) > > On April 9, 2002 05:02 pm, Jack Steiner wrote: > > > One of the really nice things about the way > Daniel does this is that > > > it hides the complexities of discontiguous > memory from the mem > > > allocation code, which is the heavily used path. > You only have to > > > worry about it on setup, or memory hotplug. > > > > FWIW, Discontig has no changes in the memory > allocator. There are a few minor > > changes in the boot allocator, though. > > You mean your incarnation of disconfigmem has no > changes in the > alloc_pages path, right? Because the incumbent > version adds _alloc_pages, > which is a major change, and a broken one at that, > since round robin > allocation makes no sense in the context of non-numa > discontigmem usage. > It depends ... if you can round robin allocations, you might be able to prevent any single chunk from going below the low water mark that triggers page reclamation, simply because pages get freed up after a while. Of course, this depends on how you are doing page reclamation. The idea should be to enable architectures to use a single kswapd for all chunks, or have 1 per chunk, or some other combination. Basically, whatever is ideal for the architecture. Kanoj __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ |
From: Daniel P. <phi...@bo...> - 2002-04-12 00:18:08
|
On April 11, 2002 11:46 pm, Kanoj Sarcar wrote: > > You mean your incarnation of disconfigmem has no > > changes in the > > alloc_pages path, right? Because the incumbent > > version adds _alloc_pages, > > which is a major change, and a broken one at that, > > since round robin > > allocation makes no sense in the context of non-numa > > discontigmem usage. > > > > It depends ... if you can round robin allocations, > you might be able to prevent any single chunk from > going below the low water mark that triggers page > reclamation, simply because pages get freed up after > a while. You might not hurt too badly if the discontig chunks are not too different in size, but it will never be as good as allocating all from one unified space. The only valid factor that determines whether an allocation space should be unified or partitioned is: does the *user* of the memory need to give different interpretations to different allocation spaces? E.g, highmem vs zone_normal; one numa node vs another. > Of course, this depends on how you are doing page > reclamation. The idea should be to enable > architectures to use a single kswapd for all > chunks, or have 1 per chunk, or some other > combination. > Basically, whatever is ideal for the architecture. Hmm, I think there's are some basic principles at work here and that this corner of the kernel can be much more consistent across architectures than it is now. -- Daniel |
From: Kanoj S. <kan...@ya...> - 2002-04-12 00:36:19
|
--- Daniel Phillips <phi...@bo...> wrote: > On April 11, 2002 11:46 pm, Kanoj Sarcar wrote: > > > You mean your incarnation of disconfigmem has no > > > changes in the > > > alloc_pages path, right? Because the incumbent > > > version adds _alloc_pages, > > > which is a major change, and a broken one at > that, > > > since round robin > > > allocation makes no sense in the context of > non-numa > > > discontigmem usage. > > > > > > > It depends ... if you can round robin allocations, > > you might be able to prevent any single chunk from > > going below the low water mark that triggers page > > reclamation, simply because pages get freed up > after > > a while. > > You might not hurt too badly if the discontig chunks > are not too different > in size, but it will never be as good as allocating > all from one unified > space. Take the example of 5 chunks with 20 pages each in them on a non numa discontig machine. Say there is a user which asks for 19 pages, maybe one or more at a time, uses the 19 pages for a while, then releases all of them. If your strategy is to allocate from chunk 0 all the time till it runs low, then you will pressurize chunk 0 to go thru page reclamation, before moving to chunk 1. OTOH, if you round-robin'ed, you would never go thru any presurrization. > > The only valid factor that determines whether an > allocation space should > be unified or partitioned is: does the *user* of the > memory need to give > different interpretations to different allocation > spaces? E.g, highmem vs > zone_normal; one numa node vs another. This is definitely one factor that determines which chunk to allocate from (and you can optimize on x86 for highmem-ok allocations to start from the last chunk, and directmem-only allocations not to search higher chunks ... I think there might be some patches on oss.sgi.com/projects/numa from IBM's Hubertus Franke about this), but 64 bit platforms don't have this highmem issue. What I am trying to stress is that even non numa discontiguous machines might have a reason to need different policies for how to ripple thru chunks during page allocation. For numa machines, it goes without saying that user might want to specify allocation search order. Kanoj > > > Of course, this depends on how you are doing page > > reclamation. The idea should be to enable > > architectures to use a single kswapd for all > > chunks, or have 1 per chunk, or some other > > combination. > > Basically, whatever is ideal for the architecture. > > Hmm, I think there's are some basic principles at > work here and that this > corner of the kernel can be much more consistent > across architectures than > it is now. > > -- > Daniel > __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ |
From: Matthew W. <wi...@fc...> - 2002-04-12 12:46:24
|
On Thu, Apr 11, 2002 at 05:36:17PM -0700, Kanoj Sarcar wrote: > > --- Daniel Phillips <phi...@bo...> wrote: > > You might not hurt too badly if the discontig chunks > > are not too different > > in size, but it will never be as good as allocating > > all from one unified > > space. > > Take the example of 5 chunks with 20 pages each in > them on a non numa discontig machine. Say there is a > user which asks for 19 pages, maybe one or more at a > time, uses the 19 pages for a while, then releases all > of them. If your strategy is to allocate from chunk 0 > all the time till it runs low, then you will > pressurize > chunk 0 to go thru page reclamation, before moving to > chunk 1. OTOH, if you round-robin'ed, you would never > go thru any presurrization. But take the example of a machine with three zones, one of 3.75GB, one of 2GB and one of 256MB. With round-robin, you start to swap after allocating 0.75GB of ram. The current _alloc_pages code is crap, as I've said before. -- It's always legal to use Linux (TM) systems http://www.gnu.org/philosophy/why-free.html |
From: Kanoj S. <kan...@ya...> - 2002-04-12 17:21:17
|
--- Matthew Wilcox <wi...@fc...> wrote: > On Thu, Apr 11, 2002 at 05:36:17PM -0700, Kanoj > Sarcar wrote: > > > > --- Daniel Phillips <phi...@bo...> > wrote: > > > You might not hurt too badly if the discontig > chunks > > > are not too different > > > in size, but it will never be as good as > allocating > > > all from one unified > > > space. > > > > Take the example of 5 chunks with 20 pages each in > > them on a non numa discontig machine. Say there is > a > > user which asks for 19 pages, maybe one or more at > a > > time, uses the 19 pages for a while, then releases > all > > of them. If your strategy is to allocate from > chunk 0 > > all the time till it runs low, then you will > > pressurize > > chunk 0 to go thru page reclamation, before moving > to > > chunk 1. OTOH, if you round-robin'ed, you would > never > > go thru any presurrization. > > But take the example of a machine with three zones, > one of 3.75GB, one > of 2GB and one of 256MB. With round-robin, you > start to swap after > allocating 0.75GB of ram. The current _alloc_pages > code is crap, as > I've said before. Look at Hubertus' patch on oss.sgi.com/projects/numa that I pointed to before, it applies intelligence in deciding which is the "best" chunk to allocate from. Although it was written for x86 numa q with other highmem/dma complications too. There will be logic and counterlogic for any one scheme of allocation on the wide range of platforms that people have. Its best to have a set of policies that architectures can choose from. Hey, this isn't sounding too different from numa policies now. Kanoj > > -- > It's always legal to use Linux (TM) systems > http://www.gnu.org/philosophy/why-free.html > > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ |
From: Daniel P. <phi...@bo...> - 2002-04-12 22:42:44
|
On Friday 12 April 2002 19:21, Kanoj Sarcar wrote: > --- Matthew Wilcox <wi...@fc...> wrote: > > But take the example of a machine with three zones, > > one of 3.75GB, one > > of 2GB and one of 256MB. With round-robin, you > > start to swap after > > allocating 0.75GB of ram. The current _alloc_pages > > code is crap, as > > I've said before. > > Look at Hubertus' patch on oss.sgi.com/projects/numa > that I pointed to before, it applies intelligence in > deciding which is the "best" chunk to allocate from. > Although it was written for x86 numa q with other > highmem/dma complications too. > > There will be logic and counterlogic for any one > scheme of allocation on the wide range of platforms > that people have. Respectfully, I disagree. Nothing beats allocating from a single, homogenous space if you can do that. In this case we've managed to buy that ability and get a little change back on the purchase. > Its best to have a set of policies > that architectures can choose from. Well, I'll go with Linus on this one. If we *can* come up with a one-size-fits-all solution that's simple and optimal, and even better, cross-architecture, then that's the one we want. As opposed to the complex, hard-to-maintain one that works well but is different for every architecture, confusing, and suffers bad source-level interactions with another part of the system, i.e., numa. > Hey, this isn't sounding too different from numa policies now. > > Kanoj That's exactly why I'm trying to change it :-) -- Daniel |
From: Daniel P. <phi...@bo...> - 2002-04-11 23:47:28
|
(sorry for the accidental reposting of the previous message) On April 9, 2002 05:02 pm, Jack Steiner wrote: > > One of the really nice things about the way Daniel does this is that > > it hides the complexities of discontiguous memory from the mem > > allocation code, which is the heavily used path. You only have to > > worry about it on setup, or memory hotplug. > > FWIW, Discontig has no changes in the memory allocator. There are a few minor > changes in the boot allocator, though. You mean your incarnation of disconfigmem has no changes in the alloc_pages path, right? Because the incumbent version adds _alloc_pages, which is a major change, and a broken one at that, since round robin allocation makes no sense in the context of non-numa discontigmem usage. > The allocator is not aware of clumps or chunks. The allocator is > aware only of nodes. Each node has a single contiguous memmap structure > that contains all the pages on the node. Yes, this is also what I set out to achieve. > Consecutive entries in the memmap array are not necessarily > consecutive physical pages. The "holes" caused by non-contiguous > clumps are not reflected in the memmap array. > > This works because: > > - page_to_phys() uses page_address() which uses "virtual" in > the page struct entry. Heads up: ->virtual may be eliminated in 2.5, except for config_highmem. > [...] > > > I'm unsure why you need this many levels of indirection ... if a > > clump is physically contiguous, why do you need to divide it into > > chunks? Is a chunk just the smallest unit that can be taken on or > > off line in a hot-swap operation? > > Chunks are present primarily for kern_addr_valid(). Since clumps > may be only partially present, we needs something that has > finer grain resolution. The CHUNK is the unit that is completely > present OR completely absent. When you add/remove chunks, do they appear at aribitrary physical addresses? -- Daniel |
From: Daniel P. <phi...@bo...> - 2002-04-08 22:06:08
|
Here's a updated version of the config_nonlinear patch, with improved factoring of the primary functions: unsigned long logical_to_phys(unsigned long p) unsigned long phys_to_logical(unsigned long p) unsigned long ordinal_to_phys(unsigned long n) unsigned long phys_to_ordinal(unsigned long p) which have the following simple definitions when config_nonlinear is not defined: #define logical_to_phys(p) (p) #define phys_to_logical(p) (p) #define ordinal_to_phys(n) ((n) << PAGE_SHIFT) #define phys_to_ordinal(p) ((p) >> PAGE_SHIFT) and otherwise, each is a table translation. I am trying the terminology 'ordinal' in this patch as a substitute for 'pagenum', so we have logical and ordinal, related by PAGE_SHIFT. I've adopted the tradition abbreviations 'phys' and 'virt' for physical and virtual, which makes the patch smaller, but otherwise I don't like those names much - I'd rather write 'physical_to_virtual' than 'phys_to_virt'. That's just me though, I'd appreciate opinions. I've had some time now to think about how this patch relates to config_numa, and my feeling is, it's orthogonal. It is not a lower layer for numa. The practical implication is that config_discontigmem can be removed entirely and config_numa can be written in the way that is best for numa support, and be freed from the necessity to support the non-numa discontig usage in certain architectures, for which the config_nonlinear approach is better in every way. For 32 bit numa, config_nonlinear is needed in order to provide some zone_normal memory on every node, mapped to the local memory of that node. In this case, the combination of config_nonlinear and config_numa could be optimized to avoid extra table lookups in some cases. At this point, I have not thought deeply about the details. --- ../2.4.17.uml.clean/arch/um/config.in Mon Mar 25 17:27:25 2002 +++ ./arch/um/config.in Fri Apr 5 10:30:08 2002 @@ -36,6 +36,7 @@ bool '2G/2G host address space split' CONFIG_HOST_2G_2G bool 'Symmetric multi-processing support' CONFIG_UML_SMP define_bool CONFIG_SMP $CONFIG_UML_SMP +bool 'Support for nonlinear physical memory' CONFIG_NONLINEAR string 'Default main console channel initialization' CONFIG_CON_ZERO_CHAN \ "fd:0,fd:1" string 'Default console channel initialization' CONFIG_CON_CHAN "xterm" --- ../2.4.17.uml.clean/arch/um/kernel/process_kern.c Mon Mar 25 17:27:26 2002 +++ ./arch/um/kernel/process_kern.c Sat Apr 6 11:07:55 2002 @@ -501,12 +501,8 @@ #ifdef CONFIG_SMP return("(Unknown)"); #else - unsigned long addr; - - if((addr = um_virt_to_phys(current, - current->mm->arg_start)) == 0xffffffff) - return("(Unknown)"); - else return((char *) addr); + unsigned long addr = um_virt_to_phys(current, current->mm->arg_start); + return addr == 0xffffffff? "(Unknown)": phys_to_virt(addr); #endif } --- ../2.4.17.uml.clean/arch/um/kernel/um_arch.c Mon Mar 25 17:27:27 2002 +++ ./arch/um/kernel/um_arch.c Sat Apr 6 11:18:22 2002 @@ -270,6 +270,44 @@ extern int jail; void *brk_start; +#ifdef CONFIG_NONLINEAR +unsigned long psection[MAX_SECTIONS]; +unsigned long vsection[MAX_SECTIONS]; + +static int init_nonlinear(void) +{ + unsigned i, shift = SECTION_SHIFT - PAGE_SHIFT; + + memset(psection, -1, sizeof(psection)); + memset(vsection, -1, sizeof(vsection)); + for (i = 0; i < MAX_SECTIONS; i++) + psection[i] = (i ^ (i >= 2)) << shift; + + for (i = 0; i < MAX_SECTIONS; i++) + if (~psection[i] && psection[i] >> shift < MAX_SECTIONS) + vsection[psection[i] >> shift] = i << shift; + + return 0; +} + +static void show_nonlinear(void) +{ + int i; + printk(">>> logical section to physical num: "); + for (i = 0; i < MAX_SECTIONS; i++) printk("%lx ", psection[i]); printk("\n"); + printk(">>> physical section to logical num: "); + for (i = 0; i < MAX_SECTIONS; i++) printk("%lx ", vsection[i]); printk("\n"); +} + +#else +# ifndef nil +# define nil do { } while (0) +# endif + +#define init_nonlinear() nil +#define show_nonlinear() nil +#endif + int linux_main(int argc, char **argv) { unsigned long start_pfn, end_pfn, bootmap_size; @@ -294,6 +332,9 @@ /* Start physical memory at least 4M after the current brk */ uml_physmem = ROUND_4M(brk_start) + (1 << 22); + init_nonlinear(); + show_nonlinear(); + setup_machinename(system_utsname.machine); argv1_begin = argv[1]; @@ -322,10 +363,10 @@ setup_memory(); high_physmem = uml_physmem + physmem_size; - start_pfn = PFN_UP(__pa(uml_physmem)); - end_pfn = PFN_DOWN(__pa(high_physmem)); + start_pfn = PFN_UP(virt_to_logical(uml_physmem)); + end_pfn = PFN_DOWN(virt_to_logical(high_physmem)); bootmap_size = init_bootmem(start_pfn, end_pfn - start_pfn); - free_bootmem(__pa(uml_physmem) + bootmap_size, + free_bootmem(virt_to_logical(uml_physmem) + bootmap_size, high_physmem - uml_physmem - bootmap_size); uml_postsetup(); --- ../2.4.17.uml.clean/drivers/char/mem.c Fri Dec 21 18:41:54 2001 +++ ./drivers/char/mem.c Sat Apr 6 11:43:52 2002 @@ -79,7 +79,7 @@ unsigned long end_mem; ssize_t read; - end_mem = __pa(high_memory); + end_mem = virt_to_logical(high_memory); if (p >= end_mem) return 0; if (count > end_mem - p) @@ -101,7 +101,7 @@ } } #endif - if (copy_to_user(buf, __va(p), count)) + if (copy_to_user(buf, logical_to_virt(p), count)) return -EFAULT; read += count; *ppos += read; @@ -114,12 +114,12 @@ unsigned long p = *ppos; unsigned long end_mem; - end_mem = __pa(high_memory); + end_mem = virt_to_logical(high_memory); if (p >= end_mem) return 0; if (count > end_mem - p) count = end_mem - p; - return do_write_mem(file, __va(p), p, buf, count, ppos); + return do_write_mem(file, logical_to_virt(p), p, buf, count, ppos); } #ifndef pgprot_noncached @@ -178,7 +178,7 @@ test_bit(X86_FEATURE_CENTAUR_MCR, &boot_cpu_data.x86_capability) ) && addr >= __pa(high_memory); #else - return addr >= __pa(high_memory); + return addr >= virt_to_phys(high_memory); // bogosity alert!! #endif } @@ -200,7 +200,7 @@ /* * Don't dump addresses that are not real memory to a core file. */ - if (offset >= __pa(high_memory) || (file->f_flags & O_SYNC)) + if (offset >= virt_to_logical(high_memory) || (file->f_flags & O_SYNC)) vma->vm_flags |= VM_IO; if (remap_page_range(vma->vm_start, offset, vma->vm_end-vma->vm_start, --- ../2.4.17.uml.clean/fs/proc/kcore.c Fri Sep 14 01:04:43 2001 +++ ./fs/proc/kcore.c Mon Apr 8 00:06:28 2002 @@ -50,7 +50,7 @@ memset(&dump, 0, sizeof(struct user)); dump.magic = CMAGIC; - dump.u_dsize = (virt_to_phys(high_memory) >> PAGE_SHIFT); + dump.u_dsize = (logical_to_virt(high_memory) >> PAGE_SHIFT); #if defined (__i386__) || defined(__x86_64__) dump.start_code = PAGE_OFFSET; #endif @@ -58,7 +58,7 @@ dump.start_data = PAGE_OFFSET; #endif - memsize = virt_to_phys(high_memory); + memsize = virt_to_logical(high_memory); if (p >= memsize) return 0; if (count > memsize - p) @@ -239,7 +239,7 @@ phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = dataoff; phdr->p_vaddr = PAGE_OFFSET; - phdr->p_paddr = __pa(PAGE_OFFSET); + phdr->p_paddr = virt_to_phys(PAGE_OFFSET); phdr->p_filesz = phdr->p_memsz = ((unsigned long)high_memory - PAGE_OFFSET); phdr->p_align = PAGE_SIZE; @@ -256,7 +256,7 @@ phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = (size_t)m->addr - PAGE_OFFSET + dataoff; phdr->p_vaddr = (size_t)m->addr; - phdr->p_paddr = __pa(m->addr); + phdr->p_paddr = virt_to_phys(m->addr); phdr->p_filesz = phdr->p_memsz = m->size; phdr->p_align = PAGE_SIZE; } @@ -382,7 +382,7 @@ } #endif /* fill the remainder of the buffer from kernel VM space */ - start = (unsigned long)__va(*fpos - elf_buflen); + start = (unsigned long) logical_to_virt(*fpos - elf_buflen); if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) tsz = buflen; --- ../2.4.17.uml.clean/include/asm-i386/io.h Wed Mar 27 23:31:33 2002 +++ ./include/asm-i386/io.h Sat Apr 6 11:13:25 2002 @@ -60,20 +60,6 @@ #endif /* - * Change virtual addresses to physical addresses and vv. - * These are pretty trivial - */ -static inline unsigned long virt_to_phys(void *address) -{ - return __pa(address); -} - -static inline void * phys_to_virt(unsigned long address) -{ - return __va(address); -} - -/* * Change "struct page" to physical address. */ #define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT) --- ../2.4.17.uml.clean/include/asm-um/page.h Wed Mar 27 23:31:33 2002 +++ ./include/asm-um/page.h Sat Apr 6 11:12:38 2002 @@ -29,30 +29,81 @@ #endif /* __ASSEMBLY__ */ +#define __va_space (8*1024*1024) + extern unsigned long uml_physmem; +extern unsigned long max_mapnr; -#define __va_space (8*1024*1024) +static inline int VALID_PAGE(struct page *page) +{ + return page - mem_map < max_mapnr; +} -static inline unsigned long __pa(void *virt) +static inline void *logical_to_virt(unsigned long p) { - return (unsigned long) (virt) - PAGE_OFFSET; + return (void *) ((unsigned long) p + PAGE_OFFSET); } -static inline void *__va(unsigned long phys) +static inline unsigned long virt_to_logical(void *v) { - return (void *) ((unsigned long) (phys) + PAGE_OFFSET); +// assert(it's a kernel virtual); + return (unsigned long) v - PAGE_OFFSET; } -static inline struct page *virt_to_page(void *kaddr) +#ifdef CONFIG_NONLINEAR +#define MAX_SECTIONS (32) +#define SECTION_SHIFT 20 /* 1 meg sections */ +#define SECTION_MASK (~(-1 << SECTION_SHIFT)) + +extern unsigned long psection[MAX_SECTIONS]; +extern unsigned long vsection[MAX_SECTIONS]; + +static inline unsigned long logical_to_phys(unsigned long p) { - return mem_map + (__pa(kaddr) >> PAGE_SHIFT); + return (psection[p >> SECTION_SHIFT] << PAGE_SHIFT) + (p & SECTION_MASK); } -extern unsigned long max_mapnr; +static inline unsigned long phys_to_logical(unsigned long p) +{ + return (vsection[p >> SECTION_SHIFT] << PAGE_SHIFT) + (p & SECTION_MASK); +} -static inline int VALID_PAGE(struct page *page) +static inline unsigned long ordinal_to_phys(unsigned long n) { - return page - mem_map < max_mapnr; + return ( psection[n >> (SECTION_SHIFT - PAGE_SHIFT)] + + (n & (SECTION_MASK >> PAGE_SHIFT)) ) << PAGE_SHIFT; +} + +static inline unsigned long phys_to_ordinal(unsigned long p) +{ + return vsection[p >> SECTION_SHIFT] + ((p & SECTION_MASK) >> PAGE_SHIFT); +} + +#else +#define logical_to_phys(p) (p) +#define phys_to_logical(p) (p) +#define ordinal_to_phys(n) ((n) << PAGE_SHIFT) +#define phys_to_ordinal(p) ((p) >> PAGE_SHIFT) +#endif /* CONFIG_NONLINEAR */ + +static inline struct page *virt_to_page(void *v) +{ + return mem_map + (virt_to_logical(v) >> PAGE_SHIFT); +} + +static inline struct page *phys_to_page(unsigned long p) +{ + return mem_map + phys_to_ordinal(p); +} + +static inline unsigned long virt_to_phys(void *v) +{ + return logical_to_phys(virt_to_logical(v)); +} + +static inline void *phys_to_virt(unsigned long p) +{ + return logical_to_virt(phys_to_logical(p)); } #endif --- ../2.4.17.uml.clean/include/asm-um/pgtable.h Mon Mar 25 17:27:28 2002 +++ ./include/asm-um/pgtable.h Sat Apr 6 11:13:25 2002 @@ -197,9 +197,8 @@ #define page_address(page) ({ if (!(page)->virtual) BUG(); (page)->virtual; }) #define __page_address(page) ({ PAGE_OFFSET + (((page) - mem_map) << PAGE_SHIFT); }) #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT)) -#define pte_page(x) \ - (mem_map+((unsigned long)((__pa(pte_val(x)) >> PAGE_SHIFT)))) -#define pte_address(x) ((void *) ((unsigned long) pte_val(x) & PAGE_MASK)) +#define pte_page(x) (mem_map + phys_to_ordinal(pte_val(x))) +#define pte_address(x) (phys_to_virt(pte_val(x) & PAGE_MASK)) static inline pte_t pte_mknewprot(pte_t pte) { @@ -313,18 +312,17 @@ * and a page entry and page directory to the page they refer to. */ -#define mk_pte(page, pgprot) \ -({ \ - pte_t __pte; \ - \ - pte_val(__pte) = ((unsigned long) __va((page-mem_map)*(unsigned long)PAGE_SIZE + pgprot_val(pgprot))); \ - if(pte_present(__pte)) pte_mknewprot(pte_mknewpage(__pte)); \ - __pte; \ -}) +static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) { + pte_t pte; + pte_val(pte) = ordinal_to_phys(page - mem_map) + pgprot_val(pgprot); + if (pte_present(pte)) + pte_mknewprot(pte_mknewpage(pte)); + return pte; +} /* This takes a physical page address that is used by the remapping functions */ #define mk_pte_phys(physpage, pgprot) \ -({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); __pte; }) +({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); BUG(); __pte; }) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { --- ../2.4.17.uml.clean/include/linux/bootmem.h Thu Nov 22 20:47:23 2001 +++ ./include/linux/bootmem.h Sat Apr 6 11:18:19 2002 @@ -35,11 +35,11 @@ extern void __init free_bootmem (unsigned long addr, unsigned long size); extern void * __init __alloc_bootmem (unsigned long size, unsigned long align, unsigned long goal); #define alloc_bootmem(x) \ - __alloc_bootmem((x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem((x), SMP_CACHE_BYTES, virt_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low(x) \ __alloc_bootmem((x), SMP_CACHE_BYTES, 0) #define alloc_bootmem_pages(x) \ - __alloc_bootmem((x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem((x), PAGE_SIZE, virt_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low_pages(x) \ __alloc_bootmem((x), PAGE_SIZE, 0) extern unsigned long __init free_all_bootmem (void); @@ -50,9 +50,9 @@ extern unsigned long __init free_all_bootmem_node (pg_data_t *pgdat); extern void * __init __alloc_bootmem_node (pg_data_t *pgdat, unsigned long size, unsigned long align, unsigned long goal); #define alloc_bootmem_node(pgdat, x) \ - __alloc_bootmem_node((pgdat), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem_node((pgdat), (x), SMP_CACHE_BYTES, virt_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_pages_node(pgdat, x) \ - __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, virt_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low_pages_node(pgdat, x) \ __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, 0) --- ../2.4.17.uml.clean/mm/bootmem.c Fri Dec 21 18:42:04 2001 +++ ./mm/bootmem.c Sat Apr 6 11:47:26 2002 @@ -51,7 +51,7 @@ pgdat_list = pgdat; mapsize = (mapsize + (sizeof(long) - 1UL)) & ~(sizeof(long) - 1UL); - bdata->node_bootmem_map = phys_to_virt(mapstart << PAGE_SHIFT); + bdata->node_bootmem_map = logical_to_virt(mapstart << PAGE_SHIFT); bdata->node_boot_start = (start << PAGE_SHIFT); bdata->node_low_pfn = end; @@ -214,12 +214,12 @@ areasize = 0; // last_pos unchanged bdata->last_offset = offset+size; - ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset + + ret = logical_to_virt(bdata->last_pos*PAGE_SIZE + offset + bdata->node_boot_start); } else { remaining_size = size - remaining_size; areasize = (remaining_size+PAGE_SIZE-1)/PAGE_SIZE; - ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset + + ret = logical_to_virt(bdata->last_pos*PAGE_SIZE + offset + bdata->node_boot_start); bdata->last_pos = start+areasize-1; bdata->last_offset = remaining_size; @@ -228,7 +228,7 @@ } else { bdata->last_pos = start + areasize - 1; bdata->last_offset = size & ~PAGE_MASK; - ret = phys_to_virt(start * PAGE_SIZE + bdata->node_boot_start); + ret = logical_to_virt(start * PAGE_SIZE + bdata->node_boot_start); } /* * Reserve the area now: --- ../2.4.17.uml.clean/mm/memory.c Fri Dec 21 18:42:05 2001 +++ ./mm/memory.c Mon Apr 8 00:09:23 2002 @@ -806,7 +806,7 @@ pte_t oldpage; oldpage = ptep_get_and_clear(pte); - page = virt_to_page(__va(phys_addr)); + page = phys_to_page(phys_addr); if ((!VALID_PAGE(page)) || PageReserved(page)) set_pte(pte, mk_pte_phys(phys_addr, prot)); forget_pte(oldpage); --- ../2.4.17.uml.clean/mm/page_alloc.c Tue Nov 20 01:35:40 2001 +++ ./mm/page_alloc.c Sat Apr 6 11:42:43 2002 @@ -735,7 +735,7 @@ struct page *page = mem_map + offset + i; page->zone = zone; if (j != ZONE_HIGHMEM) - page->virtual = __va(zone_start_paddr); + page->virtual = logical_to_virt(zone_start_paddr); zone_start_paddr += PAGE_SIZE; } |
From: Daniel P. <phi...@bo...> - 2002-04-11 23:47:30
|
Here's a updated version of the config_nonlinear patch, with improved factoring of the primary functions: unsigned long logical_to_phys(unsigned long p) unsigned long phys_to_logical(unsigned long p) unsigned long ordinal_to_phys(unsigned long n) unsigned long phys_to_ordinal(unsigned long p) which have the following simple definitions when config_nonlinear is not defined: #define logical_to_phys(p) (p) #define phys_to_logical(p) (p) #define ordinal_to_phys(n) ((n) << PAGE_SHIFT) #define phys_to_ordinal(p) ((p) >> PAGE_SHIFT) and otherwise, each is a table translation. I am trying the terminology 'ordinal' in this patch as a substitute for 'pagenum', so we have logical and ordinal, related by PAGE_SHIFT. I've adopted the tradition abbreviations 'phys' and 'virt' for physical and virtual, which makes the patch smaller, but otherwise I don't like those names much - I'd rather write 'physical_to_virtual' than 'phys_to_virt'. That's just me though, I'd appreciate opinions. I've had some time now to think about how this patch relates to config_numa, and my feeling is, it's orthogonal. It is not a lower layer for numa. The practical implication is that config_discontigmem can be removed entirely and config_numa can be written in the way that is best for numa support, and be freed from the necessity to support the non-numa discontig usage in certain architectures, for which the config_nonlinear approach is better in every way. For 32 bit numa, config_nonlinear is needed in order to provide some zone_normal memory on every node, mapped to the local memory of that node. In this case, the combination of config_nonlinear and config_numa could be optimized to avoid extra table lookups in some cases. At this point, I have not thought deeply about the details. --- ../2.4.17.uml.clean/arch/um/config.in Mon Mar 25 17:27:25 2002 +++ ./arch/um/config.in Fri Apr 5 10:30:08 2002 @@ -36,6 +36,7 @@ bool '2G/2G host address space split' CONFIG_HOST_2G_2G bool 'Symmetric multi-processing support' CONFIG_UML_SMP define_bool CONFIG_SMP $CONFIG_UML_SMP +bool 'Support for nonlinear physical memory' CONFIG_NONLINEAR string 'Default main console channel initialization' CONFIG_CON_ZERO_CHAN \ "fd:0,fd:1" string 'Default console channel initialization' CONFIG_CON_CHAN "xterm" --- ../2.4.17.uml.clean/arch/um/kernel/process_kern.c Mon Mar 25 17:27:26 2002 +++ ./arch/um/kernel/process_kern.c Sat Apr 6 11:07:55 2002 @@ -501,12 +501,8 @@ #ifdef CONFIG_SMP return("(Unknown)"); #else - unsigned long addr; - - if((addr = um_virt_to_phys(current, - current->mm->arg_start)) == 0xffffffff) - return("(Unknown)"); - else return((char *) addr); + unsigned long addr = um_virt_to_phys(current, current->mm->arg_start); + return addr == 0xffffffff? "(Unknown)": phys_to_virt(addr); #endif } --- ../2.4.17.uml.clean/arch/um/kernel/um_arch.c Mon Mar 25 17:27:27 2002 +++ ./arch/um/kernel/um_arch.c Sat Apr 6 11:18:22 2002 @@ -270,6 +270,44 @@ extern int jail; void *brk_start; +#ifdef CONFIG_NONLINEAR +unsigned long psection[MAX_SECTIONS]; +unsigned long vsection[MAX_SECTIONS]; + +static int init_nonlinear(void) +{ + unsigned i, shift = SECTION_SHIFT - PAGE_SHIFT; + + memset(psection, -1, sizeof(psection)); + memset(vsection, -1, sizeof(vsection)); + for (i = 0; i < MAX_SECTIONS; i++) + psection[i] = (i ^ (i >= 2)) << shift; + + for (i = 0; i < MAX_SECTIONS; i++) + if (~psection[i] && psection[i] >> shift < MAX_SECTIONS) + vsection[psection[i] >> shift] = i << shift; + + return 0; +} + +static void show_nonlinear(void) +{ + int i; + printk(">>> logical section to physical num: "); + for (i = 0; i < MAX_SECTIONS; i++) printk("%lx ", psection[i]); printk("\n"); + printk(">>> physical section to logical num: "); + for (i = 0; i < MAX_SECTIONS; i++) printk("%lx ", vsection[i]); printk("\n"); +} + +#else +# ifndef nil +# define nil do { } while (0) +# endif + +#define init_nonlinear() nil +#define show_nonlinear() nil +#endif + int linux_main(int argc, char **argv) { unsigned long start_pfn, end_pfn, bootmap_size; @@ -294,6 +332,9 @@ /* Start physical memory at least 4M after the current brk */ uml_physmem = ROUND_4M(brk_start) + (1 << 22); + init_nonlinear(); + show_nonlinear(); + setup_machinename(system_utsname.machine); argv1_begin = argv[1]; @@ -322,10 +363,10 @@ setup_memory(); high_physmem = uml_physmem + physmem_size; - start_pfn = PFN_UP(__pa(uml_physmem)); - end_pfn = PFN_DOWN(__pa(high_physmem)); + start_pfn = PFN_UP(virt_to_logical(uml_physmem)); + end_pfn = PFN_DOWN(virt_to_logical(high_physmem)); bootmap_size = init_bootmem(start_pfn, end_pfn - start_pfn); - free_bootmem(__pa(uml_physmem) + bootmap_size, + free_bootmem(virt_to_logical(uml_physmem) + bootmap_size, high_physmem - uml_physmem - bootmap_size); uml_postsetup(); --- ../2.4.17.uml.clean/drivers/char/mem.c Fri Dec 21 18:41:54 2001 +++ ./drivers/char/mem.c Sat Apr 6 11:43:52 2002 @@ -79,7 +79,7 @@ unsigned long end_mem; ssize_t read; - end_mem = __pa(high_memory); + end_mem = virt_to_logical(high_memory); if (p >= end_mem) return 0; if (count > end_mem - p) @@ -101,7 +101,7 @@ } } #endif - if (copy_to_user(buf, __va(p), count)) + if (copy_to_user(buf, logical_to_virt(p), count)) return -EFAULT; read += count; *ppos += read; @@ -114,12 +114,12 @@ unsigned long p = *ppos; unsigned long end_mem; - end_mem = __pa(high_memory); + end_mem = virt_to_logical(high_memory); if (p >= end_mem) return 0; if (count > end_mem - p) count = end_mem - p; - return do_write_mem(file, __va(p), p, buf, count, ppos); + return do_write_mem(file, logical_to_virt(p), p, buf, count, ppos); } #ifndef pgprot_noncached @@ -178,7 +178,7 @@ test_bit(X86_FEATURE_CENTAUR_MCR, &boot_cpu_data.x86_capability) ) && addr >= __pa(high_memory); #else - return addr >= __pa(high_memory); + return addr >= virt_to_phys(high_memory); // bogosity alert!! #endif } @@ -200,7 +200,7 @@ /* * Don't dump addresses that are not real memory to a core file. */ - if (offset >= __pa(high_memory) || (file->f_flags & O_SYNC)) + if (offset >= virt_to_logical(high_memory) || (file->f_flags & O_SYNC)) vma->vm_flags |= VM_IO; if (remap_page_range(vma->vm_start, offset, vma->vm_end-vma->vm_start, --- ../2.4.17.uml.clean/fs/proc/kcore.c Fri Sep 14 01:04:43 2001 +++ ./fs/proc/kcore.c Mon Apr 8 00:06:28 2002 @@ -50,7 +50,7 @@ memset(&dump, 0, sizeof(struct user)); dump.magic = CMAGIC; - dump.u_dsize = (virt_to_phys(high_memory) >> PAGE_SHIFT); + dump.u_dsize = (logical_to_virt(high_memory) >> PAGE_SHIFT); #if defined (__i386__) || defined(__x86_64__) dump.start_code = PAGE_OFFSET; #endif @@ -58,7 +58,7 @@ dump.start_data = PAGE_OFFSET; #endif - memsize = virt_to_phys(high_memory); + memsize = virt_to_logical(high_memory); if (p >= memsize) return 0; if (count > memsize - p) @@ -239,7 +239,7 @@ phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = dataoff; phdr->p_vaddr = PAGE_OFFSET; - phdr->p_paddr = __pa(PAGE_OFFSET); + phdr->p_paddr = virt_to_phys(PAGE_OFFSET); phdr->p_filesz = phdr->p_memsz = ((unsigned long)high_memory - PAGE_OFFSET); phdr->p_align = PAGE_SIZE; @@ -256,7 +256,7 @@ phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = (size_t)m->addr - PAGE_OFFSET + dataoff; phdr->p_vaddr = (size_t)m->addr; - phdr->p_paddr = __pa(m->addr); + phdr->p_paddr = virt_to_phys(m->addr); phdr->p_filesz = phdr->p_memsz = m->size; phdr->p_align = PAGE_SIZE; } @@ -382,7 +382,7 @@ } #endif /* fill the remainder of the buffer from kernel VM space */ - start = (unsigned long)__va(*fpos - elf_buflen); + start = (unsigned long) logical_to_virt(*fpos - elf_buflen); if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) tsz = buflen; --- ../2.4.17.uml.clean/include/asm-i386/io.h Wed Mar 27 23:31:33 2002 +++ ./include/asm-i386/io.h Sat Apr 6 11:13:25 2002 @@ -60,20 +60,6 @@ #endif /* - * Change virtual addresses to physical addresses and vv. - * These are pretty trivial - */ -static inline unsigned long virt_to_phys(void *address) -{ - return __pa(address); -} - -static inline void * phys_to_virt(unsigned long address) -{ - return __va(address); -} - -/* * Change "struct page" to physical address. */ #define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT) --- ../2.4.17.uml.clean/include/asm-um/page.h Wed Mar 27 23:31:33 2002 +++ ./include/asm-um/page.h Sat Apr 6 11:12:38 2002 @@ -29,30 +29,81 @@ #endif /* __ASSEMBLY__ */ +#define __va_space (8*1024*1024) + extern unsigned long uml_physmem; +extern unsigned long max_mapnr; -#define __va_space (8*1024*1024) +static inline int VALID_PAGE(struct page *page) +{ + return page - mem_map < max_mapnr; +} -static inline unsigned long __pa(void *virt) +static inline void *logical_to_virt(unsigned long p) { - return (unsigned long) (virt) - PAGE_OFFSET; + return (void *) ((unsigned long) p + PAGE_OFFSET); } -static inline void *__va(unsigned long phys) +static inline unsigned long virt_to_logical(void *v) { - return (void *) ((unsigned long) (phys) + PAGE_OFFSET); +// assert(it's a kernel virtual); + return (unsigned long) v - PAGE_OFFSET; } -static inline struct page *virt_to_page(void *kaddr) +#ifdef CONFIG_NONLINEAR +#define MAX_SECTIONS (32) +#define SECTION_SHIFT 20 /* 1 meg sections */ +#define SECTION_MASK (~(-1 << SECTION_SHIFT)) + +extern unsigned long psection[MAX_SECTIONS]; +extern unsigned long vsection[MAX_SECTIONS]; + +static inline unsigned long logical_to_phys(unsigned long p) { - return mem_map + (__pa(kaddr) >> PAGE_SHIFT); + return (psection[p >> SECTION_SHIFT] << PAGE_SHIFT) + (p & SECTION_MASK); } -extern unsigned long max_mapnr; +static inline unsigned long phys_to_logical(unsigned long p) +{ + return (vsection[p >> SECTION_SHIFT] << PAGE_SHIFT) + (p & SECTION_MASK); +} -static inline int VALID_PAGE(struct page *page) +static inline unsigned long ordinal_to_phys(unsigned long n) { - return page - mem_map < max_mapnr; + return ( psection[n >> (SECTION_SHIFT - PAGE_SHIFT)] + + (n & (SECTION_MASK >> PAGE_SHIFT)) ) << PAGE_SHIFT; +} + +static inline unsigned long phys_to_ordinal(unsigned long p) +{ + return vsection[p >> SECTION_SHIFT] + ((p & SECTION_MASK) >> PAGE_SHIFT); +} + +#else +#define logical_to_phys(p) (p) +#define phys_to_logical(p) (p) +#define ordinal_to_phys(n) ((n) << PAGE_SHIFT) +#define phys_to_ordinal(p) ((p) >> PAGE_SHIFT) +#endif /* CONFIG_NONLINEAR */ + +static inline struct page *virt_to_page(void *v) +{ + return mem_map + (virt_to_logical(v) >> PAGE_SHIFT); +} + +static inline struct page *phys_to_page(unsigned long p) +{ + return mem_map + phys_to_ordinal(p); +} + +static inline unsigned long virt_to_phys(void *v) +{ + return logical_to_phys(virt_to_logical(v)); +} + +static inline void *phys_to_virt(unsigned long p) +{ + return logical_to_virt(phys_to_logical(p)); } #endif --- ../2.4.17.uml.clean/include/asm-um/pgtable.h Mon Mar 25 17:27:28 2002 +++ ./include/asm-um/pgtable.h Sat Apr 6 11:13:25 2002 @@ -197,9 +197,8 @@ #define page_address(page) ({ if (!(page)->virtual) BUG(); (page)->virtual; }) #define __page_address(page) ({ PAGE_OFFSET + (((page) - mem_map) << PAGE_SHIFT); }) #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT)) -#define pte_page(x) \ - (mem_map+((unsigned long)((__pa(pte_val(x)) >> PAGE_SHIFT)))) -#define pte_address(x) ((void *) ((unsigned long) pte_val(x) & PAGE_MASK)) +#define pte_page(x) (mem_map + phys_to_ordinal(pte_val(x))) +#define pte_address(x) (phys_to_virt(pte_val(x) & PAGE_MASK)) static inline pte_t pte_mknewprot(pte_t pte) { @@ -313,18 +312,17 @@ * and a page entry and page directory to the page they refer to. */ -#define mk_pte(page, pgprot) \ -({ \ - pte_t __pte; \ - \ - pte_val(__pte) = ((unsigned long) __va((page-mem_map)*(unsigned long)PAGE_SIZE + pgprot_val(pgprot))); \ - if(pte_present(__pte)) pte_mknewprot(pte_mknewpage(__pte)); \ - __pte; \ -}) +static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) { + pte_t pte; + pte_val(pte) = ordinal_to_phys(page - mem_map) + pgprot_val(pgprot); + if (pte_present(pte)) + pte_mknewprot(pte_mknewpage(pte)); + return pte; +} /* This takes a physical page address that is used by the remapping functions */ #define mk_pte_phys(physpage, pgprot) \ -({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); __pte; }) +({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); BUG(); __pte; }) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { --- ../2.4.17.uml.clean/include/linux/bootmem.h Thu Nov 22 20:47:23 2001 +++ ./include/linux/bootmem.h Sat Apr 6 11:18:19 2002 @@ -35,11 +35,11 @@ extern void __init free_bootmem (unsigned long addr, unsigned long size); extern void * __init __alloc_bootmem (unsigned long size, unsigned long align, unsigned long goal); #define alloc_bootmem(x) \ - __alloc_bootmem((x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem((x), SMP_CACHE_BYTES, virt_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low(x) \ __alloc_bootmem((x), SMP_CACHE_BYTES, 0) #define alloc_bootmem_pages(x) \ - __alloc_bootmem((x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem((x), PAGE_SIZE, virt_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low_pages(x) \ __alloc_bootmem((x), PAGE_SIZE, 0) extern unsigned long __init free_all_bootmem (void); @@ -50,9 +50,9 @@ extern unsigned long __init free_all_bootmem_node (pg_data_t *pgdat); extern void * __init __alloc_bootmem_node (pg_data_t *pgdat, unsigned long size, unsigned long align, unsigned long goal); #define alloc_bootmem_node(pgdat, x) \ - __alloc_bootmem_node((pgdat), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem_node((pgdat), (x), SMP_CACHE_BYTES, virt_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_pages_node(pgdat, x) \ - __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS)) + __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, virt_to_logical((void *) MAX_DMA_ADDRESS)) #define alloc_bootmem_low_pages_node(pgdat, x) \ __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, 0) --- ../2.4.17.uml.clean/mm/bootmem.c Fri Dec 21 18:42:04 2001 +++ ./mm/bootmem.c Sat Apr 6 11:47:26 2002 @@ -51,7 +51,7 @@ pgdat_list = pgdat; mapsize = (mapsize + (sizeof(long) - 1UL)) & ~(sizeof(long) - 1UL); - bdata->node_bootmem_map = phys_to_virt(mapstart << PAGE_SHIFT); + bdata->node_bootmem_map = logical_to_virt(mapstart << PAGE_SHIFT); bdata->node_boot_start = (start << PAGE_SHIFT); bdata->node_low_pfn = end; @@ -214,12 +214,12 @@ areasize = 0; // last_pos unchanged bdata->last_offset = offset+size; - ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset + + ret = logical_to_virt(bdata->last_pos*PAGE_SIZE + offset + bdata->node_boot_start); } else { remaining_size = size - remaining_size; areasize = (remaining_size+PAGE_SIZE-1)/PAGE_SIZE; - ret = phys_to_virt(bdata->last_pos*PAGE_SIZE + offset + + ret = logical_to_virt(bdata->last_pos*PAGE_SIZE + offset + bdata->node_boot_start); bdata->last_pos = start+areasize-1; bdata->last_offset = remaining_size; @@ -228,7 +228,7 @@ } else { bdata->last_pos = start + areasize - 1; bdata->last_offset = size & ~PAGE_MASK; - ret = phys_to_virt(start * PAGE_SIZE + bdata->node_boot_start); + ret = logical_to_virt(start * PAGE_SIZE + bdata->node_boot_start); } /* * Reserve the area now: --- ../2.4.17.uml.clean/mm/memory.c Fri Dec 21 18:42:05 2001 +++ ./mm/memory.c Mon Apr 8 00:09:23 2002 @@ -806,7 +806,7 @@ pte_t oldpage; oldpage = ptep_get_and_clear(pte); - page = virt_to_page(__va(phys_addr)); + page = phys_to_page(phys_addr); if ((!VALID_PAGE(page)) || PageReserved(page)) set_pte(pte, mk_pte_phys(phys_addr, prot)); forget_pte(oldpage); --- ../2.4.17.uml.clean/mm/page_alloc.c Tue Nov 20 01:35:40 2001 +++ ./mm/page_alloc.c Sat Apr 6 11:42:43 2002 @@ -735,7 +735,7 @@ struct page *page = mem_map + offset + i; page->zone = zone; if (j != ZONE_HIGHMEM) - page->virtual = __va(zone_start_paddr); + page->virtual = logical_to_virt(zone_start_paddr); zone_start_paddr += PAGE_SIZE; } |
From: Daniel P. <phi...@bo...> - 2002-04-28 16:22:19
|
Hi Jeff, I updated this patch set and repackaged it for you. Here's the original message, from lse-tech: On Thursday 11 April 2002 02:08, Daniel Phillips wrote: > Here's a updated version of the config_nonlinear patch, with improved > factoring of the primary functions: > > unsigned long logical_to_phys(unsigned long p) > unsigned long phys_to_logical(unsigned long p) > > unsigned long ordinal_to_phys(unsigned long n) > unsigned long phys_to_ordinal(unsigned long p) > > which have the following simple definitions when config_nonlinear is not > defined: > > #define logical_to_phys(p) (p) > #define phys_to_logical(p) (p) > #define ordinal_to_phys(n) ((n) << PAGE_SHIFT) > #define phys_to_ordinal(p) ((p) >> PAGE_SHIFT) > > and otherwise, each is a table translation. I am trying the terminology > 'ordinal' in this patch as a substitute for 'pagenum', so we have logical and > ordinal, related by PAGE_SHIFT. > > I've adopted the tradition abbreviations 'phys' and 'virt' for physical and > virtual, which makes the patch smaller, but otherwise I don't like those > names much - I'd rather write 'physical_to_virtual' than 'phys_to_virt'. > That's just me though, I'd appreciate opinions. > > I've had some time now to think about how this patch relates to config_numa, > and my feeling is, it's orthogonal. It is not a lower layer for numa. The > practical implication is that config_discontigmem can be removed entirely and > config_numa can be written in the way that is best for numa support, and be > freed from the necessity to support the non-numa discontig usage in certain > architectures, for which the config_nonlinear approach is better in every way. > > For 32 bit numa, config_nonlinear is needed in order to provide some > zone_normal memory on every node, mapped to the local memory of that node. > In this case, the combination of config_nonlinear and config_numa could be > optimized to avoid extra table lookups in some cases. At this point, I have > not thought deeply about the details. The final patch in this set changes uml's virtual to/from physical mapping so that every 2nd megabyte of 'physical' memory is swapped, except for the first two megabytes, which I didn't change because then I would have had to change the way the kernel program text etc is copyied at bootup. See the dump of the section table at bootup to confirm the remapping. As far as I know, everything continues to function properly with this strange mapping, though I have not exercised it heavily at this point. To be sure, there is no advantage in doing this with uml that I know of. This is a testbed for the new config_nonlinear, which is quite important for the embedded system I'm working on at the moment, and also opens new possibilities for numa. Not to mention offering a way to clean up the config_discontig mess. Patch -p1. Apply order is: early.page.generic-2.4.17 early.page.i386-2.4.17 early.page.uml-2.4.17 nonlinear.generic-2.4.17 nonlinear.uml-2.4.17 -- Daniel |