From: Andy P. <at...@us...> - 2001-09-19 13:54:23
|
Update of /cvsroot/linux-vax/kernel/Documentation/vax In directory usw-pr-cvs1:/tmp/cvs-serv26794 Modified Files: memory.txt Added Files: task-memory.txt Log Message: Update documentation to reflect new memory layout for tasks --- NEW FILE --- $Id: task-memory.txt,v 1.1 2001/09/19 13:54:19 atp Exp $ atp Sept 2001 For more details on the memory layout and details of the process page tables, see the memory.txt file in this directory. If you see this message in your system logs, then this file is for you; VAXMM: process 81292000 exceeded TASK_WSMAX (64MB) addr 4000000 VAXMM pte_alloc: sending SIGSEGV to process 81292000 VM: killing process as vax-dec-linux-gcc: Internal compiler error: program as got fatal signal 9 Due to the constraints of the VAX MMU, we need to decide at compile time how much virtual address space to allocate to user processes. The number of processes and the amount of memory is limited by a set of #defines in the file. include/asm-vax/mm/task.h This allows us to size the number of tasks and the amount of virtual address space each one is allowed. Those defines are; TASK_WSMAX This is the "process address space" in P0. This is normal memory. If you run out of RAM, then this is the one to pay attention to. In VMS terms this is like WSMAX. TASK_WSMAX is the sum of TASK_TXTMAX and TASK_MMAPMAX TASK_TXTMAX This is largest program that can be run. The default value is about 6Mb. (Bear in mind that the program size on disk may not reflect its size in memory, as it may have lots of debugging information and other stuff that wont be loaded as a running program. TASK_MMAPMAX This is the memory used for the mmap() system call, and hence to the malloc library routine. This is the amount of address space available for allocation by a running program. The default value is about 58Mb. If you see a warning about WSMAX being exceeded, whilst running a program, this is the one to increase. TASK_STKMAX The amount of address space in the P1 region. This is the amount of stack memory allocated to the process. The default value is 4 Mb. TASK_MAXUPRC The maximum number of user processes allowed to run at any one time. This is like BALSETCNT on VMS. The default value is 64. TASK_WSMAX = TASK_TXTMAX + TASK_MMAPMAX Decide if you want to run bigger programs (increase TXTMAX) or let the programs have more memory (MMAPMAX), or more programs (MAXUPRC). However, don't set the sizes too much larger than you need, as you will lose more RAM to the system page table (and thats unavailable for user processes) the bigger these variables are. Index: memory.txt =================================================================== RCS file: /cvsroot/linux-vax/kernel/Documentation/vax/memory.txt,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- memory.txt 2000/04/16 22:07:18 1.2 +++ memory.txt 2001/09/19 13:54:19 1.3 @@ -1,4 +1,283 @@ $Id$ +ATP 20010910 + +Note: + +This is more a discussion document about the memory map on the vax +architecture. For discussion of VAX memory management, and the compromises +made by this port, and how that affects most people see the file +task-memory.txt in this directory. + +0) Terminology + + PAGE_OFFSET is set to be 0x80000000. So Physical Memory address 0 is + mapped to Virtual address PAGE_OFFSET. This is the start of the + VAX S0 segment, and limits the physical memory to 1024Mb. But, hey, + find me a VAX with more than 1024Mb RAM. + + PAGE_SIZE. A page is 4096 bytes long. + PAGELET_SIZE. A pagelet is 512 bytes long. + + See include/asm-vax/mm/pagelet*.h + + The Hardware page size on a VAX is 512 bytes. Hardware pages + are called pagelets. + + The pagelet layer, implemented in asm-vax/mm/ and arch/vax/mm + (pgalloc.c mostly), groups pages into logical pages of 4096 bytes. + + The rule here is; Any data structure likely to be seen by + arch-independent code uses pages. Any arch-specific code may use + pagelets, but its highly discouraged. There is one exception to + this, which is the S0 part of the process pgd (page directory). + The Linux arch independent code never goes near the S0 page table, + as its unaware that it exists (thankfully). We keep the S0 base + and length pair in pagelets. The P0 and P1 sections base and + length registers are kept in pages, for consistency, and converted + on the fly, when the registers in the PCB (process control block) + are updated. The S0 section is only ever touched at boot, and becomes + frozen by the time processes start. Its only ever touched by + vax arch code. + + A page table entry (pte, type pte_t ) maps a page. Each pte + is in fact a structure (struct pagecluster_t) that describes the + underlying pagelet ptes for that page (hwpte, hwpte_t). + + Why do we have a pagelet layer? Well, its a long story, but it + makes life a lot easier elsewhere. + +1) Memory map. + + The memory map has stabilised a little. Here is what it looks like + sept 2001. I feel that RPB shoud live in a well known place too. + + Virtual Length Description + 80000000 1 page bootmap (mem_map) + 80001000 1Mb-1page Free + 80100000 kern_size Kernel code data and bss sections + SPT_BASE SPT_SIZE Pagelet (512 bytes) aligned start of + system page table. + Length depends on physical memory, plus + other variables - see below. + iomap_base IOMAP_SIZE i/o remapping area. A set of ptes in the + system page table we can use for remapping + device io ports. e.g. microvax prom + registers, ethernet card CSR regs. The + start of this is page aligned (4096 bytes) + vmallocmap_base + VMALLOC_SIZE vmalloc() area. + TASKPTE_START see below TASKPTE area. Stores P0 and P1 page tables + for user processes. Sized at compile time. + See below. + + TASKPTE_END max_pfn*4096 Free (May contain VMB bitmaps on the last page) + +2) System page table. + + The system page table as far as TASKPTE_START is initialised in boot/head.S + the early boot assembly code. The initialisation of iomap and vmalloc should + probably move to mm/init.c. paging_init() in mm/init.c initialises the + remainder, which at present is the task pte area. Once paging_init() has + returned, there are no further alterations to the system page table. + + The following are equivalent. + S0 base register: SPT_BASE, swapper_pg_dir[2].br, pg0. + S0 length register: SPT_LEN, swapper_pg_dir[2].lr + + SPT_SIZE is the size in bytes of the SPT. + + The system page table must be pagelet aligned. + +3) TASKPTE areas. + + An area must be set aside in system space to hold process page tables. + This is the TASKPTE area. This is sized at kernel compile time (currently) + using the variabled defined in include/asm-vax/mm/task.h The task pte area + is composed of TASK_MAXUPRC "slots". Each slot is laid out like this + + name size description + p0pmd 2 pages Fake P0 page mid level directory + p1pmd 2 pages Fake P1 page mid level directory + p0pte set by TASK_WSMAX P0 page table + p1pte set by TASK_STKMAX P1 page table + + Slots are aligned to 8192 bytes. + The page mid level directories are needed because the linux MM code + needs to keep track of which ptes are allocated across the entire + address space. Its easier to fake a page midlevel directory each entry + of which is a 4 byte longword pointing at the relevant part of the + page table. + The TASK_WSMAX define limits how much virtual address space is allocated + to the process P0 region. This is composed of two sections, the text + section and the data section. The amount of address space allocated to + each is defined by TASK_TXTMAX and TASK_MMAPMAX. TASK_STKMAX limits + the amount of P1 space available. + + The need to restrict the virtual address spaces is imposed by the VAX + MM hardware. Each process has potentially 1Gb P0 and 1Gb P1 space + available to it. However, the allocation is not sparse, like it is + on CPUs with a tree structured MMU. If a process allocates a page + 200MB into its P0 space, then we must increase the P0 length register + to include the pte that describes this page at 200MB. That makes all the + intervening addresses in the page table from 0 to 200MB be + part of the P0 page table too. (The PTEs may be invalid, or the + addresses that they would occupy be used by something else, but + they are there as far as the MMU is concerned). + + Once we have mapped all of the intervening space, we can set the page + table base and length registers to the right values to point at the + base of the page table, and the length in ptes, up to 200MB. + + In contrast on an alpha or i386 for example, one only needs to allocate + a single page (plus one more for the pmd if on an alpha) and enter + it into the correct slot in the pgd. + + Additionally, The base and length registers for a P0 page table + point at a region that must be contiguous in S0 space. This makes + expansion hard, as there is a very specific S0 virtual address needed to + map any given address in a P0 pagetable. If that address is already + occupied by something else then either you cannot expand, or you + must move the other user of that virtual address. Thats not + feasible. + + The obvious solution here is to map a P0 or P1 process page table + in its entirety, from 0 to 1024Mb, into S0 space. This avoids + the expansion problem. We just reserve a chunk of S0 address space + for as many P0 and P1 page tables as we need. Each is located in + a specific range of S0 virtual address space. We can then map in + actual physical pages to hold the P0 page table ptes for addresses + on an as needed basis. They just need to be mapped to specific + S0 addresses. + + The problem with that, is that the S0 page table, which manages the + S0 address space, is located in _physical_ memory. The same problems + as above are in place, with the exception that specific physical + addresses are needed. So if we reserve a chunk of virtual address + space, then we are effectively allocating S0 ptes (sptes) that + map that space. One spte maps one page of S0 address space. + If we reserve enough S0 space for the page tables for one process's + P0 and P1 address space (2048MB), then we are reserving + 2048*1024*1024 / 4096 = 524288 pages of P0/1 space + = 524288 P0/1 ptes. + Each pte is 32 bytes in size. So the amount of S0 space we + need to reserve to hold this page table is. + 524288 * 32 = 16 Mb. + 16Mb of S0 space is; + 16 * 1024 * 1024 / 4096 = 4096 pages of S0 space + = 4096 S0 ptes. + Each pte is 32 bytes in size. So the the amount of + physical memory we need to allocate to the S0 page table is; + 4096*32 = 128 kb. + + If we allow 64 processes, then we are tying up; + 64 * 128 = 8Mb + So we have lost a 8 Mb of contiguous physical memory. + + And this is just RAM to hold the S0 page table. This does not + include the allocated pages which hold the P0 page table. + (Admittedly these can be any page returned by __get_free_page(), + so there is no need for contiguity.) + + Most processes have small memory requirements, so this 8 Mb is + mostly unused. Most VAXes have a small amount of RAM. For + later model 3100 series between 8 and 16 Mb is not an + unusual amount of RAM. Earlier systems will typically have + less. We cannot afford to waste this much RAM, so we take the + step of limiting the virtual address spaces to more practical + values. At the time of writing the values were set like this; + + TASK_TXTMAX 6Mb Maximum program size + TASK_MMAPMAX 58Mb Maxumum amount of address space + available for allocation. + TASK_STKMAX 4Mb Maximum stack size + TASK_MAXUPRC 64 Maximum number of processes + + Which allows large programs like gcc to run with some headroom. + + The space taken up by the process page tables with these + values is; + 68 * 1024 * 1024 / 4096 = 17408 P0/1 pages + = 17408 P0/1 ptes + 17408 * 32 = 544 kb S0 space + 544 *1024 / 4096 = 136 S0 pages + = 136 S0 ptes. + 136 * 32 = 4352 bytes of RAM. + for 64 processes, this is = 272 Kb. + + Which is not that much. The S0 page table needs to be allocated + in a block of contiguous physical memory, so we allocate it + in its entirety right at the start of the boot process. + + I suppose it is theoretically possible to shift pages around + and expand the S0 page table, on a running system, but I think + it would be nigh on impossible to backtrace the users of a + given physical page. One could swap out all the pages needed, + but doing that whilst in the middle of modifying the system + page table is prone to error to say the least. That just + leaves the problem of shuffling things around in the S0 + virtual address space to expand the process page tables. + + However, all the systems I know of on the VAX fix the process + virtual address space in this way, or similarly, taking the + lead from VMS. + + The actual pages allocated to hold the process page tables + are done on demand, so only as much physical memory as is + actually needed to hold the process PTEs is used. The PMD + keeps track of which pages in the process page table are allocated + (Because our PGD holds the base and length registers, amongst + other things). + + Room for Improvement + -------------------- + + We waste space with the pgd. + + We can use the TASK_xxxx macros to set default values. New values + can be supplied as a kernel command line argument, so that we only + need to reboot, not recompile to alter the page table sizes. + + We can condense the pmd down into a smaller number of pages, + but this requires smarter pmd_xxx routines to emulate the missing + bits of the process pmds, when linux scans the pmds. + + We need to eliminate the PGD_SPECIAL botch. + + PGD/PMD/PTE. + ------------ + + In Linux, the pgd is the highest level division of virtual address + space. For the VAX the mapping is clear, A process has 4 main + sections in the 32 bit address space. P0, P1, S0 and S1, each of which + is 1024Mb in size. + + P0 0x00000000 - 0x3fffffff "Process space" + P1 0x40000000 - 0x7fffffff "Process stack space" + S0 0x80000000 - 0xbfffffff "System Space" + S1 0xc0000000 - 0xffffffff "Unreachable/Reserved" + + Each one of these has a pgd entry in a page table. Each pgd_t is + a structure defined in include/asm-vax/mm/pagelet.h, which includes + the base and length registers for that segment. + + Each page is 4096 bytes in size. Each pte is 32 bytes in size. + So each page allocated to a page table holds + 4096/32 = 128 ptes. + + Each page of ptes in a page table therefore maps; + 128*4096 = 512 kb of address space. + + So, in order to map the whole of one segment (one pgd_t) we need + 1024*1024/512 = 2048 pages of ptes in the page table. + + To keep track of which pages are allocated, we need to keep a + PMD. Each pmd_t is a longword (4 bytes) so we need + 2048 * 4 / 4096 = 2 pages per PMD. + + These are located at the start of the task slot. + + -- atp Sept. 2001. + KPH 20000416 |