From: Andy P. <at...@us...> - 2001-09-19 15:53:42
|
Update of /cvsroot/linux-vax/www/htdocs/docs In directory usw-pr-cvs1:/tmp/cvs-serv27950/docs Added Files: README assembler.txt cpu.txt docframe.html index.html interrupts.txt ka43-interrupts.txt memory.txt mopd-instructions.txt syscall.txt task-memory.txt xdelta.txt Log Message: Attempt to update the web server --- NEW FILE --- Last updated Jul 10, 2000 GETTING STARTED To play with this port you need the following: 1. The cross-compiler and binutils 2. The kernel sources 3. A MOP server (mopd) 4. A VAX with an ethernet card or SCSI interface Unfortunately, there are a few large downloads involved to get up and running... 1. The cross-compiler and binutils First download the following: From ftp://linux-vax.sourceforge.net/pub/linux-vax/tools/sources/ binutils-2.9.1.0.25.tar.bz2 egcs-1.1.2.tar.bz2 From ftp://linux-vax.sourceforge.net/pub/linux-vax/tools/patches/ binutils-2.9.1.0.25-20000219.patch.bz2 egcs-1.1.2-20000219.patch.bz2 From ftp://linux-vax.sourceforge.net/pub/linux-vax/tools/ build-vax.sh one-tree-vax.sh Create a new directory to unpack all this stuff in and untar the egcs and binutils tarballs, apply the patches and copy in the shell scripts: $ mkdir vax-cross $ cd vax-cross $ tar xvf --use=bzip2 DOWNLOADS/binutils-2.9.1.0.25.tar.bz2 $ tar xvf --use=bzip2 DOWNLOADS/egcs-1.1.2.tar.bz2 $ cd binutils-2.9.1.0.25.current $ patch -p1 < DOWNLOADS/binutils-2.9.1.0.25-20000219.patch $ cd ../egcs-1.1.2.current $ patch -p1 < DOWNLOADS/egcs-1.1.2-20000219.patch $ cd .. $ cp DOWNLOADS/one-tree-vax.sh DOWNLOADS/build-vax.sh . Then create the combined binutils/egcs source tree and build it: $ ./one-tree-vax.sh $ ./build-vax.sh These should complete without errors. If you get errors, something is seriously wrong and you probably won't get a correctly-installed toolchain. All object files and binaries will be created in vax-cross/b-vax-dec-linux without touching the source trees. Then install them: $ su -c './build-vax.sh install' This will create programs in /usr/local/bin prefixed with vax-dec-linux- (for example /usr/local/bin/vax-dec-linux-gcc) and directories /usr/local/vax-dec-linux and /usr/local/lib/gcc-lib/vax-dec-linux. This will not touch your current GCC installation. 2. The kernel sources Grab the sources from CVS: $ cvs -d:pserver:ano...@cv...:/cvsroot/linux-vax login (hit return at the password prompt). $ cvs -z9 -d:pserver:ano...@cv...:/cvsroot/linux-vax co kernel cd into the kernel dir created by cvs and do $ make oldconfig to create a default .config. (Don't go playing with the config, please. It will probably just break the compile.) Compile a network-bootable image by doing $ make mopboot This will generate plenty of compiler and linker warnings, but you should end up with a vmlinux.SYS file sized about 280K. If you are hacking around in arch/vax, you can do a quicker re-compile by doing $ make mopbootx which just rebuilds stuff in arch/vax and re-links the kernel. If you have your VAX and Linux machine on the same SCSI chain and you've got a scratch disk handy, you can do $ make diskboot && dd if=vmlinux.dsk of=/dev/sdX and then tell your VAX to boot from this disk. This is faster than netbooting. NOTE THAT THIS WILL DESTROY ANY FILESYSTEM AREADY ON THE DISK. YOU HAVE BEEN WARNED. 3. A MOP server (mopd) Sources at http://www.mssl.ucl.ac.uk/~atp/linux-vax/download/mopd-linux.tar.gz Compile and install. Create the directory /tftpboot/mop. mopd looks here, and here only, when searching for boot images. Create a link from /tftpboot/mop/<ether>.SYS to the vmlinux.SYS file in your development tree. <ether> is the ethernet address of your VAX in _lowercase_ with no separators. For example, mine is 08002b0db20f.SYS. In can be useful to run mopd with the -d switch to see what it receives from the network. 4. A VAX with an ethernet card or SCSI interface. As we don't really have any hardware support in yet, hardware requirements are pretty minimal: CPU Serial console 8 MB ram Ethernet card So far we've had success reports from people with the following machines: VAXstation 2000 VAXstation 3100/m30 VAXstation 3100/m76 VAXstation 3500 VAXstation II/GPX First you'll want to get your VAX to stop at the >>> console prompt at power up. There is usually a switch on the CPU board, front panel or rear panel (depending on the model) to select this. Look for a circle with a dot inside. Hook your VAX up to a standalone terminal, such a VT-series terminal or a serial port on your PC. The VAX will probably have an MMJ serial connector. I can't find a URL with the pin-out info for this guy. If you have an OS installed (e.g. VMS, Ultrix, NetBSD), it would be a good idea to take your disks offline, if your VAX has a handy way to do this. For example, the VS3500 has front panel switches to take the internal disks offline. At the >>> prompt, try B <return>, B XQA0 or B ESA0 and see if one of them tries to netboot (watch the output of mopd -d). If it looks like mopd sent over a boot image, let us know what happens. Depending on your hardware, you might get a kernel version banner and some diagnostic output. However, if we don't support your serial console hardware, you'll probably just get an error message such as 'HLT INST' and return to the >>> prompt. If this happens, do the following: >>> E PC >>> E PSL >>> E SP >>> E/V @ >>> E >>> E >>> E >>> E >>> E >>> E And send us the output. This will hopefully give us clues as to how to get your serial console supported. If your VAX has a SCSI interface and you have an external SCSI connector on your Linux box, you can connect both of them to the same SCSI bus. (Make sure the host adapters in each machine have different SCSI IDs. VAXen usually ship with the host adapter set to ID 6, PCs are usually ID 7.) Then you can copy a kernel image onto a disk on the bus and boot from there. NOTE THAT THIS WILL DESTROY ANY FILESYSTEM AREADY ON THE DISK. YOU HAVE BEEN WARNED. --- NEW FILE --- The GNU assembler in Andy's cross compilation tool set is a little different from DEC's MACRO32 assembler. This file summarises the differences. 1. #, ^ and @ become $, ` and * In VAX MACRO you might write: movl #0, 8(r5) movl #0, @8(r5) movl #0, L^8(r5) In gas, these are written: movl $0, r0 movl $0, *8(r5) movl $0, L`8(r5) 2. ^X becomes 0x Hex constants are prefixed with 0x, rather than ^x Similarly, a leading zero not followed by an x implies octal. Therefore the following instructions are equivalent: VAX MACRO: movl #64, r0 movl #^x40, r0 movl #^o100, r0 gas: movl $64, r0 movl $0x40, r0 movl $0100, r0 --- NEW FILE --- $Id: cpu.txt,v 1.1 2001/09/19 15:53:39 atp Exp $ INTRODUCTION ============ This file attempts to collate all the CPUs that we know about, how they are identified and any quirks or bugs that we need to watch for. VAX CPUs are identified with model numbers beginning with KA followed by 2 or 3 digits. Multiple DEC systems may use the same CPUs, with different surrounding hardware (and slightly different firmware in some cases), but the basic operation should be much the same. These CPUs fall into families that seem to have various codenames (such as RIGEL and MARIAH). Where possible, we will try to use the KAxx designations, rather than the codenames. A VAX CPU is identified during boot by first examining internal processor register 0x3E (PR$_SID). The high byte of this register seems to denote the processor family. The meaning of the low 3 bytes depends on the family. SUPPORTED CPUS ============== KA42 KA43 KA46 KA410 KA630 KA650 UNSUPPORTED CPUS ================ KA41 KA52 KA55 KA60 KA620 KA640 KA655 KA660 KA730 KA750 KA780 KA785 KA790 ******************************************************************************* ******************************************************************************* KA650 ===== Description: Q-22 bus single-board CPU. M-number is M7620. Based on the CVAX implementation of VAX. Sometimes called a MicroVAX III. The only I/O on the CPU itself is the console serial port. Shipped in: VAXstation 3500 Identification: PR$_SID: The high byte is 0x0A. This indicates a CVAX-based CPU. The low byte holds the microcode revision. SIDEX at 20040004: The high byte is 0x01. This seems to indicate a Qbus CPU. Bits 16 to 23 hold the firmware revision. Bits 8 to 15 contain 0x01. This means KA650. The meaning of bits 0 to 7 is unknown. Notes: The KA650's firmware is held in a pair of 27512 EPROMs. Some units shipped with firmware versions that didn't even have a HELP command. Looking at the KA650 firmware, it looks like the same firmware is used in the KA640 and KA655 as well. There are a lot of CASEx instructions that dispatch on bits 8 to 15 of SIDEX. ******************************************************************************* ******************************************************************************* KA43 ==== Description: Integrated CPU and mainboard based on the RIGEL implementation of VAX. The board also contains two NCR5380 SCSI controllers, an AMD LANCE ethernet controller and a DZ11-compatible serial controller. Maximum memory is 32MB. Online copy of the VAXstation 3100 Model 76 Owner's Guide (EK-VX31M-UG) available at http://www.whiteice.com/~williamwebb/intro/DOC-i.html. Shipped in: VAXstation 3100 Model 76 Identification: PR$_SID: The high byte is 0x0B. This seems to indicate a RIGEL-based CPU. The meaning of the low 3 bytes is unknown. SIDEX at 20040004: The meaning of the SIDEX is unknown. Notes: Sharing memory with the LANCE chip requires a bit of hackery. Physical memory is accessible from 0x00000000 to 0x01ffffff (as normal), but is also accessible via the "DIAGMEM" region of I/O space from 0x28000000 to 0x29ffffff. To prevent strange behaviour (such as memory read parity error machine checks), you _must_ read and write the memory shared with the LANCE via the DIAGMEM region. One way to do this is to kmalloc() a region for the LANCE structures and buffers and modify the PTEs for this region to OR in bits 0x00140000 in the PFN field (to make them point to the DIAGMEM region). Actually, using get_free_pages() might be a better idea, since there might be other data structures sharing pages with this region, because kmalloc() doesn't page-align. Another way is to calculate the physical addresses behind the kmalloc()ed region and ioremap() them. This has the disadvantage of using twice as many PTEs. It looks like this might be needed for DMA to the SCSI controllers as well. --- NEW FILE --- <html> <BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#009900" VLINK="#990032" ALINK="#6F7463" FONT FACE="Helvetica"> <FONT FACE="Helvetica, Arial"><FONT SIZE=+1> <h1>Documentation</h1> This area holds all sorts of documentation generated by the project, from HOWTO's to some technical discussion documents. <h2>Howtos</h2> <ul> <li><a href="mopd-instructions.txt">How to use mopd</a> <li><a href="README">How to get started hacking with Linux/VAX</a> </ul> <p> <h2>Technical</h2><p> <UL> <LI><a href="assembler.txt">Description of Assembler Syntax</a> <LI><a href="cpu.txt">Notes on CPU/System types</a> <li><a href="interrupts.txt">Interrupt handling</a> <li><a href="ka43-interrupts.txt">Worked example of interrupt decoding (KA43)</a> <li><a href="memory.txt">Memory map and discussion</a> <li><a href="syscall.txt">How syscalls work.</a> <li><a href="task-memory.txt">Task memory layout, limitations and WSMAX</a> <li><a href="xdelta.txt">Xdelta</a> </UL> <SCRIPT> <!-- document.write("<font size=-1>Last modified "+document.lastModified+"</font>"); // --> </SCRIPT> </body> </html> --- NEW FILE --- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <META NAME="GENERATOR" CONTENT="Mozilla/4.10 [en] (X11; I; Linux 2.0.36 i586) [Netscape]"> <META NAME="Author" CONTENT="atp"> <TITLE>Linux/VAX Porting Project</TITLE> </HEAD> <frameset rows="130,*" border=0> <frame src="../header.html" name=header> <frame src="docframe.html" name=body > </frameset> </HTML> --- NEW FILE --- 20000709 KPH Here's how I intend to deal with interrupt and exception dispatching. o During boot time, trap_init fills the whole SCB with stray handlers. Since the CPU might save some longwords of data on the stack after an exception, we can't just continue from one of these exceptions in the general case. (However, interrupts from devices that come through the second and subsequent pages of the SCB should be continuable.) The stray handlers might help out with autoprobing interrupts if we decide to implement probe_irq_on() and probe_irq_off(). Dammit, I hate using the term IRQ when talking about VAXen. It just seems so PC-centric... o When an interrupt (or exception) occurs and the CPU dispatches to the handler address in the SCB, the only clue we have as to the interrupt or exception number is the handler address. There is no other way to tell which interrupt happened. This implies that every interrupt or exception handler must have a unique address. o When a driver (or other code) calls request_irq(), we allocate a data structure (let's call it irqvector) that contains a struct irqaction and a little bit of in-line code. This code just pushes PC on the stack and jumps to the generic handler. (It does this by executing a JSB instruction.) This generic handler sees a stack that looks like: SP: handler_PC (inside the irqvector) (maybe) exception info saved PC saved PSL The generic handler builds the required pt_regs struct by duplicating the saved PC/PSL and saving all the other registers. This makes the stack look like: SP: saved R0 saved R1 ... saved R11 saved FP saved AP saved SP saved PC saved PSL saved R0 handler PC (inside the irqvector) (maybe) exception info saved PC saved PSL (The second saved R0 is because we need a working register in the handler code.) The generic handler then obtains the handler PC from back up the stack, then passes this PC, the addr of the pt_regs and exception info to a dispatcher function. This function is responsible for calculating the start address of the irqvector structure and calling irqaction.handler(). When control returns to the generic handler, it restores the registers, clears the stack down as far as the original saved PC and PSL and does an REI. Anyone playing around with this stuff really needs to read the Interrupts and Exceptions chapter in the VAX Architecture Reference Manual. --- NEW FILE --- $Id: ka43-interrupts.txt,v 1.1 2001/09/19 15:53:39 atp Exp $ This info was obtained by trawling through a running VMS 7.2 on a VAXstation 3100/m76 (KA43 CPU) with the System Dump Analyzer (ANALYZE/SYSTEM). First off, this is the SCB (system control block): SDA> examine exe$gl_scb EXE$GL_SCB: 81258000 "..%." SDA> examine 81258000:81258000+3fc 80BD6E09 80002491 8000A801 80002119 .!.......$...n½. 81258000 800025F8 80B723A4 80002518 80E5CFC0 ..å..%...#o.Ø%.. 81258010 80B722C0 80B723AC 80BB35B8 80B7223C <"o..5».¬#o.."o. 81258020 80BB3479 80002118 800021D0 80002308 .#...!...!..y4». 81258030 80002300 800022F8 80B724D8 80B725E0 à%o.b$o.Ø"...#.. 81258040 8000A819 8000A811 8000A809 80002118 .!.............. 81258050 80002118 80002520 80002118 8000A821 !....!.. %...!.. 81258060 80002118 80002118 80002118 80002118 .!...!...!...!.. 81258070 80BE0C00 80BD04D0 80002118 80002118 .!...!....½..... 81258080 80C4E921 80C4E621 80002118 80BD3E91 .>½..!..!æÄ.!éÄ. 81258090 80C4E639 80C4E631 80C4E629 80C4E641 AæÄ.)æÄ.1æÄ.9æÄ. 812580A0 80002118 800027B1 80002118 80002471 q$...!..±'...!.. 812580B0 80E60A9C 80E60A00 80002118 80C4E739 9çÄ..!....æ...æ. 812580C0 80002118 80002118 80002118 80002118 .!...!...!...!.. 812580D0 80002118 80002118 80002118 80002118 .!...!...!...!.. 812580E0 80C50A5D 80C50A25 80002118 80002118 .!...!..%.Å.].Å. 812580F0 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258100 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258110 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258120 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258130 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258140 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258150 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258160 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258170 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258180 80002119 80002119 80002119 80002119 .!...!...!...!.. 81258190 80002119 80002119 80002119 80002119 .!...!...!...!.. 812581A0 80002119 80002119 80002119 80002119 .!...!...!...!.. 812581B0 80002119 80002119 80002119 80002119 .!...!...!...!.. 812581C0 80002119 80002119 80002119 80002119 .!...!...!...!.. 812581D0 80002119 80002119 80002119 80002119 .!...!...!...!.. 812581E0 80002119 80002119 80002119 80002119 .!...!...!...!.. 812581F0 8000A829 8000A829 8000A829 80E642D9 .Bæ.)...)...)... 81258200 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258210 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258220 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258230 8000A829 80DBB389 80DBB351 8000A829 )...Q.......)... 81258240 8000A829 8000A829 8000A829 80D875D1 Ñub.)...)...)... 81258250 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258260 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258270 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258280 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258290 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812582A0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812582B0 8000A829 8000A829 80DC33C9 80DC3391 .3Ü.É3Ü.)...)... 812582C0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812582D0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812582E0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812582F0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258300 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258310 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258320 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258330 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258340 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258350 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258360 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258370 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258380 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 81258390 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812583A0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812583B0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812583C0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812583D0 8000A829 8000A829 8000A829 8000A829 )...)...)...)... 812583E0 80D9CF11 80DB9BD1 8000A829 8000A829 )...)...Ñ....... 812583F0 (The hex data reads right-to-left and the ASCII reads left-to-right, i.e. VMS DUMP format.) The part we're interested in here is the second page (containing the device vectors). Most of these vectors are set to 8000a829 (which corresponds to address 8000a828, and the CPU should switch to the interrupt stack): SDA> examine/instr 8000a828 UBA$UNEXINT: JMP @#MCHK+00700 UBA$UNEXINT+00006: HALT So, these are unexpected interrupts and will lead to code near the machine check handling code. The other (used) vectors are: vector addr vector number handler addr 81258200 80 80e642d8 81258244 91 80dbb350 81258248 92 80dbb388 81258250 94 80d875d0 812582c0 b0 80dc3390 812582c4 b1 80dc33c8 812583f8 fe 80db9bd0 812583fc ff 80d9cf10 These interrupt handler addresses are containing within the CRB (Channel Request Block) for the relevant device or controller. Let's chase these down. First vector 0x80: SDA> examine/instr 80e642d8:80e642d8+10 MCHK+006F8: INCL @#IO$GL_UBA_INT0 MCHK+006FE: BRB MCHK+00700 MCHK+00700: REI This just increments an interrupt counter and dismisses the interrupt. (So the unexpected interrupt handler above effectively just dismisses the interrupt.) Next vector 0x91: SDA> examine/instr/noskip 80DBB350;2 80DBB350: PUSHR #3F 80DBB352: JSB @#GABDRIVER+00942 So, this interrupt is probably handled by GABDRIVER. Let's verify this by looking at GABDRIVER's data structures: SDA> show device ga ... --- Primary Channel Request Block (CRB) 80DBB300 --- Reference count 1 Wait queue empty IDB address 80DB5640 Unit init. 80E0D589 Int. service 80E0DCC2 ADP address 80D87300 Ctrl. init. 80E0D4ED ... SDA> format 80DBB300 80DBB300 CRB$L_FQFL 00000000 ... 80DBB350 CRB$L_INTD 9F163FBB 80DBB354 80E0DCC2 GABDRIVER+00942 ... 80DBB388 CRB$L_INTD2 9F163FBB SDA> So the interrupt handler is 0x50 bytes into the CRB. Interestingly, there's another interrupt handler 0x88 bytes into the CRB as well. This corresponds to vector 0x92 in the table above. SDA> examine/instr 80DBB388;2 80DBB388: PUSHR #3F 80DBB38A: JSB @#GABDRIVER+0145C So this device uses two interrupt vectors. This means that the CRBs for the other vectors are: vector addr num handler addr CRB address driver 81258200 80 80e642d8 (no driver) 81258244 91 80dbb350 80dbb300 GABDRIVER 81258248 92 80dbb388 80dbb300 GABDRIVER 81258250 94 80d875d0 80d87580 ESDRIVER 812582c0 b0 80dc3390 80dc3340 YEDRIVER 812582c4 b1 80dc33c8 80dc3340 YEDRIVER 812583f8 fe 80db9bd0 80db9b80 PKNDRIVER (PKA) 812583fc ff 80d9cf10 80d9cec0 PKNDRIVER (PKB) GABDRIVER is the framebuffer driver, ESDRIVER is ethernet, YEDRIVER is a terminal port driver (I think) and PKNDRIVER is a SCSI port driver. So, to summarize, the KA43 device interrupts are: Framebuffer 0x91, 0x92 LANCE ethernet 0x94 DZ11 serial 0xb0, 0xb1 NCR5380 SCSI: internal 0xfe external 0xff --- NEW FILE --- $Id: memory.txt,v 1.1 2001/09/19 15:53:39 atp Exp $ ATP 20010910 Note: This is more a discussion document about the memory map on the vax architecture. For discussion of VAX memory management, and the compromises made by this port, and how that affects most people see the file task-memory.txt in this directory. 0) Terminology PAGE_OFFSET is set to be 0x80000000. So Physical Memory address 0 is mapped to Virtual address PAGE_OFFSET. This is the start of the VAX S0 segment, and limits the physical memory to 1024Mb. But, hey, find me a VAX with more than 1024Mb RAM. PAGE_SIZE. A page is 4096 bytes long. PAGELET_SIZE. A pagelet is 512 bytes long. See include/asm-vax/mm/pagelet*.h The Hardware page size on a VAX is 512 bytes. Hardware pages are called pagelets. The pagelet layer, implemented in asm-vax/mm/ and arch/vax/mm (pgalloc.c mostly), groups pages into logical pages of 4096 bytes. The rule here is; Any data structure likely to be seen by arch-independent code uses pages. Any arch-specific code may use pagelets, but its highly discouraged. There is one exception to this, which is the S0 part of the process pgd (page directory). The Linux arch independent code never goes near the S0 page table, as its unaware that it exists (thankfully). We keep the S0 base and length pair in pagelets. The P0 and P1 sections base and length registers are kept in pages, for consistency, and converted on the fly, when the registers in the PCB (process control block) are updated. The S0 section is only ever touched at boot, and becomes frozen by the time processes start. Its only ever touched by vax arch code. A page table entry (pte, type pte_t ) maps a page. Each pte is in fact a structure (struct pagecluster_t) that describes the underlying pagelet ptes for that page (hwpte, hwpte_t). Why do we have a pagelet layer? Well, its a long story, but it makes life a lot easier elsewhere. 1) Memory map. The memory map has stabilised a little. Here is what it looks like sept 2001. I feel that RPB shoud live in a well known place too. Virtual Length Description 80000000 1 page bootmap (mem_map) 80001000 1Mb-1page Free 80100000 kern_size Kernel code data and bss sections SPT_BASE SPT_SIZE Pagelet (512 bytes) aligned start of system page table. Length depends on physical memory, plus other variables - see below. iomap_base IOMAP_SIZE i/o remapping area. A set of ptes in the system page table we can use for remapping device io ports. e.g. microvax prom registers, ethernet card CSR regs. The start of this is page aligned (4096 bytes) vmallocmap_base VMALLOC_SIZE vmalloc() area. TASKPTE_START see below TASKPTE area. Stores P0 and P1 page tables for user processes. Sized at compile time. See below. TASKPTE_END max_pfn*4096 Free (May contain VMB bitmaps on the last page) 2) System page table. The system page table as far as TASKPTE_START is initialised in boot/head.S the early boot assembly code. The initialisation of iomap and vmalloc should probably move to mm/init.c. paging_init() in mm/init.c initialises the remainder, which at present is the task pte area. Once paging_init() has returned, there are no further alterations to the system page table. The following are equivalent. S0 base register: SPT_BASE, swapper_pg_dir[2].br, pg0. S0 length register: SPT_LEN, swapper_pg_dir[2].lr SPT_SIZE is the size in bytes of the SPT. The system page table must be pagelet aligned. 3) TASKPTE areas. An area must be set aside in system space to hold process page tables. This is the TASKPTE area. This is sized at kernel compile time (currently) using the variabled defined in include/asm-vax/mm/task.h The task pte area is composed of TASK_MAXUPRC "slots". Each slot is laid out like this name size description p0pmd 2 pages Fake P0 page mid level directory p1pmd 2 pages Fake P1 page mid level directory p0pte set by TASK_WSMAX P0 page table p1pte set by TASK_STKMAX P1 page table Slots are aligned to 8192 bytes. The page mid level directories are needed because the linux MM code needs to keep track of which ptes are allocated across the entire address space. Its easier to fake a page midlevel directory each entry of which is a 4 byte longword pointing at the relevant part of the page table. The TASK_WSMAX define limits how much virtual address space is allocated to the process P0 region. This is composed of two sections, the text section and the data section. The amount of address space allocated to each is defined by TASK_TXTMAX and TASK_MMAPMAX. TASK_STKMAX limits the amount of P1 space available. The need to restrict the virtual address spaces is imposed by the VAX MM hardware. Each process has potentially 1Gb P0 and 1Gb P1 space available to it. However, the allocation is not sparse, like it is on CPUs with a tree structured MMU. If a process allocates a page 200MB into its P0 space, then we must increase the P0 length register to include the pte that describes this page at 200MB. That makes all the intervening addresses in the page table from 0 to 200MB be part of the P0 page table too. (The PTEs may be invalid, or the addresses that they would occupy be used by something else, but they are there as far as the MMU is concerned). Once we have mapped all of the intervening space, we can set the page table base and length registers to the right values to point at the base of the page table, and the length in ptes, up to 200MB. In contrast on an alpha or i386 for example, one only needs to allocate a single page (plus one more for the pmd if on an alpha) and enter it into the correct slot in the pgd. Additionally, The base and length registers for a P0 page table point at a region that must be contiguous in S0 space. This makes expansion hard, as there is a very specific S0 virtual address needed to map any given address in a P0 pagetable. If that address is already occupied by something else then either you cannot expand, or you must move the other user of that virtual address. Thats not feasible. The obvious solution here is to map a P0 or P1 process page table in its entirety, from 0 to 1024Mb, into S0 space. This avoids the expansion problem. We just reserve a chunk of S0 address space for as many P0 and P1 page tables as we need. Each is located in a specific range of S0 virtual address space. We can then map in actual physical pages to hold the P0 page table ptes for addresses on an as needed basis. They just need to be mapped to specific S0 addresses. The problem with that, is that the S0 page table, which manages the S0 address space, is located in _physical_ memory. The same problems as above are in place, with the exception that specific physical addresses are needed. So if we reserve a chunk of virtual address space, then we are effectively allocating S0 ptes (sptes) that map that space. One spte maps one page of S0 address space. If we reserve enough S0 space for the page tables for one process's P0 and P1 address space (2048MB), then we are reserving 2048*1024*1024 / 4096 = 524288 pages of P0/1 space = 524288 P0/1 ptes. Each pte is 32 bytes in size. So the amount of S0 space we need to reserve to hold this page table is. 524288 * 32 = 16 Mb. 16Mb of S0 space is; 16 * 1024 * 1024 / 4096 = 4096 pages of S0 space = 4096 S0 ptes. Each pte is 32 bytes in size. So the the amount of physical memory we need to allocate to the S0 page table is; 4096*32 = 128 kb. If we allow 64 processes, then we are tying up; 64 * 128 = 8Mb So we have lost a 8 Mb of contiguous physical memory. And this is just RAM to hold the S0 page table. This does not include the allocated pages which hold the P0 page table. (Admittedly these can be any page returned by __get_free_page(), so there is no need for contiguity.) Most processes have small memory requirements, so this 8 Mb is mostly unused. Most VAXes have a small amount of RAM. For later model 3100 series between 8 and 16 Mb is not an unusual amount of RAM. Earlier systems will typically have less. We cannot afford to waste this much RAM, so we take the step of limiting the virtual address spaces to more practical values. At the time of writing the values were set like this; TASK_TXTMAX 6Mb Maximum program size TASK_MMAPMAX 58Mb Maxumum amount of address space available for allocation. TASK_STKMAX 4Mb Maximum stack size TASK_MAXUPRC 64 Maximum number of processes Which allows large programs like gcc to run with some headroom. The space taken up by the process page tables with these values is; 68 * 1024 * 1024 / 4096 = 17408 P0/1 pages = 17408 P0/1 ptes 17408 * 32 = 544 kb S0 space 544 *1024 / 4096 = 136 S0 pages = 136 S0 ptes. 136 * 32 = 4352 bytes of RAM. for 64 processes, this is = 272 Kb. Which is not that much. The S0 page table needs to be allocated in a block of contiguous physical memory, so we allocate it in its entirety right at the start of the boot process. I suppose it is theoretically possible to shift pages around and expand the S0 page table, on a running system, but I think it would be nigh on impossible to backtrace the users of a given physical page. One could swap out all the pages needed, but doing that whilst in the middle of modifying the system page table is prone to error to say the least. That just leaves the problem of shuffling things around in the S0 virtual address space to expand the process page tables. However, all the systems I know of on the VAX fix the process virtual address space in this way, or similarly, taking the lead from VMS. The actual pages allocated to hold the process page tables are done on demand, so only as much physical memory as is actually needed to hold the process PTEs is used. The PMD keeps track of which pages in the process page table are allocated (Because our PGD holds the base and length registers, amongst other things). Room for Improvement -------------------- We waste space with the pgd. We can use the TASK_xxxx macros to set default values. New values can be supplied as a kernel command line argument, so that we only need to reboot, not recompile to alter the page table sizes. We can condense the pmd down into a smaller number of pages, but this requires smarter pmd_xxx routines to emulate the missing bits of the process pmds, when linux scans the pmds. We need to eliminate the PGD_SPECIAL botch. PGD/PMD/PTE. ------------ In Linux, the pgd is the highest level division of virtual address space. For the VAX the mapping is clear, A process has 4 main sections in the 32 bit address space. P0, P1, S0 and S1, each of which is 1024Mb in size. P0 0x00000000 - 0x3fffffff "Process space" P1 0x40000000 - 0x7fffffff "Process stack space" S0 0x80000000 - 0xbfffffff "System Space" S1 0xc0000000 - 0xffffffff "Unreachable/Reserved" Each one of these has a pgd entry in a page table. Each pgd_t is a structure defined in include/asm-vax/mm/pagelet.h, which includes the base and length registers for that segment. Each page is 4096 bytes in size. Each pte is 32 bytes in size. So each page allocated to a page table holds 4096/32 = 128 ptes. Each page of ptes in a page table therefore maps; 128*4096 = 512 kb of address space. So, in order to map the whole of one segment (one pgd_t) we need 1024*1024/512 = 2048 pages of ptes in the page table. To keep track of which pages are allocated, we need to keep a PMD. Each pmd_t is a longword (4 bytes) so we need 2048 * 4 / 4096 = 2 pages per PMD. These are located at the start of the task slot. -- atp Sept. 2001. KPH 20000416 We need to decide on what the overall memory map in S0 space will look like. Here's what I think: Start Length 80000000 1MB Spare space left over from kernel load time. Will be put on the kernel's free list. 80100000 kern_size Kernel code, data and bss sections pg0 spt_size System page table. Length will be dependent on physical memory size plus some extra space for mapping I/O pages mem_map memsize*40 The mem_map array contains one entry for each physical page of ram. remainder Remaining pages are put on free list ====================================================================== KPH 20000107 (2.2.10-991101-kh5) After a little discussion with Andy, it looks like we'll create a full-size system page table (SPT) in the asm code in head.S. This SPT needs to have one entry for each physical page of memory and additional entries to do any I/O space and ROM mapping required. This page table needs to be physically contiguous. We also need to define a region for the interrupt stack. 4KB should be plenty. (Might be a good idea to put canary values at the bottom and check them periodically.) We need an SCB (system control block, contains the interrupt and exception dispatch vectors). ====================================================================== KPH 19991118 (2.2.10-991101-kh2) Here's what happens with memory management during boot time: o VMB locates a region of good memory and leaves a little space for a small stack. On my VAXstation 3500, this is always physical address 0x00005000 The initial SP is 0x00005200, leaving 1 page for a stack. (If your memory has no faults, then you could grow the stack below 0x00005000, but VMB makes no guarantees about those pages.) o VMB loads the kernel image via MOP. On my machine, this is always 00005800 o VMB calls the entry point (512 bytes into the image - that's why there is a page of zeroes tagged onto the front of the MOP image) Again, on my machine, that means that 'start' in head.S gets called at 00005A00. o head.S then copies the whole loaded image up to 00100000 (1 MB). Once VM is enabled, virtual address 80100000 will be mapped to this physical address. The kernel image is linked with a base address of 80100000 (see arch/vax/vmlinux.lds). o The BSS section is filled with zeroes. o At this point, head.S jumps from somewhere near 00005A00 to the corresponding point above 00100000 (that's the jump to 'reloc' in head.S). Note that SP is still down at 00005200. o A system page table is built at physical address 00200000 (2MB). 16384 (0x4000) page table entries (PTEs) are created. Each is marked as valid and protection is set to user write. The page frame numbers (PFNs) in these PTEs are set to map the lower 8MB of physical memory. The System Base Register (SBR) and System Length Register (SLR) are loaded with 00200000 and 4000 to point to this page table. Once VM is turned on, the addresses 80000000 to 807fffff will map to the first 8MB of physical memory. But, we haven't turned on VM yet... o To enable VM and start running the kernel code in S0-space above 80100000, we need to do two things: 1. Set the MAPEN processor register to 1 2. Jump to an address in (the now valid) S0 space. However, immediately after we've set MAPEN, the PC still contains an address somewhere above 00100000. The CPU now interprets this as a virtual address in P0-space. We have to arrange for this address to be valid, otherwise we'll crash and burn... To make this address valid, we need to make a P0 page table that will be active when MAPEN is set. First we work out how many pages from the start of memory to the _VAX_start_mm code (i.e. _VAX_start_mm's page frame number, or PFN). We have a small, 8-page P0 page table that we fill with this PFN (and the 7 following PFNs). Then we load the P0 Base Register with a value that points to the correct distance _before_ our little P0 page table such that the first entry in the table maps _VAX_start_mm. For example: o _VAX_start_mm gets loaded at 00005C00 o head.S relocates it to 00100200, which is PFN 801 o Assume p0_table is at 00100280. This will be mapped by virtual address 80100280 once MAPEN is set. o We fill our little P0 page table to map PFNs 801 to 808 o We set P0BR to 80100280 - (801*4). The *4 is because a PTE is 4 bytes. P0LR is set to 809. Note that we're counting on the fact that nothing is going to refer to any address between 00000000 and 001001ff. If something does refer to an address in this range, we're in trouble because the PTEs for these addresses are not initialized correctly. o We load P1BR and P1LR with 'sensible' values to prevent the CPU from freaking out. o Next we have to fix up the addresses on the stack. Note that SP still points to somewhere below 00005200 (on my machine, anyway...). _VAX_start_mm is called via a CALLS from head.S, so there is exactly one full stack frame that needs fixing up: o The saved AP, FP and PC are incremented by 0x80000000 to point to the corresponding addresses in S0 space once VM gets turned on (remember that physical addresses 00000000 to 007fffff will be mapped by 80000000 to 807fffff). o The current SP and FP are incremented by 80000000. o R6 is loaded with the physical address of 'vreloc' in mmstart.S and incremented by 80000000 to give vreloc's soon-to-be-valid virtual address. o MAPEN is set to 1. At mentioned above, PC still contains a virtual address that is something above 00100200, but our fake P0 page table maps that to the same physical address. o We jump to vreloc's virtual address held in R6. o Job is done... return to head.S which then calls start_kernel. Some thoughts on the above: 1. On older VAXen (780-era), VMB only tries to find 64Kb of good memory. If this is still true on newer VAXen, then this won't be enough to hold the full Linux kernel. Instead, what we'll probably have to do to boot on machines with some bad memory is: o VMB loads a small boot loader which creates a system page table that maps all good memory pages (or maybe maps all pages and marks bad ones as invalid). o This boot loader then enables VM and loads the kernel proper. This isn't so nice because pulling the whole kernel across MOP means we don't have to write boot-time device drivers. 2. What happens if the kernel image is too big to fit between 00005A00 and 00100000 (i.e. is 1MB or bigger)? Well, first we'll have to relocate the kernel by starting the copy at the top and working down. Secondly, we'll have to make sure that all code that runs before the jump to 001xxxxx is at the start of the image. (Not a problem, actually... The linker script will take care of that.) 3. What about machines with more than 8MB? Or less than 8MB? What's the best place to pull the memory size and good/bad info from the RPB? Perhaps in head.S when we're building the system page table? --- NEW FILE --- mopd instructions. atp. 16th March 1999 mopd speaks the maintenance operations protocol, which is currently the easiest way of getting the test images into the microvax. The mopd I use can be downloaded from http://www.linux-vax.org/downloads/ This is the netbsd version with enhancements by Karl Maftoum. Usage; On your Linux/i386 system: as root ifconfig eth0 promisc mopd -a mopd will look for files of the form /tftpboot/mop/>yourVAXesMACaddress<.SYS e.g. [atp@mssllc /]$ ls /tftpboot/mop/ 08002b0fbba6.SYS Obviously you will need to change that to match your microvaxes MAC address. You also may wish to SET HALT 3 before you BOOT ESA0 at the console. --- NEW FILE --- $Id: syscall.txt,v 1.1 2001/09/19 15:53:39 atp Exp $ This file describes how syscalls work on the VAX. When userland wants to do a system call, calls a wrapper function in the standard way (so we get a standard call frame built on the stack). This wrapper then simply does a CHMK (change mode to kernel) instruction, specifying the number of the syscall: In file user-app.c: fd = creat(filename, mode); In libc: #define CHMK(x) __asm__("chmk %0" : : "g" (x) : ) int creat(const char *filename, mode_t mode) { CHMK(__NR_creat); } In the kernel, the exception handler for change-mode-to-kernel exceptions will get control. At this point, the stack looks like: SP: <local stack frame> struct pt_regs * (points to pt_regs further up on stack) void *excep_info (points to info pushed by hardware further up on stack ) ... struct pt_regs saved_regs ... syscall_number (pointed to by excep_info pointer above) saved PC saved PSL The saved PSL, saved PC and syscall number are pushed by the hardware when executing the CHMK instruction. The saved_regs are pushed by the common exception handler code and eventually we end up calling chmk_handler(): void chmk_handler(struct pt_regs *regs, void *excep_info) { int syscall = *(int *)excep_info; ... The next step is to collect the arguments from user-space. We cannot assume that they will be on the user stack since the app may have called creat() via a CALLG instruction. However, we do know that the AP (argument pointer register) inside creat() in libc will point to the argument list. So we pull AP out of the pt_regs structure. This will point to a standard VAX argument list, which starts with the number of arguments: AP: arg_count (should be less than 256, if not return error because userland is breaking the rules) AP+4: arg1 AP+8: arg2 ... Of course, this is all completely untrusted so we have to be careful to check all user-land accesses. We also need to copy the complete argument list to kernel space before passing them to the actual syscall function (which will do final validation). (Otherwise another user-land task could modify a pointer argument after we've verified that it points to accessible memory, but before we actually dereference it.) We try to copy the whole argument list to the kernel stack and then do a CALLS to the actual syscall handler. --- NEW FILE --- $Id: task-memory.txt,v 1.1 2001/09/19 15:53:39 atp Exp $ atp Sept 2001 For more details on the memory layout and details of the process page tables, see the memory.txt file in this directory. If you see this message in your system logs, then this file is for you; VAXMM: process 81292000 exceeded TASK_WSMAX (64MB) addr 4000000 VAXMM pte_alloc: sending SIGSEGV to process 81292000 VM: killing process as vax-dec-linux-gcc: Internal compiler error: program as got fatal signal 9 Due to the constraints of the VAX MMU, we need to decide at compile time how much virtual address space to allocate to user processes. The number of processes and the amount of memory is limited by a set of #defines in the file. include/asm-vax/mm/task.h This allows us to size the number of tasks and the amount of virtual address space each one is allowed. Those defines are; TASK_WSMAX This is the "process address space" in P0. This is normal memory. If you run out of RAM, then this is the one to pay attention to. In VMS terms this is like WSMAX. TASK_WSMAX is the sum of TASK_TXTMAX and TASK_MMAPMAX TASK_TXTMAX This is largest program that can be run. The default value is about 6Mb. (Bear in mind that the program size on disk may not reflect its size in memory, as it may have lots of debugging information and other stuff that wont be loaded as a running program. TASK_MMAPMAX This is the memory used for the mmap() system call, and hence to the malloc library routine. This is the amount of address space available for allocation by a running program. The default value is about 58Mb. If you see a warning about WSMAX being exceeded, whilst running a program, this is the one to increase. TASK_STKMAX The amount of address space in the P1 region. This is the amount of stack memory allocated to the process. The default value is 4 Mb. TASK_MAXUPRC The maximum number of user processes allowed to run at any one time. This is like BALSETCNT on VMS. The default value is 64. TASK_WSMAX = TASK_TXTMAX + TASK_MMAPMAX Decide if you want to run bigger programs (increase TXTMAX) or let the programs have more memory (MMAPMAX), or more programs (MAXUPRC). However, don't set the sizes too much larger than you need, as you will lose more RAM to the system page table (and thats unavailable for user processes) the bigger these variables are. --- NEW FILE --- KPH - 20000206 Here's how to use XDELTA as a kernel debugger on a VS3500 with VMB 5.3. 1. Edit arch/vax/boot/head.S and add a HALT instruction somewhere near the beginning and recompile. (Replacing one of the initial 4 NOPs might be useful, since that won't change the layout of the linked image.) Insert a BPT instruction where you want a breakpoint 2. Boot with a boot parameter of 20 (hex). This tells VMB to load XDELTA and trigger a breakpoint. What this actually does is make the initial SCB vectors for machine check, reserved operand, access violation, page fault, trace and breakpoint point into code in XDELTA. (XDELTA is copied to RAM along with the rest of VMB when you enter the BOOT command.) 3. Before trying to locate a boot device and load an image, VMB will stop at a breakpoint: >>>e\e\b/20 (BOOT/R5:20 XQA0) 1 brk at 000004EB 4. Type '4000/'. This will 'open' the address 4000 (which is the base of VMB's SCB). Hit Ctrl-J repeatedly to examine up as far as address 402C: 1 brk at 000004EB 4000/00000D0D 00004004/0000313C 00004008/00000D0D 0000400C/00000D0D 00004010/00000D0D 00004014/00000D0D 00004018/0000313C 0000401C/00000D0D 00004020/0000313C 00004024/0000313C 00004028/00003251 0000402C/000031F1 5. Note the values at addresses 4004 (machine check), 4018 (res opr), 4020 (accvio), 4024 (page fault), 4028 (trace) and 402C (bpt). Unfortunately, VMB clobbers these before passing control to the loaded kernel image. 6. Type ';P' and hit RETURN to continue from the breakpoint. VMB will load the kernel as normal, transfer control to it and hit the HALT you inserted in step 1. 7. Now, use the console's DEPOSIT command to set the SCB vectors recorded above to point into XDELTA again and CONTINUE: >>> D/P 4004 313C >>> D 4018 313C >>> D 4020 313C >>> D 4024 313C >>> D 4028 3251 >>> D 402C 31F1 >>> C 8. When your BPT instruction is reached, XDELTA will gain control again. Some additional notes: o There seems to be a problem with the above method. The S command (single step) causes the machine to lock up when used after step 8. Front-panel halt switch, or a BREAK from the console is required to restore life. o You can find the manual for XDELTA at the Compaq OpenVMS web site: http://www.openvms.digital.com:8000/72final/4540/4540pro.html Warning! It's pretty primitive... o The version of XDELTA in VMB 5.3 doesn't include 'instruction' mode, so you can't disassemble instructions. Since there is an EXAMINE/INSTRUCTION console command, there is code somewhere in the ROM to decode instructions. It shouldn't be too difficult to hack XDELTA to use this code. The XDELTA code itself is very simple. o The ROM-based XDELTA won't work once VM has been enabled. This limits its usefulness at present. However, the XDELTA code looks to be 100% position-independent, so the kernel should be able to copy it (either from ROM or from RAM) at boot, and hook the relevant SCB vectors up to it. (The kernel could also patch XDELTA to add support for instruction mode at this point.) |