[a530bb]: doc / cmucl / internals / architecture.tex Maximize Restore History

Download this file

architecture.tex    309 lines (245 with data), 14.1 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
\part{System Architecture}% -*- Dictionary: int:design -*-
\chapter{Package and File Structure}
\section{RCS and build areas}
The CMU CL sources are maintained using RCS in a hierarchical directory
structure which supports:
\begin{itemize}
\item shared RCS config file across a build area,
\item frozen sources for multiple releases, and
\item separate system build areas for different architectures.
\end{itemize}
Since this organization maintains multiple copies of the source, it is somewhat
space intensive. But it is easy to delete and later restore a copy of the
source using RCS snapshots.
There are three major subtrees of the root \verb|/afs/cs/project/clisp|:
\begin{description}
\item[rcs] holds the RCS source (suffix \verb|,v|) files.
\item[src] holds ``checked out'' (but not locked) versions of the source files,
and is subdivided by release. Each release directory in the source tree has a
symbolic link named ``{\tt RCS}'' which points to the RCS subdirectory of the
corresponding directory in the ``{\tt rcs} tree. At top-level in a source tree
is the ``{\tt RCSconfig}'' file for that area. All subdirectories also have a
symbolic link to this RCSconfig file, allowing the configuration for an area to
be easily changed.
\item[build] compiled object files are placed in this tree, which is subdivided
by machine type and version. The CMU CL search-list mechanism is used to allow
the source files to be located in a different tree than the object files. C
programs are compiled by using the \verb|tools/dupsrcs| command to make
symbolic links to the corresponding source tree.
\end{description}
On order to modify an file in RCS, it must be checked out with a lock to
produce a writable working file. Each programmer checks out files into a
personal ``play area'' subtree of \verb|clisp/hackers|. These tree duplicate
the structure of source trees, but are normally empty except for files actively
being worked on.
See \verb|/afs/cs/project/clisp/pmax_mach/alpha/tools/| for
various tools we use for RCS hacking:
\begin{description}
\item[rcs.lisp] Hemlock (editor) commands for RCS file manipulation
\item[rcsupdate.c] Program to check out all files in a tree that have been
modified since last checkout.
\item[updates] Shell script to produce a single listing of all RCS log
entries in a tree since a date.
\item[snapshot-update.lisp] Lisp program to generate a shell script which
generates a listing of updates since a particular RCS snapshot ({\tt RCSSNAP})
file was created.
\end{description}
You can easily operate on all RCS files in a subtree using:
\begin{verbatim}
find . -follow -name '*,v' -exec <some command> {} \;
\end{verbatim}
\subsection{Configuration Management}
config files are useful, especially in combinarion with ``{\tt snapshot}''. You
can shapshot any particular version, giving an RCSconfig that designates that
configuration. You can also use config files to specify the system as of a
particular date. For example:
\begin{verbatim}
<3-jan-91
\end{verbatim}
in the the config file will cause the version as of that 3-jan-91 to be checked
out, instead of the latest version.
\subsection{RCS Branches}
Branches and named revisions are used together to allow multiple paths of
development to be supported. Each separate development has a branch, and each
branch has a name. This project uses branches in two somewhat different cases
of divergent development:
\begin{itemize}
\item For systems that we have imported from the outside, we generally assign a
``{\tt cmu}'' branch for our local modifications. When a new release comes
along, we check it in on the trunk, and then merge our branch back in.
\item For the early development and debugging of major system changes, where
the development and debugging is expected to take long enough that we wouldn't
want the trunk to be in an inconsistent state for that long.
\end{itemize}
\section{Releases}
We name releases according to the normal alpha, beta, default convention.
Alpha releases are frequent, intended primarily for internal use, and are thus
not subject to as high high documentation and configuration management
standards. Alpha releases are designated by the date on which the system was
built; the alpha releases for different systems may not be in exact
correspondence, since they are built at different times.
Beta and default releases are always based on a snapshot, ensuring that all
systems are based on the same sources. A release name is an integer and a
letter, like ``15d''. The integer is the name of the source tree which the
system was built from, and the letter represents the release from that tree:
``a'' is the first release, etc. Generally the numeric part increases when
there are major system changes, whereas changes in the letter represent
bug-fixes and minor enhancements.
\section{Source Tree Structure}
A source tree (and the master ``{\tt rcs}'' tree) has subdirectories for each
major subsystem:
\begin{description}
\item[{\tt assembly/}] Holds the CMU CL source-file assembler, and has machine
specific subdirectories holding assembly code for that architecture.
\item[{\tt clx/}] The CLX interface to the X11 window system.
\item[{\tt code/}] The Lisp code for the runtime system and standard CL
utilities.
\item[{\tt compiler/}] The Python compiler. Has architecture-specific
subdirectories which hold backends for different machines. The {\tt generic}
subdirectory holds code that is shared across most backends.
\item[{\tt hemlock/}] The Hemlock editor.
\item[{\tt lisp/}] The C runtime system code and low-level Lisp debugger.
\item[{\tt pcl/}] CMU version of the PCL implementation of CLOS.
\item[{\tt tools/}] System building command files and source management tools.
\end{description}
\section{Package structure}
Goals: with the single exception of LISP, we want to be able to export from the
package that the code lives in.
\begin{description}
\item[Mach, CLX...] --- These Implementation-dependent system-interface
packages provide direct access to specific features available in the operating
system environment, but hide details of how OS communication is done.
\item[system] contains code that must know about the operating system
environment: I/O, etc. Hides the operating system environment. Provides OS
interface extensions such as {\tt print-directory}, etc.
\item[kernel] hides state and types used for system integration: package
system, error system, streams (?), reader, printer. Also, hides the VM, in
that we don't export anything that reveals the VM interface. Contains code
that needs to use the VM and SYSTEM interface, but is independent of OS and VM
details. This code shouldn't need to be changed in any port of CMU CL, but
won't work when plopped into an arbitrary CL. Uses SYSTEM, VM, EXTENSIONS. We
export "hidden" symbols related to implementation of CL: setf-inverses,
possibly some global variables.
The boundary between KERNEL and VM is fuzzy, but this fuzziness reflects the
fuzziness in the definition of the VM. We can make the VM large, and bring
everything inside, or we make make it small. Obviously, we want the VM to be
as small as possible, subject to efficiency constraints. Pretty much all of
the code in KERNEL could be put in VM. The issue is more what VM hides from
KERNEL: VM knows about everything.
\item[lisp] Originally, this package had all the system code in it. The
current ideal is that this package should have {\it no} code in it, and only
exist to export the standard interface. Note that the name has been changed by
x3j13 to common-lisp.
\item[extensions] contains code that any random user could have written: list
operations, syntactic sugar macros. Uses only LISP, so code in EXTENSIONS is
pure CL. Exports everything defined within that is useful elsewhere. This
package doesn't hide much, so it is relatively safe for users to use
EXTENSIONS, since they aren't getting anything they couldn't have written
themselves. Contrast this to KERNEL, which exports additional operations on
CL's primitive data structures: PACKAGE-INTERNAL-SYMBOL-COUNT, etc. Although
some of the functionality exported from KERNEL could have been defined in CL,
the kernel implementation is much more efficient because it knows about
implementation internals. Currently this package contains only extensions to
CL, but in the ideal scheme of things, it should contain the implementations of
all CL functions that are in KERNEL (the library.)
\item[VM] hides information about the hardware and data structure
representations. Contains all code that knows about this sort of thing: parts
of the compiler, GC, etc. The bulk of the code is the compiler back-end.
Exports useful things that are meaningful across all implementations, such as
operations for examining compiled functions, system constants. Uses COMPILER
and whatever else it wants. Actually, there are different {\it machine}{\tt
-VM} packages for each target implementation. VM is a nickname for whatever
implementation we are currently targeting for.
\item[compiler] hides the algorithms used to map Lisp semantics onto the
operations supplied by the VM. Exports the mechanisms used for defining the
VM. All the VM-independent code in the compiler, partially hiding the compiler
intermediate representations. Uses KERNEL.
\item[eval] holds code that does direct execution of the compiler's ICR. Uses
KERNEL, COMPILER. Exports debugger interface to interpreted code.
\item[debug-internals] presents a reasonable, unified interface to
manipulation of the state of both compiled and interpreted code. (could be in
KERNEL) Uses VM, INTERPRETER, EVAL, KERNEL.
\item[debug] holds the standard debugger, and exports the debugger
\end{description}
\chapter{System Building}
It's actually rather easy to build a CMU CL core with exactly what you want in
it. But to do this you need two things: the source and a working CMU CL.
Basically, you use the working copy of CMU CL to compile the sources,
then run a process call ``genesis'' which builds a ``kernel'' core.
You then load whatever you want into this kernel core, and save it.
In the \verb|tools/| directory in the sources there are several files that
compile everything, and build cores, etc. The first step is to compile the C
startup code.
{\bf Note:} {\it the various scripts mentioned below have hard-wired paths in
them set up for our directory layout here at CMU. Anyone anywhere else will
have to edit them before they will work.}
\section{Compiling the C Startup Code}
There is a circular dependancy between lisp/internals.h and lisp/lisp.map that
causes bootstrapping problems. To the easiest way to get around this problem
is to make a fake lisp.nm file that has nothing in it by a version number:
\begin{verbatim}
% echo "Map file for lisp version 0" > lisp.nm
\end{verbatim}
and then run genesis with NIL for the list of files:
\begin{verbatim}
* (load ".../compiler/generic/new-genesis") ; compile before loading
* (lisp::genesis nil ".../lisp/lisp.nm" "/dev/null"
".../lisp/lisp.map" ".../lisp/lisp.h")
\end{verbatim}
It will generate
a whole bunch of warnings about things being undefined, but ignore
that, because it will also generate a correct lisp.h. You can then
compile lisp producing a correct lisp.map:
\begin{verbatim}
% make
\end{verbatim}
and the use \verb|tools/do-worldbuild| and \verb|tools/mk-lisp| to build
\verb|kernel.core| and \verb|lisp.core| (see section \ref[building-cores].)
\section{Compiling the Lisp Code}
The \verb|tools| directory contains various lisp and C-shell utilities for
building CMU CL:
\begin{description}
\item[compile-all*] Will compile lisp files and build a kernel core. It has
numerous command-line options to control what to compile and how. Try -help to
see a description. It runs a separate Lisp process to compile each
subsystem. Error output is generated in files with ``{\tt .log}'' extension in
the root of the build area.
\item[setup.lisp] Some lisp utilities used for compiling changed files in batch
mode and collecting the error output Sort of a crude defsystem. Loads into the
``user'' package. See {\tt with-compiler-log-file} and {\tt comf}.
\item[{\it foo}com.lisp] Each system has a ``\verb|.lisp|'' file in
\verb|tools/| which compiles that system.
\end{description}
\section{Building Core Images}
\label{building-cores}
Both the kernel and final core build are normally done using shell script
drivers:
\begin{description}
\item[do-worldbuild*] Builds a kernel core for the current machine. The
version to build is indicated by an optional argument, which defaults to
``alpha''. The \verb|kernel.core| file is written either in the \verb|lisp/|
directory in the build area, or in \verb|/usr/tmp/|. The directory which
already contains \verb|kernel.core| is chosen. You can create a dummy version
with e.g. ``touch'' to select the initial build location.
\item[mk-lisp*] Builds a full core, with conditional loading of subsystems.
The version is the first argument, which defaults to ``alpha''. Any additional
arguments are added to the \verb|*features*| list, which controls system
loading (among other things.) The \verb|lisp.core| file is written in the
current working directory.
\end{description}
These scripts load Lisp command files. When \verb|tools/worldbuild.lisp| is
loaded, it calls genesis with the correct arguments to build a kernel core.
Similarly, \verb|worldload.lisp|
builds a full core. Adding certain symbols to \verb|*features*| before
loading worldload.lisp suppresses loading of different parts of the
system. These symbols are:
\begin{description}
\item[:no-compiler] don't load the compiler.
\item[:no-clx] don't load CLX.
\item[:no-hemlock] don't load hemlock.
\item[:no-pcl] don't load PCL.
\item[:runtime] build a runtime code, implies all of the above, and then some.
\end{description}
Note: if you don't load the compiler, you can't (successfully) load the
pretty-printer or pcl. And if you compiled hemlock with CLX loaded, you can't
load it without CLX also being loaded.