Menu

Tree [bb9703] master /
 History

HTTPS access


File Date Author Commit
 example 2021-10-14 stes stes [bb9703] Initial Commit
 rules 2021-10-14 stes stes [bb9703] Initial Commit
 src 2021-10-14 stes stes [bb9703] Initial Commit
 ANNOUNCE 2021-10-14 stes stes [bb9703] Initial Commit
 CATALOG-CARD 2021-10-14 stes stes [bb9703] Initial Commit
 Makefile 2021-10-14 stes stes [bb9703] Initial Commit
 mac.me 2021-10-14 stes stes [bb9703] Initial Commit
 producer.html 2021-10-14 stes stes [bb9703] Initial Commit
 producer.me 2021-10-14 stes stes [bb9703] Initial Commit
 readme.me 2021-10-14 stes stes [bb9703] Initial Commit

Read Me

.C "Producer: Smalltalk-80 to Objective-C Translator"
.(l C
Brad J. Cox
Productivity Products International
75 Glen Road
Sandy Hook, CT 06482
(203) 426 1875.
.)l
.pp
Smalltalk-80 is a tool for turning raw concepts into working software 
prototypes. Objective-C is a tool for turning proven concepts into fast,
commercial-quality, production systems. Producer is a tool for bridging
the gap between prototyping and production by automatically translating 
Smalltalk-80 sources into Objective-C sources. The translation is guided
by a rule base in which the programmer describes how differences between 
the Smalltalk-80 prototyping environment and the Objective-C production
environment should be resolved when translating the code.
.pp
At SIGGRAPH-87, PPI will announce a library of user interface components 
from which programmers build applications with iconic user interfaces.
The library and applications built using it are portable across diverse window
systems, initially X-Windows, SunWindows and Hewlett Packard's window
system. While the Objective-C user interface classes are different from
Smalltalk's, they are similar enough that Producer can usually bridge the
differences with some hand-tuning of the translated output.  We confidently 
hope that Objective-C, this library and Producer will make automatic 
translation of Smalltalk-80 prototypes a routine part of many companies'
software development lifecycle.
.pp
I'm distributing Producer to enlist your help in testing the practicality of
this notion.

.H "Disclaimer"
.pp
Producer is not a mature software product but an embryo that could grow to
maturity someday.  Specifically it is not supported or warranteed in any way.  
It was written by myself, an individual employed by PPI, and has been released 
prior to maturity by myself as an individual with the consent of the company.
This document will make its strengths and some of its present shortcomings
clear.
.pp
However, even in its present state, Producer demonstrates that automatic
translation is technically feasible and its present implementation provides
a capable foundation on which to build. Since the market for Smalltalk-80 
translators is insufficient for PPI to pursue presently, we've released
Producer for you to make what use of it you can.
.pp
I do ask that you keep me informed of your experiences in using it in its
current state, and PPI requests that you feed back any improvements so that 
we can offer a fully supported translation product in the future. PPI retains
the copyright and all other applicable rights. For example, you may not 
sell products that contain any part of the Producer distribution without 
PPI's permission.

.H "How it works"
.pp
The following is a brief description of how Producer works internally.
This was written from my recollection of how I left the code over a
year ago. It may be inaccurate in places.
.pp
Producer is basically a compiler. It's lexical analyzer (written in lex)
divides Smalltalk-80 text into lexemes, and its parser (written in yacc)
recognizes valid lexeme sequences and constructs an abstract representation
of the program as an expression tree. The expression tree consists of 
instances of Objective-C classes; e.g. Method, Statement, Expression,
Message, and Variable. The grammar was derived from the syntax diagrams
in Goldberg and Robson; \fISmalltalk-80: The Language and its
Implementation\fP; Addison Wesley; 1986.
.pp
The grammar was extended to also recognize rules that may also appear
in the lexeme stream. Rules are enclosed in { braces } to help fend off
shift-reduce conflicts from yacc. The parser stores the rules in separate
data structures for use during code generation.
.pp
At certain points, the parser sends the top of the expression tree a 
gen message to trigger code generation\**. Recall that Smalltalk-80
is an extremely simple language with basically two components; data
references (variables, literals, etc) and messages. Rules may influence
how each case is treated during code generation.
.(f
\** I now regard this as a major architectural flaw whenever I see it in any
application. It represents a key departure from an important but often
ignored rule of object-oriented design.  The expression tree classes should
be abstract so that they could be reused in other tools. But their code 
generation methods pollute the abstraction with knowledge about a particular
concrete interface; Objective-C. The code generation methods should have been
provided in a separate hierarchy of classes that know how to connect the
abstract classes to one of many potential concrete interfaces. This rule 
is simply a generalization of the model/view/controller paradigm to apply 
to interfaces of any kind, not just user interfaces.
.)f
.pp
Code generation proceeds in two passes. The first pass collects typing
information for each symbol and message by examining the expression
tree from the bottom up. The bottom-most nodes are either literals whose
type is immediately obvious (e.g. 1, 2.3, or 'string'), or they are symbols
whose type can be known or unknown. Symbol types become known either as
the result of a previous type inferencing operation or because their type
was specified in a rule. Unknown symbols default to id when first referenced.
.pp
Most of the internal nodes are messages.  Message typing is slightly more
complicated because any message can have multiple translations depending on
how the message is used because different rules may specify different 
translations for different receiver and arguments types. The diverse
translations may each compute a different type. Since we assign types bottom
up, types have been assigned for the arguments and the receiver, so a 
translation for that selector is chosen by searching a table of possible
translations for one matching the receiver and argument types. 
.pp
In all cases, unless overridden by a specific rule, default translations
are used. These amount to a fairly literal translation from Smalltalk-80 
syntax to Objective-C syntax. However exceptions are made for Smalltalk
literal constants, which translate to C literal constants. In other words,
2+2 translates to [2 plus:2], which is \fIguaranteed\fP to fail 
catastrophically in Objective-C. The integer 2 is an object only in 
Smalltalk!
.pp
The moral: \fINever\fP believe the translator. \fIAlways\fP monitor it 
closely. Remember the 90-10 rule. The automatic translation concept is
capable, with suitable rules, of automatically translating only 90% of
an application correctly; the other 10% (where the bugs will have 
congregated) is still up to you.

.H "Implementation Status"
.pp
Producer currently represents about three man-weeks of effort, spent in two
intensive bursts separated by about a year. The most recent burst was nearly
a year and a half ago.  The first burst was to demonstrate the feasibility 
and practicality of the translation concept. The second burst was in the
course of preparing a paper that, coauthored with Kurt Schmucker, will appear
in the OOPSLA-87 proceedings. A (very) early draft is provided with this
distribution. 
.pp
For being developed so quickly, the translator does an effective job of 
translation.  I refer you to the paper for discussions of the strengths 
and limitations of the translation concept.  This section discusses the
current implementation of this concept, the items on my own must-do list 
for the planned, but not yet completed, third stage of Producer's evolution.
.np
Smalltalk-80 fileout format uses '!' delimiters in a fashion that I was
never able to formalize correctly in Producer's yacc grammar. The symptom
is that the translator will generate syntax errors in nearly every translated
file for certain of these delimiters. I'm told that fileout format has been
documented in a paper somewhere, but I've never worked the repairs back into
the code. The fix should be local to gram.y.
.np
The translator loads its rule base by reading files of rules as if they
were concatenated with the sources to be translated. The rule-specification
syntax is abysmal, primarily  because it was chosen to minimize the amount 
of time I spent struggling with shift-reduce conflicts from yacc, rather than
making the rules intelligible to users. Smalltalk's formal grammar seemed
unreasonably difficult for yacc to swallow, and I suspect the problem may
lie in some mistake I've made in translating Smalltalk-80 syntax diagrams
into yacc specifications.
.np
The program contains extensive provisions for reporting its cogitations in
type inferencing. The various error, warning, logging, and debugging messages
need to be tuned for greater utility.
.np
The code was based on an as yet unreleased libary (phylum) called "Substrate",
which supports features that are not yet in our standard product set, like
Blocks, Coroutining, and exception handling.  I made a fast editing pass
to remove any dependencies on these nonstandard library features. I also 
added a file, Substrate.h, that defines stylistic conventions that I adhere
to in all my work. See USE, IMPORT, EXPORT, etc in the sources.
.pp
The preceeding problems are superficial and easily repaired. The following
ones are somewhat more substantial in that they involve design work in 
addition to coding work.
.np
The type inferencing machinery infers types of newly-encountered (unknown)
messages and variables by seeing how they are combined with variables and
messages whose types are known apriori or else determined earlier through
inferencing.  The only types that are known apriori are literals like 1,
2.3, or 'string'. This generally provides insufficient typing information
from which to infer anything useful, so you should generally provide variable
rules to pin down types for key instance variables and method arguments
You do this with rules that state, in effect, that `the type of the Smalltalk
variable named foo is int, and the variable is called foobar in Objective-C'.
Presently rules have global scope. If different Smalltalk classes use the
name, foo, in ways that should be translated differently, different rule
sets must be provided manually to the translator. Creating and managing 
these application-specific rules sets adds to the translation effort and 
tends to make rules non-reusable across translations. The rules should be
organized with a scoping mechanism, ideally one based on inheritance.
.np
The inferencing logic is ad-hoc and quite possibly slow. However the main
bottleneck seems to be loading the rule-base; translation speed has never 
been a real problem. Inferencing is presently deductive, and a more inductive
scheme based on both forwards and backwards reasoning might produce higher
quality translations. In other words, the translation of a given message
expression is determined exclusively by whatever information can be inferred
about the types of the receiver and arguments to that message (forward 
reasoning). Backward reasoning would also consider how the results of the
expression are used in other expressions.
.np
Producer does not presently handle non-trivial uses of Blocks correctly; ie.
Block expressions that cannot be translated directly into C conditional
expressions like if, while, or for, which Producer handles just fine already.
Nearly all occurrences of Smalltalk-80 Blocks could be handled without
changing the Objective-C language by adding a trivially simple Block class
to the library. A named instance variable holds a pointer to a static function
and indexed instance variables hold \fIcopies of\fP any variables that the
block accesses in the instantiation site\**. This copy could be taken
entirely automatically by copying the instantiation site's stack frame.
However I prefer to have more control over space than that. So I've been
using a scheme that requires the programmer (and someday the compiler) to
specify which variables are really accessed by the block as arguments to
the message that instantiates the block; like this
.(C
 ... { 
	IMPORT void aStaticFunction();
    id var1 = something, var2 = something;
	aBlock = [Block function:aStaticFunction args:2, var1, var2];
	[anyObject do:aBlock];
	...
 }
 LOCAL void aStaticFunction(instantiationSiteVariables, value1, value2)
	struct { id var1, var2; } *instantiationSiteVariables;
	id value1, value2;
 {
	if ([instantiationSiteVariables->var1 someMessage])
		...
 }
.)C
.ip
The block will call the function when anyObject sends the block one of several
evaluation messages (value:arg1 or value:arg1 value:arg2 or ...). The first
argument is a \fIpointer\fP to block's copy of the instantiation site's 
variables. The trailing arguments contain the arguments that the invocation 
site passed in the value: message.  I've used this approach extensively by
writing the static functions by hand, and am trying to get our staff to 
extend the language to provide some kind of language-level support to make
the syntax simpler.  This approach could be, but has not yet been, taken by
Producer.
.(f
\** In Smalltalk-80, the block seems to have access to the instantiation
site's variables, so that the block can change variables in the instantiation
site. In Objective-C the block receives a copy of the variables and cannot
use them to communicate with the instantiation site. I believe that this is
the sole functional difference between the two schemes.
.)f
.pp
The inferencing machinery's primary current virtue is that it can be made
to work for selected test cases. It leaves lots to be desired. Call me
if you decide to extend it so that I can prevent unnecessary duplication of
effort.

.H "About the distribution"
.pp
The top level of the distribution consists of
.(C
total 88
-rw-r--r--  1 cox           181 Jun 22 14:32 Makefile
-rw-r--r--  1 cox         26592 Jun 22 14:30 README
drwxr-xr-x  2 cox           512 Jun 22 14:19 example
-rw-r--r--  1 cox           166 Jun 16 13:18 log
-rw-r--r--  1 cox           997 Jun 15 11:09 mac.me
-rw-r--r--  1 cox         26751 Jun 15 11:02 producer.me
-rw-r--r--  1 cox         21444 Jun 22 14:29 readme.me
drwxr-xr-x  2 cox           512 Jun 12 10:22 rules
drwxr-xr-x  2 cox          3072 Jun 22 14:31 src
.)C
The Makefile governs formatting of the two documents; this README (from 
readme.me) and the draft of the OOPSLA-87 paper (from Producer.me). The 
mac.me file contains text formatting macros that are common to both papers;
used like this:
.(C
nroff -me mac.me Producer.me >Producer.f
.)C
.pp
The rules directory contains a single file, generic.ru, that represents a 
first pass at an application-independent rules base. This set of rules 
translate Smalltalk to the conventions used in my prototype version of 
the user interface library.
.pp
For example, it translates Smalltalk Integer operations to C int operations,
and it translates Smalltalk Point operations to C macros that manage points 
as type PT; a pair of 16-bit coordinates in a 32-bit C int.  For example,
pt(x,y) invokes a C macro that trims and shifts two ints, x and y, to fit
side by side in a 32-bit integer, ptPlus(p,q) invokes a macro that computes 
the vector sum of two points, p and q, etc.
.(C
rules:
total 35
-rw-r--r--  1 cox         35567 Jun 12 10:22 generic.ru
.)C
.pp
The src directory contains a fragment from the video animation program that
appears at the end of the Smalltalk-80 video tape. BounceInBoxNode.st is
the Smalltalk-80 source file, animation.ru contains the application-specific
rule set, BounceInBoxNode.m is the translated version built by Producer as
invoked by Makefile\**.
.(f
\** The full source for the animation program is not provided. My copyright
paranoia argued against providing even this fragment.
.)f
.(C
example:
total 7
-rw-r--r--  1 cox          1730 Jun 16 10:24 BounceInBoxNode.m
-rw-r--r--  1 cox           868 Jun 16 10:18 BounceInBoxNode.st
-rw-r--r--  1 cox           394 Jun 16 10:20 Makefile
-rw-r--r--  1 cox          2178 Jun 16 10:18 animation.ru
-rw-r--r--  1 cox           185 Jun 16 10:24 log
-rw-r--r--  1 cox           239 Jun 16 10:18 st80.h
.)C
.pp
The log file records the results of the translation session. The syntax
error is innocuous, the result of the beforementioned problem in the grammar
in handling '!' delimiters.
.(C
Producer -c ../rules/generic.ru animation.ru BounceInBoxNode.st >BounceInBoxNode.m
error 7:BounceInBoxNode.st: tegory:'Graphics-Animation'!! : syntax error
*** Error code 1 (ignored)
.)C
.pp
The src directory contains the sources for Producer, with its own Makefile.
The Substrate.h header file, which is automatically included by the
Producer.h header file, is technically a part of a internal lower level
library, Substrate, on which Producer was originally developed. Substrate.h
was copied and changed superficially so that Producer compiles correctly
without the Substrate library.
.(C
src:
total 70
-rw-r--r--  1 cox           483 Jun 12 10:21 AbstractTranslation.m
-rw-r--r--  1 cox           282 Jun 12 10:21 ArgumentList.m
-rw-r--r--  1 cox           897 Jun 12 10:21 Block.m
-rw-r--r--  1 cox           143 Jun 12 10:21 CharConstant.m
-rw-r--r--  1 cox          2205 Jun 12 10:21 Class.m
-rw-r--r--  1 cox           630 Jun 12 10:21 Comment.m
-rw-r--r--  1 cox           176 Jun 12 10:21 Constant.m
-rw-r--r--  1 cox          2032 Jun 12 10:21 Expr.m
-rw-r--r--  1 cox          1243 Jun 12 10:21 FunctionTranslation.m
-rw-r--r--  1 cox          1484 Jun 12 10:21 Identifier.m
-rw-r--r--  1 cox          1248 Jun 12 10:21 IdentifierTranslation.m
-rw-r--r--  1 cox           105 Jun 12 10:21 List.m
-rw-r--r--  1 cox          1985 Jun 15 11:55 METHODDECLS.m
-rw-r--r--  1 cox          1384 Jun 15 11:51 Makefile
-rw-r--r--  1 cox          4302 Jun 12 10:21 Method.m
-rw-r--r--  1 cox          3136 Jun 12 10:21 Msg.m
-rw-r--r--  1 cox           583 Jun 12 10:21 MsgArgPattern.m
-rw-r--r--  1 cox           828 Jun 12 10:21 MsgNamePattern.m
-rw-r--r--  1 cox          1280 Jun 12 10:21 MsgTranslation.m
-rw-r--r--  1 cox           775 Jun 12 10:21 MsgTranslator.m
-rw-r--r--  1 cox          1868 Jun 12 10:21 Node.m
-rw-r--r--  1 cox           229 Jun 12 10:21 NumberConstant.m
-rw-r--r--  1 cox          1402 Jun 15 11:27 Producer.h
-rw-r--r--  1 cox           306 Jun 12 10:21 Return.m
-rw-r--r--  1 cox           825 Jun 12 10:21 Scope.m
-rw-r--r--  1 cox          3157 Jun 12 10:21 Selector.m
-rw-r--r--  1 cox           253 Jun 12 10:21 SelectorConstant.m
-rw-r--r--  1 cox           457 Jun 12 10:21 StArray.m
-rw-r--r--  1 cox           492 Jun 12 10:21 Stmt.m
-rw-r--r--  1 cox           381 Jun 12 10:21 StringConstant.m
-rw-r--r--  1 cox          1268 Jun 12 10:21 StringTranslation.m
-rw-r--r--  1 cox          2140 Jun 15 11:38 Substrate.h
-rw-r--r--  1 cox          1405 Jun 15 11:53 Symbol.m
-rw-r--r--  1 cox           452 Jun 12 10:21 Template.m
-rw-r--r--  1 cox           901 Jun 12 10:21 Type.m
-rw-r--r--  1 cox          1800 Jun 12 10:21 design.me
-rw-r--r--  1 cox          3271 Jun 12 10:21 gen.m
-rw-r--r--  1 cox          9007 Jun 12 10:21 gram.y
-rw-r--r--  1 cox          3601 Jun 12 10:21 lex.l
-rw-r--r--  1 cox          2212 Jun 12 10:21 main.m
-rw-r--r--  1 cox           260 Jun 12 10:21 st80.h
-rw-r--r--  1 cox           259 Jun 15 11:59 y.tab.h
.)C
.pp
The files are exactly as I left them nearly a year and a half ago, except 
for:
.np
The addition of this README document. An early draft of the OOPSLA-87
paper, sadly prior to Kurt Schmucker's improvements, is in Producer.me.
.np
One recompilation pass to remove any obvious dependencies on my private
Substrate library and to verify that Producer compiles and runs correctly 
on the standard Foundation library. I tested the changes by verifing that 
the Makefile in the example directory ran to completion, but this is hardly
an ironclad guarantee. 

.H "Using Producer"
.pp
Flags controlling the translation process, source files, and rules files
are provided on the command line and are processed in the order they appear.
The flags are\**
.(f
\** I'm working from memory about what these flags mean. Some may
be nonfunctional:
.)f
.ip -d:
Enable debugging functions (dbg()) scattered throughout the code. Seldom
useful.
.ip -m:
Enables the Objective-C Foundation library message tracing feature. Seldom
useful in Producer.
.ip -a:
Enables the Objective-C Foundation library allocation tracing feature. Seldom
useful in Producer.
.ip -l:
Enables printing of each lexical token as produced by lex. Useful only for
debugging lex.l.
.ip -g:
Enables automatic redirection of each class into a separate file based
on the class name parsed from the input file. Automatically puts class
Foobar into file Foobar.m. 
.(q
CAREFUL! This puts at risk other files whose name might coincide with
a Smalltalk-80 class name!
.)q
.ip -s:
Generate Smalltalk-80 sources in the output file as Objective-C comments
(the default).
.ip -c:
Don't generate Smalltalk-80 sources in the output file.
.ip -i:
Generate information that was thought at one time to be useful when 
debugging rules.
.ip -M:
Send storeOn: to the message rule dictionary just before terminating 
as a debugging aid.
.ip -I:
Send storeOn: to the variable rule dictionary just before terminating 
as a debugging aid.
.pp
Typically, the generic rules in rules/generic.ru is specified first, then 
any application-specific rules, then a single Smalltalk-80 source file.
Unless -g is set, the translated output appears on stdout. The various
creaks, groans and mumbles that can be elicited about the translation 
process itself appear on stderr.
.pp
For the syntax for writing new rules, refer to the examples in generic.ru
and animation.ru, and if necessary, the rules section of the grammar in gram.y. 
.pp
And good luck! Let me know how you fare...