C Without C

Rob Chapman


My life began as a programmer the day I started taking heads off of my toys. It was then that I had taken an interest in the parts of the whole of things. A milestone of that young eagerness for disassembly, was my first reassembly (with a few extra parts and some lost). The next milestone was the assembly of something new, a whole of parts from which the parts were not the whole. This youthful exuberance continues on in my current life, as I read, learn, create, dissect using a toolkit which is both the whole as well as the parts.

Last year, at this conference, I did a show-and-tell about a Translator engine which I had been working on (see Translator Frameworks and Stack Verification published in the 1991 Rochester proceedings). Since then I have done little work on the engine and focused more on creating simple Translator frameworks by solving simple translation problems.

In this paper, I will focus on a framework for translating Forth into C. This includes rules for: interpreting, compiling, commenting, naming restrictions, code-data space separation and the VFM (virtual Forth machine). This paper is a snapshot of my work so far in producing a set of rules for translating Forth to C. Enjoy.


There are several reasons for undertaking this project:


Figure 1. Translation engine for translating input to output governed by a set of rules. This may be thought of as a basic language gate with two inputs (Input Stream and Translation Rules) and one output.

By defining some rules, Forth may be translated to C:

( botForth Kernel : Mar 10, 1991  Rob Chapman )
( Assume virtual Forth machine exists )
( ==== Cell ==== )
: CELL  ( -- n' )  cell ;
: CELLS  ( n -- n' )  CELL * ;

( ==== Stacks ==== )
( ==== Data stack primitives ==== )
: SWAP  ( m \ n -- n \ m )  swap ;
: DUP  ( m -- m \ m )  dup ;
: DROP  ( m -- )  drop ;
: ?DUP  ( n -- [n] \ n )  DUP  IF  DUP  ENDIF ;

Translates to:

/* botForth Kernel : Mar 10, 1991  Rob Chapman */

#include "kernel.h"

/* Virtual Forth Machinery */
unsigned int *sp,data_stack[64],*rp,return_stack[64]; /* Stacks */
unsigned int index;        /* loops */
void ***ip,**wp;    /* Threader */

/* ==== Cell ==== */

CELL()  /* -- n' */
        *sp=sizeof(unsigned int);

CELLS()  /* n -- n' */
        *sp*=sizeof(unsigned int);     /* CELL* */

/* ==== Stacks ==== */

/* ==== Data stack primitives ==== */

SWAP()  /* m \ n -- n \ m */

DUP()  /* m -- m \ m */

DROP()  /* m -- */
QUESTION_DUP()  /* n -- [n] \ n */
        if(*sp++)   /* IF */

Figure 2. An example of some Forth code which is translated to C code. The C code was produced by the translator, not by a human. Note the translation of the comment about a VFM into actual C code.


My software toolkit, which contains the botForth kernel, is the target code to translate. The translation is managed with a make file on Unix which does the necessary compiles when the rules or the kernel change. The input file is kernel.f and the rules files are c, headers and cnames. Two C files are produced: kernel.h and kernel.c. All the headers are kept in kernel.h. The VFM and all the source code for the bodies are in kernel.c. The kernel.c file may be compiled and run with an off the shelf C compiler.

From Forth to C

Currently, there are some 500 rules in 7 rule sets. Each rule set takes care of a different part of the translation. For instance, most of the time, source code is compiled, so there is a rule set dedicated to translating compilable Forth source code to compilable C source code. Some of the rule sets are discussed below.


All words become procedure calls. Most word-calls are directly translated to C procedure calls. The control structures are translated into C control structures. For example:

: CMOVE ( src \ dest \ count -- )


CMOVE() /* src \ dest \ count -- */
for(*--rp=index,index=*sp++;index;index--) /* FOR */
} /* NEXT */

All code is laid out as 1 operation to a line. This translates a horizontal style of code into a vertical style of coding. On the toolkit translation, 6 pages of Forth produce about 50 pages of C.


Stack comments are kept and appear right after a procedure is declared. Any other comments are included in the same line as the last word which was compiled. If a comment appears by itself on a line, it is assumed to be a heading for the next section of code and it is preceded by a blank line.

Naming Restrictions

In C, you can only use alphanumerics and the underbar for names and the first character must not be a number. Forth places no naming restriction on name composition except that it is hard to have a blank in a name. In translating Forth names to legal C names, the nonalphanumeric characters are replaced with their pronunciation. Sometimes, this is context dependant. For example:

>R, R> and U>




C doesn't allow code to be executed while it is compiling so this mode of Forth must be translated into something acceptable to a C compiler. In the case of building up data structures, this is translated into a C struct{}. For example:

CREATE prompt 20 ALLOT ( contains count prefixed string for prompt )


char name12[32];
/* contains count prefixed string for prompt */

*--sp=(unsigned int)&name13;

Data structures are name-numbered and then pushed onto the stack by a procedure which is given the Forth name of the data structure. When building up data structures, all the pieces are accumulated in memory and then flushed out by the next Forth definition. This allows the translator to be one pass but still pick up all the pieces from ALLOTs, ,(comma) and C,.

Code-Data Space Separation

In Forth, there is no such thing as code-space and data-space. Headers and bodies are usually contiguous in memory. In C, this is not allowed. This doesn't really create much of a problem since all the bodies of the Forth words will be translated into code and all the headers will be translated into data structures. By assuming an indirect threaded model, the inner interpreter pointer (code-fields in figForth) simply points to the body:

struct{void *link; unsigned char name[5]; void (*tick)();}\

The name of the procedure without the () leaves the address of that procedure in the data structure.

When words are ticked, the address of the inner interpreter field is pushed onto the stack. For instance:

: LITERAL ( n -- [n] ) compile @


LITERAL() /* n -- [n] */
if(*sp++) /* IF */
*--sp=(unsigned int)LIT; /* ' */


By assuming an indirect threaded model, we can freely mix the C code with code produced by the VFM compiler. The kernel has the ability to compile code (simply ,) so that it may be extended. ITC gets compiled into the data space and is interpreted by one of the four inner interpreters: INNER-VARIABLE, INNER-CONSTANT, INNER-: or INNER-DOES.

: VARIABLE ( n -- ) CREATE , ;


The output files kernel.c and kernel.h pass the compile test. This means that I am producing compilable code and there are no name translation problems. Small portions of the code have been actually run to test out parts of the VFM but the C version of the kernel has not yet been run.


Possibilities for the future include:

: FOO ( -- ) ( Forth code ) { /* C code */ } ( Forth code ) ;


The goal is to write C code without having to write or think in C. I've had enough experience at writing C code to claim to be a novice expert but I find it much easier to think and solve problems in Forth. By creating a set of rules to translate Forth to C, I am liberated to program in Forth yet be able to produce C. My boss will be happy because I am producing C code (pretty, at that!) and I'm happy because I'm programming in Forth. The virtual machine assembler is now C.

A PDF version of the original paper is also available.