Active Objects

Rob Chapman

1. Abstract

This paper overviews a method of creating relocatable Forth objects which are independent of  platform and processor.  It is based on the technology of active-object files.  Active-object files  contain binary objects like object files but they also contain the activation algorithms.  This allows  the loader to be much simpler and loader extensions may be contained in the file.

2. ACT GOOFY

A simple example is used to illustrate what an active-object file might look like.  This example  active-object file is composed of one piece of data, a text string, and an activation method for  displaying the text.  We can take a look at the file contents by reading it into memory and doing a  memory dump:
bot: CREATE X 100 ALLOT
bot: " GOOFY" FILE  X 100 READ-BYTES  CLOSE  X 4 DUMP
         30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
946530:  0A 41 4F 2D 4D 45 53 53 41 47 45 28 41 68 68 68 .AO-MESSAGE(Ahhh
946540:  79 79 79 75 68 2C 20 61 68 68 68 79 79 79 75 68 yyyuh, ahhhyyyuh
946550:  2C 20 61 68 68 68 79 79 79 75 68 2C 20 67 61 72 , ahhhyyyuh, gar
946560:  77 73 68 21 B6 DB 00 03 DB 6D B6 DB 6D B6 DB 6D wsh!.....m..m..m
The activation method is AO-MESSAGE and when the file is activated, it reads the string following it and prints it to the screen:
bot: ACT GOOFY
Loading object file...Ahhhyyyuh, ahhhyyyuh, ahhhyyyuh, garwsh!
Object file loaded.
The file was created with the following code:
( ==== Test memos ==== )
: WRITE-STRING  ( s -- )  COUNT  DUP WRITE  WRITE-BYTES ;
: MEMO  ( message \ file -- )  NEW BINARY FILE
   " AO-MESSAGE" WRITE-STRING  WRITE-STRING  CLOSE ;
: TEST  ( -- )  " Ahhhyyyuh, ahhhyyyuh, ahhhyyyuh, garwsh!" " GOOFY"  MEMO ;
TEST

3. Text, Binary Blobs and Active Objects

Ideally, a program would be in one form which may be active, quiescent or morphing.  Tradition so far has dictated two forms:
Text is a factoring of the program into manageable parts for changing and understanding (the morphing and quiescent states), whereas the binary blob is just one part - the collection of behaviours and information defined as an application (the active state).
The region between text and binary blobs is the domain of active objects.  In this region we have a range of parts with a cornucopia of compilation semantics.  For example, some parts of the source code only exist to alter compilation semantics and when creating specific binary blobs, this information is lost.  To capture the compilation semantics of the source code, they are compiled as active objects.

When code is compiled into a binary object file, it is just a collection of data.  The algorithms to transform this data into a running program reside on the host computer wishing to load and link this object file.  Active objects, on the other hand, contain their own loading and linking semantics Many of these active objects may exist in an active object file.  The host computer wishing to use this program, activates the first object of the file.  This object, then, activates the rest of the file. The active objects can make decisions on their participation in constructing a working program in computer memory.  Some active objects may deal with files external to the one being activated, to allow multiple applications to be built from common parts.

4. Relocatable Forth

Concept


An active-object file is used to contain a relocatable Forth program.  The file is created from a set of compiling rules which have been defined for the Translator (as described in the paper “Translator Frameworks” from the Rochester Conference ‘91).  These rules process the source code, strip out useless information (comments), transform the text into a form usable by the host computer, collect relocation information, keep track of compiler decisions and store all this information as an active-object file.  Most of the time, the file will consist of a user dictionary image with collections of relocation data.  There are six types of relocation collections:

Additionally, an active-object file may contain a collection of commands which may be executed to assist in relocating the program.

Cell Alignment

When reconstructing data spaces which contain a mix of data unit sizes, such as cells and strings, we must be careful to make sure that the memory alignment rules are followed.  For instance, a header for a word contains a cell for link-listing the headers, a string of bytes for the name and another cell pointing to the word’s inner interpreter.
The second cell must start at a cell boundary in memory.  This means padding the string with some number of bytes.  For memory which is 16 bits wide, the string padding might be 0 or 1 byte. For 32 bit memory, the padding is 0-3 bytes.  In an active-object file, access is byte wide so there are no padding bytes.  When transferring the header from an active-object file to a user dictionary in memory, one must allow for alignment in one of the following ways:

Byte Ordering

Byte storage in a cell is different for different platforms.  For instance, on an Intel ?P, a byte-fetch from a stored cell, will return the least significant byte, while on a Motorola ?P, it would return the most significant byte.  Since data is read from the active-object file serially, the most significant bytes of values are read first.  To read a cell from the file, one cannot simply do:
HERE  DUP CELL READ-BYTES  @   ( THIS IS NOT PORTABLE! )
We can, however, use the data stack to reconstruct a cell value from its bytes.  The no-brainer, simple-straight-forward, let-the-compiler-speed-it-up approach is:
0  CELL  FOR  256 *  READ OR  NEXT  ( read a cell from a file )
Now we can store the cell in memory and not care about its orientation.

Numbers or Words

Without a dictionary to look up words, we must somehow make a decision whether a word is a word or a number.  In a regular Forth compiler, this is done by assuming that it is a word and looking it up in a dictionary.  If it is not found then it is assumed to be a number.  If it fails to be a number then an error occurs.
When compiling relocatable Forth, we assume that it is a number first, since we have no dictionary to check to see if it is a word.  If it is not a number, then we assume it is a word and compile relocation information for it.  This precludes defining numbers as words which usually isn’t a problem except for maybe hex numbers.  I also found out that my number checking algorithm had a bug in it.  A minus sign was accepted as a 0 (try “ -” NUMBER on your system and see if it interprets it as a zero!).

Defining Words

Words which add new words to the dictionary must have translation rules defined for them.  This includes CONSTANT, VARIABLE, : and CREATE.  Any word which uses CREATE must have a translator rule defined for it.  These rules produce relocation information for the link and inner interpreter.

Immediate Words

Immediate words are a mechanism used to extend the compiler.  Since the compiler is now a set of rules defined for the Translator, immediate words must be defined as rules.  If there is no rule for the immediate word, it is assumed to be a non-immediate word and compiled as such.  When it is loaded, if it is an immediate word, then it is displayed as an error.

Conditional Compiles

Probably the trickiest part about creating active objects is when there are decisions about how to compile a program which must be made on the machine which will run the program.  What this usually means, is the source code has parts which will only be included on certain machines.
Some associated syntax guides the compiler through the proper path in the source code when compiled on the target machine.
An active object file contains all the conditional parts.  The decisions about which parts to use on the host computer are made by active objects in the file at load time.

5. Glossary

Activating an active-object file is much like interpreting Forth.  A string is parsed from the file and looked up in the dictionary.  If it is found, it is executed, otherwise it is interpreted as a number.  The following word comprise the interface for active-object support:
ACTIVATOR  ( -- )  this word is usually the first word in the file and it interprets the rest of the file until a zero is encountered.
REPEAT-OBJECT ( -- )  this will repeat a method over a collection of data.
LOAD-AOF  ( -- )  is for loading an active-object file from an active-object file.
LOAD-IMAGE  ( a -- )  for reading a collection of bytes into memory.
EXTERNAL-VALUE  ( -- )  for setting a value in existing code.
INTERNAL-VALUE  ( -- )  for setting a value in the loaded code.
INTERNAL-REF  ( -- )  for setting a pointer within the loaded code.
EXTERNAL-REF  ( -- )  for setting a pointer within the existing code.
INLINK  ( -- )  for setting a pointer from existing code to the loaded code.
OUTLINK  ( -- )  for setting a pointer from the loaded code to existing code.
OUTLIST  ( -- )  a specialization of OUTLINK which allows a bunch of references to one existing point in the existing code to be link listed for efficiency.
branch  ( -- )  allows execution to continue along a nonlinear path in the file.
0branch  ( f -- )  same as branch but the branch is only taken if the top of the data stack contains a zero.

6. Bizarre Thoughts 

Suppose we implement the virtual Forth machine (VFM) in files.  The kernel is an active-object file. When activated, it creates files for a data stack, a return stack and for the user dictionary.  With a good disk cache, this would run at close to normal program speeds depending on file access translations but it wouldn’t take up any computer memory.