Translator Frameworks

Rob Chapman

What?

This paper describes a simple framework for building powerful translator applications. Translators are used to "translate input to output". A translator application is defined by adding sets of rules to a framework. The simplest framework has no translation rules; all input is ignored and there is no output.

Figure1. A translator translates input to output governed by sets of rules.

Why?

If you have ever used AWK, SED or a compiler, you have used a translator. With the framework presented here, it is possible to build simple translator applications such as: text filters, extendable optimizing compilers, code verifiers, glossary generators, language translators, text formatters, code analyzers, or for that matter, if you can describe it, you can build it. Why?: because its simple.

How?

Input and output flow management is done with the supplied machinery. Rules and rule sets are defined to manage the flow of input to output.

Supplied machinery

The input is modeled as a stream of characters. These characters are read from a text input buffer (tib) which is terminated with a null character. Count prefixed strings, parsed from tib, are kept in an input queue (inputq) where they are used to find the longest match phrase in a rule set.

Parsing and input stream editing are accomplished by capturing parsed strings in an input queue. The translator executes rules when they match what is in the input queue. The translator will execute the rule with the longest match phrase that matches the input queue.

Figure 2. Words are parsed out of tib and placed at HERE as a count prefixed string. If there is an identical existing string, it is used, otherwise the string is copied to the buffer: unknown. The resultant string is pushed into the inputq for rule matching. The strings in the inputq are used to find a matching rule. When one is found, the new phrase is STUFFed into the inputq.

Input stream

The input stream is composed of two parts:

  1. The text input buffer (tib):

which is parsed with:

  1. Count prefixed strings parsed from tib are maintained in an input queue (inputq) where a queue is modeled in the following manner:

Editing is done by altering the inputq. New strings may be introduced at either end of the inputq.

One could think of the input queue as a cache for the text input buffer which serves three purposes:

  1. rule matching is faster since the input stream is only parsed once,
  2. a continuum is maintained across text lines by carrying over strings from the current line to the next line via the input queue,
  3. it allows easy editing of the input stream.

Rules

Translators are built by defining rule sets. Rules for translating are defined by the word {.

The syntax for defining a rule is:

Match Phrase

There may be one or more match phrases for a rule. This may be done for design clarity and easier maintenance.

When defining a rule whose match phrase is a sub phrase of an existing match phrase, the shorter one should be defined first. Currently, the shorter phrase will "short out" any longer match phrases. In the future, the longer phrases will not be "shorted out". This could be accomplished by sorting match phrases from a root so that you're always looking for one longer and arranging them for that circumstance.

When a phrase from the input stream matches a phrase in a rule set, its action is performed and the new phrase is stuffed into the input stream. An action may change rule sets or exercise the other tools available within the framework.

To deal with unknown input, a rule may be defined which has no words as a match phrase. This is called a default rule. If there are no rules to match the inputq and there is no default rule, the translation stops with a message.

Actions

Possible actions include rule set selection, inputq editing and parsing. Actions may also make use of the underlying virtual machine which consists of 2 stacks, memory, an ALU and control flow.

New phrase

The new phrase is the set of strings which gets stuffed back into the inputq when the match phrase is removed.

Translator tools

The translator may be controlled by the following words:

Prototyping Tools

These tools aid in the interactive building of rule sets:

Reflections

This paper has gone through many revisions (read: total rewrites!) and I appreciate the patience of the folks at JFAR for putting up with me.

Some folks prefer to read a manual while others prefer to learn by example. This paper is the manual while the companion paper "Stack Verification" is an example of a translator application framework. Had I the time, I would love to include some of the other translator applications frameworks which I have brewed up over the last few months to make my work easier. Some of these applications are very powerful yet they all remain simple.

I am convinced that there is something important brewing here and the words which comes to mind are "a free form phrase language". I've wondered what it would be like to have two people solving a problem: one person describes it in text pertinent to the problem domain while the other person writes translation rules to translate the text into a compileable programming language.


A PDF version of the original paper is also available.