TU: Timbre Unit II

Rob Chapman


Table of Content

Abstract

Introduction

Form

Tu

Recent Work

Future


Abstract

A discussion on new additions to Timbre, a translator engine, the future and some recent applications:


Introduction

Fundamental to all computing is translation. Timbre has been built in recognition of this. The translation is separate from the description.

In general, solutions are described in the appropriate language for the problem domain and then they are translated to a more appropriate form for execution. For example, this is true from the very beginning of a product life cycle. In one possible route, albeit quite simplified, a customer makes their needs known in a language appropriate to them and marketing translates these needs into marketing requirements. Engineering then translates these requirements into a product requirements document. This might again be translated into software and hardware requirements. Then the requirements are translated into software and/or hardware descriptions. These are translated, usually by a compiler, into a binary format understood by a processor or by configurable logic.

For one time translations, there usually isn't much need for automating any of these steps and for very small projects, many of these steps are skipped. But if there are translations that need to be done repeatedly, most obviously a compiler, then the process should be automated.

Timbre is a script driven engine with the capacity to automate the translation process. True translation is clean and makes no changes to the source and doesn't need the output to be adjusted subsequently.


Form

Translation has the general form of:

{ pattern }[ control ]{ substitution }

The pattern can be anything from a single character to a phrase of words. Blank space is used as a separator between words and is not considered part of the pattern. The control portion exerts control over the environment and makes things happen. It could be just getting or putting a value or it might be a complex sequence of actions that could change the current pattern generation. The substitution gives life to the form; it allows a 'next generation' pattern to be put into the pattern space. If it is the same as the matching pattern, then the form may run again.

These concepts have been explained in previous papers but what is new here, is the complete set of forms for Timbre:

  1. [ ] control
  2. [ ]{ } pattern generator
  3. [ { } ] controlled pattern generator
  4. [ { } ]{ } semi-controlled pattern generator
  5. { } recognition
  6. { }{ } substitution
  7. { }[ ] action
  8. { }[ ]{ } translation
  9. { }[ { } ] controlled substitution
  10. { }[ { } ]{ } transmorf

Originally, Timbre was introduced with the forms 6, 7 and 8. Subsequently, a way of selecting substitutions was required and the forms 9 and 10 were introduced. In forms 9 and 10, by mixing the substitute phrase with the control, selected substitutions can be made. Since then, I've added control and recognition. Forms 1 and 2 allow for forms 3 and 4. Forms 1 to 4, provide control over the input pattern and augmentation to the dictionary for use later. Form 5, does nothing other than recognition and allowing entry of a phrase into the Timbre dictionary. These 10 forms make a complete set. They are the base frameworks from which all rules are built.

Default Behaviours

In Timbre, the default behaviour for parsing the input is to use blank space as a delimiter. The parsing behaviour may be changed from this by using the control forms. The interesting analogy to Forth, is this would be like pulling back one state from the interpreter state to the parser state and defining it's behaviour.

Timbre's default parsing behaviour may be described as:

[ INPUT?  IF  GET-WORD inputq PUSH  ELSE  SHELL-END  ENDIF ]

If no rules match the current parsed input, then the default behaviour may be described as:

{ }[ ." No default rule." SHELL-ABORT ]


Tu

Now that we have discussed the forms of Timbre, let us discuss the syntax, or Tu. The basic Tu of Timbre is with curly braces for phrases and square brackets for control. There is an alternate Tu for the phrases which allows phrases to contain curly brackets. In this Tu, vertical bars are used to delimit the phrases. The Tu must be the same within a rule but can be different for different rules in a rule set. For example, if we want to write a translation which includes both curly brackets and vertical bars then we could write them like this:

{ | }{ vertical bar }
| { || left curly |
| } || right curly |

These three rules are of the basic form 6, substitution. They match the three symbols in the input and replace them with an english phrase which may be used to trigger further rules. The three rules do two things: they turn a symbol into readable English, and they change them into a common form. By using readable English, there is no need for comments (then unlike other languages, the comments compile and run).

C Tu

So far, the control Tu is modeled after the syntax of Forth, where there is a stack and words are blank delimited. There is an alternate Tu for the control as well. In this Tu, the control portion is delimited by curly braces and the syntax of the control code is modeled after C. The Tu of the patterns is then square brackets and vertical bars:

| pattern |{ /* C style code */ }| substitute |
[ pattern ]{ /* C style code */ }[ substitute ]

This Tu is more appealing to the C programmer's camp. The control part is similar to a C function definition and the pattern may be thought of as the name of the function. The basic forms are the same for all Tu.


Recent Work

I've been using Timbre for many years now on PC, Unix and Mac platforms for many different translations and with many different types of inputs and outputs. In the brief space allotted to me in this paper I will focus on two recent and interesting translations.

Forth To C

Previous translations for turning Forth source code into C source code were based on the assumption that the Virtual Forth Machine (VFM) was available. While this simplified the translation, it created source that was awkward for C programmers (but not so awkward for Forth/C programmers) and the C compilers weren't as adept at producing optimal code when faced with the additional data resource of a stack as pointed out in a paper by Anton Ertl .

Recently I developed a hand-held system for a customer. They wanted me to develop the code for the product and then hand it over to them and they would maintain it. The hand-held unit was a PoBox from New Micros which comes with a 68HC11 microcontroller with MaxForth in ROM. The unit also has a 20 key keypad, 2x16 LCD display, timers and EEPROM. The customer said that they wanted all the code in C.

Initially, I started developing the code in C, but the host compile, serial download, lack of a target debugger all started taking their toll on development time so I switched to Forth. The customer initially only had a vague idea of what the product should do, so I knew that getting working prototypes to them early on was paramount. Using Forth allowed me to do this. I sent them 30 revisions before they felt that the product was almost right. Since the code was nearing perfection (a coder's dream), it was time to write it in C.

The Forth code was about 1700 lines and if I tried to write it all in C from scratch, I wouldn't meet the deadline and more importantly, I'd have to debug it all over again. If I translated it into C and assumed a VFM existed, I would alienate the customer. So I decided it was time to revamp the Forth to C translator. This time the output would be stand alone, maintainable C code.

Moving the VFM

Since I'd already worked out the translations needed for C name spaces and data spaces, the next thing was to translate the usage of a stack for handling parameters to the C paradigm of passing parameters with named scoped variables. This seemed like an impossible task but like all daunting tasks, its become easier once you break them down.

The VFM that the Forth code assumes is there, is removed from the code by putting it into the translation. Values which get pushed to the stack in the code are now kept on a stack as text during the translation. If, during the translation, there is an inherent stack operation such as DUP, the text on the data stack is duplicated. The action that was supposed to be done by the stack primitive of the VFM is done during the translation. Values accumulate on the stack until they are required by consumers or transformers. Since Forth assumes two stacks, data and return, there are two stacks in the translation which mimic the movement of information in the source code. Text from the source is accumulated on the stacks until it is needed and then finally output to the output file.

In this example, the 8 is recognized in the translation by the default rule as a number and is pushed onto the data stack. The next word parsed, is recognized by a specific rule as data and it gets pushed to the data stack as well. Then finally the ! is encountered and it is recognized as a consumer of two stack items. Its action is to output the text values from the data stack in the right order with the added C syntax.

Data, Producer, Consumer, Transformer and Functions

I decided to categorize the Forth words into five categories based upon their stack deposition:

  1. Data - these words would just be pushed onto the data stack (i.e. a variable)
  2. Producer - these words produce a value and take no input (i.e. HERE)
  3. Consumer - these words consume input but produce nothing (i.e. !)
  4. Transformer - these words consume and produce (i.e. MAX)
  5. Function - these words consume nothing and produce nothing. (i.e. QUERY)

Data and Producer are the same in Forth but in C the producer is followed by parentheses.

Results

Actual results sometimes say more than theory itself. This is an excerpt from some of the code I had to translate.

: DELETE5 ( -- ) KPDATA
CASE
del# { field @ IF RUN-DELETE3 ELSE RUN-DELETE2 THEN }
can# { RUN-DELETE4 }
ENDCASE ;

void delete5(void) /* */
{
switch(kpdata())
{
case DELETE:
if (field)
{
run_delete3();
}
else
{
run_delete2();
}
break;
case CANCEL:
run_delete4();
break;
}
}

The names have been down-cased with the exception of constants which in C have now become #define s and it is customary to have them as uppercase. KPDATA is a producer which is held onto by the translation data stack until it is used by the switch statement. Inside the case statement, the variable field is fetched to make a decision on which function to call. This is translated to just the data structure name in C since it is an rvalue, it will act like a fetch is being performed. It is worth noting that there is an empty comment that has been added to the C code which will be filled in later by the programmer and then used later to generate documentation.

Here's another word translated. Note how the strings, and arguments have been altered to match the C syntax and how some of the names have been translated as well.

: DELETE ( -- ) " delete LOGGER" 0 LCD-LINE
" CONCENTRATOR " 1 LCD-LINE FIRST-FIELD KEYPAD= DELETE1 ;

void delete(void) /* */
{
lcd_str("delete LOGGER",0);
lcd_str("CONCENTRATOR ",1);
first_field();
keypad_is(delete1);
}

Since I didn't fully finish the Forth to C translation rules, some files translated 100%, while the worst file only translated 60%. This still ended up saving me a lot of coding, and the code that was translated, had already been debugged.

In the end, 24K of compiled Forth code plus a 12K Forth kernel was replaced with 20K of compiled C code and the resulting code was noticeably faster as the customer pointed out. On the other hand, the size of the source file increased by a factor of four.

Web Documentation

Documentation is the most despised task of a programmer. The euphoric moment is when the code is done and debugged. After that, the excitement is gone and it's usually drudgery. Since I perceived this task as one of simple translation, I decided to turn Timbre loose on it as well.

Glossary and Index

There are three useful ways for a maintenance programmer to view the code:

These views by themselves are okay but they can be readily enhanced by turning them into web pages and hypertexting them.

The rule set for generating the web pages was built using a general C parsing framework. The framework has no output, it just knows how to walk through the C code. The web rules placed on top of the framework generate all the output.

This is the main parsing state of the parsing rules:

RULE-SET c ( framework for parsing C source )
[ CPARSE ]
{ }[ inputq PULL ]{ check token }
 { check token }[ DROP ]
{ / * }[ RULES>  comment >RULES ]{ /* }
| {   |[ RULES>  block   >RULES ]
{ (   }[ RULES>  paren   >RULES ]{ ( }
{ #   }[ 0 WORD ]
{ int }{ char }[ RULES>  declaration >RULES ]

A rule set is declared and rules are added. The default action to gather patterns is CPARSE. The default translation for input is to pull it out of the input stream and check to see if it is a token. The other rules jump off to different parsing states, leaving the current rule set on the data stack so that it can be returned too. To handle the curly braces of C, the vertical bar Tu is used.

This is a part of the web rules:

declaration RULES
{ check token }[ DUP NAME? IF tuple CELL + ! ELSE DROP ENDIF ]
{ ; }[ .TUPLE CHANGE-RULES ]
{ ( }[ ` ( ` ) PARSE-PHRASE prototypes CHANGE-RULES ]

The declarations are the heart of the matter. They are responsible for outputting the words. The words are built up as tuples and output when they are done.

This is part of the index web page (all underlined words are hypertexted):

 Index

 
 a m, am edit
 back dataset ok, back record, back record ok, bcd, binary
 calibrated, check dataset, check log, choose cursor, clear concentrator, clear datalog, clip, create dataset, create record, cursor, cursor off
 dataset, dataset, date, day edit, day of week, decre char, delete, delete1, delete5

When clicked on, the word will take you to its entry in the glossary:

delete.c

 
void first_field(void) hilite the first field
void last_field(void) hilite the last field
void delete1(void) the selection menu keypad, logger or concentator
void delete5(void) last chance keypad: delete or cancel
void delete(void) initial entry point


Future

I will continue to use and enhance Timbre. For those interested in learning more or actually downloading and using Timbre, its web page is:

http://www.compusmart.ab.ca/rc/Timbre

"A translation is an intersection of ideas."


1 "It Has Timbre" and "Translator Frameworks", Rob Chapman

2 "C Without C" and "Bilingual Programming", Rob Chapman

3 http://www.complang.tuwien.ac.at/papers/ertl&maierhofer95.ps.gz

A PDF version of the original paper is also available.