zondag 30 december 2012

This project is a kind of research project for gaining insight in the original list processing used in Lisp, together with the memory management of it. For this, a small C library should be set up, which manages a relatively small amount of memory (e.g. 16k bytes, but parametrised), to be used for cons'ing lists the Lisp way, and then building upon that to check ideas, to check fragmentation, what would be the influence of adding new data types of different size, and so on.

The evolution of it should serve to guide the implementation and optimisation of a register-based processor on an FPGA platform. If possible, it should also guide the evolution of a Lisp-like system compiler (like C now is).

There are two parts necessary, which should be simplified as much as possible. In order to create lists in the available memory space, it is necessary to describe them in an external language, really called Lisp. At that point, the resemblance ends, because no interpretation of the lists will be done. At the start, it will really be only
description of listinternal representation of list.

The internal representation will also be used to investigate the usage of tagged data types, which in C translate into the usage of unions.

What would then be the main points of the research project, in order to execute it as a project?
Define S-expressions and their basic semantics
This part will define the minimum representation necessary in order to write S-expressions. The basic semantics is about the entities that need to be defined in order to represent meaningful data.
Define a basic internal representation
This is a studying phase, where the basic ways that are currently used to represent S-expressions and internally used data structures are be taken stock for further usage in the next phase.
Implementation of the internal representation
This part builds from the previous one an implementation.
Initialization of the environment
Here the necessary environment and its pointers will be prepared for usage.
Allocation and deallocation of cells
This part concerns the algorithm for using the bofl pointer to return a fresh cell, and the eofl pointer to return used cells back to the free list.
Build a translator for S-expressions
This part will be about reading an input stream and validating this input as correct S-expressions or a list of S-expressions, and transforming them into data structures.
Map symbols and strings onto internal list representation
In this part, the interpretation of the S-expressions and the mapping to their internal representations are taken together in order to build a working implementation of a system that can take either a manually entered expression or a file, and use that to build the representation in reality. This will do the allocation in the simplest way possible.
Map lists onto internal list representation
Continuation of the previous part
Printing expressions
It should be possible to turn all kinds of expressions back into a readable form.
Extended diagnostics
It is difficult to debug the reading and writing of generalized expressions.
Build an evaluator
After reading and printing S-expressions has been finished, this is the next part: Running a program as entered by the translator.
The following two parts should be taken into account long term, but are not really necessary to build and test the evaluator.
Memory management
This part should build further upon the previous one in order to look at ways that the usage of memory can be optimised, and to see what effects this has on the run-time speed of all the algorithms.
Extended data types
The basic data types used will probably map on a very limited base type. The investigation here will add other data types, both with fixed and varying size, in order to see what needs to be changed to the previous implementation on how memory needs to be handled, and what its run-time effects will be.

Results (4 December 2012)


Looking through the above article topic by topic, it seems that the stated goals all have been reached, except the garbage collection. But
  • A small C library has been set up, which is used to manage cell memory
  • All management of data structures was done through use of cons, car, cdr, rplaca, rplacd
  • After implementing most of it, the design of a simple 16 bit ISA was started
  • Everything was described using Lisp S-expressions
  • A union type was used to represent the basic data types in C
  • An evaluator was built and extended

After this research, an assembler for a simple 16 bit processor was implemented in Lisp, and using Lisp structures as assembler statements. This makes it possible to use the list manipulation possibility of Common Lisp to easily add new structures, which are immediately written as basic assembler before being translated.
After that, a simulator for the 16 bit processor was implemented, also in Common Lisp.
The next part is effectively starting again the implementation of the system, but now in assembler. Since everything is based upon Lisp structures, the goal is to develop patterns which can be used to describe a Lisp system in terms of Lisp and going down to assembler, by rewriting higher-level structures in lower level structures.

maandag 1 oktober 2012

When I started this blog, I did not know yet that I would do something completely different from 2010 to 2012. In the summer of 2010, I decided to get my MSc degree in Electronics/ICT, what would be the closest that I could get to a degree in software engineering. An additional benefit was that I also got a fairly thorough introduction into VHDL. I used this with this board to also build my master's thesis (there weren't any interesting topics in SE, unfortunately).

Last week I got my degree then officially.

Since a week or two I have been coding a simple environment for encoding S-expressions. It currently consists of the following parts:
  • A system for managing cells. Each cell is either a CONS or part of a string. All cells are consistently 4 bytes in size. There are two functions here: one to get the address of a new cell, and one to return a cell back to the free list.
  • Functions to build a string or a symbol
  • Functions to build lists
  • A test harness with unit tests
One purpose of this part is only to encode S-expressions into a list or tree representation. Processing of this structures will come later. Another purpose was to check if it was possible to implement all this without stop-and-go garbage collection.

The mechanism used to avoid this is that all cells are at the beginning initialised to belong to a free list. This is a singly linked list with a pointer to the beginning and a pointer to the end of it. Allocation of a cell will always return a cell at the beginning of the list, while deallocation will append the passed cell to the end of the list. The pointers at the beginning and the end will be adjusted accordingly. All other functions build on top of this mechanism. In the development, the usage of unit tests is beneficial. Several defects have been detected through those, even in the later stages of development.

This comes of course at a price. One should make sure that no cyclic structures are created. In a stop-and-go collector it is much easier to clean up such structures.

Another price is that strings use double the space, since each string is composed of a cell with a pointer to the next cell of the string and two characters. Optimising this could be done with a separate pool for strings, but that would double the work needed. Now there is only one mechanism which can be used for everything.

The next part to code will then be the S-expression encoder and decoder. It should not only be possible to encode an expression like

(loop
    for i in (quote (1 2 3 4 5 6 7 8 9 10))
    do (print i))
 

But also to print an S-expression based upon the internal list structure.