maandag 1 oktober 2012

When I started this blog, I did not know yet that I would do something completely different from 2010 to 2012. In the summer of 2010, I decided to get my MSc degree in Electronics/ICT, what would be the closest that I could get to a degree in software engineering. An additional benefit was that I also got a fairly thorough introduction into VHDL. I used this with this board to also build my master's thesis (there weren't any interesting topics in SE, unfortunately).

Last week I got my degree then officially.

Since a week or two I have been coding a simple environment for encoding S-expressions. It currently consists of the following parts:
  • A system for managing cells. Each cell is either a CONS or part of a string. All cells are consistently 4 bytes in size. There are two functions here: one to get the address of a new cell, and one to return a cell back to the free list.
  • Functions to build a string or a symbol
  • Functions to build lists
  • A test harness with unit tests
One purpose of this part is only to encode S-expressions into a list or tree representation. Processing of this structures will come later. Another purpose was to check if it was possible to implement all this without stop-and-go garbage collection.

The mechanism used to avoid this is that all cells are at the beginning initialised to belong to a free list. This is a singly linked list with a pointer to the beginning and a pointer to the end of it. Allocation of a cell will always return a cell at the beginning of the list, while deallocation will append the passed cell to the end of the list. The pointers at the beginning and the end will be adjusted accordingly. All other functions build on top of this mechanism. In the development, the usage of unit tests is beneficial. Several defects have been detected through those, even in the later stages of development.

This comes of course at a price. One should make sure that no cyclic structures are created. In a stop-and-go collector it is much easier to clean up such structures.

Another price is that strings use double the space, since each string is composed of a cell with a pointer to the next cell of the string and two characters. Optimising this could be done with a separate pool for strings, but that would double the work needed. Now there is only one mechanism which can be used for everything.

The next part to code will then be the S-expression encoder and decoder. It should not only be possible to encode an expression like

(loop
    for i in (quote (1 2 3 4 5 6 7 8 9 10))
    do (print i))
 

But also to print an S-expression based upon the internal list structure.