Freitag, Dezember 22, 2006

Karl-Steinbuch-Stipendium

Wenn Sie sich in Ihrer Freizeit mit interessanten Informatik-Projekten und -Ideen befassen, haben Sie (als Student bzw. Studentin in Baden-Württemberg) schon einmal darüber nachgedacht, sich Ihr Engagement fördern zu lassen? Denken Sie doch einmal darüber nach, ob es Sinn für Sie macht, sich um ein Stipendium zu bewerben. Aktuell (und immer wieder aktuell) ist die Ausschreibung des Karl-Steinbuch-Stipendiums.

Dienstag, Dezember 12, 2006

Constraint Programming

In der letzten Zeit habe ich mich ein wenig mit einer möglichen Umsetzung des UML-Reasoner beschäftigt. Um ein Gefühl für den "Artenreichtum" eines Klassendiagramms auf der Objektebene (der Instanzebene) zu bekommen, spielte ich ein wenig rum. Zwei Klassen und eine eingerichtete Assoziation bildeten meine Ausgangsbasis. Maximal fünf Instanzen zu jeder Klasse sollte es geben. Ich schrieb mir einfache Routinen, die mir aus zwei Mengen ({a1, a2, a3, ...} und {b1, b2, b3, ...}) mit je fünf Elementen alle möglichen Relationen von der einen zur anderen Menge erzeugten -- unter jeweils unterschiedlichen Annahmen. Beispielsweise kann man das Paar (a1,b1) als gleichwertig mit (a2,b2) betrachten. Schließlich geht es ja nur um den Link von einer Instanz aus der ersten Menge zu einer Instanz aus der zweiten Menge. Man kann jedoch auch anders argumentieren, wenn man Instanz-Relationen über die Zeit betrachtet. Dann kann (a1,b1) die Folge einer anderen Konstellation sein als (a2,b2) und auch eine andere Fortsetzung finden. Wirklich? Bisweilen verstieg ich mich in haarigen Argumentationsketten.

Wie auch immer, eines wurde mir klar: Je nach Annahme explodiert mein Raum an Möglichkeiten enorm und nimmt fast erschreckende Ausmaße an. Wie soll da noch in sinnvoller Zeit der ganze Raum an Möglichkeiten generiert und auf Design-Beschränkungen überprüft werden. Gibt es einen anderen Ansatz?

Ich wandte mich Prolog zu, einer logischen Programmiersprache. Aber irgendwie überzeugte mich das nicht. Prolog geht ähnlich vor: es wird der gesamte Suchraum generiert und dann werden logische Beschränkungen überprüft. Das ist der so genannte "generate and test"-Ansatz. Damit kommt man auch nicht weiter.

Also begann ich Constraint Programming zu studieren, der nächste, naheliegende Ansatz. Es gibt dazu ein erfreulich dünnes und gutes Buch von Thom Frühwirth und Slim Abdennadher, "Essentials of Constraint Programming" (Springer, 2003) mit dem man sich innerhalb von ein, zwei Stunden in das Thema einarbeiten kann, vorausgesetzt man überspringt die Mathematik. Constraint Programming arbeitet nach der "constraint and generate"-Methodik. Der Ansatz ist sehr schön beschrieben in dem Buch "Concepts, Techniques, and Models of Computer Programming" von Peter Van Roy und Seif Haridi (MIT Press, 2004) -- das nächste Buch, das ich zu Rate zog. Sie können das mit der Programmiersprache Oz mit Hilfe von Mozart auch selber ausprobieren. Das Buch von Roy und Haridi basiert auf Oz.

Um Constraint Programming (CP) zu betreiben brauchen Sie jedoch keine spezielle Programmiersprache. Im Grunde kann CP als Bibliothek "nachgeladen" werden, sofern eine solche Bibliothek für Ihre Programmiersprache verfügbar ist. Es geht aber auch gänzlich ohne Bibliothek, wenn man das Prinzip verstanden hat. Per Zufall bin ich auf einen Artikel von Peter Norvig (übrigens Director of Research bei Google) gestoßen, der die Idee des Constraint Programming mit dem Wechselspiel aus Constraint Propagation and Search am Beispiel eines Sudoku-Lösers demonstriert; siehe "Solving Every Sudoku Puzzle". Der Code ist in Python geschrieben, verblüffend kurz und erweist sich als effizient für selbst "schwere" Sudokus.

Wenn Sie sich durch das Norvig-Beispiel gearbeitet haben, erahnen Sie es vielleicht auch: Ich glaube, hier ist der Schlüssel zu einer praktikablen Umsetzung des UML-Reasoners verborgen.

Donnerstag, Dezember 07, 2006

Inversion of Control

Did you every try to implement an interpreter for the Scheme programming language yourself? I enjoyed it a lot -- and still enjoy it, since I haven't finished my little Scheme project yet. It is a lot of fun to implement Scheme in Scheme, but I use Python in order to make sure I understand each and every bit of the Scheme language. While a basic Scheme implementation is easy to realize, a more advanced implementation supporting so-called continuations is a nice challenge. First you have to understand the concept of continuations, which is quite a challenge in its own. Second, you need to have an idea how to make them happen in your host language.

Just to give you a rough idea: In Scheme, you get a sort of read and restore access to the current state of the computational process. There is a special procedure in Scheme, which you can use anywhere in a Scheme program; the procedure is called "call with current continuation" ("call/cc" for short) -- but that's not so important here. When executed, this special procedure saves the current state of the computational process and packages this information within something called a continuation. A continuation is a first-class entity in Scheme; you can pass it around and store. Once a continuation gets called, the state of the computational process stored within the continuation is restored. Simply speaking, program execution continues at exactly that point, where the continuation was created. To give you an analogy, think of exceptions as you know them from other programming languages. When you "try" something, the "except" clause restores the computational state of affairs in case an exception is thrown, no matter when and where deeper down in the call chain the exception is raised. Unfortunately, exceptions are functionally weaker than continuations are. You can build an exception system with continuations, but not the other way around. You cannot store an exception, raise it outside the calling context of a "try" and throw it again later on. That's the kind of thing you can do with continuations. Exceptions are a poor man's continuations.

Besides Scheme, there are only some few languages, which natively support continuations. Among those is Ruby. In most cases however, you are left without any support for continuations -- and that's not too bad. It forces you to really understand what continuations are and how to implement them. That's the situation I was faced with.

If you do not have full control over the computational process, as is needed for an implementation of continuations, you have to think of a way to regain execution control. What you end up with is either writing a compiler, which transforms Scheme programs into a form called Continuation Passing Style (CPS), or you write an interpreter maintaining its own call stack. The call stack represents the state of the computational process. Having access to the call stack is key for an implementation of continuations. The current continuation is the current situation of the call stack.

I opted for writing an interpreter. In the following, I will show you the way how I did this with Python. But before you hesitate to read on: You do not need to know or understand Scheme. What I'm after in this post is not Scheme; it's the way how to regain execution control. Scheme and the implementation of continuations is just one motivating example. I'll give you more examples later on; examples, which are relevant for anybody with an interest in software design.

For setting the scene, I need to tell you something about a cool feature in Python, namely generators. A generator looks like a regular function definition. As you know from other languages, a "return" statement within a function terminates execution of the function and possibly returns some data. Within a generator definition you do not use "return" statements. Instead, you use "yield" statements much like return statements, but a "yield" works differently. A "yield" returns data but it does not terminate the generator. The execution state of the generator is preserved and execution can be resumed later on. (Sounds like another poor man's continuations, eh?!) To give you an example, here is an extract of an interactive Python session:


>>> def countdown():
... yield 2
... yield 1
... yield 0
...


If try to call this function, it won't work as you might expect. It doesn't return a 2 or something like that. A generator object is returned instead.


>>> countdown()



With the generator object you can iteratively step from one "yield" statement to the next via the generator's "next()"-method, get some value back from the "yield" statement and, later on, resume execution where it left-off.


>>> cd = countdown()
>>> cd.next()
2
>>> cd.next()
1
>>> cd.next()
0
>>> cd.next()
Traceback (most recent call last):
File "", line 1, in
StopIteration
>>>


The exception "StopIteration" indicates that the generator is done. Generators support the same protocol as Iterators do. So, a generator can be used within a "for" statement. Creating the generator object, calling "next()" and exiting when StopIteration is raised, all this is done automagically for you in the background.


>>> for i in countdown(): print i
...
2
1
0


A more flexible definition of countdown could look like this:


>>> def countdown(n):
... assert n >= 0
... while n >= 0:
... yield n
... n = n - 1
...


I think that you get the basic idea. With "yield", a "function" (actually a generator) can return control to the caller of the "function" at any time on appearance of a "yield" statement. A "function" does not need to finish its computation before it can return something.

In my Scheme interpreter I use generators to return control, whenever the evaluator of a basic Scheme expression needs to call the evaluator of another Scheme expression. Instead of doing something like this (what you would conventionally do) ...


def do_something(n):
result = do_something_else(n)
return result


... I do that, a slight modification to the code in order to return control, whenever I call another function.


def do_something():
result = (yield do_something_else,n)
yield result


Got it? Control is returned to the caller of "do_something". The generator hands over the method object and the arguments to the caller. Now, it is up to the caller to call "do_something_else" on behalf of "do_something". After that the result of "do_something_else" is sent into the generator and the generator proceeds. Just to give you an insight how I use this technique for my Scheme interpreter, here is a code snippet without further comment.


class If(object):
def __init__(self,test,consequent,alternate=None):
self.test = test
self.consequent = consequent
self.alternate = alternate

def eval(self,env):
if (yield Command(eval,self.test,env)) != False:
yield Command(eval,self.consequent,env)
elif self.alternate:
yield Command(eval,self.alternate,env)

def __repr__(self):
repr = "(if %s %s)" % (self.test,self.consequent)
return self.alternate and (repr[:-1]+" %s)" \
% self.alternate) or repr


The overall pattern behind this approach is Inversion of Control (IoC): Delegate method calls to a centralized instance (in this case I used "yield" statements for that purpose) so that this instance gets full control over who is when to call next.

By the way, I don't feel comfortable calling Inversion of Control a pattern. It's not that easy to boil IoC down. It cannot be captured by a simple code template. I would say it's a meta-pattern or a design principle. There are other techniques to achieve IoC. Metaprogramming is one example.

Whatever you call it, Inversion of Control is a very powerful thing. IoC is about getting control over the flow of execution. You can use it to achieve quite fancy things. It might be the basis for an architectural infrastructure. Let me give you some examples.

Suppose you do not want to use threads, since there is some cost penalty associated to them. You choose to go for cooperative multithreading. What is cooperative multithreading about? The basic idea is that only one task of multiple concurrent tasks actually gets the control thread. The task returns control on own initiative back to a dispatcher and scheduler. That's why it is called "cooperative": Each task is assumed to give control back as soon as possible, so that other tasks on the scheduler's list can continue. Guess how one could implement this? With the help of generators using the technique I just showed you.

Another example: Your language does not support pre- and post-conditions (see my post about "Netze spannen mit Design by Contract"). Again, use Inversion of Control. It's up to the controller to call a method. Before calling a method, the controller just needs to check if there is a method registered representing the pre-condition and call it beforehand. Similarly, the method representing the post-condition is called after the main method. By the way, you can use this to add aspect-orientation to your system!

A final example: Suppose that you want to monitor a certain component in your system. You are interested in all methods calls going out and coming in. Since the interaction with the environment is complex, you log all method calls and the results they return. You record everything. Now, you can remove the environment and let the component re-run a certain scenario. The controller can now "replay" the recorded reactions and stimuli of the environment. Thereby, you can test and debug your component under test in isolation.

This was a rather long post. But I hope that you now understand that without Inversion of Control, there is something very valuable missing in your toolbox. Inversion of Control is an important structuring technique of your code. Some projects make heavily use of this principle (see e.g. Apache Excalibur). It is one of this magic techniques, wizards use ;-)

Sonntag, Dezember 03, 2006

Haskell-Kurs

Haskell ist eine moderne, funktionale Programmiersprache mit vielen interessanten Features wie z.B. Pattern Matching, Currying und Lazy Evaluation. Mark C. Chu-Carroll hat auf seinem Blog "Good Math, Bad Math" einen Haskell-Kurs begonnen, den ich jedem nur empfehlen kann, der sich für Programmiersprachen jenseits von Java, C#, PHP, Python, Ruby, Perl und anderen (also all jenen, die als imperative Sprachen bezeichnet werden) interessiert. Ich bin schon sehr gespannt darauf, wenn Mark uns Monaden erklären wird und was für ihn die "Leim"-Qualität von Haskell ausmacht.