Donnerstag, Januar 28, 2010

Factor @ Heilbronn University

It was an experiment -- and it went much better than I had imagined: I used Factor (a concatenative programming language) as the subject of study in a project week at Heilbronn University in a course called "Software Engineering of Complex Systems" (SECS). Maybe we are the first university in the world, where concatenative languages in general and Factor in specific are used and studied. Factor is the most mature concatenative programming language around. Its creator, Slava Pestov, and some few developers have done an excellent job.

Why concatenative programming? Why Factor?

Over the years I experimented with a lot of different languages and approaches. I ran experiments using Python, Scheme and also Prolog in my course. It turned out that I found myself mainly teaching how to program in Python, Scheme or Prolog (which still is something valuable for the students) instead of covering my main issue of concern: mastering complexity. In another approach I used XML as a tool for lightweight modeling to explore and study some techniques. The approach is innovative and still worth to be developed further but I wasn't satisfied.

My goal in the course "Software Engineering of Complex Systems" is to present and discuss practical techniques to conquer complexity in software systems. I did and still do a lot of research in this area. To make a long story short, I have come to the conclusion that Language-Driven Software Engineering (LDSE) is a very powerful and promising approach to conquer complexity. It's more than creating and using Domain Specific Languages (DSLs). It's consistently designing and creating languages throughout all levels and layers of a software implementation.

During my research I stumbled across Joy and Factor and learned about the concatenative paradigm. A series of excellent articles written by Manfred von Thun, creator of Joy, taught me the theory and fundamentals of concatenative languages. Factor, Slava Pestov's implementation of a practical concatenative language, turned out to be the best showcase I could think of: Factor is almost self-contained and extends itself by creating vocabularies of words using other vocabularies of words. Programs in Factor are written the very same style. Factor is seeing LDSE in action. I realized that it's the concatenative paradigm which enforces you to design software from a language-driven point of view.

How did the course go?

First I discussed the issue of complexity in software systems. Before I used Factor in the project week, I introduced the concatenative paradigm in the course. I presented it's mathematical foundation and used a pattern-driven approach to define the semantics of concatenative words as syntactic transformations. Finally, we defined a simplified grammar of Factor which we extended to cover pattern matching and template instantiation. All this served the purpose to smoothly prepare the ground for Factor. We also reflected and discussed a lot about what complexity is about and how it can be managed.

During the project week, my students (about 30 persons) worked with Factor four out of five days almost 8 hours per day. For my students it was full contact combat with an almost unknown programming paradigm and an exotic language. But they did really well. On day 1, the students worked through the introductory material supplied with Factor. On day 2, we studied types and object-orientation in Factor. On day 3, parsing and macros were studied in Factor. For these two days, the students worked with tutorials and worked their way through a number of exercise, which required them to write tiny programs in Factor to pass unit tests. On day 4, we worked on a topic unrelated to Factor. On day 5, two students thoroughly presented their project work, a real-world application in Factor, they had done in another course in the previous semester. We concluded the week with discussing and reflecting Factor's capabilities and the power of the concatenative paradigm in general.

Before I forget to mentioned it: Tim, research assistant in the software engineering department and PhD student, created the tutorials and the exercises and helped out a lot in class. Without him, the course wouldn't have been possible!

The students enjoyed the week very much. The evaluation of the course shows that they liked getting a new and different viewpoint on software development, object-orientation, parsing etc. They definitely realized and experienced that Factor helps them becoming better software engineers although Java is their main language.

Isn't Facor, aren't concatenative languages too esoteric to be useful?

Yes and no. There is no question that Factor is a niche language no one in industry shows interest for (besides Google, so far ;-). There might be some companies out there which use Forth and might be open for "concatenative thinking". However, even though the concatenative paradigm is almost unknown, concatenative languages are functional languages and functional languages are gaining in popularity. There's little doubt that learning functional programming broadens your scope and complements a student's skill set.

The fun part is that the concatenative approach to functional programming is much more simpler than the lambda calculus, which is traditionally taught. The math is simple and no intellectual barrier and formal transformations are easy to understand since there are no variable bindings and nesting scopes. Key concepts are stripped to their bare minimum. Did you ever try to explain the idea of continuations in Scheme? You might spend a good amount of time explaining continuations and running exercises. It's not unlikely that some students still don't get it. Continuations seem to be an extremely complex thing and appear to be somewhat mystical. In Factor, and in concatenative languages in general, continuations are a triviality! In principle, it's a snapshot of the data and the call stack. No big deal, since you juggle around with both stacks all the time. Are generic functions a specialty of CLOS. They come out naturally in a pattern-based approach to define concatenative words.

But my point goes beyond that. The way you create abstractions and refactor your programs in a concatenative language enforces you to continuously reflect about your design decisions. You have an enormous freedom of how you shape and constrain the design space of options at hand. It lets you think about words and vocabularies of words. It is thinking about creating and using languages. It combines software engineering and software programming in a way I haven't experienced in any other paradigm. That's why I introduced Factor in my course: You will start to engineer software, you'll explore new ways of creating abstractions and design frameworks.

Factor itself is an excellent case study for this approach. Factor starts from a relatively small kernel (which I -- admittedly -- haven't cleanly dissected, yet) and then consequently adds feature by feature with using Factor to extend Factor. A neat concatenative kernel turns itself into a powerful piece of software using a language-driven approach right from the start. Slava Pestov proves that this approach does result in a fast, interactive and highly reflective language. For me, Factor is a masterpiece of software engineering! It's definitely worth studying it!

Conclusion

What I experienced over the last two semesters is that some students become deeply attracted by Factor. Even if not, almost all students sense that there is a new world worth entering that takes them to a new level of understanding. It broadens their scope and skill set. Eventually, they'll leave the concatenative path for doing their Java/C# assignments in other courses or when they do some programming for a living. Still, I'm convinced that concatenative programming has an impact that lasts.

Do I sound too enthusiastic? Possibly, but I prefer to teach things I'm enthusiastic about! I'm still a student regarding the concatenative paradigm myself, I'm learning a lot each and every day about this paradigm. And one is for sure: I will continue to use Factor in the next semesters.

---
Update (2010-01-29) I received quite some requests to publish the Factor material we produced for the project week. The material is in German. Comments, ideas, corrections and improvements are welcome!

Day 1 - Intro: Getting started (Factor docu), Q&As

Day 2 - Object-Orientation: Intro, Tutorial, Q&As

Day 3 - Parsing and Macros: Intro, Tutorial, Q&As

Day 4 - Unrelated Topic:

Day 5 - Real-World Application in Factor: Presentation, Report, Sources (thanks to Andreas Maier and Marcel Steinle)

By the way, Daniel Ehrenberg indicated that Heilbronn University is not the first using Factor in a course. That's great to hear. Factor starts spreading!

In case you are interested in our research on concatenative languages, there is a paper available: "Concatenative Programming: An Overlooked Paradigm in Functional Programming".

Donnerstag, Januar 21, 2010

Scripting Languages

Recently, I had an interesting discussion about "What's the distinguishing feature of so-called scripting languages?" We easily agreed on calling Python, Ruby, Groovy, Tcl, Perl etc. as scripting languages. But then the trouble started: What distinguishes Python, Ruby etc. from Java, C#, C++ and similar languages? Is it dynamic typing? Are they more introspective? Isn't it so that meta-programming is no difficulty at all with scripting languages?

Some whisper that a Python or Ruby programmer is as much as 2-5 times more productive than a Java/C# programmer. As a matter of fact, programs written in so-called scripting languages tend to be significantly shorter than their "unscripted" counterparts. Such a discussion typically moves over into an almost religious debate about static and dynamic typing. Programs in Python and Ruby might be shorter but they are unsafe because of dynamic typing. Static typing is the way to go for large programs being developed with many developers -- say the Java and C# advocates. And they have a point. Write unit tests, say the Pythoneers and Rubyists, which you are supposed to write anyhow. As a side-effect, your unit tests easily uncover all typing related bugs. You're not better off with a statically typed language, they say.

While such discussions are interesting our main question remains unanswered. What's the distinguishing feature of scripting languages? Most scripting languages are dynamically typed. But C# for example is catching up here. Are scripting languages interpreted languages? Python compiles to byte code internally, so does Java. Do they have unique reflective and introspective capabilities? To some extend, yes, but Java and C# are also quite powerful in this respect. Is programm size the only criteria? Regarding size, Haskell is a serious competitor. Haskell is statically typed (it requires a minimum of explicit type declarations) and quite dense in expressivity.

I think that the name "scripting language" is not very helpful these days anymore. It's historically motivated. In the early days of computing, users had to interact with their machines by typing in commands in a command line. Soon, the command line was embedded in a so-called shell. Famous shells under Unix are the bourne shell (bsh), csh, tcsh; another specialized automation tool for software developers is "make". The shell provided means to automate -- script -- repetitive tasks. This kind of "programming" inspired languages like Perl. These languages weren't regarded as "serious" languages like e.g. C/C++. They were typically interpreted and relatively slow in execution. However, these languages matured over time and inspired other designers to create "glue" languages like Python or Tcl/Tk. Because of their interpretative nature it was easy to add introspective features and meta-programming facilities. The idea of being a scripting or "glue" language vanished over time. They became full-fledged implementation languages on their own right and kept the philosophy of being flexible and easy to use to solve problems. I think it is not appropriate anymore to call them "scripting languages".

However, some of these "scripting languages" introduced a feature none of the compile-execute languages offered: The "programmable command line" languages introduced interactivity!

And that is the key point, it's the distinguishing feature: Interactivity requires to design a language in a certain way. To be interactive, relatively small chunks of text must represent syntactically valid program fragments in order to query or incrementally modify the run-time environment.

The way to interact with the run-tim environment in non-interactive compile-execute languages is via the debugger. A tool that is rarely taught in combination with a non-interactive programming language. It's quite much a different experience to work with a debugger or interactively via an interactive command-line. A debugger is built around a representation of the run-time model and usually establishes a bridge towards the language the original program is written in. Interactive languages connect your programming experience with the run-time model in a consistent language-related way but still might shield some implementation details from the programmer a debugger shamelessly unveils.

So the point is that interactive languages have a severe impact (a) on the syntactic language design and (b) they establish a certain way of how you perceive and experience the run-time environment of your language. This explains the shortness of interactive programs, it also explains the agility in the development process and its perceived prouctivity: quickly testing a program at the console results in immediate feedback to the programmer. This helps learning a language a lot and helps get a better run-time understanding. It's just fun and feels cool. There are only two languages I know of which have taken the implications of interactivity (small chunks of text represent valid syntactic programs and an incremental run-time experience) to an extreme: Forth and Factor!

This does not mean, that non-interactive languages are not useful and important! They just feel a bit different. Due to their lack of interactivity they feel less "handy", so to speak.