Tuesday, April 15, 2008

Towards Moore's Law Software: Part 2 of 3

(Note: This is part two in a three part series. Part one appeared yesterday.)

The end result of Charles Simonyi's abrupt departure from being with Microsoft for over 20 years is that he started his own company, Intentional Software. As a parting gift for uhm, you know, leading the development of Word and Excel, Microsoft let him use the patent for Intentional Programming that he created at Microsoft Research. The only minor stipulation was that he'd have to write his own code for it from scratch.

That really hasn't stopped him and his company. Recently, he discussed his company's "Intentional Domain Workbench" product that is built around these concepts. One of his clients is even using it to dramatically decrease the time it takes to develop software for pension plans and all its given rules. The working concept is that domain experts work in an environment and language that is highly tailored for them and the traditional programmers work on developing a generator that takes that language domain tailored code and makes it real executable code.

For example, an expert in banking will naturally think in terms of customers, accounts, credits, debits, loans, debt, and interest (among other ideas) and will not want to be bothered with anything resembling Java or XML. Having a highly customized tool allows the program to keep the domain model and programming implementation separated rather than having to "mentally unweave" the code to its parts, reason about it and then "reweave" the solution. It's the "repeated unweaving and reweaving... burden on the programmers that introduces many programming errors into the software and can increase costs disproportionate to the size of the problem."

So Simonyi's solution is that business folks are on one side of the fence and programmers on the other, but they happily work together by focusing on the parts they care about.

Martin Fowler is another guy that I respect that has been advocating a similar idea of a "Language Workbench" where you can use tools to easily create new languages. The philosophy is that general languages don't let you think close enough to your problem to do things efficiently. Having such a workbench helps with a similar idea that is picking up momentum now called Domain Specific Modeling (DSM) that goes back at least to the 1960's. In DSM, you typically use a Domain Specific Language (DSL) to model your exact organization rather than use a general programming language like C#.

Some products are getting away with having their own language. In college, I frequently used Mathematica. It has a custom language as the flagship core of the product. Mathematica's vendor even makes it a selling point that Mathematica has an extensive amount of its code written in this special language because they think it's so good.

Frequently the programmers of complex programs have to do as Charles likes to do and "go meta." This allows you to create and add elements to a highly customized language. If you become skilled at this, you can continually create more powerful constructs in the language to get things done faster. This is essentially the concept of "metaprogramming." Giles Bowkett gave my favorite definition of it:

"Skilled programmers can write better programmers than they can hire."

That's meta-programming.

Building from the Ground Up

So what might software look like if you built it from the ground up using powerful meta-programming ideas? What if 1970's Xerox PARC crew, which included Simonyi, was all together again working on new ideas? One of the closest to this dream is the Viewpoints Research Institute founded by Alan Kay. Its research group has many former Xerox PARC researchers. I haven't been able to stop thinking about one of their projects that started out as an NSF proposal with a very interesting title "Steps Toward The Reinvention of Programming: A Compact And Practical Model of Personal Computing As A Self-Exploratorium."

In a video describing the project, Alan makes some very intriguing observations. The first is that if you can get the system down to 20,000 lines of code for everything; that is, the OS and applications - a “from the metal to the end-user” design, you're talking about at the size of a 400 page book which is well within grasp of a student. The system could be introspective and essentially a "museum on itself."

Many other popular "systems" start with a small "kernel" that is very adaptive; a good example is the U.S. Constitution that fits comfortably in a shirt pocket and has served as an engine to keep our country thriving for 200+ years. One interesting comment he made that really stuck out was that modern software is typically designed by accretion, much like a pyramid. That is, we design by adding around the edges rather than being willing to do a radical top to bottom design like the Empire State Building which was built in about a year with great engineering precision. We keep adding to systems rather than look back at their core assumptions to see if they're still meeting our needs.

A key tenet of his discussion is that we've lost 15 years of programming progress because of bad architecture, which led to a lost factor of 1000 in efficiency. We're essentially drowning in all the code. A hundred million lines of code can't be studied or fundamentally improved.

To "reduce the amount of code needed to make systems by a factor of 100, 1000, 10,000, or more," you have to do what they call "big things in very simple ways." This leads to where I got the title of this series:

"This enormous reduction of scale and effort would constitute a 'Moore’s Law' leap in software expressiveness of at least 3 and more like 4 orders of magnitude. It would also illuminate programming and possible futures for programming. It might not be enough to reach all the way to a 'reinvention of programming', but it might take us far enough up the mountain to a new plateau that would allow the routes to the next qualitative change to be seen more clearly. This is the goal and mission of our project."

Wow! I couldn't help but be impressed by their ambition. After reading that statement, my mind began to dream for a few minutes what might possible if the expressiveness of programming languages and systems might double in capacity every two years just like the transistor count on microprocessors has almost doubled every two years leading to much more powerful chips.

That was my optimistic side, but then my mind was flooded with some objections questions...

Intermission: Responding to Critics

One thing that Alan mentioned is that the problems that we think are quite big are really much smaller if we could just think of them mathematically. Being a person that enjoys mathematics, I like the thought of making programming a bit more mathematical. However, sometimes people interpret thinking mathematically as having overly terse syntax. Some languages like APL are notorious for being terse and this has led quite a few people to hate these languages. Here's an example of a line from J, an APL derivative:

mysteryFunction=: +/ % #

I obscured the name to make it less obvious. Can you guess what it does?

It computes the average of a vector. To be fair, it probably seems obvious to those that use the language every day.

A less terse, but equally strange to outsiders, class of languages are ones like Lisp (and Scheme). Here's an example of finding the length of a list:

(define (len xs)
(if (null? xs)
(+ 1 (len (cdr xs)))))

If you know the language you can easily see that this is define-ing a function called len that works on a list named xs. If xs is null, then return 0, otherwise, remove one element from the list and then find the length of that smaller list (cdr xs) and add one (+ 1) to it.

See? It's not too bad.

I say this sort of toungue-in-cheek since it took me several weeks in my programming languages class to feel comfortable in the language. As an outsider coming from a BASIC or C++ derived language, it sort of looks like magic.

Speaking on how Ruby's more readable syntax might be used to express Lisp's power in a clearer way, Giles Bowkett said:

You've got on the one hand incredible power wrapped up in parenthesis that no one else can read -- that makes you a wizard; but if you take that same incredible power and you put it in terms that anybody can use it, [then] you're not making yourself a wizard, you're making everybody who looks at your code, reads your code, uses your code, everybody becomes more effective. Instead of hoarding this magical treasure and wrapping it in layers of mystery so that only you can use it, you're making it a public resource.

That sounds interesting, but I don't think that Lisp's syntax is really that bad or inaccessible. It seems a bit different than what I grew up with, but I'm pretty sure that if I was taught it as my first language, it'd probably feel even more natural. Surely if I can be taught that "x = x + 1" is not an insane oddity, then I can probably be taught anything.

But Lisp is more than syntax; it's arguably its own culture. One of its biggest supporters is Paul Graham, who implies that you can get away with not using Lisp for "glue" programs, like typical business applications, but if you want to write "sophisticated programs to solve hard problems," like an airline reservation system, then you'll end up spending more time and will write far more code using anything else but Lisp (or something very close to it).

I have a lot of respect for Graham and tend to agree with the core idea he has behind it. Tools like Java and C# are good, but we could do much better than how they're typically used. Furthermore, there tends to be resistance for even thinking about using any other language these days besides something like C#, Java, PHP, or Python.

Brief Aside: What about the "Wasabi" Incident?

As we saw in Part 1, Charles' Intentional Software was founded with the idea that companies should develop their own language to solve their particular business problem. This statement is given in the context that you will use Simonyi's company's tools to do that. But what if you took this language oriented programming idea and went about it in a seemingly less radical way. What if you just extended an existing language to meet your company's specific needs?

That's exactly what Joel Spolsky's Fog Creek company did for their FogBugz product. They had written the application using ASP and VBScript targeting Windows based servers and then were faced with the strategic opportunity to also make it work on Linux using PHP. Instead of rewriting their code, Joel made an interesting decision to commission the development of an in-house compiler that would emit VBScript and PHP from the original source code. Once the compiler was working, they even made enhancements to the VBScript source language. They called this superset of VBScript 'Wasabi.' Joel describes it as:

".. a very advanced, functional-programming dialect of Basic with closures and lambdas and Rails-like active records that can be compiled down to VBScript, JavaScript, PHP4 or PHP5. Wasabi is a private, in-house language written by one of our best developers that is optimized specifically for developing FogBugz; the Wasabi compiler itself is written in C#."

The mere thought of writing your own language, even if it was a superset of a standard language like VBScript shocked quite a few people. Many thought (and still think) that Joel had done something really dumb. It was an amusing time to read all the responses to it. One of the most vocal was Jeff Atwood:

"However, *instead* of upgrading to ASP.NET, they decided to.. write their own proprietary language that compiles down to VBScript/PHP/JavaScript.

I just can't get behind that. It's a one-way ticket to crazytown. When was the last time you encountered a problem and thought to yourself, 'Mmm. What I really need here is MY VERY OWN PROGRAMMING LANGUAGE.'"


"My argument against Wasabi is this: when a developer decides the right solution to their problem is a new language, you don't have a solution. You have an even bigger problem."

So even extending a language sure does rock the boat. But I think that it's not as big of a deal as we think. Atwood later toned down a little bit saying that "nobody cares what your code looks like." But regardless, it's a sentiment felt by a lot of people.

Why do they feel this way?

I don't fault Atwood, he's certainly entitled to his own views. Given the modern state of tools to support custom languages, it does seem a bit "crazy" to go down that path. You certainly have to count the cost involved in creating a language or tool that's too "powerful."0 Reg Braithwaite warns us about this:

"What is the Turing tar-pit? It’s the place where a program has become so powerful, so general, that the effort to configure it to solve a specific problem matches or exceeds the effort to start over and write a program that solves the specific problem.

I think one of the fundamental barriers is that our industry doesn't have more widespread knowledge of Martin Fowler's (amongst many others) ideas of a "Language Workbench" that can significantly ease the process.

Some companies are getting away with custom compilers that are similar to Wasabi. The Google Web Toolkit has a compiler that converts code from Java to JavaScript. Maybe it didn't get the shocked response that Wasabi did because it didn't extend its source language, Java. I think that since many people mistakenly believe that Java and JavaScript are similar, it didn't seem that radical.

The Real Reason 'Very High Level Languages' Haven't Taken Off?

I hate to admit it, but one of the lures to writing in traditional languages is that the brain power per line of code is relatively low enough that it doesn't hurt too much to think in. When I was writing an ML compiler in ML, I had to think much harder for every line of code. It wouldn't be uncommon for me to spend 30 minutes on a single line. However, one of the benefits was that once I had written that line, it usually made sense. An advantage to this approach is that I wasn't too disturbed that Standard ML didn't have a debugger at the time. It didn't really need one.

I think a part of the problem is that traditional languages make us feel that we're productive because we can generate lots of code quickly in them. But is this a good thing? Erik Meijer said it well:

"Do you want to get [your program] done, or do you want to get it right? ... We confuse productivity with getting it done quickly. We don't [account for] all the hours of debugging. We're doing the wrong accounting."

A truly productive language will help us write code that is right the first time without the need for a lot of debugging. When you do honest accounting of the whole development cycle, I can then agree with Fred Brooks and others that the number of lines of code written per day is approximately equivalent for a good programmer regardless of the language used. The real trick is how much you do in those lines.

Enough criticism. If the "Moore's Law" language of the future isn't necessarily APL, J, or excessive use of parenthesis, then what on Earth might it look like?

Let's take a peek.

UPDATE: Part three is now available.


Peter Christensen said...

Jeff, you've saved me the trouble of writing something that has been on my mind for a while. I can't wait for Part 3!

Juha-Pekka Tolvanen said...

Nice post. You might be interested to see industry cases on creating and using domain-specific (modeling) languages at http:://www.dsmforum.org/cases.html

Jeff Moser said...

Peter: Thanks for the encouragement. Hopefully part 3 lives up to your expectations :)

I checked out your blog and saw you mention the Chicago Lisp meeting. That definitely sounds interesting. It'd be neat to have something similar in Indianapolis.

Juha-Pekka: Thanks for the link! I checked out the cases. Do you see DSM mostly being used for workflow type scenarios? It seemed like MetaEdit+ is the popular tool to use. Have you seen ones built with Simonyi's tools?

Peter Christensen said...

Jeff, you should get in touch with Paul Beel (paul dot beel at gmail dot com) and check out his blog at http://novacode.wordpress.com/ . He has started a group in Terra Haute called the Indiana Programming Meetup. Their first meeting was about Common Lisp (http://novacode.wordpress.com/2008/03/26/meeting-main-topic-common-lisp/ ) and they had the guys from Paragent come and show off their system and CUSP, the CL plugin for Eclipse. It's not exclusively Lisp (next meeting is by a PHP guy) but might be up your alley.


Juha-Pekka Tolvanen said...

I see that one sort of domain-specific models is those based on flow type, like user navigation and interaction, task workflow or various services, like call processing services described in detail in: http://www.metacase.com/support/45/manuals/Call%20Processing%20Language%20Example.pdf

Another soft of models is those describing static structures, like automotive architectures in AUTOSAR (http://www.metacase.com/images/SoftwareComponentDiagram.png) or insurance products (http://www.metacase.com/images/insurance.gif).

If behavior is a relevant part of the domain and there is need for a more comprehensive code generation, then various state machines are preferable models of computation.

Often in practice there is a need to combine multiple of languages. Their concrete syntax does not need to be graphical but can also be based on matrix and table presentations along with some text. These possibilities are naturally dependent on the tool. We wrote few years ago a paper on various language structures and how companies tend to build domain-specific languages. In this paper we analyzed 20+ industry cases. If interested, you may find the paper here: http://users.jyu.fi/~jpt/TolKelSPLC2005.pdf

When it comes to Intentional Software I see several similarities, but I have only seen one of their case presented at OOP last January (http://www.metacase.com/blogs/jpt/blogView?showComments=true&entry=3378978081). That talk mostly gave the rationale for raising the level of abstraction and creating domain-specific languages than discussed how the tool works in practice. Issues like how the languages are defined and maintained or how they evolve along with models created already created were not unfortunately addressed.

Jeff Moser said...

Peter Christensen: Paul Beel's group sounds interesting and I checked out his blog. Unfortuntaely, the 100 minute drive would make it a bit too far to attend regularly.

Thanks for the contact info though, never know when our paths might meet.

Juha-Pekka Tolvanen: Wow! Impressive experience, especially with the paper. I agree with the points you made.

I have a follow-up question though: in your opinion, based off all the work you've done, what do you think is holding these tools and ideas from becoming more mainstream? Are they too niche and require too much training or customization?

Juha-Pekka Tolvanen said...

When it comes to adoption, I see that there is no single reason holding the domain-specific (modeling) languages from becoming mainstream. One obvious reason why we don’t hear much about success stories on using domain-specific languages and models is that they are often propriety and contained within one company only. For instance, over 90% of our customers don’t allow us to mention details regarding their DSM activities. And I understand this very well -- if you are 10x faster than your competitors why you would like to reveal your advantage?

Jeff Moser said...

Juha-Pekka Tolvanen: Interesting thought -- I hadn't considered that companies might be actively trying not to let the fact that their using these concepts for fear they'd lose an advantage.