Saturday, April 26, 2008

My "Better Know a Framework" Talk at IndyCodeCamp

The podcast ".NET Rocks!" has a segment called "Better Know A Framework" at the start of every show. The point of the short segment is to simply raise awareness of perhaps lesser known areas of the .net framework. I've found that it's a great way to learn about some things that I simply haven't looked at before.

When Aaron Lerch asked for session proposals for the IndyCodeCamp that he and some others were organizing, I thought it'd be fun to have a "Better Know a Framework" session that had much of the same feel as the segment on the show.

In order to prepare for the talk, I went back through every segment and made a few notes on what parts of the framework was covered as well as documentation links for more information. In case that work might helpful to anyone else, I'm including it as an "appendix" to this post.


Thanks to Mike Hall for the photo

The actual talk went pretty well. There were around 30 people at the session (my session was concurrent with 3 other sessions). One of the highlights was the introduction where the hosts of .NET Rocks introduced me (via a recording). Another highlight was being able to throw out flip-flops with my company's logo on them to people in the audience that were first to answer some of the questions I asked.

In trying keep with the code camp manifesto, I intentionally had no slides. The presentation was basically me stepping through this C# solution.

I created it to show how to use some lesser known classes and namespaces. I also briefly mentioned the "Turkey Test" and its implications on code.

Feel free to download the solution and run through it. If you couldn't make it to the talk, feel free to ask questions via the comments. If you were able to attend the event in person, please fill out the evaluation form online or leave comments here as well.

Thanks to Aaron Lerch and many others who helped organize Indianapolis's first code camp! It was a privilege to get the chance to speak there.

Appendix

Here's a very brief recap of every "Better Know a Framework" segment since it started. If you'd like more information about it, I included a link to the show where you can listen to it. I also linked to the MSDN documentation for what was covered. I put an asterisk (*) by the show number if I covered that topic in my demo source file.

kick it on DotNetKicks.com

Wednesday, April 16, 2008

Towards Moore's Law Software: Part 3 of 3

(Note: This is the third and final part of a series. Part one appeared Monday and part two appeared yesterday.)

First "STEPS"

Let's say we want to build the TCP/IP stack of an operating system. A traditional implementation might take 10,000 lines of code. What if you rethought the design from the ground up? What if you could make the IP packet handling code look almost identical to the RFC 791 diagram which defines IP? That's exactly what the Viewpoints team did. This is real code in their system:

and the TCP stack is similar to the RFC 793 diagram:

That's right; the ASCII art diagram is the code. How did they do it? Well, when you have a powerful "meta-meta language-language" like their IS, you can define languages in a form that reads like a traditional Backus-Naur Form (BNF) grammar representation:




With packet parsing out of the way, the implementation of their TCP algorithm is a little dense, but readable piece of code that handles the SYN, ACK, and other responses:

With this as our basis, writing a service like the Daytime Service is quite simple:

All told, the entire stack is comfortably under 200 lines of code without using any tricks like code generation wizards. The graphics subsystem is similarly clever. It basically defines everything as a polygon (including fonts, windows, etc). It's well under 500 lines of code. They have to be this compact in order to meet their goal of an entire system in under 20,000 lines.

Another interesting metaphor in the project is the use of "massively parallel objects" or what they refer to as "particle fields," as a fundamental part of the system:

"Even less in the mainstream, the “particle/field” idea [PF] has been found more in specialty areas (such as finite-automata and FEM, swarm programming in biology, etc.), than as a general system building tool (although it is the center of systems such as Sketchpad and TEX, and is even found in an operating system such as MUSE). Traditional object-oriented design has tended to overbalance its attention on the objects and to give too rudimentary attention to message-passing. If the center of attention were to be shifted to messaging of all kinds, then the notion of “fields” immediately suggests itself as a more general way to think about inter-object relationships (the previous example of “ants distributing and sensing pheromones” is a good metaphor of this style)."

For example, this idea can be applied to text formatting where you essentially treat every letter as its own object. Programming the solution then becomes much easier. You can have simple rules where you just look out for your immediate neighboring letters and then use an extremely efficient message passing system to make your simple code work in a very efficient manner. Here's a sample demonstration from their NSF proposal:

This reminds me of ideas from "emergence" which is a theory that explains how a flock of birds or an ant colony can do complex things even though each individual in the system thinks simple thoughts. An individual bird is only thinking in terms of simple rules like "try to follow the guy in front of you" and "get out of the way if something dangerous is nearby." Just these two rules alone can lead to the fantastically complicated formations that we see in the sky.

The massively parallel objects with efficient messaging metaphor leads to algorithms that are simpler in concept because you can focus on the behavior of one little object rather than have to worry about how the whole system works.

"I want it now!"

With the Viewpoints team only in their second year of a five year project, we can get a feel for where the future of software development is going, but we can't realistically put it into production quite yet. What are our options then?

Ted Neward likes to talk about how the next five years will be about programming languages attempting to bridge the "huge disconnect between the practitioners, the guys who get stuff done, and the academics, who think about how we get things done." He says this disconnect exists because "the academics and the practitioners don't talk to each other." I really think that the next five years we'll see significant strides towards improving the situation. If you want to jump ahead of the curve, I think it's worthwhile start imagining a dream language that would help you cut along the "natural joints" of a problem you work on. Can you think of an expressive language like the ASCII art for parsing TCP/IP headers that is targeted for your specific problem?

Another interesting observation Ted made was that:

"... programming languages, much like human languages, are expressions not just of concepts within them, but also the environment in which they were born."

That is, no language is perfect. Each language has a culture that gave birth to it which was usually focused on a particular type of problem. I often find that I put myself in too much of a language box and unfortunately am happy in that box. This led to me asking Alan Kay about the best way to cope with a design problem I had. The power to think in a highly tailored, but not necessarily general, programming language might end up being the best solution to problems we face.

If you go down this path, there are many tools and some good articles available now to help you get started writing your own custom language. If you want to be a bit more conservative, you can write an internal domain specific language inside of host languages like Ruby or C#. However, if you go down the custom route, you'll benefit of doing it now rather than if you had done it 20 years ago because there are huge frameworks at your disposal like Microsoft's Dynamic Language Runtime (DLR) that handle most of the "goo" involved in getting your own language up and running. You can leverage a lot of the libraries built on .net or the JVM so that you don't have to worry so much about supporting concerns like database access and handling XML if that's not you're primary concern. This is in contrast to a decade ago when you would have had to build all these supporting libraries just to get people to even think about your language seriously. Even a popular "new" language like Ruby has taken about 10-12 years to get enough libraries to make it feasible for development. By standing upon the shoulders of frameworks, you can quickly build something that is production worthy.

You can even use Phoenix, Microsoft's new backend optimizing compiler for your language. That way, you don't have to worry about emitting machine code or IL. You just parse your language into a simple intermediate representation and your generated code will be produced with the same efficiency as the compiler being used for Windows 7. To get you started, the Phoenix SDK will include a LISP compiler as a sample app.

Final Thoughts

"The future is already here -- it's not just not evenly distributed" -- William Gibson

The widespread adoption of virtual machines like .net's Common Language Runtime (CLR) are increasingly making it an option to use custom languages to solve particular problems. This is because your code can be easily used by more mainstream languages like C#. I think this is one of the critical reasons why Microsoft decided to include a functional language, F#, into an upcoming version of Visual Studio. Similarly, I enjoyed watching all the different ideas at the lang.net symposium because they showed the viability of the CLR for many different languages that would also integrate well with mainstream code.

I have to be honest and admit that that I don't envision myself going off and writing my own language right now, but at least I'll be looking for ways where a custom language might be a really good fit. I think I need to apply some meta level thinking by perhaps writing some of my Project Euler solutions in languages that make this easier like Scala or F# even though C# is easy for me to use.

This journey towards a "Moore's Law" increase in expressiveness has been a bit long, but I'm starting to see some practical benefits:

  1. I'm already feeling expressive differences between C# 2 and C# 3. I'm starting to build on LINQ's good ideas in my production code by using ideas from C++'s Standard Template Library (STL) to make the simple ideas in the code clear and get rid of some of the scaffolding "goo" that exists. Notable areas of influence on my thinking in the .net arena have been Wintellect's Power Collections and the NSTL project. Another variant on this idea is to see more of day to day challenges as graph problems and use a library like Peli's QuickGraph to solve them.
  2. The "massively parallel objects" metaphor made possible by highly efficient message passing has changed the way that I have thought about some of my hobby programs. Instead of focusing on rules for how a single class can orchestrate hundreds or thousands of objects, I am happily thinking of the object as a bird in a flock. The good news is that this metaphor works well and the same overall macro goal is obtained. I think this is just the start to curbing the problem Alan mentioned of "[there] has been an enormous over-focus on objects and an under-focus on messaging (most so-called object oriented languages don’t really use the looser coupling of messaging, but instead use the much tighter “gear meshing” of procedure calls – this hurts scalability and interoperability)."
  3. Reflecting on the Viewpoints' IS meta-meta language-language and what can be built with it has really allowed me to see more of the beauty that real computer science has to offer. Even if I can't put those ideas immediately into production code, I could start down that path with something smaller like applying the ideas of code-generation. The latter is one of the ideas that essentially made Rails so popular: it's the code that you don't have to write yourself that makes it interesting.

The bad news about going down the path of meta level thinking and writing your own custom languages is that the tools are still relatively young and it's a little more work than it will be ten years from now. The good news is that if you spend the time at least exploring what smart guys like Alan and Charles are doing now, you'll probably have a comfortable advantage on your competitors when these ideas become mainstream. If it took object oriented programming ideas a good 25+ years to become popular, the ideas behind language oriented programming and things like massively parallel objects will probably take at least 10 years to hit wide adoption.

Real "Moore's Law" software will have what Gordon Moore noticed about the effects of his own "law" on hardware: it told the industry that they "[have] to move that fast or they fall behind." I think some of the ideas mentioned here might help get towards that philosophy. However, I'm just a secondhand observer. The best place to get started is to look at the original sources, or at the very least, the Viewpoints project overview video.

I'm interested in your thoughts. What do you think might lead to a "Moore's Law" increase in software expressiveness? Do you think it's even possible?

Tuesday, April 15, 2008

Towards Moore's Law Software: Part 2 of 3

(Note: This is part two in a three part series. Part one appeared yesterday.)

The end result of Charles Simonyi's abrupt departure from being with Microsoft for over 20 years is that he started his own company, Intentional Software. As a parting gift for uhm, you know, leading the development of Word and Excel, Microsoft let him use the patent for Intentional Programming that he created at Microsoft Research. The only minor stipulation was that he'd have to write his own code for it from scratch.

That really hasn't stopped him and his company. Recently, he discussed his company's "Intentional Domain Workbench" product that is built around these concepts. One of his clients is even using it to dramatically decrease the time it takes to develop software for pension plans and all its given rules. The working concept is that domain experts work in an environment and language that is highly tailored for them and the traditional programmers work on developing a generator that takes that language domain tailored code and makes it real executable code.

For example, an expert in banking will naturally think in terms of customers, accounts, credits, debits, loans, debt, and interest (among other ideas) and will not want to be bothered with anything resembling Java or XML. Having a highly customized tool allows the program to keep the domain model and programming implementation separated rather than having to "mentally unweave" the code to its parts, reason about it and then "reweave" the solution. It's the "repeated unweaving and reweaving... burden on the programmers that introduces many programming errors into the software and can increase costs disproportionate to the size of the problem."

So Simonyi's solution is that business folks are on one side of the fence and programmers on the other, but they happily work together by focusing on the parts they care about.

Martin Fowler is another guy that I respect that has been advocating a similar idea of a "Language Workbench" where you can use tools to easily create new languages. The philosophy is that general languages don't let you think close enough to your problem to do things efficiently. Having such a workbench helps with a similar idea that is picking up momentum now called Domain Specific Modeling (DSM) that goes back at least to the 1960's. In DSM, you typically use a Domain Specific Language (DSL) to model your exact organization rather than use a general programming language like C#.

Some products are getting away with having their own language. In college, I frequently used Mathematica. It has a custom language as the flagship core of the product. Mathematica's vendor even makes it a selling point that Mathematica has an extensive amount of its code written in this special language because they think it's so good.

Frequently the programmers of complex programs have to do as Charles likes to do and "go meta." This allows you to create and add elements to a highly customized language. If you become skilled at this, you can continually create more powerful constructs in the language to get things done faster. This is essentially the concept of "metaprogramming." Giles Bowkett gave my favorite definition of it:

"Skilled programmers can write better programmers than they can hire."

That's meta-programming.

Building from the Ground Up

So what might software look like if you built it from the ground up using powerful meta-programming ideas? What if 1970's Xerox PARC crew, which included Simonyi, was all together again working on new ideas? One of the closest to this dream is the Viewpoints Research Institute founded by Alan Kay. Its research group has many former Xerox PARC researchers. I haven't been able to stop thinking about one of their projects that started out as an NSF proposal with a very interesting title "Steps Toward The Reinvention of Programming: A Compact And Practical Model of Personal Computing As A Self-Exploratorium."

In a video describing the project, Alan makes some very intriguing observations. The first is that if you can get the system down to 20,000 lines of code for everything; that is, the OS and applications - a “from the metal to the end-user” design, you're talking about at the size of a 400 page book which is well within grasp of a student. The system could be introspective and essentially a "museum on itself."

Many other popular "systems" start with a small "kernel" that is very adaptive; a good example is the U.S. Constitution that fits comfortably in a shirt pocket and has served as an engine to keep our country thriving for 200+ years. One interesting comment he made that really stuck out was that modern software is typically designed by accretion, much like a pyramid. That is, we design by adding around the edges rather than being willing to do a radical top to bottom design like the Empire State Building which was built in about a year with great engineering precision. We keep adding to systems rather than look back at their core assumptions to see if they're still meeting our needs.

A key tenet of his discussion is that we've lost 15 years of programming progress because of bad architecture, which led to a lost factor of 1000 in efficiency. We're essentially drowning in all the code. A hundred million lines of code can't be studied or fundamentally improved.

To "reduce the amount of code needed to make systems by a factor of 100, 1000, 10,000, or more," you have to do what they call "big things in very simple ways." This leads to where I got the title of this series:

"This enormous reduction of scale and effort would constitute a 'Moore’s Law' leap in software expressiveness of at least 3 and more like 4 orders of magnitude. It would also illuminate programming and possible futures for programming. It might not be enough to reach all the way to a 'reinvention of programming', but it might take us far enough up the mountain to a new plateau that would allow the routes to the next qualitative change to be seen more clearly. This is the goal and mission of our project."

Wow! I couldn't help but be impressed by their ambition. After reading that statement, my mind began to dream for a few minutes what might possible if the expressiveness of programming languages and systems might double in capacity every two years just like the transistor count on microprocessors has almost doubled every two years leading to much more powerful chips.

That was my optimistic side, but then my mind was flooded with some objections questions...

Intermission: Responding to Critics

One thing that Alan mentioned is that the problems that we think are quite big are really much smaller if we could just think of them mathematically. Being a person that enjoys mathematics, I like the thought of making programming a bit more mathematical. However, sometimes people interpret thinking mathematically as having overly terse syntax. Some languages like APL are notorious for being terse and this has led quite a few people to hate these languages. Here's an example of a line from J, an APL derivative:

mysteryFunction=: +/ % #

I obscured the name to make it less obvious. Can you guess what it does?

It computes the average of a vector. To be fair, it probably seems obvious to those that use the language every day.

A less terse, but equally strange to outsiders, class of languages are ones like Lisp (and Scheme). Here's an example of finding the length of a list:

(define (len xs)
(if (null? xs)
0
(+ 1 (len (cdr xs)))))

If you know the language you can easily see that this is define-ing a function called len that works on a list named xs. If xs is null, then return 0, otherwise, remove one element from the list and then find the length of that smaller list (cdr xs) and add one (+ 1) to it.

See? It's not too bad.

I say this sort of toungue-in-cheek since it took me several weeks in my programming languages class to feel comfortable in the language. As an outsider coming from a BASIC or C++ derived language, it sort of looks like magic.

Speaking on how Ruby's more readable syntax might be used to express Lisp's power in a clearer way, Giles Bowkett said:

You've got on the one hand incredible power wrapped up in parenthesis that no one else can read -- that makes you a wizard; but if you take that same incredible power and you put it in terms that anybody can use it, [then] you're not making yourself a wizard, you're making everybody who looks at your code, reads your code, uses your code, everybody becomes more effective. Instead of hoarding this magical treasure and wrapping it in layers of mystery so that only you can use it, you're making it a public resource.

That sounds interesting, but I don't think that Lisp's syntax is really that bad or inaccessible. It seems a bit different than what I grew up with, but I'm pretty sure that if I was taught it as my first language, it'd probably feel even more natural. Surely if I can be taught that "x = x + 1" is not an insane oddity, then I can probably be taught anything.

But Lisp is more than syntax; it's arguably its own culture. One of its biggest supporters is Paul Graham, who implies that you can get away with not using Lisp for "glue" programs, like typical business applications, but if you want to write "sophisticated programs to solve hard problems," like an airline reservation system, then you'll end up spending more time and will write far more code using anything else but Lisp (or something very close to it).

I have a lot of respect for Graham and tend to agree with the core idea he has behind it. Tools like Java and C# are good, but we could do much better than how they're typically used. Furthermore, there tends to be resistance for even thinking about using any other language these days besides something like C#, Java, PHP, or Python.

Brief Aside: What about the "Wasabi" Incident?

As we saw in Part 1, Charles' Intentional Software was founded with the idea that companies should develop their own language to solve their particular business problem. This statement is given in the context that you will use Simonyi's company's tools to do that. But what if you took this language oriented programming idea and went about it in a seemingly less radical way. What if you just extended an existing language to meet your company's specific needs?

That's exactly what Joel Spolsky's Fog Creek company did for their FogBugz product. They had written the application using ASP and VBScript targeting Windows based servers and then were faced with the strategic opportunity to also make it work on Linux using PHP. Instead of rewriting their code, Joel made an interesting decision to commission the development of an in-house compiler that would emit VBScript and PHP from the original source code. Once the compiler was working, they even made enhancements to the VBScript source language. They called this superset of VBScript 'Wasabi.' Joel describes it as:

".. a very advanced, functional-programming dialect of Basic with closures and lambdas and Rails-like active records that can be compiled down to VBScript, JavaScript, PHP4 or PHP5. Wasabi is a private, in-house language written by one of our best developers that is optimized specifically for developing FogBugz; the Wasabi compiler itself is written in C#."

The mere thought of writing your own language, even if it was a superset of a standard language like VBScript shocked quite a few people. Many thought (and still think) that Joel had done something really dumb. It was an amusing time to read all the responses to it. One of the most vocal was Jeff Atwood:

"However, *instead* of upgrading to ASP.NET, they decided to.. write their own proprietary language that compiles down to VBScript/PHP/JavaScript.

I just can't get behind that. It's a one-way ticket to crazytown. When was the last time you encountered a problem and thought to yourself, 'Mmm. What I really need here is MY VERY OWN PROGRAMMING LANGUAGE.'"

and

"My argument against Wasabi is this: when a developer decides the right solution to their problem is a new language, you don't have a solution. You have an even bigger problem."

So even extending a language sure does rock the boat. But I think that it's not as big of a deal as we think. Atwood later toned down a little bit saying that "nobody cares what your code looks like." But regardless, it's a sentiment felt by a lot of people.

Why do they feel this way?

I don't fault Atwood, he's certainly entitled to his own views. Given the modern state of tools to support custom languages, it does seem a bit "crazy" to go down that path. You certainly have to count the cost involved in creating a language or tool that's too "powerful."0 Reg Braithwaite warns us about this:

"What is the Turing tar-pit? It’s the place where a program has become so powerful, so general, that the effort to configure it to solve a specific problem matches or exceeds the effort to start over and write a program that solves the specific problem.

I think one of the fundamental barriers is that our industry doesn't have more widespread knowledge of Martin Fowler's (amongst many others) ideas of a "Language Workbench" that can significantly ease the process.

Some companies are getting away with custom compilers that are similar to Wasabi. The Google Web Toolkit has a compiler that converts code from Java to JavaScript. Maybe it didn't get the shocked response that Wasabi did because it didn't extend its source language, Java. I think that since many people mistakenly believe that Java and JavaScript are similar, it didn't seem that radical.

The Real Reason 'Very High Level Languages' Haven't Taken Off?

I hate to admit it, but one of the lures to writing in traditional languages is that the brain power per line of code is relatively low enough that it doesn't hurt too much to think in. When I was writing an ML compiler in ML, I had to think much harder for every line of code. It wouldn't be uncommon for me to spend 30 minutes on a single line. However, one of the benefits was that once I had written that line, it usually made sense. An advantage to this approach is that I wasn't too disturbed that Standard ML didn't have a debugger at the time. It didn't really need one.

I think a part of the problem is that traditional languages make us feel that we're productive because we can generate lots of code quickly in them. But is this a good thing? Erik Meijer said it well:

"Do you want to get [your program] done, or do you want to get it right? ... We confuse productivity with getting it done quickly. We don't [account for] all the hours of debugging. We're doing the wrong accounting."

A truly productive language will help us write code that is right the first time without the need for a lot of debugging. When you do honest accounting of the whole development cycle, I can then agree with Fred Brooks and others that the number of lines of code written per day is approximately equivalent for a good programmer regardless of the language used. The real trick is how much you do in those lines.

Enough criticism. If the "Moore's Law" language of the future isn't necessarily APL, J, or excessive use of parenthesis, then what on Earth might it look like?

Let's take a peek.

UPDATE: Part three is now available.