Wednesday, June 10, 2009

The First Few Milliseconds of an HTTPS Connection

Convinced from spending hours reading rave reviews, Bob eagerly clicked "Proceed to Checkout" for his gallon of Tuscan Whole Milk and...

Whoa! What just happened?

In the 220 milliseconds that flew by, a lot of interesting stuff happened to make Firefox change the address bar color and put a lock in the lower right corner. With the help of Wireshark, my favorite network tool, and a slightly modified debug build of Firefox, we can see exactly what's going on.

By agreement of RFC 2818, Firefox knew that "https" meant it should connect to port 443 at Amazon.com:

Most people associate HTTPS with SSL (Secure Sockets Layer) which was created by Netscape in the mid 90's. This is becoming less true over time. As Netscape lost market share, SSL's maintenance moved to the Internet Engineering Task Force (IETF). The first post-Netscape version was re-branded as Transport Layer Security (TLS) 1.0 which was released in January 1999. It's rare to see true "SSL" traffic given that TLS has been around for 10 years.

Client Hello

TLS wraps all traffic in "records" of different types. We see that the first byte out of our browser is the hex byte 0x16 = 22 which means that this is a "handshake" record:

The next two bytes are 0x0301 which indicate that this is a version 3.1 record which shows that TLS 1.0 is essentially SSL 3.1.

The handshake record is broken out into several messages. The first is our "Client Hello" message (0x01). There are a few important things here:

  • Random:


    There are four bytes representing the current Coordinated Universal Time (UTC) in the Unix epoch format, which is the number of seconds since January 1, 1970. In this case, 0x4a2f07ca. It's followed by 28 random bytes. This will be used later on.
  • Session ID:


    Here it's empty/null. If we had previously connected to Amazon.com a few seconds ago, we could potentially resume a session and avoid a full handshake.
  • Cipher Suites:


    This is a list of all of the encryption algorithms that the browser is willing to support. Its top pick is a very strong choice of "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA" followed by 33 others that it's willing to accept. Don't worry if none of that makes sense. We'll find out later that Amazon doesn't pick our first choice anyway.
  • server_name extension:


    This is a way to tell Amazon.com that our browser is trying to reach https://www.amazon.com/. This is really convenient because our TLS handshake occurs long before any HTTP traffic. HTTP has a "Host" header which allows a cost-cutting Internet hosting companies to pile hundreds of websites onto a single IP address. SSL has traditionally required a different IP for each site, but this extension allows the server to respond with the appropriate certificate that the browser is looking for. If nothing else, this extension should allow an extra week or so of IPv4 addresses.

Server Hello

Amazon.com replies with a handshake record that's a massive two packets in size (2,551 bytes). The record has version bytes of 0x0301 meaning that Amazon agreed to our request to use TLS 1.0. This record has three sub-messages with some interesting data:

  1. "Server Hello" Message (2):

    • We get the server's four byte time Unix epoch time representation and its 28 random bytes that will be used later.
    • A 32 byte session ID in case we want to reconnect without a big handshake.
    • Of the 34 cipher suites we offered, Amazon picked "TLS_RSA_WITH_RC4_128_MD5" (0x0004). This means that it will use the "RSA" public key algorithm to verify certificate signatures and exchange keys, the RC4 encryption algorithm to encrypt data, and the MD5 hash function to verify the contents of messages. We'll cover these in depth later on. I personally think Amazon had selfish reasons for choosing this cipher suite. Of the ones on the list, it was the one that was least CPU intensive to use so that Amazon could crowd more connections onto each of their servers. A much less likely possibility is that they wanted to pay special tribute to Ron Rivest, who created all three of these algorithms.
  2. Certificate Message (11):


    • This message takes a whopping 2,464 bytes and is the certificate that the client can use to validate Amazon's. It isn't anything fancy. You can view most of its contents in your browser:


  3. "Server Hello Done" Message (14):


    • This is a zero byte message that tells the client that it's done with the "Hello" process and indicate that the server won't be asking the client for a certificate.

Checking out the Certificate

The browser has to figure out if it should trust Amazon.com. In this case, it's using certificates. It looks at Amazon's certificate and sees that the current time is between the "not before" time of August 26th, 2008 and before the "not after" time of August 27, 2009. It also checks to make sure that the certificate's public key is authorized for exchanging secret keys.

Why should we trust this certificate?

Attached to the certificate is a "signature" that is just a really long number in big-endian format:

Anyone could have sent us these bytes. Why should we trust this signature? To answer that question, need to make a speedy detour into mathemagic land:

Interlude: A Short, Not Too Scary, Guide to RSA

People sometimes wonder if math has any relevance to programming. Certificates give a very practical example of applied math. Amazon's certificate tells us that we should use the RSA algorithm to check the signature. RSA was created in the 1970's by MIT professors Ron *R*ivest, Adi *S*hamir, and Len *A*dleman who found a clever way to combine ideas spanning 2000 years of math development to come up with a beautifully simple algorithm:

You pick two huge prime numbers "p" and "q." Multiply them to get "n = p*q." Next, you pick a small public exponent "e" which is the "encryption exponent" and a specially crafted inverse of "e" called "d" as the "decryption exponent." You then make "n" and "e" public and keep "d" as secret as you possibly can and then throw away "p" and "q" (or keep them as secret as "d"). It's really important to remember that "e" and "d" are inverses of each other.

Now, if you have some message, you just need to interpret its bytes as a number "M." If you want to "encrypt" a message to create a "ciphertext", you'd calculate:

C ≡ Me (mod n)

This means that you multiply "M" by itself "e" times. The "mod n" means that we only take the remainder (e.g. "modulus") when dividing by "n." For example, 11 AM + 3 hours ≡ 2 (PM) (mod 12 hours). The other recipient knows "d" which allows them to invert the message to recover the original message:

Cd ≡ (Me)d ≡ Me*d ≡ M1 ≡ M (mod n)

Just as interesting is that the person with "d" can "sign" a document by raising a message "M" to the "d" exponent:

Md ≡ S (mod n)

This works because "signer" makes public "S", "M", "e", and "n." Anyone can verify the signature "S" with a simple calculation:

Se ≡ (Md)e ≡ Md*e ≡ Me*d ≡ M1 ≡ M (mod n)

Public key cryptography algorithms like RSA are often called "asymmetric" algorithms because the encryption key (in our case, "e") is not equal to (e.g. "symmetric" with) the decryption key "d". Reducing everything "mod n" makes it impossible to use the easy techniques that we're used to such as normal logarithms. The magic of RSA works because you can calculate/encrypt C ≡ Me (mod n) very quickly, but it is really hard to calculate/decrypt Cd ≡ M (mod n) without knowing "d." As we saw earlier, "d" is derived from factoring "n" back to its "p" and "q", which is a tough problem.

Verifying Signatures

The big thing to keep in mind with RSA in the real world is that all of the numbers involved have to be big to make things really hard to break using the best algorithms that we have. How big? Amazon.com's certificate was "signed" by "VeriSign Class 3 Secure Server CA." From the certificate, we see that this VeriSign modulus "n" is 2048 bits long which has this 617 digit base-10 representation:

1890572922 9464742433 9498401781 6528521078 8629616064 3051642608 4317020197 7241822595 6075980039 8371048211 4887504542 4200635317 0422636532 2091550579 0341204005 1169453804 7325464426 0479594122 4167270607 6731441028 3698615569 9947933786 3789783838 5829991518 1037601365 0218058341 7944190228 0926880299 3425241541 4300090021 1055372661 2125414429 9349272172 5333752665 6605550620 5558450610 3253786958 8361121949 2417723618 5199653627 5260212221 0847786057 9342235500 9443918198 9038906234 1550747726 8041766919 1500918876 1961879460 3091993360 6376719337 6644159792 1249204891 7079005527 7689341573 9395596650 5484628101 0469658502 1566385762 0175231997 6268718746 7514321

(Good luck trying to find "p" and "q" from this "n" - if you could, you could generate real-looking VeriSign certificates.)

VeriSign's "e" is 2^16 + 1 = 65537. Of course, they keep their "d" value secret, probably on a safe hardware device protected by retinal scanners and armed guards. Before signing, VeriSign checked the validity of the contents that Amazon.com claimed on its certificate using a real-world "handshake" that involved looking at several of their business documents. Once VeriSign was satisfied with the documents, they used the SHA-1 hash algorithm to get a hash value of the certificate that had all the claims. In Wireshark, the full certificate shows up as the "signedCertificate" part:

It's sort of a misnomer since it actually means that those are the bytes that the signer is going to sign and not the bytes that already include a signature.

The actual signature, "S", is simply called "encrypted" in Wireshark. If we raise "S" to VeriSign's public "e" exponent of 65537 and then take the remainder when divided by the modulus "n", we get this "decrypted" signature hex value:

0001FFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFF00302130 0906052B0E03021A 05000414C19F8786 871775C60EFE0542 E4C2167C830539DB

Per the PKCS #1 v1.5 standard, the first byte is "00" and it "ensures that the encryption block, [when] converted to an integer, is less than the modulus." The second byte of "01" indicates that this is a private key operation (e.g. it's a signature). This is followed by a lot of "FF" bytes that are used to pad the result to make sure that it's big enough. The padding is terminated by a "00" byte. It's followed by "30 21 30 09 06 05 2B 0E 03 02 1A 05 00 04 14" which is the PKCS #1 v2.1 way of specifying the SHA-1 hash algorithm. The last 20 bytes are SHA-1 hash digest of the bytes in "signedCertificate."

Since the decrypted value is properly formatted and the last bytes are the same hash value that we can calculate independently, we can assume that whoever knew "VeriSign Class 3 Secure Server CA"'s private key "signed" it. We implicitly trust that only VeriSign knows the private key "d."

We can repeat the process to verify that "VeriSign Class 3 Secure Server CA"'s certificate was signed by VeriSign's "Class 3 Public Primary Certification Authority."

But why should we trust that? There are no more levels on the trust chain.

The top "VeriSign Class 3 Public Primary Certification Authority" was signed by itself. This certificate has been built into Mozilla products as an implicitly trusted good certificate since version 1.4 of certdata.txt in the Network Security Services (NSS) library. It was checked-in on September 6, 2000 by Netscape's Robert Relyea with the following comment:

"Make the framework compile with the rest of NSS. Include a 'live' certdata.txt with those certs we have permission to push to open source (additional certs will be added as we get permission from the owners)."

This decision has had a relatively long impact since the certificate has a validity range of January 28, 1996 - August 1, 2028.

As Ken Thompson explained so well in his "Reflections on Trusting Trust", you ultimately have to implicitly trust somebody. There is no way around this problem. In this case, we're implicitly trusting that Robert Relyea made a good choice. We also hope that Mozilla's built-in certificate policy is reasonable for the other built-in certificates.

One thing to keep in mind here is that all these certificates and signatures were simply used to form a trust chain. On the public Internet, VeriSign's root certificate is implicitly trusted by Firefox long before you go to any website. In a company, you can create your own root certificate authority (CA) that you can install on everyone's machine.

Alternatively, you can get around having to pay companies like VeriSign and avoid certificate trust chains altogether. Certificates are used to establish trust by using a trusted third-party (in this case, VeriSign). If you have a secure means of sharing a secret "key", such as whispering a long password into someone's ear, then you can use that pre-shared key (PSK) to establish trust. There are extensions to TLS to allow this, such as TLS-PSK, and my personal favorite, TLS with Secure Remote Password (SRP) extensions. Unfortunately, these extensions aren't nearly as widely deployed and supported, so they're usually not practical. Additionally, these alternatives impose a burden that we have to have some other secure means of communicating the secret that's more cumbersome than what we're trying to establish with TLS (otherwise, why wouldn't we use that for everything?).

One final check that we need to do is to verify that the host name on the certificate is what we expected. Nelson Bolyard's comment in the SSL_AuthCertificate function explains why:

/* cert is OK. This is the client side of an SSL connection.
* Now check the name field in the cert against the desired hostname.
* NB: This is our only defense against Man-In-The-Middle (MITM) attacks! */

This check helps prevent against a man-in-the-middle attack because we are implicitly trusting that the people on the certificate trust chain wouldn't do something bad, like sign a certificate claiming to be from Amazon.com unless it actually was Amazon.com. If an attacker is able to modify your DNS server by using a technique like DNS cache poisoning, you might be fooled into thinking you're at a trusted site (like Amazon.com) because the address bar will look normal. This last check implicitly trusts certificate authorities to stop these bad things from happening.

Pre-Master Secret

We've verified some claims about Amazon.com and know its public encryption exponent "e" and modulus "n." Anyone listening in on the traffic can know this as well (as evidenced because we are using Wireshark captures). Now we need to create a random secret key that an eavesdropper/attacker can't figure out. This isn't as easy as it sounds. In 1996, researchers figured out that Netscape Navigator 1.1 was using only three sources to seed their pseudo-random number generator (PRNG). The sources were: the time of day, the process id, and the parent process id. As the researchers showed, these "random" sources aren't that random and were relatively easy to figure out.

Since everything else was derived from these three "random" sources, it was possible to "break" the SSL "security" in 25 seconds on a 1996 era machine. If you still don't believe that finding randomness is hard, just ask the Debian OpenSSL maintainers. If you mess it up, all the security built on top of it is suspect.

On Windows, random numbers used for cryptographic purposes are generated by calling the CryptGenRandom function that hashes bits sampled from over 125 sources. Firefox uses this function along with some bits derived from its own function to seed its pseudo-random number generator.

The 48 byte "pre-master secret" random value that's generated isn't used directly, but it's very important to keep it secret since a lot of things are derived from it. Not surprisingly, Firefox makes it hard to find out this value. I had to compile a debug version and set the SSLDEBUGFILE and SSLTRACE environment variables to see it.

In this particular session, the pre-master secret showed up in the SSLDEBUGFILE as:

4456: SSL[131491792]: Pre-Master Secret [Len: 48]
03 01 bb 7b 08 98 a7 49 de e8 e9 b8 91 52 ec 81 ...{...I.....R..
4c c2 39 7b f6 ba 1c 0a b1 95 50 29 be 02 ad e6 L.9{......P)....
ad 6e 11 3f 20 c4 66 f0 64 22 57 7e e1 06 7a 3b .n.? .f.d"W~..z;

Note that it's not completely random. The first two bytes are, by convention, the TLS version (03 01).

Trading Secrets

We now need to get this secret value over to Amazon.com. By Amazon's wishes of "TLS_RSA_WITH_RC4_128_MD5", we will use RSA to do this. You could make your input message equal to just the 48 byte pre-master secret, but the Public Key Cryptography Standard (PKCS) #1, version 1.5 RFC tells us that we should pad these bytes with random data to make the input equal to exactly the size of the modulus (1024 bits/128 bytes). This makes it harder for an attacker to determine our pre-master secret. It also gives us one last chance to protect ourselves in case we did something really bone-headed, like reusing the same secret. If we reused the key, the eavesdropper would likely see a different value placed on the network due to the random padding.

Again, Firefox makes it hard to see these random values. I had to insert debugging statements into the padding function to see what was going on:

wrapperHandle = fopen("plaintextpadding.txt", "a");
fprintf(wrapperHandle, "PLAINTEXT = ");
for(i = 0; i < modulusLen; i++)
{
fprintf(wrapperHandle, "%02X ", block[i]);
}
fprintf(wrapperHandle, "\r\n");
fclose(wrapperHandle);

In this session, the full padded value was:

00 02 12 A3 EA B1 65 D6 81 6C 13 14 13 62 10 53 23 B3 96 85 FF 24 FA CC 46 11 21 24 A4 81 EA 30 63 95 D4 DC BF 9C CC D0 2E DD 5A A6 41 6A 4E 82 65 7D 70 7D 50 09 17 CD 10 55 97 B9 C1 A1 84 F2 A9 AB EA 7D F4 CC 54 E4 64 6E 3A E5 91 A0 06 00 03 01 BB 7B 08 98 A7 49 DE E8 E9 B8 91 52 EC 81 4C C2 39 7B F6 BA 1C 0A B1 95 50 29 BE 02 AD E6 AD 6E 11 3F 20 C4 66 F0 64 22 57 7E E1 06 7A 3B

Firefox took this value and calculated "C ≡ Me (mod n)" to get the value we see in the "Client Key Exchange" record:

Finally, Firefox sent out one last unencrypted message, a "Change Cipher Spec" record:

This is Firefox's way of telling Amazon that it's going to start using the agreed upon secret to encrypt its next message.

Deriving the Master Secret

If we've done everything correctly, both sides (and only those sides) now know the 48 byte (256 bit) pre-master secret. There's a slight trust issue here from Amazon's perspective: the pre-master secret just has bits that were generated by the client, they don't take anything into account from the server or anything we said earlier. We'll fix that be computing the "master secret." Per the spec, this is done by calculating:

master_secret = PRF(pre_master_secret, "master secret", ClientHello.random + ServerHello.random)

The "pre_master_secret" is the secret value we sent earlier. The "master secret" is simply a string whose ASCII bytes (e.g. "6d 61 73 74 65 72 ...") are used. We then concatenate the random values that were sent in the ClientHello and ServerHello (from Amazon) messages that we saw at the beginning.

The PRF is the "Pseudo-Random Function" that's also defined in the spec and is quite clever. It combines the secret, the ASCII label, and the seed data we give it by using the keyed-Hash Message Authentication Code (HMAC) versions of both MD5 and SHA-1 hash functions. Half of the input is sent to each hash function. It's clever because it is quite resistant to attack, even in the face of weaknesses in MD5 and SHA-1. This process can feedback on itself and iterate forever to generate as many bytes as we need.

Following this procedure, we obtain a 48 byte "master secret" of

4C AF 20 30 8F 4C AA C5 66 4A 02 90 F2 AC 10 00 39 DB 1D E0 1F CB E0 E0 9D D7 E6 BE 62 A4 6C 18 06 AD 79 21 DB 82 1D 53 84 DB 35 A7 1F C1 01 19

Generating Lots of Keys

Now that both sides have a "master secrets", the spec shows us how we can derive all the needed session keys we need using the PRF to create a "key block" where we will pull data from:

key_block = PRF(SecurityParameters.master_secret, "key expansion", SecurityParameters.server_random + SecurityParameters.client_random);

The bytes from "key_block" are used to populate the following:

client_write_MAC_secret[SecurityParameters.hash_size]
server_write_MAC_secret[SecurityParameters.hash_size]
client_write_key[SecurityParameters.key_material_length]
server_write_key[SecurityParameters.key_material_length]
client_write_IV[SecurityParameters.IV_size]
server_write_IV[SecurityParameters.IV_size]

Since we're using a stream cipher instead of a block cipher like the Advanced Encryption Standard (AES), we don't need the Initialization Vectors (IVs). Therefore, we just need two Message Authentication Code (MAC) keys for each side that are 16 bytes (128 bits) each since the specified MD5 hash digest size is 16 bytes. In addition, the RC4 cipher uses a 16 byte (128 bit) key that both sides will need as well. All told, we need 2*16 + 2*16 = 64 bytes from the key block.

Running the PRF, we get these values:

client_write_MAC_secret = 80 B8 F6 09 51 74 EA DB 29 28 EF 6F 9A B8 81 B0
server_write_MAC_secret = 67 7C 96 7B 70 C5 BC 62 9D 1D 1F 4A A6 79 81 61
client_write_key = 32 13 2C DD 1B 39 36 40 84 4A DE E5 6C 52 46 72
server_write_key = 58 36 C4 0D 8C 7C 74 DA 6D B7 34 0A 91 B6 8F A7

Prepare to be Encrypted!

The last handshake message the client sends out is the "Finished message." This is a clever message that proves that no one tampered with the handshake and it proves that we know the key. The client takes all bytes from all handshake messages and puts them into a "handshake_messages" buffer. We then calculate 12 bytes of "verify_data" using the pseudo-random function (PRF) with our master key, the label "client finished", and an MD5 and SHA-1 hash of "handshake_messages":

verify_data = PRF(master_secret, "client finished", MD5(handshake_messages) + SHA-1(handshake_messages)) [12]

We take the result and add a record header byte "0x14" to indicate "finished" and length bytes "00 00 0c" to indicate that we're sending 12 bytes of verify data. Then, like all future encrypted messages, we need to make sure the decrypted contents haven't been tampered with. Since our cipher suite in use is TLS_RSA_WITH_RC4_128_MD5, this means we use the MD5 hash function.

Some people get paranoid when they hear MD5 because it has some weaknesses. I certainly don't advocate using it as-is. However, TLS is smart in that it doesn't use MD5 directly, but rather the HMAC version of it. This means that instead of using MD5(m) directly, we calculate:

HMAC_MD5(Key, m) = MD5((Key ⊕ opad) ++ MD5((Key ⊕ ipad) ++ m)

(The ⊕ means XOR, ++ means concatenate, "opad" is the bytes "5c 5c ... 5c", and "ipad" is the bytes "36 36 ... 36").

In particular, we calculate:

HMAC_MD5(client_write_MAC_secret, seq_num + TLSCompressed.type + TLSCompressed.version + TLSCompressed.length + TLSCompressed.fragment));

As you can see, we include a sequence number ("seq_num") along with attributes of the plaintext message (here it's called "TLSCompressed"). The sequence number foils attackers who might try to take a previously encrypted message and insert it midstream. If this occurred, the sequence numbers would definitely be different than what we expected. This also protects us from an attacker dropping a message.

All that's left is to encrypt these bytes.

RC4 Encryption

Our negotiated cipher suite was TLS_RSA_WITH_RC4_128_MD5. This tells us that we need to use Ron's Code #4 (RC4) to encrypt the traffic. Ron Rivest developed the RC4 algorithm to generate random bytes based on a 256 byte key. The algorithm is so simple you can actually memorize it in a few minutes.

RC4 begins by creating a 256-byte "S" byte array and populating it with 0 to 255. You then iterate over the array by mixing in bytes from the key. You do this to create a state machine that is used to generate "random" bytes. To generate a random byte, we shuffle around the "S" array.

Put graphically, it looks like this:

To encrypt a byte, we xor this pseudo-random byte with the byte we want to encrypt. Remember that xor'ing a bit with 1 causes it to flip. Since we're generating random numbers, on average the xor will flip half of the bits. This random bit flipping is effectively how we encrypt data. As you can see, it's not very complicated and thus it runs quickly. I think that's why Amazon chose it.

Recall that we have a "client_write_key" and a "server_write_key." The means we need to create two RC4 instances: one to encrypt what our browser sends and the other to decrypt what the server sent us.

The first few random bytes out of the "client_write" RC4 instance are "7E 20 7A 4D FE FB 78 A7 33 ..." If we xor these bytes with the unencrypted header and verify message bytes of "14 00 00 0C 98 F0 AE CB C4 ...", we'll get what appears in the encrypted portion that we can see in Wireshark:

The server does almost the same thing. It sends out a "Change Cipher Spec" and then a "Finished Message" that includes all handshake messages, including the decrypted version of the client's "Finished Message." Consequently, this proves to the client that the server was able to successfully decrypt our message.

Welcome to the Application Layer!

Now, 220 milliseconds after we started, we're finally ready for the application layer. We can now send normal HTTP traffic that'll be encrypted by the TLS layer with the RC4 write instance and decrypt traffic with the server RC4 write instance. In addition, the TLS layer will check each record for tampering by computing the HMAC_MD5 hash of the contents.

At this point, the handshake is over. Our TLS record's content type is now 23 (0x17). Encrypted traffic begins with "17 03 01" which indicate the record type and TLS version. These bytes are followed by our encrypted size, which includes the HMAC hash.

Encrypting the plaintext of:

GET /gp/cart/view.html/ref=pd_luc_mri HTTP/1.1
Host: www.amazon.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.10) Gecko/2009060911 Minefield/3.0.10 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
...

will give us the bytes we see on the wire:

The only other interesting fact is that the sequence number increases on each record, it's now 1 (and the next record will be 2, etc).

The server does the same type of thing on its side using the server_write_key. We see its response, including the tell-tale application data header:

Decrypting this gives us:

HTTP/1.1 200 OK
Date: Wed, 10 Jun 2009 01:09:30 GMT
Server: Server
...
Cneonction: close
Transfer-Encoding: chunked

which is a normal HTTP reply that includes a non-descriptive "Server: Server" header and a misspelled "Cneonction: close" header coming from Amazon's load balancers.

TLS is just below the application layer. The HTTP server software can act as if it's sending unencrypted traffic. The only change is that it writes to a library that does all the encryption. OpenSSL is a popular open-source library for TLS.

The connection will stay open while both sides send and receive encrypted data until either side sends out a "closure alert" message and then closes the connection. If we reconnect shortly after disconnecting, we can re-use the negotiated keys (if the server still has them cached) without using public key operations, otherwise we do a completely new full handshake.

It's important to realize that application data records can be anything. The only reason "HTTPS" is special is because the web is so popular. There are lots of other TCP/IP based protocols that ride on top of TLS. For example, TLS is used by FTPS and secure extensions to SMTP. It's certainly better to use TLS than inventing your own solution. Additionally, you'll benefit from a protocol that has withstood careful security analysis.

... And We're Done!

The very readable TLS RFC covers many more details that were missed here. We covered just one single path in our observation of the 220 millisecond dance between Firefox and Amazon's server. Quite a bit of the process was affected by the TLS_RSA_WITH_RC4_128_MD5 Cipher Suite selection that Amazon made with its ServerHello message. It's a reasonable choice that slightly favors speed over security.

As we saw, if someone could secretly factor Amazon's "n" modulus into its respective "p" and "q", they could effectively decrypt all "secure" traffic until Amazon changes their certificate. Amazon counter-balances this concern this with a short one year duration certificate:

One of the cipher suites that was offered was "TLS_DHE_RSA_WITH_AES_256_CBC_SHA" which uses the Diffie-Hellman key exchange that has a nice property of "forward secrecy." This means that if someone cracked the mathematics of the key exchange, they'd be no better off to decrypt another session. One downside to this algorithm is that it requires more math with big numbers, and thus is a little more computationally taxing on a busy server. The "Advanced Encryption Standard" (AES) algorithm was present in many of the suites that we offered. It's different than RC4 in that it works on 16 byte "blocks" at a time rather than a single byte. Since its key can be up to 256 bits, many consider this to be more secure than RC4.

In just 220 milliseconds, two endpoints on the Internet came together, provided enough credentials to trust each other, set up encryption algorithms, and started to send encrypted traffic.

And to think, all of this just so Bob can buy milk.

Friday, April 24, 2009

Using Obscure Windows COM APIs in .NET

Most native Windows APIs are simple to call from .NET. For example, if you need to do something special when showing a window, you can use the ShowWindow API using Platform Invocation Services (P/Invoke) like this:

[DllImport("user32.dll")]
static extern bool ShowWindow(IntPtr hWnd, int nCmdShow);

When you call this function, here's roughly what happens:

  1. The CLR calls LoadLibrary on the file (e.g. "user32.dll")
  2. The CLR then calls GetProcAddress on the function name (e.g. "ShowWindow") to get the address of where the function is located.

For the most part, it just magically works. If we had used a function like "MessageBox", the CLR would notice that it doesn't exist and would then pick between the ANSI version (e.g. "MessageBoxA") or the Unicode version (e.g. "MessageBoxW").

With the address in hand, it's easy to jump to it and you're all set. Simple and easy.

I was expecting a simple API like this when I was investigating how to register my program as the default handler for ".wav" files on Vista. In the pre-Vista days, most programs would write directly into a registry key for the file extension (e.g. "HKEY_CLASSES_ROOT\.wav") and move on. Problems come when your program wants to register itself as a handler for a "popular" extension like .MP3 or .HTM. Some programs go into an all out arms race with other programs in a fight of wills to make sure they keep the extension.

In Windows Vista and later, Microsoft wants us to use the new "Default Programs" feature. The idea is that you register what file extensions your program supports in the registry and then a nice UI allows people to easily pick which of those extensions they want to associate with your program. Digging around the documentation led me to discover that the bulk of the functionality was exposed via the IApplicationAssociationRegistration COM interface.

Ah, COM.

Over the years, I've tried to keep my distance from it. This irrational fear came from wizards that "next, next, finish"'d your way into thousands of lines of inscrutable code. It took me years of passing glances to finally understand its basics. Even then, when I needed to use it from .NET, I'd right click on my project references and click "Add Reference":

I'd pick the library I needed and then somehow I could use the types as if they were .NET objects. I didn't ask further questions and moved on.

Unfortunately, IApplicationAssociationRegistration was nowhere to be found on the "Add Reference" list since it doesn't seem to have a registered type library associated with it. Using my basic COM knowledge, I knew that if I wanted to use it I would need to know the interface identifier (IID) as well as a class identifier (CLSID) that pointed to a concrete implementation.

Following the MSDN documentation, I knew I'd probably find success in shobjidl.idl:

Sure enough, shobjidl.idl was sitting in my "C:\Program Files\Microsoft SDKs\Windows\v6.1\Include" directory and had this interface definition:

[
object,
uuid(4e530b0a-e611-4c77-a3ac-9031d022281b),
pointer_default(unique),
helpstring("Protocol URL and Extension File Application")
]
interface IApplicationAssociationRegistration : IUnknown
{
HRESULT QueryCurrentDefault(
[in, string] LPCWSTR pszQuery,
[in] ASSOCIATIONTYPE atQueryType,
[in] ASSOCIATIONLEVEL alQueryLevel,
[out, string] LPWSTR* ppszAssociation);

...
}

A little further down was the declaration for the concrete class (coclass) and its associated class id (CLSID):

// CLSID_ApplicationAssociationRegistration
[ uuid(591209c7-767b-42b2-9fba-44ee4615f2c7) ] coclass ApplicationAssociationRegistration
{
interface IApplicationAssociationRegistration;
}

In the IDL, we also see the definitions for the enums that the functions use:

typedef [v1_enum] enum tagASSOCIATIONLEVEL
{
AL_MACHINE,
AL_EFFECTIVE,
AL_USER,
} ASSOCIATIONLEVEL;

typedef [v1_enum] enum tagASSOCIATIONTYPE
{
AT_FILEEXTENSION,
AT_URLPROTOCOL,
AT_STARTMENUCLIENT,
AT_MIMETYPE,
} ASSOCIATIONTYPE;

Getting this to work in .NET was surprisingly easy. The basic idea is that the CLR has to have just enough information to find the types:

  1. The "ComImportAttribute" is almost as simple to use as DllImportAttribute. In addition, you need to use the GuidAttribute to specify the gigantic GUIDs.
  2. You use the "InterfaceTypeAttribute" to specify the basic interface(s) that the interface you're importing uses. In COM, all interfaces derive from IUnknown. If the interface supports scripting then it implements IDispatch. If you provide a speedy C++ way of accessing your interface (e.g. vtable definition) and the scripting IDispatch interface, you've got a "dual" interface.
  3. You need to translate the parameter types to their .NET equivalents. This is an incredibly mechanical process that's straightforward. If there is a chance that the underlying bits are different between COM and .NET (e.g. they're not blittable) then you need to use the MarshalAsAttribute to tell the CLR how to convert the types as necessary.
  4. You need to remember that COM handles errors by returning HRESULTs instead of natively using exceptions like .NET uses. By default, the CLR will make the last parameter that is an OUT parameter in the IDL to be the return value (it helps if it's marked by "retval"). Therefore, you can act as if the function really returns its last parameter and the CLR will automatically check the HRESULT and throw a corresponding .NET exception as needed.
  5. Optionally, and perhaps most controversially, you're free de-Hungarianize the parameter names and PascalCase the enum names to make them much more friendly looking to people in .NET. It's optional since it might confuse people that use MSDN documentation and expecting the original names.

In a minute or so, I translated the definitions and gladly got rid of the Hungarian prefixes by converting parameter names of "pszQuery" to just "query." I also converted all the enums and removed their unnecessary prefixes. The end result was this:

[ComImport]
[Guid("4e530b0a-e611-4c77-a3ac-9031d022281b")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
internal interface IApplicationAssociationRegistration
{
[return: MarshalAs(UnmanagedType.LPWStr)]
string QueryCurrentDefault( [MarshalAs(UnmanagedType.LPWStr)] string query,
AssociationType queryType,
AssociationLevel queryLevel);
[return: MarshalAs(UnmanagedType.Bool)]
bool QueryAppIsDefault([MarshalAs(UnmanagedType.LPWStr)] string query,
AssociationType queryType,
AssociationLevel queryLevel,
[MarshalAs(UnmanagedType.LPWStr)] string appRegistryName);
[return: MarshalAs(UnmanagedType.Bool)]
bool QueryAppIsDefaultAll(AssociationLevel queryLevel,
[MarshalAs(UnmanagedType.LPWStr)] string appRegistryName);
void SetAppAsDefault([MarshalAs(UnmanagedType.LPWStr)] string appRegistryName,
[MarshalAs(UnmanagedType.LPWStr)] string set,
AssociationType setType);
void SetAppAsDefaultAll([MarshalAs(UnmanagedType.LPWStr)] string appRegistryName);
void ClearUserAssociations();
}

Importing the concrete class that implements the interface was just a matter of specifying its CLSID:

[ComImport]
[Guid("591209c7-767b-42b2-9fba-44ee4615f2c7")]
internal class ApplicationAssociationRegistration
{
// coclass is implemented by the runtime callable wrapper
}

With all of that goo out of the way, you can use the interface like a normal .NET type:

var aa = new ApplicationAssociationRegistration();
var iaar = (IApplicationAssociationRegistration)aa;
string myCurrentMp3Player = iaar.QueryCurrentDefault(".mp3", AssociationType.FileExtension, AssociationLevel.Effective);

Behind the scenes, the runtime callable wrapper has to do something like this:

  1. Load in ole32.dll where COM functions reside.
  2. Call CoInitialize to initialize COM.
  3. Look up your CLSID and IID in the registry under HKEY_CLASSES_ROOT and find their associated DLL (in our case, "shell32.dll")
  4. Create a factory for your class.
  5. Use the factory to create an instance.
  6. Call QueryInterface to get the specific interface we want (e.g. IApplicationAssociationRegistration)
  7. Get a pointer to the function we want using the vtable.

After all that, we finally have a place to jump to like we did with P/Invoke.

Why bother with all of this? One reason is that Microsoft has a huge legacy investment in C and C++ in Windows. There's no compelling reason for them to rewrite things in .NET. A natural consequence is that the C++ code that implements their latest APIs will be exposed using COM for the foreseeable future. Recently, Microsoft has gone ahead and published .NET COM wrappers for some of the popular new APIs like the Libraries feature in Windows 7. With just a little work, you don't have to wait on Microsoft to do this for you.

Given that .NET was designed as a successor to COM, it's no surprise that Microsoft has made interoperability with it very seamless. The runtime callable wrapper does a good job of hiding most of the messier details. The garbage collector handles much of the bookkeeping involved with memory management that used to be the bane of COM programming. The runtime is very aware of typical COM semantics of when to allocate and free memory. It's not always perfect. Sometimes you can be pre-emptive and force your COM object to be cleaned up via Marshal.ReleaseComObject so you don't have to wait on the garbage collector, but you should be careful.

I just presented the basics of what I learned to get my job done. There's a lot more out there for more advanced scenarios. I've found the free book "COM and .NET Interop" by Andrew Troelsen to be helpful.

There's plenty of obscure Windows APIs out there for the taking. Enjoy!


kick it on DotNetKicks.com

Monday, March 16, 2009

How .NET Regular Expressions Really Work

Remember when you first tried to parse text?

My early BASIC programs were littered with IF statements that dissected strings using LEFT$, RIGHT$, MID$, TRIM$, and UCASE$. It took me hours to write a program that parsed a simple text file. Just trying to support whitespace and mixed casing was enough to drive me crazy.

Years later when I started programming in Java, I discovered the StringTokenizer class. I thought it was a huge leap forward. I no longer had to worry about whitespace. However, I still had to use functions like "substring" and "toUpperCase", but I thought that was as good as it could get.

And then one day I found regular expressions.

I almost cried when I realized that I could replace parsing code that took me hours to write with a simple regular expression. It still took me several years to become comfortable with the syntax, but the learning curve was worth the power obtained.

And yet with all of this love, I still had this nagging suspicion that I was doing it wrong. After reading Pragmatic Thinking and Learning, I was determined to try to imagine what life was like inside the code I wrote. But I just couldn't connect with a regular expression.

The last straw came recently when I was trying to help a coworker craft a regex to properly handle name/value string pairs with escaped strings. In the end, our regex worked, but I felt that it was duct-taped together. I knew there was a better way.

I picked up a copy of Jeffrey Friedl's book "Mastering Regular Expressions" and couldn't put it down. In less than a week, I had flown through 400+ pages and had finally started to feel like I understood how regular expressions worked. I finally had a sense for what backtracking really meant and I had a better idea for how a regex could go catastrophically out of control.

I had extremely high hopes for chapter 9 which covered the .NET regular expression "flavor." Since I work with .NET every day, I thought this would be the best chapter. I did learn a few things like how to properly use RegexOptions.ExplicitCapture, how to use the special per-match replacement sequences that Regex.Replace offers, how to save compiled regular expressions to a DLL, and how to match balanced parentheses -a feat that's theoretically not possible with a regex. Despite learning all of this in the chapter, I still didn't feel that I could "connect" with the very .NET regular expression engine that I know and love.

To be fair, the vast benefit of the book comes from the first six chapters that deal with how regular expressions work in general since regex implementations share many ideas. The book laid a solid foundation, but I wanted more.

I wanted to stop all my hand-waving at regular expressions and actually understand how they really work.

I knew I wanted to drill into the code. Although tools like Reflector are amazing, I knew I wanted to see the actual code. It's fairly easy now to step into the framework source code in the debugger. Unlike understanding the details of locking, which had me dive into C++ and x86 assembly, it was refreshing to see that the .NET regular expression engine was written entirely in C#.

I decided to use a really simple regular expression and search string and then follow it from cradle to grave. If you'd like to follow along at home, I've linked to relevant lines in the .NET regular expression source code.

My very simple regex consisted of looking for a basic URL:

string textToSearch = "Welcome to http://www.moserware.com/!";
string regexPattern = @"http://([^\s/]+)/?";
Match m = Regex.Match(textToSearch, regexPattern);
Console.WriteLine("Full uri = '{0}'", m.Value);
Console.WriteLine("Host ='{0}'", m.Groups[1].Value);

Our journey begins at Regex.Match where we checking an internal cache of the past 15 regex values to see if there a match for:

"0:ENU:http://([^\\s/]+)/?"

This is a compact representation of:

RegexOptions : Culture : Regex pattern

The regex doesn't find this in the cache, so it starts scanning the pattern. Note that out of respect for the authors, our regex pattern doesn't have any comments or whitespace in it:

// It would be nice to get rid of the comment modes, since the 
// ScanBlank() calls are just kind of duct-taped in.

We start creating an internal tree representation of the regex by adding a multi-character (aka "Multi") node to contain the "http://" part. Next, we see that the scanner made it to first real capture:

http://([^\s/]+)/?

This capture contains a character class that says that we don't want to match spaces or a forward slash. It is converted into an obscure five character string:

"\x1\x2\x1\x2F\x30\x64"

Later we'll see why it had to all fit in one string, but for now we can use a helpful comment to decode each character:

OffsetHex ValueMeaning
00x01The set should be negated
10x02There are two characters in the character part of the set
20x01There is one Unicode category
30x2FInclusive lower-bound of the character set. It's a '/' in Unicode
40x30Exclusive upper-bound of the character set. It's a '0' in Unicode
50x64This is a magic number that means the "Space" category.

Before I realized that this string had meaning, I was utterly confused.

As we continue scanning, we find a '+' quantifier:

http://([^\\s/]+)/?

This is noted as a Oneloop node since it's a "loop" of what came before (e.g. the character class set). It has arguments of 1 and Int32.MaxValue to denote 1 or more matches. We see that the next character isn't a '?', so we can assert this is not a lazy match which means it's a greedy match. 

The first group is recorded when we hit the ')' character. At the end of the pattern, we note a One (character) node for the '/' and we see it's followed by a '?'  which is just another quantifier, this time with a minimum of 0 and a maximum of 1.

All those nodes come together to give us this "RegexTree:"

We still need to convert the tree to code that the regular expression "machine" can execute later. The bulk of the work is done by an aptly named RegexCodeFromRegexTree function that has a decent comment:

/*
* The top level RegexCode generator. It does a depth-first walk
* through the tree and calls EmitFragment to emits code before
* and after each child of an interior node, and at each leaf.
*
* It runs two passes, first to count the size of the generated
* code, and second to generate the code.
*
* <CONSIDER>we need to time it against the alternative, which is
* to just generate the code and grow the array as we go.</CONSIDER>
*/

I love the anonymous "CONSIDER" comment and would have had a similar reaction. Instead of using an ArrayList or List<int> to store the op codes, which can automatically resize as needed, the code diligently goes through the entire RegexTree twice. The class is peppered with "if(_counting)" expressions that just increase a counter by the size they will use in the next pass.

As predicted by the comment, the bulk of the work is done by the 250 line switch statement that makes up the EmitFragment function. This function breaks up RegexTree "fragments" and converts them to a simpler RegexCode. The first fragment is:

EmitFragment(nodetype=RegexNode.Capture | BeforeChild, node=[RegexNode.Capture, Group=0, Length=-1], childIndex=0)

This is shorthand for emitting the RegexCode that should come before the children of the top level "RegexNode.Capture" node that represents group 0 and that goes until the end of the string (e.g. has length -1). The last 0 means that it's the 0th child of the parent node (this is sort of meaningless since it has no parent). The subsequent calls walk the rest of the tree:

EmitFragment(RegexNode.Concatenate | BeforeChild, [RegexNode.Concatenate], childIndex=0)
EmitFragment(RegexNode.Multi, [RegexNode.Multi, string="http://"], childIndex=0)
EmitFragment(RegexNode.Concatenate | AfterChild, [RegexNode.Concatenate], childIndex=0)
EmitFragment(RegexNode.Concatenate | BeforeChild, [RegexNode.Concatenate], childIndex=1)
EmitFragment(RegexNode.Capture | BeforeChild, [RegexNode.Capture, Group=1, -1], childIndex=0)
EmitFragment(RegexNode.SetLoop, [RegexNode.SetLoop, min=1, max=Int32.MaxValue], childIndex=0)
EmitFragment(RegexNode.Capture | AfterChild, [RegexNode.Capture, Group=1, Length=-1], childIndex=0)
EmitFragment(RegexNode.Concatenate | AfterChild, [RegexNode.Concatenate], childIndex=1)
EmitFragment(RegexNode.Concatenate | BeforeChild, [RegexNode.Concatenate], childIndex=2)
EmitFragment(RegexNode.Oneloop, [RegexNode.Oneloop, min=0, max=1, character='/'], childIndex=0)
EmitFragment(RegexNode.Concatenate | AfterChild, [RegexNode.Concatenate], childIndex=2)
EmitFragment(RegexNode.Capture | AfterChild, [RegexNode.Capture, Group=0, Length=-1], childIndex=0)

The reward for all this work is an integer array that describes the RegexCode "op codes" and their arguments. You can see that some instructions like "Setrep" take a string argument. These arguments point to offsets in a string table. This is why it was critical to pack everything about a set into the obscure string we saw earlier. It was the only way to pass that information to the instruction.

Decoding the code array, we see:

IndexInstructionOp Code/ArgumentString Table ReferenceDescription
0Lazybranch23 Lazily branch to the Stop instruction at offset 21.
1 21 
2Setmark31 Push our current state onto a stack in case we need to backtrack later.
3Multi12 Perform a multi-character match of string table item 0 which is "http://".
4 0"http://"
5Setmark31 Push our current state onto a stack in case we need to backtrack later.
6Setrep2 Perform a set repetition match of length 1 on the set stored at string table position 1, which represents [^\s/].
7 1"\x1\x2\x1\x2F\x30\x64"
8 1 
9Setloop5 Match the set [^\s/] in a loop at most Int32.MaxValue times.
10 1"\x1\x2\x1\x2F\x30\x64"
11 2147483647 
12Capturemark32 Capture into group #1, the string between the mark set by the last Setmark and the current position.
13 1 
14 -1 
15Oneloop3 Match Unicode character 47 (a '/') in a loop for a maximum of 1 time.
16 47 
17 1 
18Capturemark32 Capture into group #0, the contents between the first Setmark instruction and the current position.
19 0 
20 -1 
21Stop40 Stop the regex.

We can now see that our regex has turned into a simple "program" that will be executed later.

Prefix Optimizations

We could stop here, but we'd miss the fun "optimizations." With our pattern and search string, the optimizations will actually slow things down, but the code generator is oblivious to that. The basic idea behind prefix optimizations is to quickly jump to where the match might start. It does this by using a RegexFCD class that I'm guessing stands for "Regex First Character Descriptor."

With our regex, the FirstChars functions notices our "http://" 'Multi' node and determines that any match must start with an 'h'. If we had alternations, the first character of each alternation would be added to make a limited set of potential first characters. With this optimization alone, we can skip all characters in the text that aren't in this approved "white list" of first characters without having to execute any of the above RegexCode.

But wait... there's an even trickier optimization! The optimizer discovers that the first thing the regex must match is a simple string literal: a 'Multi' node. This means that we can use the RegexBoyerMoore class which applies the Boyer-Moore search algorithm.

The key insight is that we don't have to check each character of the text. We only need to look at last character to see if it's even worth checking the rest.

For example, if our sample text is "Welcome to http://www.moserware.com/!" and we're searching for "http://" which is 7 characters, we first look at the 7th character of the text which is 'e'. Since 'e' is not the 7th character of what we're looking for (which is a '/'), we know that there couldn't possibly be a match and so we don't need to bother checking all previous 6 characters because there isn't even an 'e' in what we're looking for. The tricky part is what to do if the what we find is in the string that we're trying to match, but it isn't the last '/' character.

The specifics are handled in straightforward way with some minor optimizations to reduce memory needs given 65,000+ possible Unicode characters. For each character, the maximum possible skip is calculated.

For "http://", we come up with this skip table:

CharacterCharacters to skip ahead
/0
:2
h6
p3
t4
all others7

This table tells us that if we find an 'e' then we can skip ahead 7 characters without even checking the previous 6 characters. If we find a 'p', then we can skip ahead at least 3 characters before performing a full check, and if we find a '/' then we could be on the last character and need to check other characters (e.g. skip ahead 0).

There is one more optimization that looks for anchors, but none apply to our regex, so it's ignored.

We're done! We made it to the end of the RegexWriter phase. The "RegexCode" internal representation consists of these critical parts:

  1. The regex code we created.
  2. The string table derived from the regex that the code uses (e.g. our "Multi" and "Setrep" instructions have string table references).
  3. The maximum size of our backtracking stack. (Ours is 7, this will make more sense later.)
  4. A mapping of named captures to their group numbers. (We don't have any in our regex, so this is empty.)
  5. The total number of captures. (We have 2.)
  6. The RegexBoyerMoore prefix that we calculated. (This applies to us since we have a string literal at the start.)
  7. The possible first characters in our prefix. (In our case, we calculated this to be an 'h'.)
  8. Our anchors. (We don't have any.)
  9. An indicator whether this should be a RightToLeft match. (In our case, we use the default which is false.)

Every regex passes through this step. It applies to our measly regex with a code size of 21 as much as it does to a gnarly RFC2822 compliant regex that has 175. These nine items completely describe everything that we'll do with our regex and they never change.

In need of an interpreter

Now that we have the RegexCode, the match method will run and create a RegexRunner which is the "driver" for the regex matching process. Since we didn't specify the "Compiled" flag, we'll use the RegexInterpreter runner.

Before the interpreter starts scanning, it notices that we have a valid Boyer-Moore prefix optimization and it uses it to quickly locate the start of the regex:

Index0123456789101112131415161718192021222324252627282930313233343536
CharacterWelcome to http://www.moserware.com/!
Scan Order      1    982 & 76543                   

It first looks at the 7th character and finds an 'e' instead of the '/' that it wanted. The skip table tells it that 'e' isn't in any possible match, so it jumps ahead 7 more characters where it finds a 't'. The skip table tells it to jump ahead 4 more characters where it finally finds the '/' it wanted. It then verifies that this is the last character of our "http://" prefix. With a valid prefix found, we prepare for a match in case we're lucky and the rest of the regex matches.

The bulk of the interpreter is in its "Go" method which is a 700 line switch statement that interprets the RegexCode we created earlier. The only interesting part is that the interpreter keeps two stacks to keep its state in case it needs to backtrack and abandon a path it took. The "run stack" records where in the search string an operation begins while the "run track" records the RegexCode instruction that could potentially backtrack. Any time there is a chance that the interpreter could go down a wrong path, it pushes its state onto these stacks so that it can potentially try something else later.

On our string, the following instructions execute:

  1. Lazybranch - This is a branch that is "lazy." It will only occur if we fail and have to backtrack to this instruction. In case there are problems, we push 11 (the string offset to the start of "http://") onto the "run stack" and 0 (the RegexCode offset for this instruction) onto the "run track." The branch is to code offset 21 which is the "Stop" instruction.
  2. Setmark - We save our position in case we have to backtrack.
  3. Multi - A multi-character match. The string to match is at offset 0 in the string table (which is "http://").
  4. Setmark - Another position save in case of a backtrack. Since the Multi code succeeded, we push our "run stack" offset of 18 (the start of "www.") and our "run track" code position of 5
  5. Setrep - Loads the "\x1\x2\x1\x2F\x30\x64" set representation at offset 1 in the string table that we calculated earlier. It reads an operand from the execution stack that we should verify that the set repeats exactly once. It calls CharInClassRecursive that does the following:
    1. It sees that the first character, 'w', is not in the character range ['/', '0'). This check corresponds to the '/' in the "[^\s/]" part of the regex.
    2. It next tries CharInCategory which notes that 'w' is part of the "LowercaseLetter" UnicodeCategory. The magic number 0x64 in our set tells us to do a Char.IsWhiteSpace check on it. This too fails.
    3. Although both checks fail, the interpreter sees that it needs to flip the result since it is a negated (^) set. This makes the character class match succeed.
  6. Setloop - A "loop" instruction is like a "rep" one except that it isn't forced to match anything. In our case, we see that we loop for a maximum of Int32.MaxValue times on the same set we saw in "Setrep." Here you can see that the code generation phase turned the "+" in "[^\s/]+" of the regex into a Setrep of 1 followed by a Setloop. This is equivalent to "[^\s/][^\s/]*". The loop keeps chomping characters until it finds the '/' which causes it to call BackwardNext() which sets the current position to just before the final '/'.
  7. CaptureMark - Here we start capturing group 1 by popping the "run stack" which gives us 18. Our current offset is 35. We capture the string between these two positions, "www.moserware.com", and keep it for later use in case the entire regex succeeds.
  8. Oneloop - Here we do a loop at most one time that will check for the '/' character. It succeeds.
  9. CaptureMark - We capture into group 0 the value between the offset on the "run stack", which is 11 (the start of "http://"), and the last character of the string at offset 36. The string between these offsets is "http://www.moserware.com/".
  10. Stop - We're done executing RegexCode and can stop the interpreter.

Since we stopped with successful captures, the Match is declared a success. Sure enough, if we look at our console window, we see:

Full uri = 'http://www.moserware.com/'
Host ='www.moserware.com'

Backtracking Down Unhappy Paths

I can hear the cursing shouts of ^#!@.*#!$ from the regex mob coming towards me. They're miffed that I used a toy regular expression with a pathetically easy search text that didn't do anything "interesting."

The mob really shouldn't be that worried. We already have all the essential tools we need to understand how things work.

One common issue that you have to deal with in a "real" regular expression is backtracking.

Let's say you have a search text and pattern like this:

string text = "This text has 1 digit in it";
string pattern = @".*\d"; Regex.Match(text, pattern);

You'd recognize the parse tree:

The only thing new about it is that the '.' pattern was translated into a "Notone" node that matches anything except one particular character (in our case, a line feed). We see that the set follows the obscure, but compact representation. The only thing new to report is that '\x09' is the magic number to represent all Unicode digits (which the Turkey Test showed is more than just [0-9]).

It's painful to watch the regex interpreter work so hard for this match. The ".*" puts it in a Notoneloop that goes right to the end of the string since it doesn't find a line feed ('\n'). It then looks for the Set that represents "\d" and it fails. It has no choice but to backtrack by executing the "RegexCode.Notoneloop | RegexCode.Back" composite instruction which backtracks one character by resetting the "run track" to be the Set instruction again, but this time it will start one character earlier.

Even in our insanely simple search string, the interpreter has to backtrack by executing "RegexCode.Notoneloop | RegexCode.Back" and retesting the Set a total of thirteen times.

An almost identical process occurs if we had used a lazy match regular expression like ".*?\d". The difference is that it does a "Notonelazy" instruction and then gets caught up in a "RegexCode.Notonelazy | RegexCode.Back" backtrack and Set match attempt that happens fourteen times. Each iteration of the loop causes the "Notonelazy" instruction to add one more character instead of removing one like the "Notoneloop" instruction had to. This is typical:

In situations where the decision is between "make an attempt" and "skip an attempt," as with items governed by quantifiers, the engine always chooses to first make the attempt for greedy quantifiers, and to first skip the attempt for lazy (non-greedy) ones.  Mastering Regular Expressions, p.159

If we had a little more empathy for the regex interpreter, we would have written "[^\d]*\d" and avoided all the backtracking, but it wouldn't have shown this common error.

Alternations such as "hello|world" are handled with backtracking. Before each alternative is attempted, the current position is saved on the "run track" and "run stack." If the alternate fails, the regex engine resets the position to what it was before the alternate was tried and the next alternate is attempted.

Now, we can even understand how more advanced concepts like atomic grouping work. If we use a regex like:

\w+:

to match the names of email headers as in:

Subject: Hello World!

Things will work well. The problem will come when we try to match against

Subject

We already know that there is going to be a backtracking since "\w+" will match the whole string and then backtracking will occur as the interpreter desperately tries to match a ':'. If we used atomic grouping, as in:

(?>\w+):

We would see that the generated RegexCode has two extra instructions of Setjump and Forejump in it. These instructions tell the interpreter to do unconditional jumps after matching the "\w+". As the comment for "Forejump" indicates, these unconditional jumps will "zap backtracking state" and be much more efficient for a failed match since backtracking won't occur.

Loose Ends

There are some minor details left. The first time you use any regex, a lot of work goes on initializing all the character classes that are stored as static variables. If you just timed a single Regex, your numbers would be highly skewed by this process.

Another common issue is whether you should use the RegexOptions.Compiled flag. Compiling is handled by the RegexCompiler class. The interesting aspects of the IL code generation is handled exactly like the interpreter, as indicated by this comment:

/* 
* The main translation function. It translates the logic for a single opcode at
* the current position. The structure of this function exactly mirrors
* the structure of the inner loop of RegexInterpreter.Go().
*
* The C# code from RegexInterpreter.Go() that corresponds to each case is
* included as a comment.
*
* Note that since we're generating code, we can collapse many cases that are
* dealt with one-at-a-time in RegexIntepreter. We can also unroll loops that
* iterate over constant strings or sets.
*/

We can see that there is some optimization in the generated code. The down side is that we have to generate all the code regardless of if we use all of it or not. The interpreter only uses what it needs. Additionally, unless we use Regex.CompileToAssembly to save the compiled code to a DLL, we'll end up doing the entire process of creating the parse tree, RegexCode, and code generation at runtime.

Thus, for most cases, it seems that RegexOptions.Compiled isn't worth the effort. But it's good to keep in mind that there are exceptions when performance is critical and your regex can benefit from it (otherwise, why have the option at all?).

Another option is RegexOptions.IgnoreCase that makes everything case insensitive. The vast majority of the process stays the same. The only difference is that all instructions that compare characters will convert each System.Char to lower case, mostly using the Char.ToLower method. This sounds reasonable, but it's not quite perfect. For example, in Koine Greek, the word for "moth" goes from uppercase to lowercase like this:

That is, in Greek, when a "sigma" (Σ) appears in lowercase at the end of a word, it uses a different letter (ς) than if it appeared anywhere else (σ). RegexOptions.IgnoreCase can't handle cases that need more context than a single System.Char even though the string comparison functions can handle this. Consider this example:

string mothLower = "σής"; 
string mothUpper = mothLower.ToUpper(); // "ΣΉΣ"
bool stringsAreEqualIgnoreCase = mothUpper.Equals(mothLower, StringComparison.CurrentCultureIgnoreCase);  // true
bool stringsAreEqualRegex = Regex.IsMatch(mothLower, mothUpper, RegexOptions.IgnoreCase); // false

This also means that .NET's Regex won't do well with characters outside the Basic Multilingual Plane that need to be represented by more than one System.Char as a "surrogate pair."

I bring all of these "cases" up because it obviously troubled one of the Regex programmers who wrote this comment twice:

// We do the ToLower character by character for consistency.  With surrogate chars, doing 
// a ToLower on the entire string could actually change the surrogate pair.  This is more correct
// linguistically, but since Regex doesn't support surrogates, it's more important to be
// consistent.

You can tell the author was fully anticipating the bug reports that eventually came as a result of this decision. Unfortunately, due to the way the code is structured, changing this behavior would take a hefty overhaul of the engine and would require a massive amount of regression testing. I'm guessing this is the reason why it won't be coming in a service pack anytime soon.

The last interesting option that affects most of the code is RegexOptions.RightToLeft. For the most part, this affects where the searching starts and how a "bump" is applied. When the engine wants to move forward or get the characters to the "right", it checks this option to see if it should move +1 or -1 character from the current position. It's a simple idea, but its implementation is with many "if(!runrtl)" statements spread throughout the code.

Finally, you might be interested in how Mono's regular expression compares with Microsoft's. The good news is that the code is also available online as well. In general, Mono's implementation is very similar. Here are some of the (minor) differences:

  • Mono's parse tree has a similar shape, but it uses more strongly typed classes. For example, sets such as [^\s/] are given their own class rather than encoded as a single string.
  • The Boyer-Moore prefix optimization is done in the QuickSearch class. It is calculated at run-time and is only used if the search string is longer than 5 characters.
  • The regex machine doesn't have a separate string table for referencing strings like "http://". Each character is passed in as an argument to the instruction.

Conclusion

Weighing in around 14,000 lines of code, .NET's regular expression engine takes awhile to digest. After getting over the shock of its size, it was relatively straightforward to understand. Seeing the real source code, with its occasional funny comments, provided insight that Reflector simply couldn't offer. In the end, we see that a .NET regular expression pattern is simply a compact representation for its internal RegexCode machine language.

This whole process has allowed me to finally connect with regular expressions and give them a splash of empathy. Seeing the horror of backtracking first hand in the debugger was enough for me to want to do everything in my power to get rid of it. Following the translation process down to the RegexCode level clued me into how my regex pattern will actually execute. Feeling the wind fly by a regex using the Boyer-Moore prefix optimization has encouraged me to do whatever I can to put string literals at the front of a pattern.

It's all these little things that add up to a blazingly fast regular expression.


kick it on DotNetKicks.com

Tuesday, February 3, 2009

Rebooting Computing: Why?

Have you seen “The Most Famous Chart in Computer Science Education?” The exact numbers and data sources vary, but the curve always looks similar:

I used data from college bound seniors who indicated on their SAT that they intended to major in “Computer and Information Sciences and Support Services.” The curve tends peak between 1999 and 2001 and then you see a huge decline that has just begun to bottom out to numbers less than half their peak value.

Some people like to explain away this drop on another curve:

Although there was a correlation of computer science enrollment and the stock market, you'll see that the curves diverge around 2003. A popular belief is that the bursting dot-com bubble scared some potential students, but then by 2003 parents thought most software jobs were being offshored and encouraged their kids to pick a different field.

Some jobs did go to Bangalore, but the total number of jobs actually grew in the US, even beyond their 1999 levels. There are still excellent job prospects for the long term. Even in hard times like the 1970's recession, companies like Apple and Microsoft were founded. In 10 years, we'll know of a several great companies that got their start in the current financial crisis. It's unfortunate how students and their parents have been misled about the reality.

We're faced with a pipeline problem. The fresh-outs you'll be looking to hire in 10 years are in middle school right now. Are you doing anything to woo them to a career in computing? What do you say to a bright young girl to at least consider looking at computer science?

I could start with my story. As a kid, I was captivated by how I could think of an idea for a program and then have a computer execute it exactly. It was as if I could put part of my mind inside the computer. My parents and friends thought this was magic. It was magic, but it was a magic I could understand. There's always cool new things being computed. It was magical to see how Deep Blue beat Kasparov long before I began to understand the beauty of how it worked. Every day I listen to MP3s that were created by a compression algorithm that gets rid of sound that my ear can't hear. Companies like Walmart sift terabytes of data to predict market demand so they can send extra strawberry Pop-Tarts when hurricanes are approaching. On a personal level, new algorithms predict with a high accuracy what types of movies we'll like. But that's just one tiny sliver of what's out there. Each person can have their own unique experience. There's a lot of great computing going on and the demand is only increasing.

Sometimes you'll hear honest hesitations about a career in computing because of fears that they'll “sit in front of a computer all day.” The sad irony is that this thinking causes people to go into fields like accounting, graphic design, marketing, or hundreds of other fields where they spend just as much time in front of a computer. The difference is that they'll spend most of their time using applications like Outlook, Word, or Excel and often have less fun than the software developers that are creating these programs. Moreover, working in the computing fields will give someone a chance to create future interfaces that don't have everyone in front of giant screens all day.

This isn't to say that there aren't boring software development jobs, but there are also plenty of great jobs. It's a great career. I sometime feel a little guilty that I can work in a field that I enjoy. As a field, we create software that powers business communications and connects you with friends. Software will be a pivotal role in gene sequencing and analyzing that will usher in customized medical treatments and drugs. Software will drive many great innovations of the future.

It's sad that kids aren't even given a chance to see the breadth and excitement of our field. I don't blame them. On the whole, we're doing a terrible job broadcasting our image.

Professors often teach computer science as if it is some sterile thing with nothing new in it. This is just not true; we're in our adolescence. Sure, we bumble around and do silly things at times, but it also means we're growing. We're in an incredible time. Mehran Sahami captured this well at the conclusion of his intro class at Stanford:

“Think about the time that you're living in. Don Knuth, who is considered the father of Computer Science is still alive and he's in this department. It's sort of like you're geometers and you're living in the time of Euclid... It's all happening now. Don't think of this stuff as dead people who did this stuff and it just happened and now you're forced to do it. You're living in it.”

Mehran is one of the few teachers that do a great job of sharing the new and exciting possibilities of computer science. Too often you see teachers that claim that computer science is some small box that is exactly what they're teaching.

When I've had the honor of having conversations with computing pioneers like Alan Kay, I often hear how their teachers in the 60's would admit that they didn't understand the full possibilities of computing, and they wanted their students do better. The early ARPA community with J.C.R. Licklider at the helm is a great example of this style.

Licklider encouraged and funded wild and imaginative ideas that caused a huge boom in computer science back in the 60's and early 70's. Unfortunately, this slowed down in the 1980s as funders became more conservative, took fewer risks, and as a result got more incremental improvements rather than something big. I think this is why Alan Kay has difficulty finding significant new inventions in computing since 1980. Alan's statement sounds crazy until you see just how much of what we use was started before 1980. Some will point to the web and the browser as a huge new invention, but even Marc Andreessen, speaking on how he was able to create the first graphical web browser revealed:

“I was able to do it so quickly because it was the icing on the cake that had been baking for 30 years.”

Indeed he had. In the mid-60's, some of the baking started when Licklider funded Doug Englebart's amazing oN-Line System that amazingly had hypertext links and was operated by a mouse. Len Kleinrock's Ph.D. proposal on packet theory in 1961 gave him a great start that ultimately led to his team sending the first message over ARPAnet in 1969. Vint Cerf and Bob Kahn had already published a paper on TCP/IP, the bedrock of the Internet protocols, by 1974. All of these technologies were well refined and in production by the time Tim Berners-Lee created HTTP in the early 90's to which Andreessen would add a graphical front end.

Are we willing to fund long-term “wild” and “crazy” ideas today to create Internet-sized future results? We've been too focused on short term results. It's not just academia; most companies focus on short-terms results that dismiss the fact that computing is a young field and miss what really matters:

“Many HR departments haven't figured this out yet, but in reality, It's less important to know Java, Ruby, .NET, or the iPhone SDK. There's always going to be a new technology or a new version of an existing technology to be learned. The technology itself isn't as important; it's the constant learning that counts.” - Pragmatic Thinking and Learning, p.145

Bummer, Now What?

Last March, I was given the opportunity to be on a design team to start tackling some of these problems. We knew that we couldn't change the whole field and that some people would want to keep it the way it is. But we also knew that we had to do something; we didn't want to settle for the status quo.

Our primary task was to plan a “summit” of the best people from academia, government, and industry and get them all in one room so that we could get a good sample of the entire field. We didn't want to let people have the chance to point fingers outside and say it was somebody else's problem.

After working on the basic concept for the summit, we needed to give it a name. I had enjoyed many side discussions of the great days of Licklider, PARC, and the early culture that accomplished great things. Sort of as a joke, I thought that we needed to “reboot” computer science to get rid of the cruft that had accumulated over time and get back to the excitement when the field was brand new. After some discussion, we decided to change “computer science” to a broader field of “computing” and use the “magic and beauty” of computer science to be the driver of the “rebooting.”

Only later would I realize the rebooting metaphor could be stretched a bit further. We can “reboot” the field without throwing away the good parts just as an operating system can reboot and depend on its valuable non-volatile memory being preserved. Rebooting doesn't mean that we'll go down the same crufty path (e.g. perhaps we have better “drivers” now). Most importantly, the domain name was available so we ran with it.

After nine months of planning and inviting over 220 people, we had our summit in January at the Computer History Museum. It was the first time that such a broad representation of the computing field came to work together in the same room.

The summit was guided by the Appreciative Inquiry process. It's a technique that has you work in small teams to discover a positive core of what's giving the field life and then uses that to start dreaming of a better future. As the three days unraveled, we made it to the “design” phase where we kicked off several projects that fell into three rough categories:

  • Education
    • K-8 Fundamentals (Creating engaging introductions of computing fundamentals at the elementary level)
    • Project/Problem Based Learning for Grades 7-14
    • CS in K-12: Essential Subject (Determining a path to get computing essentials introduced in the K-12 curriculum)
    • Recruiting CS Teachers (Significantly increase the number of computer science teachers)
    • National Curriculum for Multi-Disciplinary Collaboration
    • International Educational Repository (that would include classroom activities and ideas for the K-16 level)
  • Outreach
    • LabRats (Build learning communities that include after school activities focused on areas like computer science)
    • Recruiting Women & Minorities into CS
    • Image of Computing (Sort of like a marketing campaign to show how computing is changing the world)
    • Tools for Fun and Beauty (Providing software tools for people sharing the fun and beauty of computing)
    • Relevant Computer Science Intersecting with Socially Relevant Projects (e.g. providing infrastructure in third world countries or disaster response scenarios)
    • Defining Future Computing Requirements for IT Service Verticals (e.g. Health Care, Financial, Government)
  • Internal Growth
    • Parallel Worlds Initiative (Leverage the boom in multi-core to drive new areas of thinking in computing)
    • Open Artifacts (e.g. create hardware and software systems that can easily be inspected, understood, assembled, disassembled and reused in new ways to allow for exploration)
    • Rediscovering Computing “Gems” (Revisit ideas from the past that might have been previously abandoned because they were infeasible but now might be possible)
    • Computing Field Guide (Create a resource to show the breadth of computing to both novices and experts)

I joined the Computing Field Guide team. I think it'll be fun to see if we can come up with a way to leverage many of the great existing resources out there and learn some of the breadth of the field as a result.

In the end, the three day summit was just the beginning of a long journey. It was a bit chaotic and some were disappointed we didn't fix everything right then or do more, but I think the summit gave a good context for the issues our field faces. My best memories include all of the great people that I met and being able to engage in some great conversations.

Steve Jobs said “you can't connect the dots looking forward; you can only connect them looking backwards.” I think that holds true for this summit. There are still a lot of dots between here and there; wherever “there” might be. With a lot of hard work, I'm confident that the dots will connect somehow. There's an interesting road ahead and I want to be a part of it. We need to start baking cakes for future generations to ice.

Your Turn

What do you think of the current state of computer science? What is your dream for the future? What are some things that we can start now to improve the current situation? Are you involved with any existing effort? Would you like to join ours?

I'd love to hear your thoughts.

P.S. There are several blog posts by fellow “Rebooters” ([1] [2] [3] [4] [5] [6] [7] [8] [9]). There are some pictures on flickr and a Rebooting Computing Community site that's starting to have follow-on discussion and might eventually have videos of highlights from the summit. In addition, you might watch this great talk by Dr. Peter Denning who has been leading this effort for over five years.

Friday, January 2, 2009

Wetware Refactorings

Brain rot.

While others were enjoying the final months of their college senioritis, I was worried about brain rot. It hit me that in order to be bad or at best mediocre at something, all I had to do was... nothing. From a boring job to a bad marriage, it seemed that the secret to mediocrity was to coast along and not make any adjustments or actively work to make things better. Towards graduation, I knew that I had the option of going into the workplace and just let things happen, stop actively learning, and let my brain rot. It'd take awhile for people to realize that I'd stopped caring about learning. But by then, I'd be locked into a job where it didn't matter.

The alternative was to force myself to keep learning and accept the fact that it would never be easy. Fearful of brain rot, I decided to pick this second path. But I was on my own. Unlike school, there wouldn't be anyone forcing me to learn new things. Even with this decision, there was a slight feeling I was part of a losing game.

Throughout my school years, the consistent message about the brain was that you were given a limited number of brain cells at birth and you had to do your best to keep all the ones you could. At best, you could hope to somehow mature the ones that remained. Popular culture promoted this message and sometimes joked about it, such as Cliff's "Buffalo Theory" on Cheers:

"Well you see, Norm, it's like this... A herd of buffalo can only move as fast as the slowest buffalo and when the herd is hunted, it is the slowest and weakest ones at the back that are killed first. This natural selection is good for the herd as a whole, because the general speed and health of the whole group keeps improving by the regular killing of the weakest members. In much the same way, the human brain can only operate as fast as the slowest brain cells. Now, as we know, excessive drinking of alcohol kills brain cells. But naturally, it attacks the slowest and weakest brain cells first. In this way, regular consumption of beer eliminates the weaker brain cells, making the brain a faster and more efficient machine. And that, Norm, is why you always feel smarter after a few beers."

I kept this depressing view of brain cell scarcity until I learned about neurogenesis:

Fortunately, professor Elizabeth Gould thought otherwise. In a discovery that turned the field on its ear, she discovered neurogenesis -- the continued birth of new brain cells throughout adulthood. But here's the funny part. The reason researchers had never witnessed neurogenesis previously was because of the environment of their test subjects.

If you're a lab animal stuck in a cage, you will never grow new neurons.

If you're a programmer stuck in a drab cubicle, you will never grow new neurons.

On the other hand, in a rich environment with things to learn, observe, and interact with, you will grow plenty of new neurons and new connections between them.

For decades, scientists were misled because an artificial environment (sterile lab cages) created artificial data. Once again, context is key. Your working environment needs to be rich in sensory opportunities, or else it will literally cause brain damage. Pragmatic Thinking and Learning, p.67

Neurobics

While listening to a segment on Radio Lab, I finally realized that the brain doesn't have one central command structure that interprets the world. It's more likely that each sensory organ feeds in its own "note" to your mind and it's the combined musical "chord" that you perceive as a thought or memory.

In an attempt to take advantage of this discovery, I checked out the book "Keep Your Brain Alive: 83 neurobic exercises to help prevent memory loss and increase mental fitness." It was a bit odd for me to read since the book is targeted at people twice my age, but it highlighted the interesting concept of "neurobics" which are like neural aerobics. The book explains how neurobics take advantage of the decentralized nature of your brain:

      1. Involve one or more of your senses in a novel context. By blunting the sense you normally use, force yourself to rely on other senses to do an ordinary task. For instance: Get dressed for work with your eyes closed.
      2. Engage your attention To stand out from the background of everyday events and make your brain go into alert mode, an activity has to be unusual, fun, surprising, engage your emotions, or have meaning for you. (e.g. turn the pictures on your desk upside down)
      3. Break a routine activity in an unexpected, nontrivial way. Novelty just for its own sake is not highly neurobic. Routine breakers include taking a completely new route to work and shopping at a farmers market instead of a supermarket.
        - Keep Your Brain Alive, pp.33-34

Some of my favorite suggestions from the book include: My desk with the chessboard

  • Shower with your eyes closed. (p.43)
  • Brushing roulette: brush your teeth with your nondominant hand. (p.44) This also works for other routine activities like using your mouse.
  • The sightless start: enter and get ready to start your car with your eyes closed. (p.55)
  • Take brain breaks: a walk or social lunch break can help invigorate the mind. (p.81)
  • Have an ongoing chess game.

The last was the most interesting, and had me involve coworkers:

We know of one office where a chessboard was left out near the water cooler. Any employee could come to the board (preferably during a break), assess the situation, and make a move. It was an ongoing game, with no known players, and no winners or losers.

Even a novice chess player will weigh dozens of possible moves, attempt to visualize the consequences of each one, then select the move that offers some strategic advantage. This type of "Random-player" chess game doesn't allow anyone to develop a long-term strategy. But it does require visual-spatial thinking that is different from what most of us do at work. The brief gear switching provides a break from verbal, left-brain activities and lets the "working brain" take a breather. - Keep Your Brain Alive, p.83

In my case, the chess game didn't work out exactly like the book described, but it has been fun. I set up a small magnetic chess board on my desk and encouraged anyone to come and make a move. I thought it'd be interesting to have clear winner and loser. This posed a challenge of how to keep track of the game that had no time limit or fixed set of players. I used a business card to track of whose turn it is. When the side with the printing on it is turned up, it means that it's black's turn; otherwise it's white's turn. This "protocol" has worked out well and has allowed several coworkers to play throughout the day. As a side benefit, it's caused me to actively work at getting better at chess.

Refactoring Your Wetware

Last week, I read through Andy Hunt's excellent "Pragmatic Thinking & Learning: Refactor Your Wetware" that has many more pragmatic brain tips that are written from a programmer perspective. The book is centered around the Dreyfus Model of skill acquisition where learning starts at a beginner stage where you need a lot of rules to function and then transitions all the way up to an expert level where you rely heavily on intuition. Here are just a few of the many ideas presented in the book:

    • Add sensory experience to engage more of your brain: continuing on the neurobics theme, Andy Hunt has some interesting ideas (pp.74-75):

      • Instead of UML diagrams, try toy blocks or Lego bricks.
      • Use physical CRC cards to engage your touch and spatial senses.
      • Draw a picture ("not UML or an official diagram; just a picture")
      • Describe your design verbally.
      • Act out your software design with coworkers.
    • Create a "Pragmatic Investment Plan" where you treat your skills and what you want to learn with at least as much care as you take with your finances and apply similar skills (e.g. diversify, have short and long term goals). Make sure your goals are Specific, Measurable, Achievable, Relevant, and Time-Boxed (SMART) (pp.154-156)
    • Consider starting a reading group with people in the office to help learn a skill from a book. (p.164)
    • Make sure you have freedom to explore ideas by having a "starter kit" infrastructure of version control, unit testing, and build automation. This gives you the safety to fail, which is often needed when learning. (p.192)
    • Unscaffolding: artificially make things harder than they need to be so that when you have to do it for real, it seems a lot easier (e.g. train for running by tying weights to your ankles). I like how the author claims that this is why Ruby programmers should at least spend some time working in C++ (p.206)
    • Develop Your Exocortex: Have a place outside your brain where you can store and organize thoughts. The author recommends a personal wiki. I personally prefer to use Google Notebook. (p.221)

Be the Query

I enjoyed the discussion in the book on changing your perspective, especially when it comes to feeling like you're inside the system you're creating:

Imagine yourself as an integral component of the problem you're working on: suppose you are the database query or the packet on the network. When you get tired of waiting in line, what will you do? Who would you tell? - (p.107)

I think that this approach can help lead to a richer Object Thinking in software designs. For example, if you can picture yourself as a database query, I think you'll have much more empathy for what's going on. This thinking might influence your design to avoid painful table scans and favor indexes.

My oblique strategies coffee cup and my Rubik's cube that I like to play with.

Another way to change your perspective is to use oblique strategies:

These questions and statements force you to draw analogies and to think more deeply about the problem. They are a great resource to draw on when you' get stuck. - (p.108)

Some example strategies include:

  • Use fewer notes
  • Don't be afraid of things because they're easy to do.
  • What wouldn't you do?

I printed a bunch of cards with each strategy on them and keep them by my desk in a coffee mug. It's a new addition, so I don't have long term results, but it is fun to take one out and think of how it might relate to a problem I'm working on when I get stuck.

Keep Your Brain Cache Warm

Many brain changing ideas require you to focus. This is hard to do when our jobs often require us to multitask or our culture encourages us to check email frequently in order to respond quickly and look good among our peers. There are dangers to this:

Scientists agree that trying to focus on several things at once means you'll do poorly at each of them.

And if that wasn't bad enough, a controversial study done in the United Kingdom noted that if you constantly interrupt your task to check email or respond to an IM text message, your effective IQ drops ten points.

By comparison, smoking a marijuana joint drops your IQ a mere four points. - Pragmatic Thinking and Learning, p.228

This is really hard to overcome. Since reading this, I've tried to avoid checking Outlook more than once an hour. After each check, I shut it down to prevent the distraction. I've found that it's helpful to actually shut it down; otherwise it's too tempting to ALT+TAB to it or get lured in by a notification message.

If you replace "phone" with "email", you can see that Peopleware described over two decades ago why most people are working far below their potential of being in a state of "flow" where you actually get stuff done:

If the average incoming [email] takes five minutes and your reimmersion period is fifteen minutes, the total cost of that [email] in flow time (work time) lost is twenty minutes. A dozen [emails] use up half a day. A dozen other interruptions and the rest of the work day is gone. This is what guarantees, "You never get anything done around here between 9 and 5." - Peopleware 2nd edition, p.63

Therefore, for the sake of your sanity and focus, be careful before flushing your brain cache by doing a context switch. The time saved can be used for more creative thinking and getting things done.

Is this All a Bunch of Hooey?

I admit that some of these ideas sound silly. I haven't yet worked up the courage to bring Legos to a design meeting yet, but I can see how it would help use a different neural pathway and activate more gray matter. Surely that couldn't hurt, right? It'd probably bring a new perspective and overcome a road block. This is probably why some of the most creative companies encourage playful behavior at work. This might be seen as a horrible waste of time to traditional management (and shareholders for that matter), but it's much more likely to use more of your brain which in turn might create better solutions.

One alternative to all of this is to just coast along and not take thinking and learning into your own hands. Left on their own, your company is likely to put you through a sheep dip using a "training course" in an artificial environment that will easily wear off just like a farmer might dunk his sheep into a chemical bath to get rid of parasites which will also wear off and need to be repeated.

While sometimes uncomfortable at first, I've enjoyed incorporating some "wetware refactoring" into my daily life as a way to overcome brain rot and aid learning and thinking. It sure beats a sheep dip.

What do you do? Do you have specific wetware refactorings that you do? I'd love to hear them; the "crazier" they are the more likely they'll probably activate different neural pathways and probably lead to a better result. Let me know in the comments.