Wednesday, June 10, 2009

The First Few Milliseconds of an HTTPS Connection

Convinced from spending hours reading rave reviews, Bob eagerly clicked "Proceed to Checkout" for his gallon of Tuscan Whole Milk and...

Whoa! What just happened?

In the 220 milliseconds that flew by, a lot of interesting stuff happened to make Firefox change the address bar color and put a lock in the lower right corner. With the help of Wireshark, my favorite network tool, and a slightly modified debug build of Firefox, we can see exactly what's going on.

By agreement of RFC 2818, Firefox knew that "https" meant it should connect to port 443 at Amazon.com:

Most people associate HTTPS with SSL (Secure Sockets Layer) which was created by Netscape in the mid 90's. This is becoming less true over time. As Netscape lost market share, SSL's maintenance moved to the Internet Engineering Task Force (IETF). The first post-Netscape version was re-branded as Transport Layer Security (TLS) 1.0 which was released in January 1999. It's rare to see true "SSL" traffic given that TLS has been around for 10 years.

Client Hello

TLS wraps all traffic in "records" of different types. We see that the first byte out of our browser is the hex byte 0x16 = 22 which means that this is a "handshake" record:

The next two bytes are 0x0301 which indicate that this is a version 3.1 record which shows that TLS 1.0 is essentially SSL 3.1.

The handshake record is broken out into several messages. The first is our "Client Hello" message (0x01). There are a few important things here:

  • Random:


    There are four bytes representing the current Coordinated Universal Time (UTC) in the Unix epoch format, which is the number of seconds since January 1, 1970. In this case, 0x4a2f07ca. It's followed by 28 random bytes. This will be used later on.
  • Session ID:


    Here it's empty/null. If we had previously connected to Amazon.com a few seconds ago, we could potentially resume a session and avoid a full handshake.
  • Cipher Suites:


    This is a list of all of the encryption algorithms that the browser is willing to support. Its top pick is a very strong choice of "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA" followed by 33 others that it's willing to accept. Don't worry if none of that makes sense. We'll find out later that Amazon doesn't pick our first choice anyway.
  • server_name extension:


    This is a way to tell Amazon.com that our browser is trying to reach https://www.amazon.com/. This is really convenient because our TLS handshake occurs long before any HTTP traffic. HTTP has a "Host" header which allows a cost-cutting Internet hosting companies to pile hundreds of websites onto a single IP address. SSL has traditionally required a different IP for each site, but this extension allows the server to respond with the appropriate certificate that the browser is looking for. If nothing else, this extension should allow an extra week or so of IPv4 addresses.

Server Hello

Amazon.com replies with a handshake record that's a massive two packets in size (2,551 bytes). The record has version bytes of 0x0301 meaning that Amazon agreed to our request to use TLS 1.0. This record has three sub-messages with some interesting data:

  1. "Server Hello" Message (2):

    • We get the server's four byte time Unix epoch time representation and its 28 random bytes that will be used later.
    • A 32 byte session ID in case we want to reconnect without a big handshake.
    • Of the 34 cipher suites we offered, Amazon picked "TLS_RSA_WITH_RC4_128_MD5" (0x0004). This means that it will use the "RSA" public key algorithm to verify certificate signatures and exchange keys, the RC4 encryption algorithm to encrypt data, and the MD5 hash function to verify the contents of messages. We'll cover these in depth later on. I personally think Amazon had selfish reasons for choosing this cipher suite. Of the ones on the list, it was the one that was least CPU intensive to use so that Amazon could crowd more connections onto each of their servers. A much less likely possibility is that they wanted to pay special tribute to Ron Rivest, who created all three of these algorithms.
  2. Certificate Message (11):


    • This message takes a whopping 2,464 bytes and is the certificate that the client can use to validate Amazon's. It isn't anything fancy. You can view most of its contents in your browser:


  3. "Server Hello Done" Message (14):


    • This is a zero byte message that tells the client that it's done with the "Hello" process and indicate that the server won't be asking the client for a certificate.

Checking out the Certificate

The browser has to figure out if it should trust Amazon.com. In this case, it's using certificates. It looks at Amazon's certificate and sees that the current time is between the "not before" time of August 26th, 2008 and before the "not after" time of August 27, 2009. It also checks to make sure that the certificate's public key is authorized for exchanging secret keys.

Why should we trust this certificate?

Attached to the certificate is a "signature" that is just a really long number in big-endian format:

Anyone could have sent us these bytes. Why should we trust this signature? To answer that question, need to make a speedy detour into mathemagic land:

Interlude: A Short, Not Too Scary, Guide to RSA

People sometimes wonder if math has any relevance to programming. Certificates give a very practical example of applied math. Amazon's certificate tells us that we should use the RSA algorithm to check the signature. RSA was created in the 1970's by MIT professors Ron *R*ivest, Adi *S*hamir, and Len *A*dleman who found a clever way to combine ideas spanning 2000 years of math development to come up with a beautifully simple algorithm:

You pick two huge prime numbers "p" and "q." Multiply them to get "n = p*q." Next, you pick a small public exponent "e" which is the "encryption exponent" and a specially crafted inverse of "e" called "d" as the "decryption exponent." You then make "n" and "e" public and keep "d" as secret as you possibly can and then throw away "p" and "q" (or keep them as secret as "d"). It's really important to remember that "e" and "d" are inverses of each other.

Now, if you have some message, you just need to interpret its bytes as a number "M." If you want to "encrypt" a message to create a "ciphertext", you'd calculate:

C ≡ Me (mod n)

This means that you multiply "M" by itself "e" times. The "mod n" means that we only take the remainder (e.g. "modulus") when dividing by "n." For example, 11 AM + 3 hours ≡ 2 (PM) (mod 12 hours). The recipient knows "d" which allows them to invert the message to recover the original message:

Cd ≡ (Me)d ≡ Me*d ≡ M1 ≡ M (mod n)

Just as interesting is that the person with "d" can "sign" a document by raising a message "M" to the "d" exponent:

Md ≡ S (mod n)

This works because "signer" makes public "S", "M", "e", and "n." Anyone can verify the signature "S" with a simple calculation:

Se ≡ (Md)e ≡ Md*e ≡ Me*d ≡ M1 ≡ M (mod n)

Public key cryptography algorithms like RSA are often called "asymmetric" algorithms because the encryption key (in our case, "e") is not equal to (e.g. "symmetric" with) the decryption key "d". Reducing everything "mod n" makes it impossible to use the easy techniques that we're used to such as normal logarithms. The magic of RSA works because you can calculate/encrypt C ≡ Me (mod n) very quickly, but it is really hard to calculate/decrypt Cd ≡ M (mod n) without knowing "d." As we saw earlier, "d" is derived from factoring "n" back to its "p" and "q", which is a tough problem.

Verifying Signatures

The big thing to keep in mind with RSA in the real world is that all of the numbers involved have to be big to make things really hard to break using the best algorithms that we have. How big? Amazon.com's certificate was "signed" by "VeriSign Class 3 Secure Server CA." From the certificate, we see that this VeriSign modulus "n" is 2048 bits long which has this 617 digit base-10 representation:

1890572922 9464742433 9498401781 6528521078 8629616064 3051642608 4317020197 7241822595 6075980039 8371048211 4887504542 4200635317 0422636532 2091550579 0341204005 1169453804 7325464426 0479594122 4167270607 6731441028 3698615569 9947933786 3789783838 5829991518 1037601365 0218058341 7944190228 0926880299 3425241541 4300090021 1055372661 2125414429 9349272172 5333752665 6605550620 5558450610 3253786958 8361121949 2417723618 5199653627 5260212221 0847786057 9342235500 9443918198 9038906234 1550747726 8041766919 1500918876 1961879460 3091993360 6376719337 6644159792 1249204891 7079005527 7689341573 9395596650 5484628101 0469658502 1566385762 0175231997 6268718746 7514321

(Good luck trying to find "p" and "q" from this "n" - if you could, you could generate real-looking VeriSign certificates.)

VeriSign's "e" is 2^16 + 1 = 65537. Of course, they keep their "d" value secret, probably on a safe hardware device protected by retinal scanners and armed guards. Before signing, VeriSign checked the validity of the contents that Amazon.com claimed on its certificate using a real-world "handshake" that involved looking at several of their business documents. Once VeriSign was satisfied with the documents, they used the SHA-1 hash algorithm to get a hash value of the certificate that had all the claims. In Wireshark, the full certificate shows up as the "signedCertificate" part:

It's sort of a misnomer since it actually means that those are the bytes that the signer is going to sign and not the bytes that already include a signature.

The actual signature, "S", is simply called "encrypted" in Wireshark. If we raise "S" to VeriSign's public "e" exponent of 65537 and then take the remainder when divided by the modulus "n", we get this "decrypted" signature hex value:

0001FFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFF00302130 0906052B0E03021A 05000414C19F8786 871775C60EFE0542 E4C2167C830539DB

Per the PKCS #1 v1.5 standard, the first byte is "00" and it "ensures that the encryption block, [when] converted to an integer, is less than the modulus." The second byte of "01" indicates that this is a private key operation (e.g. it's a signature). This is followed by a lot of "FF" bytes that are used to pad the result to make sure that it's big enough. The padding is terminated by a "00" byte. It's followed by "30 21 30 09 06 05 2B 0E 03 02 1A 05 00 04 14" which is the PKCS #1 v2.1 way of specifying the SHA-1 hash algorithm. The last 20 bytes are SHA-1 hash digest of the bytes in "signedCertificate."

Since the decrypted value is properly formatted and the last bytes are the same hash value that we can calculate independently, we can assume that whoever knew "VeriSign Class 3 Secure Server CA"'s private key "signed" it. We implicitly trust that only VeriSign knows the private key "d."

We can repeat the process to verify that "VeriSign Class 3 Secure Server CA"'s certificate was signed by VeriSign's "Class 3 Public Primary Certification Authority."

But why should we trust that? There are no more levels on the trust chain.

The top "VeriSign Class 3 Public Primary Certification Authority" was signed by itself. This certificate has been built into Mozilla products as an implicitly trusted good certificate since version 1.4 of certdata.txt in the Network Security Services (NSS) library. It was checked-in on September 6, 2000 by Netscape's Robert Relyea with the following comment:

"Make the framework compile with the rest of NSS. Include a 'live' certdata.txt with those certs we have permission to push to open source (additional certs will be added as we get permission from the owners)."

This decision has had a relatively long impact since the certificate has a validity range of January 28, 1996 - August 1, 2028.

As Ken Thompson explained so well in his "Reflections on Trusting Trust", you ultimately have to implicitly trust somebody. There is no way around this problem. In this case, we're implicitly trusting that Robert Relyea made a good choice. We also hope that Mozilla's built-in certificate policy is reasonable for the other built-in certificates.

One thing to keep in mind here is that all these certificates and signatures were simply used to form a trust chain. On the public Internet, VeriSign's root certificate is implicitly trusted by Firefox long before you go to any website. In a company, you can create your own root certificate authority (CA) that you can install on everyone's machine.

Alternatively, you can get around having to pay companies like VeriSign and avoid certificate trust chains altogether. Certificates are used to establish trust by using a trusted third-party (in this case, VeriSign). If you have a secure means of sharing a secret "key", such as whispering a long password into someone's ear, then you can use that pre-shared key (PSK) to establish trust. There are extensions to TLS to allow this, such as TLS-PSK, and my personal favorite, TLS with Secure Remote Password (SRP) extensions. Unfortunately, these extensions aren't nearly as widely deployed and supported, so they're usually not practical. Additionally, these alternatives impose a burden that we have to have some other secure means of communicating the secret that's more cumbersome than what we're trying to establish with TLS (otherwise, why wouldn't we use that for everything?).

One final check that we need to do is to verify that the host name on the certificate is what we expected. Nelson Bolyard's comment in the SSL_AuthCertificate function explains why:

/* cert is OK. This is the client side of an SSL connection.
* Now check the name field in the cert against the desired hostname.
* NB: This is our only defense against Man-In-The-Middle (MITM) attacks! */

This check helps prevent against a man-in-the-middle attack because we are implicitly trusting that the people on the certificate trust chain wouldn't do something bad, like sign a certificate claiming to be from Amazon.com unless it actually was Amazon.com. If an attacker is able to modify your DNS server by using a technique like DNS cache poisoning, you might be fooled into thinking you're at a trusted site (like Amazon.com) because the address bar will look normal. This last check implicitly trusts certificate authorities to stop these bad things from happening.

Pre-Master Secret

We've verified some claims about Amazon.com and know its public encryption exponent "e" and modulus "n." Anyone listening in on the traffic can know this as well (as evidenced because we are using Wireshark captures). Now we need to create a random secret key that an eavesdropper/attacker can't figure out. This isn't as easy as it sounds. In 1996, researchers figured out that Netscape Navigator 1.1 was using only three sources to seed their pseudo-random number generator (PRNG). The sources were: the time of day, the process id, and the parent process id. As the researchers showed, these "random" sources aren't that random and were relatively easy to figure out.

Since everything else was derived from these three "random" sources, it was possible to "break" the SSL "security" in 25 seconds on a 1996 era machine. If you still don't believe that finding randomness is hard, just ask the Debian OpenSSL maintainers. If you mess it up, all the security built on top of it is suspect.

On Windows, random numbers used for cryptographic purposes are generated by calling the CryptGenRandom function that hashes bits sampled from over 125 sources. Firefox uses this function along with some bits derived from its own function to seed its pseudo-random number generator.

The 48 byte "pre-master secret" random value that's generated isn't used directly, but it's very important to keep it secret since a lot of things are derived from it. Not surprisingly, Firefox makes it hard to find out this value. I had to compile a debug version and set the SSLDEBUGFILE and SSLTRACE environment variables to see it.

In this particular session, the pre-master secret showed up in the SSLDEBUGFILE as:

4456: SSL[131491792]: Pre-Master Secret [Len: 48]
03 01 bb 7b 08 98 a7 49 de e8 e9 b8 91 52 ec 81 ...{...I.....R..
4c c2 39 7b f6 ba 1c 0a b1 95 50 29 be 02 ad e6 L.9{......P)....
ad 6e 11 3f 20 c4 66 f0 64 22 57 7e e1 06 7a 3b .n.? .f.d"W~..z;

Note that it's not completely random. The first two bytes are, by convention, the TLS version (03 01).

Trading Secrets

We now need to get this secret value over to Amazon.com. By Amazon's wishes of "TLS_RSA_WITH_RC4_128_MD5", we will use RSA to do this. You could make your input message equal to just the 48 byte pre-master secret, but the Public Key Cryptography Standard (PKCS) #1, version 1.5 RFC tells us that we should pad these bytes with random data to make the input equal to exactly the size of the modulus (1024 bits/128 bytes). This makes it harder for an attacker to determine our pre-master secret. It also gives us one last chance to protect ourselves in case we did something really bone-headed, like reusing the same secret. If we reused the key, the eavesdropper would likely see a different value placed on the network due to the random padding.

Again, Firefox makes it hard to see these random values. I had to insert debugging statements into the padding function to see what was going on:

wrapperHandle = fopen("plaintextpadding.txt", "a");
fprintf(wrapperHandle, "PLAINTEXT = ");
for(i = 0; i < modulusLen; i++)
{
fprintf(wrapperHandle, "%02X ", block[i]);
}
fprintf(wrapperHandle, "\r\n");
fclose(wrapperHandle);

In this session, the full padded value was:

00 02 12 A3 EA B1 65 D6 81 6C 13 14 13 62 10 53 23 B3 96 85 FF 24 FA CC 46 11 21 24 A4 81 EA 30 63 95 D4 DC BF 9C CC D0 2E DD 5A A6 41 6A 4E 82 65 7D 70 7D 50 09 17 CD 10 55 97 B9 C1 A1 84 F2 A9 AB EA 7D F4 CC 54 E4 64 6E 3A E5 91 A0 06 00 03 01 BB 7B 08 98 A7 49 DE E8 E9 B8 91 52 EC 81 4C C2 39 7B F6 BA 1C 0A B1 95 50 29 BE 02 AD E6 AD 6E 11 3F 20 C4 66 F0 64 22 57 7E E1 06 7A 3B

Firefox took this value and calculated "C ≡ Me (mod n)" to get the value we see in the "Client Key Exchange" record:

Finally, Firefox sent out one last unencrypted message, a "Change Cipher Spec" record:

This is Firefox's way of telling Amazon that it's going to start using the agreed upon secret to encrypt its next message.

Deriving the Master Secret

If we've done everything correctly, both sides (and only those sides) now know the 48 byte (256 bit) pre-master secret. There's a slight trust issue here from Amazon's perspective: the pre-master secret just has bits that were generated by the client, they don't take anything into account from the server or anything we said earlier. We'll fix that be computing the "master secret." Per the spec, this is done by calculating:

master_secret = PRF(pre_master_secret, "master secret", ClientHello.random + ServerHello.random)

The "pre_master_secret" is the secret value we sent earlier. The "master secret" is simply a string whose ASCII bytes (e.g. "6d 61 73 74 65 72 ...") are used. We then concatenate the random values that were sent in the ClientHello and ServerHello (from Amazon) messages that we saw at the beginning.

The PRF is the "Pseudo-Random Function" that's also defined in the spec and is quite clever. It combines the secret, the ASCII label, and the seed data we give it by using the keyed-Hash Message Authentication Code (HMAC) versions of both MD5 and SHA-1 hash functions. Half of the input is sent to each hash function. It's clever because it is quite resistant to attack, even in the face of weaknesses in MD5 and SHA-1. This process can feedback on itself and iterate forever to generate as many bytes as we need.

Following this procedure, we obtain a 48 byte "master secret" of

4C AF 20 30 8F 4C AA C5 66 4A 02 90 F2 AC 10 00 39 DB 1D E0 1F CB E0 E0 9D D7 E6 BE 62 A4 6C 18 06 AD 79 21 DB 82 1D 53 84 DB 35 A7 1F C1 01 19

Generating Lots of Keys

Now that both sides have a "master secrets", the spec shows us how we can derive all the needed session keys we need using the PRF to create a "key block" where we will pull data from:

key_block = PRF(SecurityParameters.master_secret, "key expansion", SecurityParameters.server_random + SecurityParameters.client_random);

The bytes from "key_block" are used to populate the following:

client_write_MAC_secret[SecurityParameters.hash_size]
server_write_MAC_secret[SecurityParameters.hash_size]
client_write_key[SecurityParameters.key_material_length]
server_write_key[SecurityParameters.key_material_length]
client_write_IV[SecurityParameters.IV_size]
server_write_IV[SecurityParameters.IV_size]

Since we're using a stream cipher instead of a block cipher like the Advanced Encryption Standard (AES), we don't need the Initialization Vectors (IVs). Therefore, we just need two Message Authentication Code (MAC) keys for each side that are 16 bytes (128 bits) each since the specified MD5 hash digest size is 16 bytes. In addition, the RC4 cipher uses a 16 byte (128 bit) key that both sides will need as well. All told, we need 2*16 + 2*16 = 64 bytes from the key block.

Running the PRF, we get these values:

client_write_MAC_secret = 80 B8 F6 09 51 74 EA DB 29 28 EF 6F 9A B8 81 B0
server_write_MAC_secret = 67 7C 96 7B 70 C5 BC 62 9D 1D 1F 4A A6 79 81 61
client_write_key = 32 13 2C DD 1B 39 36 40 84 4A DE E5 6C 52 46 72
server_write_key = 58 36 C4 0D 8C 7C 74 DA 6D B7 34 0A 91 B6 8F A7

Prepare to be Encrypted!

The last handshake message the client sends out is the "Finished message." This is a clever message that proves that no one tampered with the handshake and it proves that we know the key. The client takes all bytes from all handshake messages and puts them into a "handshake_messages" buffer. We then calculate 12 bytes of "verify_data" using the pseudo-random function (PRF) with our master key, the label "client finished", and an MD5 and SHA-1 hash of "handshake_messages":

verify_data = PRF(master_secret, "client finished", MD5(handshake_messages) + SHA-1(handshake_messages)) [12]

We take the result and add a record header byte "0x14" to indicate "finished" and length bytes "00 00 0c" to indicate that we're sending 12 bytes of verify data. Then, like all future encrypted messages, we need to make sure the decrypted contents haven't been tampered with. Since our cipher suite in use is TLS_RSA_WITH_RC4_128_MD5, this means we use the MD5 hash function.

Some people get paranoid when they hear MD5 because it has some weaknesses. I certainly don't advocate using it as-is. However, TLS is smart in that it doesn't use MD5 directly, but rather the HMAC version of it. This means that instead of using MD5(m) directly, we calculate:

HMAC_MD5(Key, m) = MD5((Key ⊕ opad) ++ MD5((Key ⊕ ipad) ++ m)

(The ⊕ means XOR, ++ means concatenate, "opad" is the bytes "5c 5c ... 5c", and "ipad" is the bytes "36 36 ... 36").

In particular, we calculate:

HMAC_MD5(client_write_MAC_secret, seq_num + TLSCompressed.type + TLSCompressed.version + TLSCompressed.length + TLSCompressed.fragment));

As you can see, we include a sequence number ("seq_num") along with attributes of the plaintext message (here it's called "TLSCompressed"). The sequence number foils attackers who might try to take a previously encrypted message and insert it midstream. If this occurred, the sequence numbers would definitely be different than what we expected. This also protects us from an attacker dropping a message.

All that's left is to encrypt these bytes.

RC4 Encryption

Our negotiated cipher suite was TLS_RSA_WITH_RC4_128_MD5. This tells us that we need to use Ron's Code #4 (RC4) to encrypt the traffic. Ron Rivest developed the RC4 algorithm to generate random bytes based on a 256 byte key. The algorithm is so simple you can actually memorize it in a few minutes.

RC4 begins by creating a 256-byte "S" byte array and populating it with 0 to 255. You then iterate over the array by mixing in bytes from the key. You do this to create a state machine that is used to generate "random" bytes. To generate a random byte, we shuffle around the "S" array.

Put graphically, it looks like this:

To encrypt a byte, we xor this pseudo-random byte with the byte we want to encrypt. Remember that xor'ing a bit with 1 causes it to flip. Since we're generating random numbers, on average the xor will flip half of the bits. This random bit flipping is effectively how we encrypt data. As you can see, it's not very complicated and thus it runs quickly. I think that's why Amazon chose it.

Recall that we have a "client_write_key" and a "server_write_key." The means we need to create two RC4 instances: one to encrypt what our browser sends and the other to decrypt what the server sent us.

The first few random bytes out of the "client_write" RC4 instance are "7E 20 7A 4D FE FB 78 A7 33 ..." If we xor these bytes with the unencrypted header and verify message bytes of "14 00 00 0C 98 F0 AE CB C4 ...", we'll get what appears in the encrypted portion that we can see in Wireshark:

The server does almost the same thing. It sends out a "Change Cipher Spec" and then a "Finished Message" that includes all handshake messages, including the decrypted version of the client's "Finished Message." Consequently, this proves to the client that the server was able to successfully decrypt our message.

Welcome to the Application Layer!

Now, 220 milliseconds after we started, we're finally ready for the application layer. We can now send normal HTTP traffic that'll be encrypted by the TLS layer with the RC4 write instance and decrypt traffic with the server RC4 write instance. In addition, the TLS layer will check each record for tampering by computing the HMAC_MD5 hash of the contents.

At this point, the handshake is over. Our TLS record's content type is now 23 (0x17). Encrypted traffic begins with "17 03 01" which indicate the record type and TLS version. These bytes are followed by our encrypted size, which includes the HMAC hash.

Encrypting the plaintext of:

GET /gp/cart/view.html/ref=pd_luc_mri HTTP/1.1
Host: www.amazon.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.10) Gecko/2009060911 Minefield/3.0.10 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
...

will give us the bytes we see on the wire:

The only other interesting fact is that the sequence number increases on each record, it's now 1 (and the next record will be 2, etc).

The server does the same type of thing on its side using the server_write_key. We see its response, including the tell-tale application data header:

Decrypting this gives us:

HTTP/1.1 200 OK
Date: Wed, 10 Jun 2009 01:09:30 GMT
Server: Server
...
Cneonction: close
Transfer-Encoding: chunked

which is a normal HTTP reply that includes a non-descriptive "Server: Server" header and a misspelled "Cneonction: close" header coming from Amazon's load balancers.

TLS is just below the application layer. The HTTP server software can act as if it's sending unencrypted traffic. The only change is that it writes to a library that does all the encryption. OpenSSL is a popular open-source library for TLS.

The connection will stay open while both sides send and receive encrypted data until either side sends out a "closure alert" message and then closes the connection. If we reconnect shortly after disconnecting, we can re-use the negotiated keys (if the server still has them cached) without using public key operations, otherwise we do a completely new full handshake.

It's important to realize that application data records can be anything. The only reason "HTTPS" is special is because the web is so popular. There are lots of other TCP/IP based protocols that ride on top of TLS. For example, TLS is used by FTPS and secure extensions to SMTP. It's certainly better to use TLS than inventing your own solution. Additionally, you'll benefit from a protocol that has withstood careful security analysis.

... And We're Done!

The very readable TLS RFC covers many more details that were missed here. We covered just one single path in our observation of the 220 millisecond dance between Firefox and Amazon's server. Quite a bit of the process was affected by the TLS_RSA_WITH_RC4_128_MD5 Cipher Suite selection that Amazon made with its ServerHello message. It's a reasonable choice that slightly favors speed over security.

As we saw, if someone could secretly factor Amazon's "n" modulus into its respective "p" and "q", they could effectively decrypt all "secure" traffic until Amazon changes their certificate. Amazon counter-balances this concern this with a short one year duration certificate:

One of the cipher suites that was offered was "TLS_DHE_RSA_WITH_AES_256_CBC_SHA" which uses the Diffie-Hellman key exchange that has a nice property of "forward secrecy." This means that if someone cracked the mathematics of the key exchange, they'd be no better off to decrypt another session. One downside to this algorithm is that it requires more math with big numbers, and thus is a little more computationally taxing on a busy server. The "Advanced Encryption Standard" (AES) algorithm was present in many of the suites that we offered. It's different than RC4 in that it works on 16 byte "blocks" at a time rather than a single byte. Since its key can be up to 256 bits, many consider this to be more secure than RC4.

In just 220 milliseconds, two endpoints on the Internet came together, provided enough credentials to trust each other, set up encryption algorithms, and started to send encrypted traffic.

And to think, all of this just so Bob can buy milk.

UPDATE: I wrote a program that walks through the handshake steps mentioned in this article. I posted it to GitHub.

158 comments:

Drew said...

Great post, Jeff. I'm curious to know if there is a way to disable certain algorithms in FF? Could you cut down that list of 34 algorithms? What will Amazon choose if you make FF refuse to use TLS_RSA_WITH_RC4_128_MD5?

Jeff Moser said...

Drew: You can disable cipher suites by typing "about:config" in your address bar and then filter for "RSA"

You'll see a huge list, including "security.ssl3.rsa_rc4_128_md5", if you set this to false and then go back to Amazon.com, you'll see that it now picks "TLS_RSA_WITH_AES_256_CBC_SHA"

Amazon is willing to spend extra time using 256 bit AES, but they still really don't like the expensive Diffie-Hellman key exchange :)

Eddie P said...

Thanks.. now i feel as dumb as a rock... excuse me while i go stare blankly at the walll

Will Shaver said...

This is a fantastic post. Thanks for taking the time to write it!

robburke said...

Really superb post -- tremendously informative and an engaging read!! Thank you Jeff.

Anonymous said...

I learned something, thank you!

Anonymous said...

Thanks for this post, Jeff. Great stuff.

Jim C said...

I love it when somebody like yourself takes the time to make an ambitious post like this one - thanks for contributing this body of information to the searchable web.

- you've likely helped more people than you think you have...

sriram srinivasan said...

Thanks much for the lovely presentation. This is great work.

Eric Frenkiel said...

Fantastic post, Jeff! Your ability to clearly explain difficult concepts is unmatched =) I look forward to dropping by more often.

Anonymous said...

Thanks very much for taking the time to post this article.. Truly fascinating and informative read.

Mr T's Fashion Consultant said...

Great article, very informative. I like Wireshark too. Can you tell us how you launched Wireshark (command line options or whatever) to get the captured data? Thanks for any info.

Adam said...

At least with newer versions of FireFox, you left out an important part of many new ecommerce HTTPS connections - OCSP verification of the validity of the certificate from the CA itself, in real time.
The particular Amazon cert you used didn't make use of this (only the new EV certs do), but for the 'fully secure' new status bar and color change in FF3+ and IE8, you have to have an EV cert with valid OCSP responder.

Gabe Sumner said...

> If nothing else, this extension should allow an extra week or so of IPv4 addresses.

:) Made me chuckle!

chkno said...

Note: RFC 4217 FTP with TLS is more often referred to as FTPS rather than SFTP. SFTP is more commonly used to refer to the SSH File Transfer Protocol.

John Doe said...

DH is not preferred because its costlier, it only does the key-exchange so authentication still needs to be done using DSA or RSA (hence called TLS_DHE_RSA_WITH_AES_256_CBC_SHA), while RSA can be used for both key-exchange and auth. DH additionally might require 'server-key-exchange' SSL messages. Lastly, most crypto-cards support a full RSA handshake operation which can't be used for DH.

C.Santini said...

Thank you! Great post!!
my little contribute for RSA understanding:
RSA encryption by hand :)

Anonymous said...

Thank you.

Tony Oppenheim said...

Wonderful post! Coincidently I've just been reading Steven Levy's great book "Crypto: How the Code Rebels Beat the Government Saving Privacy in the Digital Age" which tells the backstory of how RSA and many of the other encryption methods we now use came to be.

Thanks for posting!

Anonymous said...

I read about week ago about SRP protocol. It is so brillant. It would be really nice, if it will be supported in browsers and webservers.

Nice post.

Anonymous said...

Excellent write up. Thank you for putting this together!!

Anonymous said...

What an excellent walkthrough. I was geekily exhilarating. You took the time to annotate this beautifully so the many might enjoy it. Wunderbar.

Anonymous said...

always mind your p's and q's

JeremyH said...

Nice post. One other thing that goes on even before the HTTPS connect is a DNS lookup to connect to a server called www.amazon.com. Mozilla will first see if 'www.amazon.com' is in the DNS cache and if not will start to recursively resolve the domain name, starting at the root ('.'), then the TLD ('.com.'), then the domain ('amazon.com.'). It all happens before the initial TCP connection. This looks like this:

; <<>> DiG 9.4.2 <<>> +trace www.amazon.com
;; global options: printcmd
. 297443 IN NS J.ROOT-SERVERS.NET.
. 297443 IN NS L.ROOT-SERVERS.NET.
. 297443 IN NS D.ROOT-SERVERS.NET.
. 297443 IN NS G.ROOT-SERVERS.NET.
. 297443 IN NS I.ROOT-SERVERS.NET.
. 297443 IN NS K.ROOT-SERVERS.NET.
. 297443 IN NS M.ROOT-SERVERS.NET.
. 297443 IN NS B.ROOT-SERVERS.NET.
. 297443 IN NS F.ROOT-SERVERS.NET.
. 297443 IN NS H.ROOT-SERVERS.NET.
. 297443 IN NS C.ROOT-SERVERS.NET.
. 297443 IN NS A.ROOT-SERVERS.NET.
. 297443 IN NS E.ROOT-SERVERS.NET.
;; Received 512 bytes from 68.87.71.226#53(68.87.71.226) in 32 ms

com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS i.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
;; Received 492 bytes from 199.7.83.42#53(L.ROOT-SERVERS.NET) in 60 ms

amazon.com. 172800 IN NS udns1.ultradns.net.
amazon.com. 172800 IN NS udns2.ultradns.net.
;; Received 116 bytes from 192.35.51.30#53(f.gtld-servers.net) in 102 ms

www.amazon.com. 887 IN NS ns-923.amazon.com.
www.amazon.com. 887 IN NS ns-921.amazon.com.
www.amazon.com. 887 IN NS ns-912.amazon.com.
www.amazon.com. 887 IN NS ns-911.amazon.com.
;; Received 180 bytes from 204.74.101.1#53(udns2.ultradns.net) in 20 ms

www.amazon.com. 60 IN A 72.21.207.65
;; Received 48 bytes from 72.21.192.209#53(ns-921.amazon.com) in 23 ms

spdalton said...

Brilliant post! Was a great way to see real application for the introduction to networking course I'm taking. Thanks!

Anonymous said...

Great post! Learnt something new today. Thanks.

Anonymous said...

Great post!

Anonymous said...

FTP over SSL is called FTPS. There's enough confusion already regarding SFTP vs. FTPS so please fix your mistake.

Vicente said...

Nice article!

Jan Michael Yu said...

Awesome post. My mind went blank halfway through but I'm saving this for a later read. I wish classes were taught like this in school.

Jeff Moser said...

Will Shaver, robburke, Anonymous #1/#2/#3/#4/#5/#7/#10/#11, sriaram srinivasan, and Vicente: Thanks for the postive feedback! I really appreciate all of them. As I was writing this post over the past four weeks, I was worried that it'd be too long and no one would read it (even after I cut a few thousand words). It's been very encouraging to see the response.

Eddie P: That wasn't the intent -- hopefully TLS is more exciting than staring at a wall :)

Jim C: Thanks for the encouragement. My hope is that this might encourage others to see the beautiful protocols that underly everyday Internet usage.

Eric Frenkiel - It'd be great to see you back. Feel free to click the "Subscribe" buttons on the right hand side (either via RSS or email)

Mr T's Fashion Consultant: I downloaded Wireshark from wireshark.org, installed it, and then ran it. I then clicked "Capture", then "Interfaces" and clicked "Start" on the interface that was generating packets. Once you do this, it'll start to capture all traffic. I then ran the debug Firefox build (Minefield) and went to Amazon. After all this I clicked "Capture", then "Stop". In the protocol column, I found the first "TLSv1" protocol packet and then right-clicked and then clicked on "Follow TCP Stream." Everything else came from looking at the specific packets and seeing the Wireshark decoding of the bytes.

Adam: Very true on OCSP's, CRL's, and the EV extended checks.

Gabe Sumner: Glad you enjoyed it - I tried to keep things lighthearted :-)

chkno/Anonymous #12: You're right. I've gone ahead and made the change from SFTP to FTPS.

John Doe: Point taken on DH for just key exchange and RSA for authentication. This is honestly what I meant as I had linked to the appropriate Wikipedia articles. What I was trying to indicate was DH is nice from a "forward secrecy" perspective. The "downsides" if you can call them that, are more math == more CPU and you have ephemeral keys which means that a network admin has a harder time snooping on the traffic.

C.Santini: Thanks. What do you think of the math with huge numbers? :)

Tony Oppenheim: Sounds like an interesting book. Did anything in particular stick out from the backstory that you thought was interesting?

Anonymous #6: SRP is one of those hidden Internet gems. It's too bad it's not more popular (and therefore, available in more products).

Anonymous #8: Nice use of "geekily exhilirating" :)

Anonymous #9: I don't mind the p's and q's, just so long as they're prime.

JeremyH: Yep, the DNS query answer came in right before the Client Hello. One thing that I didn't cover here is DNS-SEC, which uses certificates to sign responses. It's not widely deployed (yet), but it helps to prevent against DNS cache poisioning.

spdalton: Glad it was helpful. Computers offer the ultimate lab science. I'd highly encourage you to play around with Wireshark to see what's happening on your own network. I think it's fun to see what's actually going on.

Jan Michael Yu: Don't worry if it takes a little for it to sink in. It took me a few weeks of exploring to feel comfortable with it. Feel free to ask follow-up questions if something doesn't make sense.

Dom said...

You no doubt have one already but you could probably submit this as a Masters paper and come away with Honors from most top universities. Nice work!

sanjaya said...

wow.. this is amazing though takes time to get into bits and bytes, but now have some idea about how protocol works.

Anonymous said...

Great post! Thanks dude!

pats said...

Everytime i read that secure traffic flows thro' a tunnel, are all these data packets goes through 'tunnel' ? ... my defination of tunnel is something that connects client and server and no one can read actual data flowing throgh since it is encrypted ... if wrong correct me..

Tony Oppenheim said...

@Jeff Moser: "Did anything in particular stick out from the backstory that you thought was interesting?"

Mostly that the NSA did everything they could to keep RSA and other private sector encryption methods from coming into use. It's thru the efforts of really just a few dedicated/brilliant/eccentric/obsessed people that we have the level of security for commercial use we enjoy today. If it wasn't for them e-commerce as we know it wouldn't be happening.

Or at least that's how it seems from my reading. Though I'm no expert in this field.

Anonymous said...

This is monumental work, Jeff. I would love to read more from you.

Siu said...

Thank you for the post! Very useful!

I will need to print this to understand everythig.

Anonymous said...

Thanks a lot this enlightening article. It is both detailed and technically correct, as well as understandable. Very well structured and written from a didactical point of view.

Anonymous said...

Awesome article. Please make a pretty PDF version of it!

John Doe said...

@Jeff - Should submit it to phrack. Would be real cool to have a phrack entry!

Niranjan said...

Excellent post, very informative and easy to understand!

Nguyen said...

such a good research :D

earthgecko said...

Fascinating article, thanks. Found the RSA part very interesting. It does just go to demonstrate how much data is flying around... and how quick it does fly around.

Anonymous said...

You are totally awesome for this post, great work!!! Oh, and I almost forgot, Firefox rules! Kudos

Pdizzle said...

Great post thanks a lot. This ties in well with the SSCP class I just took.

Dave said...

I read you article while wearing my Three Wolf Moon T-Shirt (link below), and I was actually able to understand the math ! It's amazing, because the last math class I had was in my senior year of elementary school. Thanks to your wonderful explanation of the SSL handshake, I can now encrypt and decrypt human conversation in real time. I will try using what I have learned here to decrypt the mathematical language of the wolves in order to discover their plan to reach the moon again before the Russians can beat us there. Thank you !



Three Wolf Moon T-Shirt

Anonymous said...

Interesting hack. What are you going to do with this knowledge? Hack Firefox users who access Paypal?

Jeremy said...

Great post, thank you very much. I'm a graduate student in computer science, but my focus is databases so I don't often get to see the networking end of the wire.

Jeff Moser said...

Sanjaya, Anonymous #12/#13/#14/#15, Siu, John Doe, Niranjan, Nguyen, earthgecko, Pdizzle, Jeremy: Thanks for the kind words and spending the time to comment. I appreciate it.

Dom: That'd be nice, but I'm sure it's a bit tougher than that :) No masters for me, just a Bachelor's in Computer Science and Math. I'm just a guy that likes exploring with Wireshark.

Pats: Tunnels allow normally unencrypted traffic to appear as encrypted when on the wire. SSH is a popular example. You can sort of think of TLS/SSL as a secure "tunnel" for HTTPS traffic.

Tony Oppenheim: Thanks for sharing that. I read on Wikipedia that Clifford Cocks invented a similar system to RSA 4-5 years earlier for the UK intelligence agency GCHQ, but didn't really implement or deploy it. I always enjoy the history behind the technology.

Anonymous #15: Any advice on what you'd like a "prettier PDF version" to look like? Like a magazine (two-columns?)

Dave: glad the shirt gave you its claimed super powers :) It'd be nice if practical math like RSA was taught in high school. I think it'd be more engaging... at least a few weeks of it.

Anonymous #16: I have no hacking intents... it's more of just seeing all these protocols that are used everyday as having an elegance to them.

Anonymous said...

Wow,
this explanation ROCKS!

Can you please do the same for SIP/RTP kind of traffic (in combination with wireshark)?

Sasi said...

wow !! Clear & Well explained ... Thanks Jeff.

Anonymous said...

Great post, thanks! BTW this post has made it to the front page of digg.com

Waan said...

Great post,Jeff.... Thank you!

Anonymous said...

I think i learned more from this than my Cisco Security class... and we spent like 3 weeks on this...

JaKe said...

A PDF would be great! Just a simple straight from the page print of the text, graphics and hyperlinks. Nothing fancy, like two-column, required, the graphics won't let you anyway ;) .

CVillalobos said...

Okay. That was cool thanks for the info. Detailed and a fun way to start the morning.

Anonymous said...

Great Info, great article. Thank you so much..

Anonymous said...

Great Article, very intresting and useful Jeff!!!

Ankur said...

Good stuff, Jeff! Truly informative.

Television Online said...

Possibly the most intense milliseconds that you'll never actually notice.

Anonymous said...

wow, you have no life

k@beza said...

Oh my god !! Oh my god !! You can see the Matrix !!
Fuc*ing insaneeeeeeeeeeee !

sulumits retsambew said...

very deep explanation , thank you

Peter Vatistas said...

Awesome post! I'm a big fan of Wireshark.

David Thomson said...

I read one comment about SRP. I implemented this in a web browser called SupraBrowser

http://sourceforge.net/projects/suprabrowser

It supports encryption of all traffic using 3DES (with pluggable algorithms), based on a Diffie-Hellman key exchange augmented by the result of a bi-directional SRP authentication (http://srp.stanford.edu).

tarmo said...

Good stuff. Well explained.

Anonymous said...

Thanks a lot for the post, it was very informative.

Rajat Swarup said...

Some additional information:
For really large primes p,q such that n=pxq the values of the public exponent e and private exponent d are computed as :
e x d = 1 mod (p-1)x(q-1)
The value (p-1) x (q-1) is also called as the Euler's totient function. :-)

Rajat Swarup said...

Btw...I forgot to mention...this was an awesome post! :-)

Anonymous said...

Thanks for a thorough and ... just great post. Awesome!

Anonymous said...

@Mr.Moser... So for example, the 'client_write_mac_secret' is an associative array, whose hash_value's value is part or all of the key_block? Am I understanding that part correctly?

By the way, great post! And I don't mean great in the way in which it is normally cheaply used in our day to day activities. Unless blogger.com goes down, I think this post will be indexed highly on the internets after you and I are long gone. You might want to back this info up in a couple other places chief!

Anonymous said...

Outstanding article. In the STARTTLS E-Mail world it is neccessary to force Wireshark to interpret the SMTP stream as SSL, as they don't seem to have found a robust way to detect the transition from plain to encrypted yet. Once this is done, everything works just the same as described here.

Anonymous said...

@Mr.Moser... just for clarification on a typo in my last post a few moments ago, when I said "hash_value"'s value ...i meant "hash_size"'s value. Please clarify if you have a minute. If you explain this one, I will understand the others. It's an associative array, correct? But how is that value populated or determined. I'm foggy on that.

Anonymous said...

wow,
this article is perhaps a reply to a loosely based question 'dad, where did I come from?'

and your explanation is a very good elaboration of facts and figures.

great effort and good job.
well done and yes I am enlightened.

Rajasekhar said...

I think this is one of most clearly written blog-articles I have read in recent times. Thanks a ton for the tech details. I am yet to get a full grasp of it, but when I do, it will definitely add to my tech armor :) Great work!

Suebtas said...

Awesome tutorial SSL article.

Mary-Frances Beesorchard said...

Thank You! This article makes such a clear study-plan for beginners.

Jeff Moser said...

Anonymous #19/#20/#21/#22/#24/#25/#29, Sasi, Waan, CVillalobos, Ankur, sulumits retsambew, Peter Vatistas, tarmo, Rajasekhar, Suebtas, Mary-Frances Beesorchard: Thanks for the kind comments and for stopping by.

Anonymous #18: SIP/RTP is unencrypted and Wireshark handles it fairly well on its own. I'd encourage you to download Wireshark and just take a few seconds of captures and then explore it in Wireshark. That'd probably be more insightful.

JaKe: Per your request, I created a PDF version of this post. Does this work for you?

Television Online: Yeah, I typically don't notice them most times, but now I respect those milliseconds more.

David Thomson: Glad to see use of SRP.

Rajat Swarup: Yep. If you check out the links I had in the RSA section, you'll see that I linked to the totient function along with Fermat's Little Theorem, the Chinese Remainder theorem, and the original RSA paper that outlines this as well.

Anonymous #26/#28: The "client_write_MAC_secret" are the first "hash_size" bytes (in this case, 16 bytes since it's MD5) of "key_block". The "key_block" is the output of the Pseudo Random Function (PRF) that I mentioned. You don't need to hash it any more. This is the "key" of the keyed-HMAC version of MD5 that I described in the post.

For completeness:

client_write_MAC_secret are bytes 0..[SecurityParameters.hash_size] - 1 of the PRF generated "key_block"
server_write_MAC_secret are bytes [SecurityParameters.hash_size] .. [SecurityParameters.hash_size + SecurityParameters.hash_size] - 1 of the PRF generated "key_block"
client_write_key are bytes [SecurityParameters.hash_size + SecurityParameters.hash_size]..[SecurityParameters.hash_size + SecurityParameters.hash_size + SecurityParameters.key_material_length] -1 of the PRF generated "key_block"
server_write_key are bytes [SecurityParameters.hash_size + SecurityParameters.hash_size + SecurityParameters.key_material_length]..[SecurityParameters.hash_size + SecurityParameters.hash_size + SecurityParameters.key_material_length + SecurityParameters.key_material_length] - 1 of the PRF generated "key_block"

Does that help? Feel free to check out the RFC I linked to if you want even more details.

Anonymous #27: It's a little trickier to detect a transition. A good heuristic would be to look for the the TLS version bytes and content types... along with seemingly random data.

Jimmy In Madrid (J.I.M) said...

Amazing! great work there!

Inv said...

Hi Jeff, great explanation. Just one thing is really unclear to me - DNS poisoning: The attacker obtains certificate from amazon.com, I enter "amazon.com" to browser, browser goes to attacker's site, which responds by valid amazon.com certificate signed by Verisign.
How does the browser tell this is an attack?

Jeff Moser said...

Jimmy In Madrid (J.I.M): Thanks!

Inv: Great question! Note that if an attacker did this, they'd run into trouble in the "Trading Secrets" section that I described. Without knowing Amazon.com's private key, they couldn't decrypt the pre-master secret that the client sends out because the certificate from Verisign has Amazon's public key. Thus, the client would use that public key (and not one an attacker generated).

WJG in Denver said...

Excellent article! I will need to re-read this a few times to really feel like I understand it but I like how you have explained such a complicated topic in plain language.

Please clarify one thing, though:

"keep "d" as secret as you possibly can"

but -

"The other recipient knows "d" which allows them to invert the message to recover the original message"

So is "d" secret or not? I believe this could be rephrased to say that this happens when someone else is sending you a message and you know "d", not the other recipient. Am I correct?

Jeff Moser said...

WJG in Denver: You're right, my wording in the second part is confusing. I should probably take out the word "other" there.

With RSA, "d" exponent should always be kept secret/private. The "e" exponent and "n" modulus are made public.

What I meant in that section is that you can calculate "C ≡ M^e (mod n)" because you created "M" and you're sending it to someone who has made their "e" and "n" values public.

In this case, the person you're sending "C" to is the recipient. That is, the receiver is the person who made his "e" and "n" values public. We assume he knows the "d" value that corresponds to the published "e" value. Because he knows "d" (and is keeping it secret), he can invert/undo our encrypted value "C" to recover "M".

Does that help clarify things?

j. montgomery, CISSP, GNET, GSEC said...

Excellent post! I enjoyed the deep-dive. Keep up the good work.

Pranav said...

Nice post! Thanks!

l0b0 said...

This must be the best security article I've seen in years - Thank you Jeff! Security is so hard (even for seasoned developers) that text like this probably does more for security than any number of new algorithms.

Hex said...

Gr8 Post............
i learnt a new thing today....

thanx a lot and keep posting such things

harsha said...

Thanks a lot,
I was searching a similar tutorial,
But after so many day i found this one,
Please add search tag AS tutorial to SSL/TLS or decrypting SSL/TLS,
So that many like me will be helped

Witek Baryluk said...

David Thomas: Hi, it was me who pointed about SRP. I see that SRP is implemented in upcoming Firefox https://wiki.mozilla.org/Firefox/3.next/hitlist For server side I only found GnuTLS too support it. (using standard TLS-SRP protocol: RFC5054).


suprabrowser is interesting project. I will look at it. but i think usage of AES256 is better, it is safer and faster. Inventing own cryptographic protocols is very very very hard, and most times implementations are not secure.

Anonymous said...

Brilliant post, I'll translate it with your permission to Spanish and will trackback to your original article.

Jeff Moser said...

"j. montgomery, CISSP, GNET, GSEC", Pranav, l0b0, Hex: Thanks!

harsha: The words in your comment should be indexed by Google with a link back to this article.

Witek Baryluk: Cool find on TLS-SRP in Firefox. That'll be great if it encourages other browsers to adopt it.

Anonymous #28: Feel free to translate this post to Spanish. My only request is that the top of the translation say that it was translated from my blog with a trackback. Feel free to make it better while translating ;)

Zac said...

@Jeff, thanks for an incredibly insightful post in to the inner workings of HTTPS -- probably the best description I've ever read!

@Adam, OCSP verification is actually available for all VeriSign SSL certificates, EV or otherwise (though it's certainly true that some other CAs don't have an OCSP responder configured for their non-EV certificates).

If you want to get really picky, the CA/Browser Forum standard for EV SSL certificates doesn't mandate the use of OCSP - CAs are required to make revocation information available; CRLs are still an acceptable way of doing so (per the guidelines doco at http://www.cabforum.org/documents.html).

Niranjan Patil said...

This is a wonderful post. I appreciate the amount of pain you have taken to research and write. Thanks for sharing with the community.
I have read many books and watched videos (by so called professionals) but none come close to the simplicity and completeness of your article in explaining concepts with RSA, TLS and just plain security. Your article has high degree of guaranteed understanding for any reader!

Johns said...

Great post!!
It's very useful for my RSA understanding. Thank you very much!

Anonymous said...

One thing that I didn't see in your use of Wireshark was the use of the server's private key. In order to decode SSL/TLS traffic, both the server's private key and certificate are required (this is documented at http://blogs.sun.com/beuchelt/entry/decrypting_ssl_traffic_with_wireshark ). No sane CIO is going to let you download the private key to your laptop, run wireshark to diagnose a problem, and then walk out of the building with the key still on your laptop. If you're consulting and they allow this to happen, you owe it to your own reputation, and their continued security to make sure they understand why this is a very bad idea.

just my 2 cents

Jeff Moser said...

Zac, Niranjan Patil, Johns: Thanks for the supportive comments!

Anonymous #29: Note that I didn't need the server's private key here and was able to decode everything. The reason, as mentioned in the post, is that I modified FireFox to tell me all data that is (usually) secret. FireFox knows the pre-master secret and encrypts it with the server's public key. Once this happens, only the server could decrypt it with its private key. However, given that FireFox knows this key, it can derive everything.

I had to write a custom program to do the decrypting for me since this mode isn't natively supported by Wireshark.

For the general case (where you can't modify the client), then your advice holds.

Thanks for the comment.

Anonymous said...

Great Post! I am looking for more posts like this from you. It cleared up a lot of questions in my mind regarding https.

website design nyc said...

nice collection

Helgin said...

Hi, I wonder,
1. can i translate that into Russian?
2. Meybe i did not understand all correctly, but i get impression that eavesdropper hears random data from rever and client and knows publiс Amazon key.
Knowing that will not he be able to reconstruct initial "finished message" and calculade encryption keys from that?
He may not know secret key FF generated, but he can generate a set of probable variants and compare to the one that was send in reality....

Jeff Moser said...

Anonymous #30/website design nyc: Thanks!

Helgin:

1. Feel free to translate it to Russian. My only request is that the top of the translation say that it was translated from my blog with a link back to this post. Thanks for your willingness to do this!

2. The "finished message" verifies that both sides know the secret keys already. Without the pre-master secret key, an attacker would have a lot of "possible variants" to try (on the order of 2^128). I can't think of a way beyond brute force attacking the key exchange or the symmetric cipher.

That said, there are other ways of getting the secret information :)

Paul Morriss said...

"There are four bytes representing the current Coordinated Universal Time (UTC) in the Unix epoch format, which is the number of seconds since January 1, 1970."
Only four? I hope someone fixes that before 2038. I want to pay for my retirement home with a secure online transaction.

Matthew said...

great post, wireshark is an extremely useful tool, add to that firebug for http debugging and you have to wonder how people managed before they came along.

Asgeir S. Nilsen said...

Thanks for an informative post.

Could you please comment on my post TLS: A Broken Trust Model about the inherent flaw of the trust model used both for establishing session key and verifying the identity of the server?

Jaans said...

You had me at "handshake"

Jeff Moser said...

Paul Morriss: Hopefully we have a newer protocol version in 19 years :) If we don't, it should still be OK since the client and server values would both overflow to the same value (in theory).

Matthew: True, Firebug is a good utility as well.

Jaans: Reminds me of the Microsoft Exchange blog: "You had me at EHLO" :-)

Jeff Moser said...

Asgeir S. Nilsen: You're right that trust is a very important part of the process. I addressed similar concerns in the "Verifying Signatures" part of the post. In that section I linked to "Reflections on Trusting Trust" which showed that you always have to trust something. If you haven't read that already, I highly encourage it -- it's a classic in computer security.

I can restate the general idea by modifying a quote from Bruce Schneier: "if you think cryptography can solve your [trust] problem, then you don't understand your [trust] problem and you don't understand cryptography"

This problem has been around for ages. Consider Shakespeare's Hamlet Act 5, Scene 2 where Hamlet forged a letter from the king to get Rosencrantz and Guildenstern killed (who just so happened to be carrying the letter). The trust in that case was the king's seal.

I think cryptography has made the logistics easier (it's far easier to forge a seal than it is to forge a digital signature), but the trust issue remains.

Jeff Moser said...

Asgeir S. Nilsen:

1. In your proposal for "A Better TLS Key Exchange" you recommend a solution is where the browser creates a digital certificate that lasts for the duration of a browser session. I don't follow how this would help because in an RSA key exchange the client generates the pre-master secret and sends it to the server, so the server has to decode it. The server would have to know how to decode it. If the client encrypted the pre-master secret with his freshly generated certificate private key, then anyone who saw the certificate (which would have to be transmitted in the clear) could decrypt it with the public key on the new cert. Thus, all security would be lost.

Now, let's say that you didn't go this route but instead wanted the server to generate the pre-master secret. The client would send his self-signed certificate to the server and the server could reply with the pre-master secret encrypted using the client's public key from the certificate. Nothing is stopping a man in the middle from exploiting this by intercepting your self-signed certificate and replacing it with his own self signed certificate (since it's self-signed, the server wouldn't know any better). You could improve the situation by also verifying the server's signature, but then you're back to trusting the certificate chain as your ultimate trust.

Note that nothing is stopping you from removing Certificate Authorities (CAs) that you don't trust. You could create your own trust scheme based off certificates. You could also use existing tools like OpenPGP and the RFC 5081 that modifies TLS to support its keys. This way you could create your own web of trust based on personally meeting people.

Another option is to take the Kerberos approach where everyone trusts the same person. RFC 2712 modifies TLS to do this. Again, it'd require a trusted third party.

Originally, SSL/TLS certificates were hard to get and good checks were made. Over time, they were watered down to the point. I think that Extended Validation Certificates go back to the original purposes. However, you're still trusting a company to do its job right.

Are certificates perfect? By no means! However, they provide a good means of *communicating* trust, not necessarily creating trust. This has to be implicit somewhere. You have to implicitly trust the CA *and* the people that wrote the cryptography library *and* that the code wasn't modified on its way to your machine *and* that your OS is running it correctly *and* that your CPU is executing it properly, etc.

2. You mention using shared secrets. This works and is the basis for the schemes I mentioned in the post such as TLS-PSK and my favorite, TLS-SRP (which goes out of its way to protect the secrets themselves).

Fabian said...

Insightful! Great job man!

Fritz said...

Thank you Jeff, what a wonderful microscopic view into this ant-heap of clever actions. You had even already answered my question on why a copied server certificate wouldn’t help anybody in the middle (To Inv’s question June 13, 2009 5:14 PM ). Klasse! Fritz Jörn from Bonn, Fritz@Joern.com

Travis said...

Shows just how much goes by in a few milliseconds!

At one time this maybe would've surprised me, but after getting involved in the field of wireless, I've realized that a few milliseconds is a LONG time...

Jeff Moser said...

Fabian: Thanks!

Fritz: Glad I could help. A lot of cool stuff happens in the "ant-heap".

Travis: Right -- that reminds me of a time in college when I was working on a EE lab project with a gate that took something like 1200 nanoseconds to work. I remember saying "that's so slow!" And then I thought for a second and started laughing. It's incredibly fast in normal terms, but it's all relative to the scale you're used to (in this case, a few nanoseconds was my new "normal.")

Thanks for stopping by!

Eric said...

Hi! Your blog is simply super. you have create a differentiate. more templates easy to download Thanks for the sharing this website. it is very useful professional knowledge.

Anonymous said...

Great Post. Very good indeed.
I was trying to get the PDF of it but the link seems broken?! I hope you can put it up again, PLEASE... ^_^

It´s a an amazing work, very clear (even for those like me, that don´t understand much of this)! Thank you.

Not anonymous:
Diogo Oliveira <-(my name)

Post Scriptum: Got milk¿? ^_^

Jeff Moser said...

Eric: Your comment looks a little spammy, but I do know I need to make the site look better. I'll trust you had good intentions :) Thanks.

Diogo Oliveira: I recreated it and uploaded it to a different place in both Word (.docx) and PDF format. Do these links work for you?

Anonymous said...

Diogo Oliveira again... ^_^

Thank you Jeff, the PDF is a good to go. Once more, i must give you my congratulations on such a fine work, and say that it was 1AM in Portugal when i read this article and posted my comment. And using the words of Eddie P: "now i feel as dumb as a rock... excuse me while i go stare blankly at the walll" ^_^

Petri said...

Amazing post, Jeff. Maybe you could write a Volume IV to Stevens' TCP/IP Illustrated?

Anonymous said...

Thanks for this post!!! took me most part of day to go through but so worth it.
Cheers

PJ said...

I feel that your post is worthwhile of grooming.

Published less than six months ago, all links to koders are dead. For instance, this is the correct link to SSL_AuthCertificate function.

You inevitably had to deal with NSS and "Mozilla's built-in certificate policy" directly links to a resource on mozilla.org. Therefore when updating the links I suggest you reference Mozilla's own documentation. Me bethinks it is CERT_CheckCertValidTimes() that does the job of "seeing that the current time is between the 'not before' time of August 26th, 2008 and before the 'not after' time of August 27, 2009."

Sanne said...

Hey!
We were kinda wondering if you can explain how SSL works, you seem to know these things pretty good:) We're two Swedish girls and studying information and communication technology, but sometimes all the technical descriptions on the net makes us confused, so maybe you can explain it so we understand. We know it's a long shot, but hey, one can always try:)
Krams
/Sanne and Jolina

waldner said...

Super cool! Thank you very much, this is exactly the kind of articles I love reading.

Anonymous said...

Thank you, thank you, thank you. Just the right level of detail for me, with plenty of links to dive deeper where I'm curious to know more. I wish that many more people wrote as clearly as you--I was informed and entertained!

Ingvar Helgarson said...

Absolutely an eye opener. Incredibly useful post.

shawn said...

Thank you for this great information. Great explanation of the topic.

mundi said...

Really good post, but I have a question:
Could you write the details of how the MAC is calculated for the finished message in your example?
I have some problems finding out how to do it exactly.
Thanks in advance

Jeff Moser said...

Petri, waldner, Anonymous #31, Ingvar Helgarson, shawn: Thanks!

PJ: They still seem to work for me.

Sanna: I tried my best at HTTPS. Hopefully it was helpful.

mundi: I show exactly how this is done in the code I posted on GitHub. Specifically for your question, see here.

Anonymous said...

wow!

Jeff Moser said...

A reader emailed me a link to this description of RSA that's pretty good. That, along with the RSA Wikipedia page and perhaps my simple example of cracking RSA on Stack Overflow might help others that have difficulty with that part of this post.

James Shupe said...

They changed to RC4 like most other major HTTPS websites, after disclosure of the BEAST attack in late 2011.

RC4 is the only popular cipher that isn't vulnerable (being that it's not a CBC.)

Jeff Moser said...

James Shupe: Yeah, it makes sense given B.E.A.S.T announcements that came out after this post. For others that are interested in this topic, I recommend this episode of Security Now.

Anonymous said...

Oddly, if you use SSL offload on an ELB (elastic load balancer) then Amazon does *not* prefer RC4. Nor does it disable client-initiated renegotiation.

Reported by the Qualys SSL test: https://www.ssllabs.com/ssltest/index.html

All this means I had a better result on SSL before moving onto ELB. Amazon really puzzles me sometimes.

Sujith said...

One of the best technical articles I read on the web. Great post, thank you.

Sujith said...

One of the best technical articles I ever read from web. Thank you for posting.

Manoj S said...

Thanks for wonderful post Jeff. I came across your post while looking for HTTPS client code, either in Java or any other language.

Would appreciate if those handshakes are explained in detail about their order, package structure etc.

I am trying to hand-code simple SSL client in C programming language using good old sockets.

I am sure you would have clue about where to look for that kind of info.

Again, a great post :-)

Anonymous said...

One of the best technical posts I've ever come across! Your mixed the technical content well with your narrative. Well done, sir. Well done.

Sriram Iyer said...

The most descriptive and informative article on TLS I have read on the web. !!!! Respect !!!!

Java E said...

You said its short but its not that short :) Good work though.

Anonymous said...

Hey,encryted pre-master-secret, which is sent by client, has size more than unencrypted PKCS#1 encoded pre-master-secret.
Actually, encrypted_message_size=unencrypted_message_size + 2
why is that?
jkjs

arkanoid said...

One interesting missing bit: what is EDH, why it isn't likely to be there and why should it be.

arkanoid said...

(subscribed to comments. i wonder why cannot I do it before I post)

Anonymous said...

Congress could learn a lot from this process.

Jim said...

The Host: header is what the web server (e.g., Apache) uses to allow multiple web sites on the same IP address. But, it's of no use during the SSL/TLS setup since it isn't sent or seen until after all that is done.

SSL/TLS uses the CommonName and AltName attributes of the server certificate to inform the client (e.g., FireFox) which names are allowed. If you typed www.amazon.com into FireFox and ended up at someone else's web server -- say, due to DNS cache poisoning or a forgotten /etc/hosts override -- and that server didn't have a forged server cert, FireFox would not find www.amazon.com or *.amazon.com in the server certificate offered and the connection would end immediately to prevent man-in-the-middle attacks. FireFox would pop up a warning dialog wherein you could tell it to proceed anyway, if you aren't intimidated by the scary warnings it displays.

Bernd said...

I guess Amazon (like Google) prefer RC4 over the Block ciphers to defend against the BEAST attack.

Thorton said...

I know it's been said before, but GREAT post!

Anonymous said...

All I can say, that was the most technically detailed and still understandable explanation of how real-life HTTPS connection works I saw on internet sofar. All the other respected PKI/SSL/TLS experts on the internet can be ashamed that none of them documented this earlier for the community!

soder

kkb said...

great-great post, thank you

synapse said...

Great post, thanks!

sami said...

Great post - Tuscan milk.. great to start it off

cert said...

Great in-depth article!

Criação said...

Now that is what aI call a dissection...
I too feel dumb before your expertise and at the same time a little proud to learn something new.
Thanks for this epic post!

Semih Akalin said...

This reminds me of an old project. Explain all the bits that are communicated and computed across all APIs involved, when a user presses a key, and a set of pixels appear on the screen spelling "a".

Insaf M. said...

Bloody fantastic! Well written and explained!

rudie dirkx said...

You might want to mention https://en.wikipedia.org/wiki/Server_Name_Indication (which you sort of did, but with a different name). SNI FTW and still we 'need' 1 IP per certificate =( Stupid hosting companies.

GREAT article!

rudie dirkx said...

How long will a handshaked session last? The endpoints didn't agree on a TTL/Keep-alive thing... Is it until either point denies the current encryption, which will trigger a new handshake? What about HTTP? Keep-alive is usually 300s, which means a new socket after that. New handshake or reuse previous?

Thanks. Great to finally see in such detail what the hell my browser is doing all the time.

Anonymous said...

Very interesting article, and I imagine it will be looked at many times in light of the recent US and UK spy agencys claiming to be able to crack HTTPS.

My questions is, where does the private key reside in HTTPS transactions, if its on my local machine, why cant i see it/where its stored.

Many thanks.

KP said...

Great blog post this. It seems that the encryption of the http packets happens in the transport layer below it, so even if a site is running on https, can any browser extension or anyone who has access to the DOM manage to read the form fields like passwords or credit card info etc. before the encryption happens ? Have been wondering about that

yvenu said...

Nice explanation. Thanks.

yvenu said...

"client sends out because the certificate from Verisign has Amazon's public key. Thus, the client would use that public key (and not one an attacker generated)."

Hi Jeff,
Does this mean Certificate from Verisign (which is pre-loaded in to browser) will have public keys of all sites it signed.