Somebody asked me about https (vs. http) a few days back and it triggered a DFS in my head's neural flora to dig out the answer. The answer was found and was good enough for the questioner but not good enough for me - because of the holes in there! So I did some research over the web/books to fill in those holes and then thought to put it down here for future reference. So, in this post I will talk about same basics of cyber-security and zoom a bit into https (or SSL). Let's first look at what all aspects fall under the umbrella of security:
- Confidentiality/Authenticity i.e. we do not want our information to be visible to others or want it to be only visible to a handful of people (aka the "authentic" ones)
- Integrity i.e. we do not want that the information we are conveying to get distorted/modified in between before it reaches intended destination
- Availability i.e. we do not want that the services we are providing to others get disrupted or become inaccessible to them
If any of the aspects is compromised, then we call it a "security-breach". For example, consider the infamous "Man-in-the-middle attacks" where an eavesdropper snoops into a conversation and either steals the information compromising Confidentiality, or distorts the information compromising Integrity. Then, there are "Denial-of-Service attacks" that compromise the Availability aspect of security. Now, let's look at in brief the most common techniques that are adopted to ensure Confidentiality and Integrity (I will not cover Availability in this post).
Data Encryption
If the two parties involved in a conversation follow an encryption scheme that an eavesdropper cannot guess, then the conversation becomes secure. Key-based encryption schemes are the most common in the field of security. In such schemes, you encrypt your message using a key (or simply a number or a sequence) and decrypt it using the same or other coupled-number. How you do the encryption with the key depends on the encryption algorithm you are using. A simple algorithm could be just doing an XOR of the message with the key. Consider encrypting an M with a key K to yield ciphertext E(M):
Encryption: E(M) = M xor K
Decryption: M = E(M) xor K
Decryption: M = E(M) xor K
So, both the parties involved in the communication needs to know K. Such a scheme is called Symmetric Key Encryption. But of course, this algorithm of using XOR is simple to crack, when a single constant repeating key is used. Algorithms that are used normally with Symmetric Encryption are much more complex and difficult to crack like AES, blowfish etc. The difficulty with Symmetric Key Encryption is that the same key needs to be shared between the 2 parties and so the question is how to securely share the key? Thankfully, there are already techniques to do the sharing of the key like Diffie-Hellman Key Exchange etc. But, the other way is to use Asymmetric Key Encryption, where you don't need to share the key! The following picture depicts what happens in Asymmetric Key Encryption (aka Public Key Encryption)
Basically, there are 2 entangled keys for one user as against one shared key in Symmetric Encryption. One of the key is known to all (called public), while the other key is only known to the user (called private). As shown in the picture, if somebody wants to send a message to the user then he will encrypt the message using the user's public key and then the user can decrypt it using his private key. The message encoded using a public key can only be decoded using the corresponding private key and private key is only known to the user. It looks like magic, but its Number Theory. For the mathematics involved here, please look at the RSA implementation Usually, both symmetric and asymmetric encryption schemes are used in tandem, where asymmetic encryption is used in sharing the key that will be used later for symmetric encryption.
Digital Signatures
With data encryption we are able to ensure that Confidentiality aspect of security is preserved. For maintaining Integrity, using Digital Signatures is the most common technique. Digitally signing of a document (or any data) is a two step process: 1. Generating a hash of the input document using some hashing algoithms like SHA or MD5, 2. Then using your private key to encrypt the generated hash. The encrypted hash is basically the digital signature. The sender will send the document along with the signature to the receiver. The receiver will first use the public key of the sender to decrypt the signature to recover the hash; and also will regenerate the hash from the document and compare the two hash-es. If they do not match, that means somebody tampered with the document or with the signature. Note that, nobody other than the sender can regenerate the signature after modifying the document as he will not have the private key of the sender.Now we will see how the data encryption techniques and digital signature come into play in https or SSL.
How https ensures secure communication?
There is one more player in https protocol in addition to the sender and the receiver who is known as "Certificate Authority" (CA). These CA's are authorized organizations analogous to government organizations whom you trust. Some of the CA's are Verisign, Symantec, Google Internet Authority etc. The websites who want their customers to connect securely obtain a certificate (SSL or X.509) from one of these CA's. This certificate contains the public key of the website, details of the organization that owns the website, details of the CA and the digital signature done by CA using its private key. Now, lets look at what happens when we try to connect to such a website using a web browser:After browser sends a connection request to a secure website, the corresponding web-server responds back with its SSL certificate. Now, the browser needs to verify if the certificate is genuine. Looking at the certificate, browser knows about which CA has signed it and then using the public key of that CA it verifies the integrity of the certificate, as discussed above in the digital signatures section. With this it ensures if no eavesdropper has tampered with the certificate. Next, the browser generates a random session key and sends it to the web server not before encrypting it with the public key of the server (public key was present in the certificate). In this manner, a session key is securely exchanged between the browser and the web-server. After this, both the web-server and the browser can securely exchange data using Symmetric Encryption with the session key. This ensures that anybody who snoops (and who might be replaying the SSL certificate) will be not able to decode any communication as he doesn't have the private key of web-server and thus also the session-key. This makes the https perfectly secure!
