Sunday 5 April 2020

Encryption , Hashing and Encoding - don't get confused


When I started reading about these topics initially I was very confused. So, here I am writing a small post regarding this topic.




1) Encoding:

  • Encoding is neither hashing nor encryption. It is just representing message/numbers with a different system.

Let say this is '0' ( digit 0 ) but as per ASCII encoding( 7-bit encoding ) this is 0x30
similarly let say this is 15 ( in base 10 ) but this can be F ( in base 16 ) or17 ( in base 8 ).

""" But this is not something hidden, even though it looks like so. """"

So what is this string: 0001 0101 1001 1010 ? ask this to your non-programmer friend and he might not be able to decode , but programmers will easily know, that this is encoded in binary and will reply that this is '5530'

I remember that joke, which says there are 10 types of engineers, which basically means there are 2 types of engineers.

So, encoding is just representing your message in different system. Be it ASCII, Unicode, UTF-8 , base10, base64, base2, base8, base16.


2) Hashing:


For Hashing remember that : Hashing is one way function i.e. you can't just use the Hash to know which message generated this hash , there is nothing like "unhashing".

These are popular Hash algorithms ( op of print (hashlib.algorithms_guaranteed) ) :
'sha256', 'sha384', 'sha224', 'sha512', 'sha1', 'md5'

A quick caution: even though md5 and sha1 are still used, but they are now proven vulnerable to attacks, so if you are looking at some critical solutions use 'sha256' and 'sha512'.

If you are with hacker mindset, and want to know this vulnerabilities , watch < Dr Mike Pound videos on Computerphile channel >

you can check out python library hashlib for hashing function .

import hashlib
myString = " This is very long string , I just want to show you that even for this long string , the generated hash will have fixed length only and you cannot reverse it "
result = hashlib.sha256(myString.encode())

print(result.hexdigest())
hex_val = result.hexdigest()
print(len(hex_val))

Output:

1963675e2d3b5c1326d4c565ddd2d060224fc61cb5491c8f0856a9dfaea16154
64

In the above there are total 64 nibbles in output and total 256-bit hash.

Where is hash used actually ??? 

1) Data Integrity

So, I wanted to send you this data :

" This is very long string , I just want to show you that even for this long string , the generated hash will have fixed length only and you cannot reverse it "

But what if due to network latency some bytes are missed (you know with UDP this is can occur ). How do we guaranty that the message you received is Correct and Authentic?

We will add hash with message:

Msg : " This is very long string , I just want to show you that even for this long string , the generated hash will have fixed length only and you cannot reverse it "
Hash : 1963675e2d3b5c1326d4c565ddd2d060224fc61cb5491c8f0856a9dfaea16154

So, when you receive this message you will recalculate hash on your side and confirm the data integrity is maintained.

But there is BAD BAD guys in this innocent word.

What if someone changed your message then recalculate the Hash and send it to you both hash and message ??  you will think that data is still authentic, is it now ?

So , what's solution ?

The answer can be HMAC, (This is part of "keyed hashing algoritham family" , which uses a key to generate hash , and without this no one can modify the message and recalculate the correct Hash.)

Here is very good explaination about HMAC:




2) Create a short index for group of data / to create a hash map:

This is used in database system to speed up searches for multiple fields.

Let say your citizen table is something like this:
First name     |  Last name  |   District  
I want to search First name = “David” , Last name = “Lee” , District = “Arizona”

Then I can create a Hash of  DavidLeeArizona store the Hash and search the hash .

One thing you should remember Hash in not encryption, and it is a one way function.

3 ) Encryption :

Encryption is basically converting message into a hidden message that only opposite party can read and no one else.
Encrypted message is also called cipher text.
There is one interesting movie is there on this topic : The Imitation Game ( About enigma machine in WW2 )

Julius Caesar used to encrypt his message, and this was named Caesar cipher :
Can anyone tell this message:
Aopz tlzzhnl pz lujyfwalk dpao Jhlzhy jpwoly.
This is just “This message is encrypted with Caesar cipher” with shift of 7.
You can check out this website : < https://cryptii.com/pipes/caesar-cipher >

This is very old technique and may not work in modern world of WWW.
Here are some quick terms:

1) AES :

Note that DES is now deprecated, and AES is recommended.

AES is a "symmetric key algorithm" which uses the same 128, 192, or 256 bit key for both encryption and decryption (the security of an AES system increases exponentially with key length).  ( ref : < https://blog.syncsort.com/2019/03/data-security/aes-vs-rsa-encryption-differences/ > )

Note the word:  "symmetric key algorithm"

Advantage of AES is : 
It is fast to encrypt and decrypt message, if you already have the key.

Disadvantage ?? : 
Both parties should know the key first. So how do we transfer key? By sending email ?? What if email is hacked? 

And what if you don’t know the opposite guy and you want a trust factor.
The answer is : "public-key cryptography" or "asymmetric encryption"

2) RSA:

In RSA both parties have 2 different pair of keys: public key and private key.

You need to keep your private key seprate and your never share private key. But you send your public key to others who want to send you message.

Whoever want to send you message will encrypt their message with your public key.
Not that this is just one way: Message encrypted with public key can be only decrypted with private key of same pair, so only you can decrypt this message.

This is basis of modern world HTTPS communication.
But with RSA Encryption and decryption takes long time ( with the correct key ) , so RSA is not used to encrypt the actual message , that is done with symmetric key ( AES ) instead.

RSA can be used for “short messages”, and hence it can be used to transfer “symmetric key” secretly and both parties don’t need to know each other.

Check out this video :



And I will recommend you read stack overflow regarding this , whatever new questions you have.


3) Diffie-Helman:


So if RSA – AES combo is the real deal, why we want something new??
The answer is because some people want: perfect forward secrecy.
Check out this video : 




Basically, perfect secret key ensures that exchanging keys are destroyed once the message is transferred. We don’t reuse the key for another session.

What is advantage of this??

So let say at any point of time hacker got hold of your key, then still hacker can only read message for that session , session ends - key is useless.Nice isn’t it.

Even though so many time confused Diffie-Helman is not the public-key cryptography but rather a key exchange system.

At the end in Diffie-Helman exchange , both parties arrive at same common number secretly.
Check out this video : 



So what is used in actual world ??

All three Diffie-Helman, RSA and AES. Most of the time security of the web “https:\\” use combo of all three , to keep you secured.

At this point we conclude, let me know if you have any improvement idea.
For questions search stack overflow.

No comments:

Post a Comment