Saturday, July 12, 2014

Unix Shell: Compare Files(2)

1. File checksum matching
terminal:
1 - 5) Print out file content of o1, o2, o3, o4 and o5
6) Use md5sum command to calculate the "sum" of all files whose name starting with o at local directory.
md5sum is generating 32 hexadecimal bits by default which could "almost" uniquely represent one file.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o1  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o2  
 Hello New York!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o3  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o4  
 Hello New York!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o5  
 Hello Boston!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ md5sum ./o*  
 59ca0efa9f5633cb0371bbc0355478d8 ./o1  
 0226fbaf072dbabf30369c2f7f162ffa ./o2  
 59ca0efa9f5633cb0371bbc0355478d8 ./o3  
 0226fbaf072dbabf30369c2f7f162ffa ./o4  
 b569b3ceb65fa2f141b687943f45e4bb ./o5  

Example about using md5sum to find identical files
find_identical_files:
 #! /bin/bash  
   
 md5sum ./o* |  
 #At this point of time, the text we get is:  
 #59ca0efa9f5633cb0371bbc0355478d8 ./o1  
 #0226fbaf072dbabf30369c2f7f162ffa ./o2  
 #59ca0efa9f5633cb0371bbc0355478d8 ./o3  
 #0226fbaf072dbabf30369c2f7f162ffa ./o4  
 #b569b3ceb65fa2f141b687943f45e4bb ./o5  
 #  
 #Following awk script, for each record, we recourd  
 #its number of occurance. We only output records whose  
 #number of occurance is more than 1. In this case:  
 #when number of record is equal to 1, we just save its  
 #entire record. When number of record is 2, we print out  
 #the first occurance of record. For all records whose number  
 #of occurance is more than 1, we need to print out itself  
 awk '{  
   count[$1]++  
   if(count[$1] == 1) first[$1]=$0  
   if(count[$1] == 2) print first[$1]  
   if(count[$1] >= 2) print $0  
 }' | sort |  
 #We put the output of identical records to sort, so they are  
 #sorted based on the "sum" number, which means files with same  
 #"sum" number are adjacent now.  
 #Current text we get is:  
 #0226fbaf072dbabf30369c2f7f162ffa ./o2  
 #0226fbaf072dbabf30369c2f7f162ffa ./o4  
 #59ca0efa9f5633cb0371bbc0355478d8 ./o1  
 #59ca0efa9f5633cb0371bbc0355478d8 ./o3  
 #  
 #Following awk script aims to separate different groups  
 #of files. What we need to do is comparing the current  
 #checksum with the previous record's checksum, if it is different,  
 #we add one empty line.  
 awk 'BEGIN { first=1 }  
 {   
   if (first == 1)  
   {  
     first = 0;  
     last = $1;  
     print $0;  
   }  
   else  
   {  
     if(last != $1) print "";  
     print $0;  
     last=$1;  
   }  
 }'  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./find_identical_files  
 0226fbaf072dbabf30369c2f7f162ffa ./o2  
 0226fbaf072dbabf30369c2f7f162ffa ./o4  
   
 59ca0efa9f5633cb0371bbc0355478d8 ./o1  
 59ca0efa9f5633cb0371bbc0355478d8 ./o3  

2. Digital Signature Verification
Basic Concept:
Private Key: the key known only by the sender
Public Key: the key potentially known by any one.
Either key can be used to encrypt the message, and the message can be decrypted by the other key.

Example:
Alice want to send a message to every one, and want to tell every one that this message is indeed written by her. She can use her private key to encrypt the message, and others can use the public key to decrypt the message. Every one is confident that the message is indeed written by Alice, since the public key can't decrypt the message unless Alice used her private key to encrypt the message.

Alice want to send a message to Peter only, and want to tell Peter that this message is indeed written by her. Alice can use the public key to encrypt the message and give her private key to Peter. Peter then use her private key to decrypt the message. Others can't know the message since they don't have the private key, and Peter is sure that this message is written by Alice since Alice give him the private key.

==================================================
gpg command:
this command can be used to decrypt the file with the public key to verify the signature(verify that the file is indeed written by the assumed person)

http://pgp.mit.edu/ is the US official website to query the public key given the key id. Key ID is used to link private key, public key. Private key is only known by the author. Public key can be queried by key id.

At the main page of "http://pgp.mit.edu/", we put in key id: 0xD333CBA1

Extract a key

Search String:  

We can get the public key, and save into temp.key file.

terminal:
--import option is used to import the public key into internal database.
 gpg --import temp.key  

Use gpg command to decrypt the message with available public key in own internal database and verify that the message is encrypted by specified person, that's why we have to import the public key firstly, otherwise, gpg command can't find usage public key to decrypt the message.
================================================
 But  it is very annoying to get the public key from website every time.
gpg command can be used to get the public key given the keyserver and the key ID.
terminal:
1) keyserver option is used to specify the server name. --search-keys option is used to specify the key id.
So we get the public key we want and enter number 1 to make gpg command to import the public key.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ gpg --keyserver pgp.mit.edu --search-keys 0xD333CBA1  
 gpg: searching for "0xD333CBA1" from hkp server pgp.mit.edu  
 (1)    Jim Meyering <jim@meyering.net>  
     Jim Meyering <meyering@gnu.org>  
     Jim Meyering <meyering@pobox.com>  
     Jim Meyering <meyering@ascend.com>  
     Jim Meyering <meyering@lucent.com>  
     Jim Meyering <meyering@redhat.com>  
     Jim Meyering <meyering@na-net.ornl.gov>  
      1024 bit DSA key D333CBA1, created: 1999-09-26  
 Keys 1-1 of 1 for "0xD333CBA1". Enter number(s), N)ext, or Q)uit > 1  
 gpg: requesting key D333CBA1 from hkp server pgp.mit.edu  
 gpg: /home/aubinxia/.gnupg/trustdb.gpg: trustdb created  
 gpg: key D333CBA1: public key "Jim Meyering <jim@meyering.net>" imported  
 gpg: no ultimately trusted keys found  
 gpg: Total number processed: 1  
 gpg:        imported: 1  

No comments:

Post a Comment