Saturday, July 12, 2014

Unix Shell: Merge User Accounts(1)

Problem:
We have 2 sets of user accounts, u1.passwd, u2.passwd. Each file contains a few lines of user record, each record has following format:
<username>:<password>:<userid>
Our goal is to merge 2 sets of user account together.

We are going to four kinds of situations when doing the merge.
1) same username and same uuid exists on both files
2) different username and same uuid exists on both files
3) same username and different uuid exists on both files
4) one username and uuid only exists on one file, not the other

1. Step 1, physically merge 2 files together.
1) Print out first user record file
2) Print out second user record file
3) Sort both user record files and copy the output to merge1 file
tee command: copy data from standard input to standard output or file
4) Print out merge1 file
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat u1.passwd  
 xx:pw1:0  
 xx1:pw1:1  
 xx3:pw3:2  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat u2.passwd  
 xx:pw1:0  
 xx2:pw2:1  
 xx3:pw3:3  
 xx4:pw4:4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sort u1.passwd u2.passwd | tee merge1  
 xx1:pw1:1  
 xx2:pw2:1  
 xx3:pw3:2  
 xx3:pw3:3  
 xx4:pw4:4  
 xx:pw1:0  
 xx:pw1:0  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat merge1  
 xx1:pw1:1  
 xx2:pw2:1  
 xx3:pw3:2  
 xx3:pw3:3  
 xx4:pw4:4  
 xx:pw1:0  
 xx:pw1:0  

2. Split merge1 to dupuser, dupuid, and unique1
We are going to split the merged user accounts into 3 files:
dupuser: it contains all user records having same username but different user id
dupuid: it contains all user records having same user id but different username
unique1: it contains all user records having same user id and same username

script:
 #Setup the field separater to be ":"  
 BEGIN { FS=":" }  
   
 #We use two arrays to index all records,name and uid.  
 #For each record, if both name and uid don't contain this record,  
 #It will be inserted into both arrays  
   
 #If name contain this record, meaning there is another record  
 #with same name(it must be from the other file, within the same  
 #file, all username and uid are unique). If uid array also contain  
 #another record with same uid, then we do nothing and just keep  
 #other record there. Otherwise, it means that we find two records  
 #with same name and different uids, these two records must be from  
 #two files separately. Then we put these 2 records into dupusers,  
 #and remove the record from name and uid array.  
 #Note we won't find 3 records with same names or same uids, since  
 #we are merging 2 files.  
   
 #If name doesn't contain this record, but uid contain this record  
 #It means that we find two records with same uids but different  
 #names. Then we put these two records into dupuids file, and remove  
 #two records from name and uid array  
   
 #If both name and uid array don't contain this record, then it is  
 #a new record, and we will insert this record to both arrays.  
   
 #In the end, we put remaining records in name into "unique1" file.  
   
 {  
   if($1 in name)  
   {  
     if($3 in uid)  
       ;  
     else  
     {  
       print name[$1] > "dupusers"  
       print $0 > "dupusers"  
       delete name[$1]  
   
       remove_uid_by_name($1)  
     }  
   } else if ($3 in uid)  
   {  
     print uid[$3] > "dupuids"  
     print $0 > "dupuids"  
     delete uid[$3]  
   
     remove_name_by_uid($3)  
   } else  
     name[$1] = uid[$3] = $0  
 }  
   
 END {  
   for(i in name)  
     print name[i] > "unique1"  
   
   close("unique1")  
   close("dupusers")  
   close("dupuids")  
 }  
   
 function remove_uid_by_name(n,  i,f)  
 {  
   for(i in uid)  
   {  
     split(uid[i], f, ":")  
     if(f[1] == n)  
     {  
       delete uid[i]  
       break  
     }  
   }  
 }  
   
 function remove_name_by_uid(id,  i,f)  
 {  
   for(i in name)  
   {  
     split(name[i], f, ":")  
     if(f[3] == id)  
     {  
       delete name[i]  
       break  
     }  
   }  
 }  

terminal:
1) Print out the merge1 file which is generated at step1
2) run the awk script with the input merge1
3 - 5) Print out dupusers, dupuids and unique1, generated by running above awk script.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat merge1  
 xx1:pw1:1  
 xx2:pw2:1  
 xx3:pw3:2  
 xx3:pw3:3  
 xx4:pw4:4  
 xx:pw1:0  
 xx:pw1:0  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk -f script <merge1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat dupusers  
 xx3:pw3:2  
 xx3:pw3:3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat dupuids  
 xx1:pw1:1  
 xx2:pw2:1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique1  
 xx4:pw4:4  
 xx:pw1:0  

No comments:

Post a Comment