Unix Shell, Programming and Multithreading: July 2014

Sunday, July 27, 2014

Unix Shell: Process Schedule(2)

1. batch command
batch command just make different commands run simultaneously
terminal:
1) run batch command, then we are allowed to put in the command list will be put into the batch queue
2) Put in two following commands then type Ctrl D
./script_1
echo "Hello world!" > temp2.txt
3) List the temp file generated. Both files are generated.
4 - 5) Print out the file content

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ batch  
 warning: commands will be executed using /bin/sh  
 at> ./script_1  
 at> echo "Hello world!" >temp2.txt  
 at> <EOT>  
 job 18 at Sun Jul 27 15:24:00 2014  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lrt temp*  
 -rw-rw-r-- 1 aubinxia aubinxia 13 Jul 27 15:25 temp.txt  
 -rw-rw-r-- 1 aubinxia aubinxia 13 Jul 27 15:25 temp2.txt  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat temp.txt  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat temp2.txt  
 Hello world!

2. crontab
crontab is used to run specified command at scheduled time, it is very helpful run some "maintaining tsk" which needs to run multiple times at different time period.

crontab file format:
mm hh dd mon weekeday command
00-59 00-23 01-31 01-12 0-6(0=Sunday)

hyphen means range, * means every choice

owntab:
1) First line means that system should run ls command every minute
2) Second line means that system should run ls command at every Sunday's 23:15

 0-59 * * * * ls >>/home/aubinxia/Desktop/xxdev/temp.txt  
 15 23 * * 0 ls >>/home/aubinxia/Desktop/xxdev/temp.txt

terminal:
1) -l option make crontab list current tab schedule
2) If there is no schedule now, we add our own owntab schedule
3) After some minutes, print out the content of temp.txt, we found that crontab is working now, it starts output the content into temp.txt. Based on its content, we can know that crontab is running the command at user's home directory.
4) Use -r option to remove the current tab schedule
5) Use -l option to list current tab schedule, we find that all schedules have been removed.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ crontab -l  
 no crontab for aubinxia  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ crontab owntab  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat temp.txt  
 Desktop  
 Documents  
 Downloads  
 examples.desktop  
 Music  
 Pictures  
 Public  
 script_1  
 script_1~  
 Templates  
 Videos 
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ crontab -r
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ crontab -l
 no crontab for aubinxia

Unix Shell: Process Schedule(1)

1. sleep command
sleep command allows the process wait for specified seconds and then continue running. During this period, the process doesn't consume any CPU resources.

script_2:

 #! /bin/bash  
   
 a=1  
 while [ $a -ne 0 ]  
 do  
   ((a++))  
   sleep 5  
 done

terminal:
1) Launch the script_2 process at the background
2) run ps command to list running processes, script_2 is alive now.
3) run "ps aux" to list the header line
4) run "ps aux" to grep the script_2 line, it indicates that script_2 process in sleeping only occupies 0.2% CPU resources.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_2 &  
 [2] 5293  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2833 pts/1  00:00:01 bash  
  2968 pts/1  00:00:32 emacs  
  5293 pts/1  00:00:00 script_2  
  5294 pts/1  00:00:00 sleep  
  5295 pts/1  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps aux | head -n 1  
 USER    PID %CPU %MEM  VSZ  RSS TTY   STAT START  TIME COMMAND  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps aux | grep script_2  
 aubinxia 5293 0.2 0.0  5296 1324 pts/1  S  13:33  0:00 /bin/bash ./script_2  
 aubinxia 5302 0.0 0.0  4680  820 pts/1  S+  13:33  0:00 grep --color=auto script_2

2. at command
Given a script, specify it to run on schedule.

script_1:

 #! /bin/bash  
   
 echo "Hello world!" > temp.txt

terminal:
1) Run script immediately. And we get the output saying the script is scheduled to be running at 14:16
2) After 14:16, list temp.txt, last modification time is exactly 14:16
3) Run script 1 minute away from now. And we get the output saying the script is scheduled to be running at 14:18
4) After 14:18, list temp.txt last modification time is exactly 14:18, meaning it is just generated.
5) Run script 2 hours from now, and we get the output saying the script is scheduled to be running 2 hours later.
6) Run script 1 day from now, and we get the output confirming that.
7) Run script 1 month from now, and we get the output confirming that.
8) Run script 1 year from now, and we get the output confirming that.
9) at -l option could help us review all scheduled jobs right now. The first number is the job# used to identify the scheduled job.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at now -f ./script_1  
 warning: commands will be executed using /bin/sh  
 job 6 at Sun Jul 27 14:16:00 2014  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lrt temp.txt  
 -rw-rw-r-- 1 aubinxia aubinxia 13 Jul 27 14:16 temp.txt  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at now + 1 minutes -f ./script_1  
 warning: commands will be executed using /bin/sh  
 job 7 at Sun Jul 27 14:18:00 2014  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lrt temp.txt  
 -rw-rw-r-- 1 aubinxia aubinxia 13 Jul 27 14:18 temp.txt  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at now + 2 hours -f ./script_1
 warning: commands will be executed using /bin/sh
 job 8 at Sun Jul 27 16:20:00 2014
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at now + 1 days -f ./script_1
 warning: commands will be executed using /bin/sh
 job 9 at Mon Jul 28 14:21:00 2014
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at now + 1 months -f ./script_1
 warning: commands will be executed using /bin/sh
 job 10 at Wed Aug 27 14:21:00 2014
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at now + 1 years -f ./script_1
 warning: commands will be executed using /bin/sh
 job 12 at Mon Jul 27 14:21:00 2015
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at -l
 8 Sun Jul 27 16:20:00 2014 a aubinxia
 12 Mon Jul 27 14:21:00 2015 a aubinxia
 9 Mon Jul 28 14:21:00 2014 a aubinxia
 11 Mon Jul 27 14:21:00 2015 a aubinxia
 10 Wed Aug 27 14:21:00 2014 a aubinxia

terminal:
1) Wrong input, since if using pm as suffix, then hour number should be 01 - 12
2) Ignore the pm suffix, then it is good.
3) Or we use pm suffix, but write the correct hour number
4) Schedule the process to be running at 07/28 2:00pm this year.
5) Schedule the process to be running at 07/28 2:00pm next year: 2015.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at 14:00pm -f ./script_1  
 Hour too large for PM. Last token seen: pm  
 Garbled time  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at 14:00 -f ./script_1  
 warning: commands will be executed using /bin/sh  
 job 13 at Mon Jul 28 14:00:00 2014  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at 02:00pm -f ./script_1  
 warning: commands will be executed using /bin/sh  
 job 14 at Mon Jul 28 14:00:00 2014  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at 02:00pm 28 July -f ./script_1  
 warning: commands will be executed using /bin/sh  
 job 15 at Mon Jul 28 14:00:00 2014  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at 02:00pm 28 July 2015 -f ./script_1  
 warning: commands will be executed using /bin/sh  
 job 16 at Tue Jul 28 14:00:00 2015

terminal:
1) Use -l option to list all pending jobs
2) use atrm 16 to remove the job 16
3) list the pending jobs again, job 16 is already removed.
4) use -r option(same as atrm) ot remove job 15
5) list the pending jobs again, job 15 is already removed.
6) run atq command to list all pending jobs(same as at -l)

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at -l | sort -k1n  
 8    Sun Jul 27 16:20:00 2014 a aubinxia  
 9    Mon Jul 28 14:21:00 2014 a aubinxia  
 10    Wed Aug 27 14:21:00 2014 a aubinxia  
 11    Mon Jul 27 14:21:00 2015 a aubinxia  
 12    Mon Jul 27 14:21:00 2015 a aubinxia  
 13    Mon Jul 28 14:00:00 2014 a aubinxia  
 14    Mon Jul 28 14:00:00 2014 a aubinxia  
 15    Mon Jul 28 14:00:00 2014 a aubinxia  
 16    Tue Jul 28 14:00:00 2015 a aubinxia  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ atrm 16  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at -l | sort -k1n  
 8    Sun Jul 27 16:20:00 2014 a aubinxia  
 9    Mon Jul 28 14:21:00 2014 a aubinxia  
 10    Wed Aug 27 14:21:00 2014 a aubinxia  
 11    Mon Jul 27 14:21:00 2015 a aubinxia  
 12    Mon Jul 27 14:21:00 2015 a aubinxia  
 13    Mon Jul 28 14:00:00 2014 a aubinxia  
 14    Mon Jul 28 14:00:00 2014 a aubinxia  
 15    Mon Jul 28 14:00:00 2014 a aubinxia  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at -r 15  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ at -l | sort -k1n  
 8    Sun Jul 27 16:20:00 2014 a aubinxia  
 9    Mon Jul 28 14:21:00 2014 a aubinxia  
 10    Wed Aug 27 14:21:00 2014 a aubinxia  
 11    Mon Jul 27 14:21:00 2015 a aubinxia  
 12    Mon Jul 27 14:21:00 2015 a aubinxia  
 13    Mon Jul 28 14:00:00 2014 a aubinxia  
 14    Mon Jul 28 14:00:00 2014 a aubinxia  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ atq  
 8    Sun Jul 27 16:20:00 2014 a aubinxia  
 12    Mon Jul 27 14:21:00 2015 a aubinxia  
 14    Mon Jul 28 14:00:00 2014 a aubinxia  
 9    Mon Jul 28 14:21:00 2014 a aubinxia  
 13    Mon Jul 28 14:00:00 2014 a aubinxia  
 11    Mon Jul 27 14:21:00 2015 a aubinxia  
 10    Wed Aug 27 14:21:00 2014 a aubinxia

Unix Shell: Process Tracing

Process tracing could help identify all system calls the process made. This is very helpful for debugging a process if we don't have the source code base.

script_1:

 #! /bin/bash  
   
 echo "Hello world!"

1. Trace all system calls script_1 has made:
terminal:
After running script_1 script, it outputs all system calls it has made.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ strace ./script_1  
 execve("./script_1", ["./script_1"], [/* 64 vars */]) = 0  
 brk(0)                 = 0x9c64000  
 access("/etc/ld.so.nohwcap", F_OK)   = -1 ENOENT (No such file or directory)  
 mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb772e000  
 access("/etc/ld.so.preload", R_OK)   = -1 ENOENT (No such file or directory)  
 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3  

   ......

 read(255, "#! /bin/bash\n\necho \"Hello world!"..., 34) = 34
 write(1, "Hello world!\n", 13Hello world!
 )          = 13
 read(255, "", 34)                       = 0
 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
 exit_group(0)                           = ?
 +++ exited with 0 +++

2. Trace only specified system calls script_1 has made:
terminal:
1) Only output the "write" system call script_1 has made
2) Output the "read" and "write" system call script_1 has made

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ strace -e write ./script_1  
 write(1, "Hello world!\n", 13Hello world!  
 )     = 13  
 +++ exited with 0 +++  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ strace -e trace=open,write ./script_1  
 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3  
 open("/lib/i386-linux-gnu/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = 3  
 open("/lib/i386-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3  
 open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3  
 open("/dev/tty", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3  
 open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3  
 open("/proc/meminfo", O_RDONLY|O_CLOEXEC) = 3  
 open("/usr/lib/i386-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3  
 open("./script_1", O_RDONLY|O_LARGEFILE) = 3  
 write(1, "Hello world!\n", 13Hello world!  
 )     = 13  
 +++ exited with 0 +++

3. Output the tracing result to external file
terminal:
1) run strace command to output only "write" system call script_1 has made, output the tracing result to external file trace.txt
2) Print out the file content of trace.txt

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ strace -e write -o trace.txt ./script_1  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat ./trace.txt  
 write(1, "Hello world!\n", 13)     = 13  
 +++ exited with 0 +++

4. Trace the running process
script_2:
This is a simple process which is doing non-stop loop

 #! /bin/bash  
   
 a=1  
 while [ $a -ne 0 ]  
 do  
   ((a++))  
   sleep 5  
 done

terminal:
1) Run ps command , script_2 is already launched
2) Use root user(we have to use root user in this case), to monitor the system call made by process with id 3519, which is script_2. Then it will keep outputting the system call list.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2833 pts/1  00:00:00 bash  
  2968 pts/1  00:00:17 emacs  
  3519 pts/1  00:00:00 script_2  
  3554 pts/1  00:00:00 sleep  
  3555 pts/1  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sudo strace -p 3519  
 Process 3519 attached  
 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 3557  
 rt_sigaction(SIGINT, {SIG_DFL, [], 0}, {0x8087c50, [], 0}, 8) = 0  
 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0  
 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3557, si_status=0, si_utime=0, si_stime=2} ---  
 waitpid(-1, 0xbfd244f8, WNOHANG)    = -1 ECHILD (No child processes)  
 sigreturn() (mask [])          = 0  
 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0  
 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 
 ......

5. Trace the process with timestamp
terminal:
-t option will make strace output the command

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ strace -t -e trace=open,write ./script_1  
 11:29:40 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3  
 11:29:40 open("/lib/i386-linux-gnu/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = 3  
 11:29:40 open("/lib/i386-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3  
 11:29:40 open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3  
 11:29:40 open("/dev/tty", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3  
 11:29:40 open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3  
 11:29:40 open("/proc/meminfo", O_RDONLY|O_CLOEXEC) = 3  
 11:29:40 open("/usr/lib/i386-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3  
 11:29:40 open("./script_1", O_RDONLY|O_LARGEFILE) = 3  
 11:29:40 write(1, "Hello world!\n", 13Hello world!  
 ) = 13  
 11:29:40 +++ exited with 0 +++

6. Trace the process with relative execution time
terminal:

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ strace -r -e trace=open,write ./script_1  
    0.000000 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3  
    0.003371 open("/lib/i386-linux-gnu/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = 3  
    0.015814 open("/lib/i386-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3  
    0.001950 open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3  
    0.020927 open("/dev/tty", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3  
    0.001489 open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3  
    0.011242 open("/proc/meminfo", O_RDONLY|O_CLOEXEC) = 3  
    0.009761 open("/usr/lib/i386-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3  
    0.021213 open("./script_1", O_RDONLY|O_LARGEFILE) = 3  
    0.005412 write(1, "Hello world!\n", 13Hello world!  
 ) = 13  
    0.008233 +++ exited with 0 +++

7. Trace the process with system calls report
terminal:
-c option will make strace command to generate a statistics report for all system calls. In this case, it indicates that open has been launched 9 times, write has been launched 1 time.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ strace -c -e trace=open,write ./script_1  
 Hello world!  
 % time   seconds usecs/call   calls  errors syscall  
 ------ ----------- ----------- --------- --------- ----------------  
 100.00  0.000319     35     9      open  
  0.00  0.000000      0     1      write  
 ------ ----------- ----------- --------- --------- ----------------  
 100.00  0.000319          10      total

8. Process Accounting
Whenever we run a process, it would generate the accounting information at system files. We can output the these accounting information, to know what kind of processes have been launched from the very beginning.

terminal:

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sa | head -n 10  
   1553   516.40re    1.17cp     0avio   1344k  
    65   101.18re    0.26cp     0avio   2446k  ***other*  
     2    0.44re    0.21cp     0avio   22785k  update-apt-xapi  
    908   81.84re    0.16cp     0avio   1055k  sleep  
     5    0.19re    0.09cp     0avio   5012k  command-not-fou  
     2    0.27re    0.08cp     0avio   1680k  apt-get  
     4   62.64re    0.06cp     0avio   35264k  unity-panel-ser  
     2    0.18re    0.06cp     0avio    994k  mandb  
    27    2.73re    0.04cp     0avio    603k  strace  
     2    0.39re    0.02cp     0avio   13126k  ubuntu-sso-logi

9. /proc File system
Unix will save the process related information into files.
Location is /proc

terminal:
1) Run ps command to list current running process
2) List the folder /proc/5293, 5293 is the process id of script_2, that folder contains all process running information, which are organized as files.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2833 pts/1  00:00:02 bash  
  5293 pts/1  00:00:10 script_2  
  7907 pts/1  00:00:07 emacs  
  8532 pts/1  00:00:00 sleep  
  8533 pts/1  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls /proc/5293  
 attr    comm       fd    loginuid  mountstats   pagemap   sessionid syscall  
 autogroup  coredump_filter fdinfo  map_files net      personality smaps   task  
 auxv    cpuset      gid_map maps    ns       projid_map  stack   timers  
 cgroup   cwd       io    mem    oom_adj    root     stat    uid_map  
 clear_refs environ     latency mountinfo oom_score   sched    statm   wchan  
 cmdline   exe       limits  mounts   oom_score_adj schedstat  status

Saturday, July 26, 2014

Unix Shell: Process Control(3)

1. Trap the signal
script_1:
Use trap command to setup the customized handler for signal HUP and TERM

 #! /bin/bash  
   
 trap "echo Catch the HUP signal!" HUP  
 trap "echo Catch the terminal signal! ; rm ./temp ; exit 0" TERM  
   
 echo "" >temp  
 a=1  
 while [ $a -ne 0 ]  
 do  
   ((a++))  
   echo $a >>temp  
   sleep 5  
 done

terminal:
1) Launch the script_1 process
2) Run ps command and script_1 process is already launched
3) Output the temp file which is generated by script_1, the process is adding text into the file
4) Send the HUP signal to script_1 process
5) script_1 process catch the HUP signal and output the text on standard output
6) Run ps command, but the script_1 process is still alive
7) List the temp file, temp file is not deleted
8) Send the TERM signal to script_1 process
9) script_1 process catch the TERM signal and output the text
10) Run the ps command, script_1 process is already removed
11) List the temp file, temp file is already deleted.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [2] 3448  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2954 pts/0  00:00:00 bash  
  3162 pts/0  00:00:09 emacs  
  3448 pts/0  00:00:00 script_1  
  3449 pts/0  00:00:00 sleep  
  3450 pts/0  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat temp  
   
 2  
 3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -HUP 3448  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ Catch the HUP signal!  
   
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2954 pts/0  00:00:00 bash  
  3162 pts/0  00:00:09 emacs  
  3448 pts/0  00:00:00 script_1  
  3456 pts/0  00:00:00 sleep  
  3457 pts/0  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lrt temp  
 -rw-rw-r-- 1 aubinxia aubinxia 15 Jul 26 17:17 temp  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -TERM 3448  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ Catch the terminal signal!  
   
 [2]+ Done          ./script_1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2954 pts/0  00:00:00 bash  
  3162 pts/0  00:00:09 emacs  
  3462 pts/0  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lrt temp  
 ls: cannot access temp: No such file or directory

2. Other Signals -EXIT
EXIT signal is caught before the exit() system call is called.
./script_1:

 #! /bin/bash  
   
 trap "echo Catch the EXIT signal" EXIT  
   
 a=1  
 while [ $a -ne 0 ]  
 do  
   date >/dev/null  
   sleep 5  
 done

terminal:
1) Launch the script_1 process
2) kill the process script_1 and it catch the EXIT signal

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [2] 3524  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -HUP 3524  
 Catch the EXIT signal

3. Other Signals -DEBUG
script_1:

 #! /bin/bash  
   
 trap "echo Catch the DEBUG signal" DEBUG  
   
 a=2  
 echo "Hello world! a=" $a  
 #output:  
 #Catch the DEBUG signal:  
 #Catch the DEBUG signal:  
 #Hello world! a= 2  
 #  
 #This indicates that DEBUG signal handler is launched  
 #before each statement  
   
 ((a++))  
 echo "Hello world! a="$a  
 #output:  
 #Catch the DEBUG signal  
 #Catch the DEBUG signal  
 #Hello world! a=3  
 #  
 #This also indicates that DEBUG signal handler is launched  
 #before each statement.

4. Other signal -ERR
script_1:

 #! /bin/bash  
   
 trap "echo Catch the ERR signal" ERR  
   
 echo "file information: " $(ls no-such-file)  
 #output:  
 #ls: cannot access no-such-file: No such file or directory  
 #file information:   
   
 ls no_such_file  
 #output:  
 #ls: cannot access no_such_file: No such file or directory  
 #Catch the ERR signal

It indicates that only the 2nd "ls" command catch the ERR signal.

Unix Shell: Process Control(2)

1. Terminate the process
When terminating a process, we should firstly try HUP, then try TERM, lastly KILL. "HUP" and "TERM" signal gives the process a chance to clean up the temporary files.

script_1:

 #! /bin/bash  
   
 echo "" >temp  
 a=5  
 while [ $a -ne 0 ]  
 do  
   ((a++))  
   echo $a >>temp  
   sleep 5  
 done

terminal:
1 - 3) Start the signal_1 3 times
4) Run the ps command, there are 3 processes about script_1
5) Open the temp file, now 3 script_1 processes are trying to add text into temp
6) send HUP signal to 3285, which kills the process
7) send TERM signal to 3287, which also kills the process
8) send KILL signal to 3289, which also kills the process
9) Run ps command, it indicates that all 3 processes about script_1 are killed.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [2] 3285  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [3] 3287  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [4] 3289  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2954 pts/0  00:00:00 bash  
  3162 pts/0  00:00:04 emacs  
  3285 pts/0  00:00:00 script_1  
  3287 pts/0  00:00:00 script_1  
  3288 pts/0  00:00:00 sleep  
  3289 pts/0  00:00:00 script_1  
  3290 pts/0  00:00:00 sleep  
  3291 pts/0  00:00:00 sleep  
  3292 pts/0  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat temp  
   
 6  
 7  
 7  
 7  
 8  
 8  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -HUP 3285  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -TERM 3287  
 [2]  Hangup         ./script_1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -KILL 3289  
 [3]- Terminated       ./script_1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2954 pts/0  00:00:00 bash  
  3162 pts/0  00:00:04 emacs  
  3312 pts/0  00:00:00 ps  
 [4]+ Killed         ./script_1

We should be cautious about the KILL signal, since it will leave the temporary files (like lock) in file system, which may be a problem for this process running next time.

Kill the process by name:
terminal:
1 - 3) start the process script_1 for 3 times, now we have 3 script_1 processes running. Process IDs are: 3333, 3335, 3338
4) Run ps command to list all running processes. There are 3 script_1 processes running now.
5) Run "pgrep" command to list all process IDs whose name is script_1, we get 3 results: 3333, 3335, 3338
6) Run "pkill" command send HUP signal to all processes with name "script_1", be cautious it may kill more process than we expect.
7) Run "ps" command, all script_1 processes are killed now.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [2] 3333  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [3] 3335  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [4] 3338  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2954 pts/0  00:00:00 bash  
  3162 pts/0  00:00:05 emacs  
  3333 pts/0  00:00:00 script_1  
  3335 pts/0  00:00:00 script_1  
  3337 pts/0  00:00:00 sleep  
  3338 pts/0  00:00:00 script_1  
  3339 pts/0  00:00:00 sleep  
  3340 pts/0  00:00:00 sleep  
  3341 pts/0  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ pgrep script_1  
 3333  
 3335  
 3338  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ pkill -HUP script_1  
 [2]  Hangup         ./script_1  
 [3]- Hangup         ./script_1  
 [4]+ Hangup         ./script_1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2954 pts/0  00:00:00 bash  
  3162 pts/0  00:00:05 emacs  
  3357 pts/0  00:00:00 ps

Friday, July 25, 2014

Unix Shell: Process Control(1)

1. kill command
kill command doesn't do the job to kill the process, instead, it will just send a signal to the process.
Each process has its own right to determine how to interpret the signal.

terminal:
kill -l list all signals that are supported by kill command

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -l  
  1) SIGHUP     2) SIGINT     3) SIGQUIT     4) SIGILL     5) SIGTRAP  
  6) SIGABRT     7) SIGBUS     8) SIGFPE     9) SIGKILL    10) SIGUSR1  
 11) SIGSEGV    12) SIGUSR2    13) SIGPIPE    14) SIGALRM    15) SIGTERM  
 16) SIGSTKFLT    17) SIGCHLD    18) SIGCONT    19) SIGSTOP    20) SIGTSTP  
 21) SIGTTIN    22) SIGTTOU    23) SIGURG    24) SIGXCPU    25) SIGXFSZ  
 26) SIGVTALRM    27) SIGPROF    28) SIGWINCH    29) SIGIO    30) SIGPWR  
 31) SIGSYS    34) SIGRTMIN    35) SIGRTMIN+1    36) SIGRTMIN+2    37) SIGRTMIN+3  
 38) SIGRTMIN+4    39) SIGRTMIN+5    40) SIGRTMIN+6    41) SIGRTMIN+7    42) SIGRTMIN+8  
 43) SIGRTMIN+9    44) SIGRTMIN+10    45) SIGRTMIN+11    46) SIGRTMIN+12    47) SIGRTMIN+13  
 48) SIGRTMIN+14    49) SIGRTMIN+15    50) SIGRTMAX-14    51) SIGRTMAX-13    52) SIGRTMAX-12  
 53) SIGRTMAX-11    54) SIGRTMAX-10    55) SIGRTMAX-9    56) SIGRTMAX-8    57) SIGRTMAX-7  
 58) SIGRTMAX-6    59) SIGRTMAX-5    60) SIGRTMAX-4    61) SIGRTMAX-3    62) SIGRTMAX-2  
 63) SIGRTMAX-1    64) SIGRTMAX

2. Suspend and continue one process
script_1:

 #! /bin/bash  
   
 a=5  
 while [ $a -ne 0 ]  
 do  
   ((a++))  
   echo $a  
   sleep 5  
 done

terminal:
1) Start the script_1 process at the background
2) script_1 output "6, 7" at the standard output
3) ps command to list the running process , script_1 is running now.
4) script_1 output "8,9" at the standard output
5) 8513 is the process id, use kill command to send STOP signal to suspend the process
6) script_1 doesn't output number any more. It is suspended now. Run ps command, script_1 is still alive, but suspended now.
7) Use kill command to send the "CONT" signal to make script_1 process continue running.
8) script_1 output "10, 11, 12" at the standard output
9) Use kill command to completely "kill" the script_1 process
10) Run ps command, script_1 process is not alive any more.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [2] 8513  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ 6  
 7  
   
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2664 pts/0  00:00:01 bash  
  2838 pts/0  00:00:43 emacs  
  8513 pts/0  00:00:00 script_1  
  8515 pts/0  00:00:00 sleep  
  8516 pts/0  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ 8  
 9  
   
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -STOP 8513  
   
 [2]+ Stopped         ./script_1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2664 pts/0  00:00:01 bash  
  2838 pts/0  00:00:43 emacs  
  8513 pts/0  00:00:00 script_1  
  8522 pts/0  00:00:00 sleep <defunct>  
  8523 pts/0  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -CONT 8513  
 10  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ 11  
 12  
   
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ kill -9 8513  
 [2]+ Killed         ./script_1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  2664 pts/0  00:00:01 bash  
  2838 pts/0  00:00:43 emacs  
  8526 pts/0  00:00:00 sleep  
  8527 pts/0  00:00:00 ps

4. Process Handling Signals
KILL and STOP signal can't be ignored by any process, they are delivered immediately. For other signals, if the process is not waken up, signals can't be delivered until the process is up.

Also, given a signal, a process can choose to :
1) Ignore the signal
2) Execute the default action
3) Execute the customized action (with trap command)

terminal:
man signal(7): default actions of signals

     Signal   Value   Action  Comment  
     ──────────────────────────────────────────────────────────────────────  
     SIGHUP    1    Term  Hangup detected on controlling terminal  
                    or death of controlling process  
     SIGINT    2    Term  Interrupt from keyboard  
     SIGQUIT    3    Core  Quit from keyboard  
     SIGILL    4    Core  Illegal Instruction  
     SIGABRT    6    Core  Abort signal from abort(3)  
     SIGFPE    8    Core  Floating point exception  
     SIGKILL    9    Term  Kill signal  
     SIGSEGV   11    Core  Invalid memory reference  
   
     SIGPIPE   13    Term  Broken pipe: write to pipe with no  
                    readers  
     SIGALRM   14    Term  Timer signal from alarm(2)  
     SIGTERM   15    Term  Termination signal  
     SIGUSR1  30,10,16  Term  User-defined signal 1  
     SIGUSR2  31,12,17  Term  User-defined signal 2  
     SIGCHLD  20,17,18  Ign   Child stopped or terminated  
     SIGCONT  19,18,25  Cont  Continue if stopped  
     SIGSTOP  17,19,23  Stop  Stop process  
     SIGTSTP  18,20,24  Stop  Stop typed at terminal  
     SIGTTIN  21,21,26  Stop  Terminal input for background process  
     SIGTTOU  22,22,27  Stop  Terminal output for background process

Unix Shell: Process Monitor

1. Process in Unix
1) Each process is assigned a time slide, and after that , the process context is going to be switched to another one. Time slice is very short, user normally can't detect the process context switching.
2) There is one scheduler to control and manage the process switch.
3) Each process has one priority assigned to decide the order of process running. Normally time-critical process has higher priority and run before the less-important one.
4) Each process has kernel context, data structure inside the kernel context record the process specific information.
5) load average: during given time period, how many processes are in a runnable or uninterruptable state.

terminal:
1) Run uptime, "10:51:27" is the current time, "up 57 min" indicates that the system is already up for 57 minutes. "2 users" indicates that there are 2 users logged in. "load average:" gives the number of runnable or uninteruptable processes in last 1 minute, last 5 minutes and last 15 minutes.
2) Run ps to get the list of processes currently running.
3) Start a new process with "script_1", which is one non-stopping loop.
4) After 1 minute, run the "uptime" command again. We can see that load average number is increasing since we have one more process running now.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ uptime  
  10:51:27 up 57 min, 2 users, load average: 0.36, 0.84, 0.86  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  3069 pts/1  00:00:00 bash  
  3188 pts/1  00:00:00 ps  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 &  
 [1] 3189  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ uptime  
  10:52:02 up 57 min, 2 users, load average: 0.70, 0.87, 0.87

2. Listing Process
terminal
PID: process id
TIME: how long since the process is started, it indicated that script_1 is only started for 11 seconds.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ps  
  PID TTY     TIME CMD  
  3069 pts/1  00:00:00 bash  
  3294 pts/1  00:00:11 script_1  
  3297 pts/1  00:00:00 ps

3. List top intense process in system
terminal:
After running "top" command at terminal shell. We get a description of currently running process in system. The first one is our script_1(non-stopping loop), it consumes 69.9% of cpu resources.

 top - 15:32:02 up 6 min, 2 users, load average: 1.68, 1.66, 0.92  
 Tasks: 153 total,  2 running, 151 sleeping,  0 stopped,  0 zombie  
 %Cpu(s): 80.4 us, 19.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st  
 KiB Mem:  3585280 total, 1122744 used, 2462536 free,  42132 buffers  
 KiB Swap: 3667964 total,    0 used, 3667964 free.  423488 cached Mem  
   
  PID USER   PR NI  VIRT  RES  SHR S %CPU %MEM   TIME+ COMMAND                  
  2822 aubinxia 20  0  5292  1072  916 R 69.9 0.0  0:31.98 script_1                  
  2112 aubinxia 20  0 443328 197876 37128 S 15.0 5.5  1:06.06 compiz                   
  1128 root   20  0 288992 68544 11212 S 9.5 1.9  0:37.00 Xorg                    
  2653 aubinxia 20  0 128452 19212 12348 S 1.6 0.5  0:02.38 gnome-terminal               
  1838 aubinxia 20  0  38544  4452  3468 S 0.7 0.1  0:02.31 ibus-daemon                
  2738 aubinxia 20  0 602468 208492 43468 S 0.7 5.8  0:28.66 firefox                  
  2825 aubinxia 20  0  5528  1408  1004 R 0.7 0.0  0:00.30 top                    
   4 root   20  0    0   0   0 S 0.3 0.0  0:01.48 kworker/0:0                
  1137 root   20  0  36988  6428  3612 S 0.3 0.2  0:01.15 accounts-daemon              
  1881 aubinxia 20  0 125812 16840 11004 S 0.3 0.5  0:01.06 unity-panel-ser              
  1892 aubinxia 20  0 116480 17556 10132 S 0.3 0.5  0:01.01 ibus-ui-gtk3                
  2063 aubinxia 20  0  29580  5756  3236 S 0.3 0.2  0:00.46 ibus-engine-sim

4. Self-Made Top Script
owntop:

 #! /bin/bash  
   
 IFS='  
     '  
 HEADFLAGS="-n 8"  
 PSFLAGS=aux  
 SLEEPFLAGS=5  
   
 HEADER="`ps $PSFLAGS | head -n 1`"  
 #Run the "ps aux | head -n 1" meaning that we only  
 #retrieve the first line of its output  
 #The $HEADER content is:  
 #USER    PID %CPU %MEM  VSZ  RSS TTY   STAT START  TIME COMMAND  
   
 while true  
 do  
   clear #Clear the screen   
   uptime #Print out the basic process information  
   
   echo "$HEADER" #Print out the HEADER content  
   ps $PSFLAGS |  
     sed -e 1d | #Remove the first header line  
       sort -k3nr -k1,1 -k2n | #sort the remaining lines  
         head $HEADFLAGS #only retrievethe first 8 lines  
   #-k3nr means that sort based on 3rd column(CPU Usage), taking it as the number  
   #using reversed order(descending)  
   #-k1,1 means that sort based on first column(user name), taking it as string  
   #-k2n means that sort based on 2nd column(PID), taking it as number, using  
   #reversed order(descending)  
   
   sleep $SLEEPFLAGS  
   #Sleep 5 seconds and then do the looping again  
   #meaning it will refresh the process information for  
   #every 5 seconds  
 done

terminal:
After running owntop, it will clear up the screen and keep refreshing the process information every 5 seconds.

  16:04:35 up 39 min, 2 users, load average: 1.24, 1.24, 1.14  
 USER    PID %CPU %MEM  VSZ  RSS TTY   STAT START  TIME COMMAND  
 aubinxia 2112 29.4 6.0 468940 216972 ?    Rl  15:26 11:11 compiz  
 root   1128 14.6 2.0 294876 72424 tty7   Ss+ 15:25  5:39 /usr/bin/X -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch  
 aubinxia 2738 3.6 6.4 607324 233000 ?    Sl  15:28  1:18 /usr/lib/firefox/firefox  
 aubinxia 2838 0.9 0.7 134176 26840 pts/0  Sl  15:39  0:14 emacs owntop  
 aubinxia 1838 0.5 0.1 38676 4452 ?    Ssl 15:26  0:13 /usr/bin/ibus-daemon --daemonize --xim  
 aubinxia 3848 0.5 0.0  5300 1328 pts/0  S+  16:04  0:00 /bin/bash ./owntop  
 aubinxia 2653 0.4 0.5 128460 21240 ?    Sl  15:28  0:10 gnome-terminal  
 root     4 0.3 0.0   0   0 ?    S  15:25  0:08 [kworker/0:0]

5. List Processes by User
pusers:

 #! /bin/bash  
   
 IFS='  
     '  
 EGREPFLAGS=  
   
 #Read in the egrep flags from script arguments  
 while test $# -gt 0  
 do  
   if test -z "$EGREPFLAGS"  
   then  
     EGREPFLAGS="$1"  
   else  
     EGREPFLAGS="$EGREPFLAGS|$1"  
   fi  
   shift  
 done  
   
 #Setup the default egrep flags  
 if test -z "$EGREPFLAGS"  
 then  
   EGREPFLAGS="."  
 else  
   EGREPFLAGS="^ *($EGREPFLAGS)"  
 fi  
   
 case "`uname -s`" in  
 *BSD | Darwin) PSFLAGS="-a -e -o user, ucomm -x" ;;  
 *) PSFLAGS="-e -o user,comm" ;;  
 esac  
   
 echo $PSFLAGS  
 #Because "uname -s" returns "Linux" here  
 #output: -e -o user,comm  
   
 #ps $PSFLAGS  
 ps -e -o user,comm | #list all processes with 2 columns: user and command  
  sed -e 1d | head -n 10 | #remove the first line and retrieve next 10 lines  
 #output at this step:  
 #root   init  
 #root   kthreadd  
 #root   ksoftirqd/0  
 #root   kworker/0:0  
 #root   kworker/0:0H  
 #root   rcu_sched  
 #root   rcu_bh  
 #root   migration/0  
 #root   watchdog/0  
 #root   khelper  
   sort -b -k1,1 -k2,2 | #sort by first and 2nd column, -b: ignore trailing space  
    uniq -c |   
 #output at this step:  
 #   1 root   init  
 #   1 root   khelper  
 #   1 root   ksoftirqd/0  
 #   1 root   kthreadd  
 #   1 root   kworker/0:0  
 #   1 root   kworker/0:0H  
 #   1 root   migration/0  
 #   1 root   rcu_bh  
 #   1 root   rcu_sched  
 #   1 root   watchdog/0  
     sort -b -k2,2 -k1nr,1 -k3,3 |  
      awk '{  
          user = (LAST == $2)? "" : $2  
          LAST=$2  
          printf("%-15s\t%2d\t%s\n", user, $1, $3)  
         }' #Reformat each record, if user name is same as last one, ignore it.  
   
 #Final Output:  
 #root           1    init  
 #             1    khelper  
 #             1    ksoftirqd/0  
 #             1    kthreadd  
 #             1    kworker/0:0  
 #             1    kworker/0:0H  
 #             1    migration/0  
 #             1    rcu_bh  
 #             1    rcu_sched  
 #             1    watchdog/0

Sunday, July 20, 2014

awk: spell checker example

 #Usage:  
 #  awk [-v Dictionaries="sysdict1 sysdict2 ..."] -f spell.awk -- \  
 #    [=suffixfile1 =suffixfile2 ...] [+dict1 +dict2 ...] \  
 #    [-strip] [-verbose] [file(s)]  
   
 BEGIN { initialize() }  
    { spell_check_line() }  
 END { report_exception() }  
   
 function initialize()  
 {  
   NonWordChars="[^" "'" "A-Z" "a-z" "]"  
   
   get_dictionaries()  
   scan_options()  
   load_dictionaries()  
   load_suffixes()  
   order_suffixes()  
 }  
   
 function get_dictionaries(    key, files)  
 {  
   #Try to get the dictionaries string from environment variable  
   #Whenever awk starts, it will inherit the environment variable  
   #and store in "ENVIRON", which is one associative array  
   #Index is variable name, mapping to the variable value  
   if ((Dictionaries == "") && ("DICTIONARIES" in ENVIRON))  
     Dictionaries = ENVIRON["DICTIONARIES"]  
   
   #If dictionaries doesn't contain anything, we setup our own  
   #Dictionary Files  
   if (Dictionaries == "")  
   {  
 #    DictionaryFiles["/usr/share/dict/american-english"]++  
 #    DictionaryFiles["/usr/share/dict/british-english"]++  
   }  
   else  
   {  
     #This will split the string by space, put each item into  
     #files array. Array index starts from 1  
     split(Dictionaries, files)  
     for(key in files)  
       DictionaryFiles[files[key]]++  
   }  
 }  
   
 function scan_options(   k)  
 {  
   #Handle the awk input argument in this function  
   #ARGC is the number of all arguments  
   #ARGV is the array of all arguments, index starting from 0  
   #and the first item is command itself "awk", so we starts  
   #from the 2nd argument(index starts from 1)  
   
   #We have to setup each "ARGV[k]" to empty string, otherwise  
   #awk will take it as input file and then look for this file  
   #Normally it will complain "no such file or directory"  
   for(k = 1; k < ARGC; k++)  
   {  
     if(ARGV[k] == "-strip")  
     {  
       #If the argument is "-strip", then setup the global  
       #config variable "Strip"  
       ARGV[k]=""  
       Strip=1  
     }  
     else if(ARGV[k] == "-verbose")  
     {  
       #If the argument is "-verbose", then setup the global  
       #config variable "Verbose"  
       ARGV[k]=""  
       Verbose=1  
     }  
     else if(ARGV[k] ~ /^=/)  
     {  
       #If the argument starts with "=", then increase the  
       #NSuffixFiles and associative array SuffixFiles  
       NSuffixFiles++  
       SuffixFiles[substr(ARGV[k],2)]++  
       ARGV[k]=""  
     }  
     else if(ARGV[k] ~ /^[+]/)  
     {  
       #If the argument starts with "+", then increase the  
       #item in associative array DictionaryFiles  
       DictionaryFiles[substr(ARGV[k], 2)]++  
       ARGV[k]=""  
     }  
   }  
   
   #Remove trailing empty arguments(for nawk)  
   #For nawk, if there is empty arguments left in the end, it won't   
   #read value from standard input, so we need to decrease ARGC, until  
   #meeting with one non-empty argument  
   while ((ARGC > 0) && (ARGV[ARGC-1] == ""))  
     ARGC--  
 }  
   
 function load_dictionaries()  
 {  
   #Iterate each file in DictionaryFiles, for each  
   #file, read each line as a word, and save the word  
   #at associative array "Dictionary"  
   for(file in DictionaryFiles)  
   {  
     while((getline word < file) > 0)  
       Dictionary[tolower(word)]++  
     close(file)  
   }  
 }  
   
 function load_suffixes(   file, k, line, n, parts)  
 {  
   #If number of suffix files is larger than 0, then we iterate   
   #SuffixFiles, read each suffix rule line from each file  
   if(NSuffixFiles > 0)  
   {  
     for(file in SuffixFiles)  
     {  
       while((getline line < file) >0)  
       {  
         #For each "rule", strip comments, leading whitespace  
         #and trailing whitespace  
         sub(" *#.*$", "", line) # strip comments  
         sub("^[ \t]+", "", line) # strip leading whitespace  
         sub("[ \t]+$", "", line) # strip trailing whitespace  
         if(line == "")  
           continue  
   
         #Split each items in line, assign items into array parts  
         #Save first item (suffix) into array Suffixes  
         #Save remaining items(replacement) into array Replacement  
         n=split(line, parts)  
         Suffixes[parts[1]]++  
         Replacement[parts[1]]=parts[2]  
   
         for(k=3; k<=n; k++)  
           Replacement[parts[1]]=Replacement[parts[1]] " " parts[k]  
       }  
       close(file)  
     }  
   }  
   else  
   {  
     #If user doesn't specify the replacement file, setup default  
     #suffix rules  
     Suffixes["ed"]=1;  
     Suffixes["ing"]=1;  
     Replacement["ed"]="\"\" e"  
     Replacement["ing"]="\"\""  
   }  
 }  
   
 function order_suffixes(   i, j, key)  
 {  
   #Save all suffixes into array OrderedSuffix  
   NOrderedSuffix=0  
   for(key in Suffixes)  
     OrderedSuffix[++NOrderedSuffix] = key  
     
   #Sort the OrderedSuffix, make it be from long  
   #to short  
   for(i=1; i<NOrderedSuffix;i++)  
     for(j=i+1; j<=NOrderedSuffix;j++)  
       if(length(OrderedSuffix[i]) < length(OrderedSuffix[j]))  
         swap(OrderedSuffix, i, j)  
 }  
   
 function swap(a,i,j,  temp)  
 {  
   temp = a[i]  
   a[i] = a[j]  
   a[j] = temp  
 }  
   
 function spell_check_line(   k, word)  
 {  
   #For each record line, we replace the non word characters  
   #with white spaces  
   gsub(NonWordChars, " ")  
   
   #Iterate each record, strip leading and trailing apostrophies  
   #then call spell check method  
   for(k=1;k <= NF;k++)  
   {  
     word=$k  
     sub("^'+", "", word) #strip leading apostrophies  
     sub("'+$", "", word) #strip trailing apostrophies  
     if(word != "")  
       spell_check_word(word)  
   }  
 }  
   
 function spell_check_word(word,   key, lc_word, location, w, wordlist)  
 {  
   #Convert the input word to lowercase, and check in Dictionary   
   #associative arrays  
   lc_word=tolower(word)  
   if(lc_word in Dictionary)  
     return  
   else  
   {  
     #If not found in Dictionary associative arrays, then strip  
     #the suffix if user specified to do that, and check words  
     #after stripping suffixes in Dictionary  
     if(Strip)  
     {  
       strip_suffixes(lc_word, wordlist)  
       for(w in wordlist)  
         if(w in Dictionary)  
           return  
     }  
       
     #If the word still doesn't get found at Dictionary after stripping  
     #off the suffix, then we shall save the word into array Exception  
     location = Verbose ? (FILENAME ":" FNR ":") : ""  
     if(lc_word in Exception)  
       Exception[lc_word] = Exception[lc_word] "\n" location word  
     else  
       Exception[lc_word] = location word  
   }  
 }  
   
 function strip_suffixes(word, wordlist,    ending, k, n, regexp)  
 {  
   #wordlist array is used to save all words generated after stripping  
   #suffix. In the beginning, we use split to clear up the wordlist  
   split("", wordlist)  
   
   #Iterate each suffix. For each suffix, if it matches with the word  
   #in the end, we strip the end by using substr. RSTART is the number  
   #indicating from where regexp starts matching.  
   for(k=1; k <= NOrderedSuffix; k++)  
   {  
     regexp=OrderedSuffix[k]  
     if(match(word, regexp))  
     {  
       word=substr(word, 1, RSTART-1)  
   
       #We check the Replacement associative array, if there is no  
       #replacement, then we save the origianl word in "wordlist"  
       #otherwise, we split items in Replacement string, add each  
       #item to end-stripped word, generating new word, and save   
       #into the wordlist  
       if(Replacement[regexp] == "")  
         wordlist[word]=1  
       else  
       {  
         split(Replacement[regexp], ending)  
         for(n in ending)  
         {  
           if(ending[n] == "\"\"")  
             ending[n] = ""  
           wordlist[word ending[n]]=1  
         }  
       }  
   
       break  
     }  
   }  
 }  
   
 function report_exception()  
 {  
   for(key in Exception)  
     print Exception[key]  
 }

awk program is compiled into compact internal representation and then interpreted at runtime by a virtual machine.
But its built-in function is implemented by underlying language, currently in C.

Sunday, July 13, 2014

Unix Shell: Spell Checker

1. comm command
Given two files, comm command can detect common lines in both files, the unique line in file1 and unique lines in file2.
Note: all two files need to be sorted firstly

terminal:
1 - 2) Print out the content of t1 and t2, each contains two lines of strings
3) Use comm command to get command and unique lines from two files, --check-order option make comm command to check the order whenever proceeding one step. At this step, it complains that input file is not sorted yet.
4 - 5) sort files t1 and t2, output the result to sorted_t1 and sorted_t2 separately
6 - 7) Print out the file content of sorted_t1 and sorted_t2, both files are already sorted
8) Use comm command to get common and unique lines.
First column: "Hello Los Angeles" means this line only exists at first file sorted_t1
Second column: "Hello New York" means this line only exists at second file sorted_t2
Third column: "Hello world" means this line exists at both files.
9) -1 option means suppressing the output of "Unique lines in file 1"
10) -2 option means suppressing the output of "Unique lines in file 2"
11) -3 option means suppressing the output of "Common lines in both files"

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat t1  
 Hello world!  
 Hello Los Angeles!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat t2  
 Hello world!  
 Hello New York!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ comm --check-order t1 t2  
         Hello world!  
 comm: file 1 is not in sorted order  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sort <t1 >sorted_t1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sort <t2 >sorted_t2  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat sorted_t1  
 Hello Los Angeles!  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat sorted_t2  
 Hello New York!  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ comm --check-order sorted_t1 sorted_t2  
 Hello Los Angeles!  
     Hello New York!  
         Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ comm --check-order -1 sorted_t1 sorted_t2  
 Hello New York!  
     Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ comm --check-order -2 sorted_t1 sorted_t2  
 Hello Los Angeles!  
     Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ comm --check-order -3 sorted_t1 sorted_t2  
 Hello Los Angeles!  
     Hello New York!

2. A simple self-made spell checker program with "comm" command
terminal:
1) Print out the file content of owndict, which is a simple self-made "dictionary", the spell checker works based comparison of owndict and t1
2) Print out the file content of t1, it contains a wrongly spelled word: "worlds"
3) Sort owndict, and output the result into sorted_dict
4) Print out the file content of sorted_dict
5) Use comm command to list lines existed only on t1, but not in sorted_dict, -13 option is used to suppress the output first and third columns, which are lines existed only at sorted_dict and existed at both files. If the line exists only at t1 but not our dictionary "sorted_dict", then it will get output and taken as wrongly spelled word

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat owndict  
 Hello  
 world  
 New  
 York  
 Los  
 Angelesaubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat t1  
 Hello  
 worlds  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sort <owndict >sorted_dict  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat sorted_dict  
 Angeles  
 Hello  
 Los  
 New  
 world  
 York  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ comm -13 sorted_dict t1  
 worlds

3. Spell command
command can be used to check the wrongly spelled word in the text file
terminal:
1) Print out the file content of t1, the first line's "color" is american english, and the second line's "colour" is british english
2) Print out the file content of owndict, which is our own "dictionary"
3) Use spell command to get the wrongly spelled word, in this case, "colour" in the second line get picked. By default, spell command is using the american english as the standard.
4) -b option tell "comm" command to use "british english" as the standard
5) -d option allows user to specify own dictionary file, since at owndict, we don't have "colour", so "colour" at the t1 get picked as the wrongly spelled word.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat t1  
 I love the blue color!  
 I love the red colour!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat owndict  
 I  
 love  
 the  
 blue  
 red  
 color  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ spell t1  
 colour  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ spell -b t1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ spell -d owndict t1  
 colour

Note:
If changing the locale, we need to re-sort the dictionary with new rules in new locale, otherwise comparison result would be problematic.

Unix Shell: Merge User Accounts(3)

5. Process dupusers and dupuids
make_old_new_list:

 #! /bin/bash  
   
 rm -f old_new_list  
   
 #Setup the Field Separator  
 old_ifs=$IFS  
 IFS=:  
   
 #Read in the user password and uid  
 #Inside the loop, read in the second line, since  
 #we are reading from dupusers, there are supposed to  
 #be two lines with same user names but different user  
 #ids, otherwise, we put error information to standard  
 #error output  
   
 #Inside the loop, we put the username, first uid, and  
 #second uid to old_new_list. Put ther username, password  
 #and second uid to unique2  
 while read user passwd uid  
 do  
   if read user2 passwd2 uid2  
   then  
     if [ $user = $user2 ]  
     then  
       printf "%s\t%s\t%s\n" $user $uid $uid2 >> old_new_list  
       echo "$user:$passwd:$uid2"  
     else  
       echo "$0: out of sync: $user and $user2" >&2  
       exit 1  
     fi  
   else  
     echo $0: no duplicate for $user >&2  
     exit 1  
   fi  
 done < dupusers > unique2  
 IFS=$old_ifs  
   
 #Count how many records dupuids has, we need to generate  
 #that number of new user ids. By plan, we are going to  
 #generate a new user id for each record in dupuids  
 count=$(wc -l < dupuids)  
   
 #Setup the positional parameters inside the script, then  
 #it would have an array of new ids  
 set -- $(./newuid -c $count unique_ids)  
 IFS=:  
   
 #Read in each record from dupuids, for each record  
 #put in the new uid into unique3  
 while read user passwd uid  
 do  
   newuid=$1  
   shift  
   echo "$user:$passwd:$newuid"  
   printf "%s\t%s\t%s\n" $user $uid $newuid >> old_new_list  
 done < dupuids > unique3

terminal:
1. Print out the content of file "dupusers"
2. Print out the content of file "dupuids"
3. Run make_old_new_list
4. Print out the file content of "old_new_list"
It combines two "xx3" records in dupusers into one record, and new xx3's userid is 3. And for xx1, and xx2 in dupuids, their user ids are replaced by new user ids
5. Print out the content of unique2, which is from processing dupusers
6. Print out the content of unique3, which is from processing dupuids

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat dupusers  
 xx3:pw3:2  
 xx3:pw3:3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat dupuids  
 xx1:pw1:1  
 xx2:pw2:1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./make_old_new_list  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat old_new_list  
 xx3    2    3  
 xx1    1    5  
 xx2    1    6  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique2  
 xx3:pw3:3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique3  
 xx1:pw1:5  
 xx2:pw2:6

6. Combine unique[123] to get the final new user records
terminal:
1) Print out content of u1.passwd
2) Print out content of u2.passwd
3) Print out content of unique1, which includes users who have both same usernames and same userids in both u1.passwd and u2.passwd, or users whose user names and user ids only exist in one password file but not the other
4) Print out content of unique2, which includes users who have same user names but different user ids in u1.passwd and u2.passwd.
5) Print out content of unique3, which inlcudes users who have different user names but same user ids in u1.passwd and u2.passwd.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat u1.passwd  
 xx:pw1:0  
 xx1:pw1:1  
 xx3:pw3:2  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat u2.passwd  
 xx:pw1:0  
 xx2:pw2:1  
 xx3:pw3:3  
 xx4:pw4:4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique1  
 xx4:pw4:4  
 xx:pw1:0  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique2  
 xx3:pw3:3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique3  
 xx1:pw1:5  
 xx2:pw2:6  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sort -k 3 -t : -n unique[123] >final.passwd  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat final.passwd  
 xx:pw1:0  
 xx3:pw3:3  
 xx4:pw4:4  
 xx1:pw1:5  
 xx2:pw2:6

Saturday, July 12, 2014

Unix Shell: Merge User Accounts(2)

3. Create list of in-use user ids
terminal:
1) Print out the file of merge1's content
2) Use ":" as the field separator, print out the 3rd field(user id field), with merge1 as the input
3) Give the output of last step to sort command, to sort all records by number(-n option) and remove repetitive user ids(-u option), write the output to unique_ids file
4) List the unique_ids file
5) Print out the content of unique_ids file

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat merge1  
 xx1:pw1:1  
 xx2:pw2:1  
 xx3:pw3:2  
 xx3:pw3:3  
 xx4:pw4:4  
 xx:pw1:0  
 xx:pw1:0  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk -F: '{print $3}' < merge1  
 1  
 1  
 2  
 3  
 4  
 0  
 0  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk -F: '{print $3}' < merge1 | sort -n -u >unique_ids  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lrt unique_ids  
 -rw-rw-r-- 1 aubinxia aubinxia 10 Jul 12 17:49 unique_ids  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique_ids  
 0  
 1  
 2  
 3  
 4

4. Given list of in-use user ids, generate a new un-used user id
newuid:

 #! /bin/bash  
   
 count=1  
   
 #Get the count option, to decide how many user ids  
 #we need to generate  
 while getopts "c:" opt  
 do  
   case $opt in  
   c) count=$OPTARG ;;  
   esac  
 done  
   
 shift $(($OPTIND - 1))  
   
 #IDFILE is the file containing all in-use user ids  
 IDFILE=$1  
   
 awk -v count=$count '  
 BEGIN {  
   # Read in all in-use ids and save those ids into  
   # array uidlist  
   for(i=1; getline id > 0; i++)  
   {  
     uidlist[i]=id  
   }  
   
   totalids=i  
     
   # Starting from the 2nd id, check if it is different from  
   # previous id, if yes, then try to pick up all un-used ids  
   # in between.  
   for(i=2; i<totalids; i++)  
   {  
     if(uidlist[i-1]!=uidlist[i])  
     {  
       for(j=uidlist[i-1]+1; j<uidlist[i];j++)  
       {  
         print j  
         if(--count == 0)  
           exit  
       }  
     }  
   }  
   
   # If we still do not get enough user ids, then we  
   # start from the last user id in array, until getting  
   # enough user ids  
   
   if(count != 0)  
   {  
     nextuid=uidlist[totalids-1]+1  
     while(count != 0)  
     {  
       print nextuid  
       nextuid++  
       count--  
     }  
   }  
 }' $IDFILE

terminal:
1) Print out unique_ids file content, which is generated in last step
2) Use newuid script to generate 3 un-used new user ids based on unique_ids

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique_ids  
 0  
 1  
 2  
 3  
 4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./newuid -c 3 unique_ids  
 5  
 6  
 7

Unix Shell: Merge User Accounts(1)

Problem:
We have 2 sets of user accounts, u1.passwd, u2.passwd. Each file contains a few lines of user record, each record has following format:
<username>:<password>:<userid>
Our goal is to merge 2 sets of user account together.

We are going to four kinds of situations when doing the merge.
1) same username and same uuid exists on both files
2) different username and same uuid exists on both files
3) same username and different uuid exists on both files
4) one username and uuid only exists on one file, not the other

1. Step 1, physically merge 2 files together.
1) Print out first user record file
2) Print out second user record file
3) Sort both user record files and copy the output to merge1 file
tee command: copy data from standard input to standard output or file
4) Print out merge1 file

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat u1.passwd  
 xx:pw1:0  
 xx1:pw1:1  
 xx3:pw3:2  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat u2.passwd  
 xx:pw1:0  
 xx2:pw2:1  
 xx3:pw3:3  
 xx4:pw4:4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sort u1.passwd u2.passwd | tee merge1  
 xx1:pw1:1  
 xx2:pw2:1  
 xx3:pw3:2  
 xx3:pw3:3  
 xx4:pw4:4  
 xx:pw1:0  
 xx:pw1:0  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat merge1  
 xx1:pw1:1  
 xx2:pw2:1  
 xx3:pw3:2  
 xx3:pw3:3  
 xx4:pw4:4  
 xx:pw1:0  
 xx:pw1:0

2. Split merge1 to dupuser, dupuid, and unique1
We are going to split the merged user accounts into 3 files:
dupuser: it contains all user records having same username but different user id
dupuid: it contains all user records having same user id but different username
unique1: it contains all user records having same user id and same username

script:

 #Setup the field separater to be ":"  
 BEGIN { FS=":" }  
   
 #We use two arrays to index all records,name and uid.  
 #For each record, if both name and uid don't contain this record,  
 #It will be inserted into both arrays  
   
 #If name contain this record, meaning there is another record  
 #with same name(it must be from the other file, within the same  
 #file, all username and uid are unique). If uid array also contain  
 #another record with same uid, then we do nothing and just keep  
 #other record there. Otherwise, it means that we find two records  
 #with same name and different uids, these two records must be from  
 #two files separately. Then we put these 2 records into dupusers,  
 #and remove the record from name and uid array.  
 #Note we won't find 3 records with same names or same uids, since  
 #we are merging 2 files.  
   
 #If name doesn't contain this record, but uid contain this record  
 #It means that we find two records with same uids but different  
 #names. Then we put these two records into dupuids file, and remove  
 #two records from name and uid array  
   
 #If both name and uid array don't contain this record, then it is  
 #a new record, and we will insert this record to both arrays.  
   
 #In the end, we put remaining records in name into "unique1" file.  
   
 {  
   if($1 in name)  
   {  
     if($3 in uid)  
       ;  
     else  
     {  
       print name[$1] > "dupusers"  
       print $0 > "dupusers"  
       delete name[$1]  
   
       remove_uid_by_name($1)  
     }  
   } else if ($3 in uid)  
   {  
     print uid[$3] > "dupuids"  
     print $0 > "dupuids"  
     delete uid[$3]  
   
     remove_name_by_uid($3)  
   } else  
     name[$1] = uid[$3] = $0  
 }  
   
 END {  
   for(i in name)  
     print name[i] > "unique1"  
   
   close("unique1")  
   close("dupusers")  
   close("dupuids")  
 }  
   
 function remove_uid_by_name(n,  i,f)  
 {  
   for(i in uid)  
   {  
     split(uid[i], f, ":")  
     if(f[1] == n)  
     {  
       delete uid[i]  
       break  
     }  
   }  
 }  
   
 function remove_name_by_uid(id,  i,f)  
 {  
   for(i in name)  
   {  
     split(name[i], f, ":")  
     if(f[3] == id)  
     {  
       delete name[i]  
       break  
     }  
   }  
 }

terminal:
1) Print out the merge1 file which is generated at step1
2) run the awk script with the input merge1
3 - 5) Print out dupusers, dupuids and unique1, generated by running above awk script.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat merge1  
 xx1:pw1:1  
 xx2:pw2:1  
 xx3:pw3:2  
 xx3:pw3:3  
 xx4:pw4:4  
 xx:pw1:0  
 xx:pw1:0  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk -f script <merge1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat dupusers  
 xx3:pw3:2  
 xx3:pw3:3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat dupuids  
 xx1:pw1:1  
 xx2:pw2:1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat unique1  
 xx4:pw4:4  
 xx:pw1:0

Unix Shell: Compare Files(2)

1. File checksum matching
terminal:
1 - 5) Print out file content of o1, o2, o3, o4 and o5
6) Use md5sum command to calculate the "sum" of all files whose name starting with o at local directory.
md5sum is generating 32 hexadecimal bits by default which could "almost" uniquely represent one file.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o1  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o2  
 Hello New York!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o3  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o4  
 Hello New York!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat o5  
 Hello Boston!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ md5sum ./o*  
 59ca0efa9f5633cb0371bbc0355478d8 ./o1  
 0226fbaf072dbabf30369c2f7f162ffa ./o2  
 59ca0efa9f5633cb0371bbc0355478d8 ./o3  
 0226fbaf072dbabf30369c2f7f162ffa ./o4  
 b569b3ceb65fa2f141b687943f45e4bb ./o5

Example about using md5sum to find identical files
find_identical_files:

 #! /bin/bash  
   
 md5sum ./o* |  
 #At this point of time, the text we get is:  
 #59ca0efa9f5633cb0371bbc0355478d8 ./o1  
 #0226fbaf072dbabf30369c2f7f162ffa ./o2  
 #59ca0efa9f5633cb0371bbc0355478d8 ./o3  
 #0226fbaf072dbabf30369c2f7f162ffa ./o4  
 #b569b3ceb65fa2f141b687943f45e4bb ./o5  
 #  
 #Following awk script, for each record, we recourd  
 #its number of occurance. We only output records whose  
 #number of occurance is more than 1. In this case:  
 #when number of record is equal to 1, we just save its  
 #entire record. When number of record is 2, we print out  
 #the first occurance of record. For all records whose number  
 #of occurance is more than 1, we need to print out itself  
 awk '{  
   count[$1]++  
   if(count[$1] == 1) first[$1]=$0  
   if(count[$1] == 2) print first[$1]  
   if(count[$1] >= 2) print $0  
 }' | sort |  
 #We put the output of identical records to sort, so they are  
 #sorted based on the "sum" number, which means files with same  
 #"sum" number are adjacent now.  
 #Current text we get is:  
 #0226fbaf072dbabf30369c2f7f162ffa ./o2  
 #0226fbaf072dbabf30369c2f7f162ffa ./o4  
 #59ca0efa9f5633cb0371bbc0355478d8 ./o1  
 #59ca0efa9f5633cb0371bbc0355478d8 ./o3  
 #  
 #Following awk script aims to separate different groups  
 #of files. What we need to do is comparing the current  
 #checksum with the previous record's checksum, if it is different,  
 #we add one empty line.  
 awk 'BEGIN { first=1 }  
 {   
   if (first == 1)  
   {  
     first = 0;  
     last = $1;  
     print $0;  
   }  
   else  
   {  
     if(last != $1) print "";  
     print $0;  
     last=$1;  
   }  
 }'

terminal:

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./find_identical_files  
 0226fbaf072dbabf30369c2f7f162ffa ./o2  
 0226fbaf072dbabf30369c2f7f162ffa ./o4  
   
 59ca0efa9f5633cb0371bbc0355478d8 ./o1  
 59ca0efa9f5633cb0371bbc0355478d8 ./o3

2. Digital Signature Verification
Basic Concept:
Private Key: the key known only by the sender
Public Key: the key potentially known by any one.
Either key can be used to encrypt the message, and the message can be decrypted by the other key.

Example:
Alice want to send a message to every one, and want to tell every one that this message is indeed written by her. She can use her private key to encrypt the message, and others can use the public key to decrypt the message. Every one is confident that the message is indeed written by Alice, since the public key can't decrypt the message unless Alice used her private key to encrypt the message.

Alice want to send a message to Peter only, and want to tell Peter that this message is indeed written by her. Alice can use the public key to encrypt the message and give her private key to Peter. Peter then use her private key to decrypt the message. Others can't know the message since they don't have the private key, and Peter is sure that this message is written by Alice since Alice give him the private key.

==================================================
gpg command:
this command can be used to decrypt the file with the public key to verify the signature(verify that the file is indeed written by the assumed person)

http://pgp.mit.edu/ is the US official website to query the public key given the key id. Key ID is used to link private key, public key. Private key is only known by the author. Public key can be queried by key id.

At the main page of "http://pgp.mit.edu/", we put in key id: 0xD333CBA1

Extract a key

Search String:

We can get the public key, and save into temp.key file.

terminal:
--import option is used to import the public key into internal database.

 gpg --import temp.key

Use gpg command to decrypt the message with available public key in own internal database and verify that the message is encrypted by specified person, that's why we have to import the public key firstly, otherwise, gpg command can't find usage public key to decrypt the message.
================================================
But it is very annoying to get the public key from website every time.
gpg command can be used to get the public key given the keyserver and the key ID.
terminal:
1) keyserver option is used to specify the server name. --search-keys option is used to specify the key id.
So we get the public key we want and enter number 1 to make gpg command to import the public key.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ gpg --keyserver pgp.mit.edu --search-keys 0xD333CBA1  
 gpg: searching for "0xD333CBA1" from hkp server pgp.mit.edu  
 (1)    Jim Meyering <jim@meyering.net>  
     Jim Meyering <meyering@gnu.org>  
     Jim Meyering <meyering@pobox.com>  
     Jim Meyering <meyering@ascend.com>  
     Jim Meyering <meyering@lucent.com>  
     Jim Meyering <meyering@redhat.com>  
     Jim Meyering <meyering@na-net.ornl.gov>  
      1024 bit DSA key D333CBA1, created: 1999-09-26  
 Keys 1-1 of 1 for "0xD333CBA1". Enter number(s), N)ext, or Q)uit > 1  
 gpg: requesting key D333CBA1 from hkp server pgp.mit.edu  
 gpg: /home/aubinxia/.gnupg/trustdb.gpg: trustdb created  
 gpg: key D333CBA1: public key "Jim Meyering <jim@meyering.net>" imported  
 gpg: no ultimately trusted keys found  
 gpg: Total number processed: 1  
 gpg:        imported: 1