Sunday, June 29, 2014

Unix Shell: Temporary Files(1)

1. 5 ways to create temporary files
terminal:
1) Use "cat /dev/null" to create temp1 file, truncate the file if necessary
2) Use "printf" to create temp2 file, truncated the file if necessary
3) Similar from point 1
4) Similar from point 3
5) Use "touch" command to create temp5 file

Touch is the best choice here. Since it is not error prone.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat /dev/null > temp1  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ printf "" > temp2  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat /dev/null >> temp3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ printf "" >> temp4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch temp5  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -l  
 total 0  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:56 temp1  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:56 temp2  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:56 temp3  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:56 temp4  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:56 temp5  

2. Change Different Time with "touch" command
terminal:
1) List all times of temp5
2) Use touch command against temp5 without other parameters
3) List all times of temp5, you will find that all times get changed
4) Use touch command with -m option, only modification time and last change time get updated
5) List all times of temp5
6) Use touch command with -a option, only access time and last change time get updated
7) List all times of temp5
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ stat temp5  
  File: ‘temp5’  
  Size: 0         Blocks: 0     IO Block: 4096  regular empty file  
 Device: 801h/2049d    Inode: 134297   Links: 1  
 Access: (0664/-rw-rw-r--) Uid: ( 1000/aubinxia)  Gid: ( 1000/aubinxia)  
 Access: 2014-06-29 16:56:52.768351035 -0400  
 Modify: 2014-06-29 16:56:52.768351035 -0400  
 Change: 2014-06-29 16:56:52.768351035 -0400  
  Birth: -  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch temp5  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ stat temp5  
  File: ‘temp5’  
  Size: 0         Blocks: 0     IO Block: 4096  regular empty file  
 Device: 801h/2049d    Inode: 134297   Links: 1  
 Access: (0664/-rw-rw-r--) Uid: ( 1000/aubinxia)  Gid: ( 1000/aubinxia)  
 Access: 2014-06-29 17:05:56.020347187 -0400  
 Modify: 2014-06-29 17:05:56.020347187 -0400  
 Change: 2014-06-29 17:05:56.020347187 -0400  
  Birth: -  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch -m temp5  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ stat temp5  
  File: ‘temp5’  
  Size: 0         Blocks: 0     IO Block: 4096  regular empty file  
 Device: 801h/2049d    Inode: 134297   Links: 1  
 Access: (0664/-rw-rw-r--) Uid: ( 1000/aubinxia)  Gid: ( 1000/aubinxia)  
 Access: 2014-06-29 17:05:56.020347187 -0400  
 Modify: 2014-06-29 17:06:12.892347068 -0400  
 Change: 2014-06-29 17:06:12.892347068 -0400  
  Birth: -  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch -a temp5  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ stat temp5  
  File: ‘temp5’  
  Size: 0         Blocks: 0     IO Block: 4096  regular empty file  
 Device: 801h/2049d    Inode: 134297   Links: 1  
 Access: (0664/-rw-rw-r--) Uid: ( 1000/aubinxia)  Gid: ( 1000/aubinxia)  
 Access: 2014-06-29 17:06:29.168346952 -0400  
 Modify: 2014-06-29 17:06:12.892347068 -0400  
 Change: 2014-06-29 17:06:29.168346952 -0400  
  Birth: -  

3. Override time with "touch" command
-t option allow user to override the time stamp, but we have to make sure that we are using the right time format.
terminal:
1) Wrong format
2) Wrong format
3) Right format, meaning 2014, 06 01, 17:00
4) List all times of temp5, both access and modification time get changed, and last change time is the latest current time without getting affected
5) Right format, 53 in the end represents seconds
6) List all times of temp5,same, both access and modification time get changed, and last change time is not affected.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch -t 2014 temp5  
 touch: invalid date format ‘2014’  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch -t 20140601 temp5  
 touch: invalid date format ‘20140601’  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch -t 201406011700 temp5  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ stat temp5  
  File: ‘temp5’  
  Size: 0         Blocks: 0     IO Block: 4096  regular empty file  
 Device: 801h/2049d    Inode: 134297   Links: 1  
 Access: (0664/-rw-rw-r--) Uid: ( 1000/aubinxia)  Gid: ( 1000/aubinxia)  
 Access: 2014-06-01 17:00:00.000000000 -0400  
 Modify: 2014-06-01 17:00:00.000000000 -0400  
 Change: 2014-06-29 17:16:33.384342673 -0400  
  Birth: -  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch -t 201406011700.53 temp5  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ stat temp5  
  File: ‘temp5’  
  Size: 0         Blocks: 0     IO Block: 4096  regular empty file  
 Device: 801h/2049d    Inode: 134297   Links: 1  
 Access: (0664/-rw-rw-r--) Uid: ( 1000/aubinxia)  Gid: ( 1000/aubinxia)  
 Access: 2014-06-01 17:00:53.000000000 -0400  
 Modify: 2014-06-01 17:00:53.000000000 -0400  
 Change: 2014-06-29 17:17:32.532342254 -0400  
  Birth: -  

Unix Shell: File Time

1. ctime, mtime, atime
atime: last access time.
mtime: last modification time.
ctime: last change time.

2. Get different times of file:
terminal:
command stat can be used to list 3 different times of a given file.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ stat test1  
  File: ‘test1’  
  Size: 0         Blocks: 0     IO Block: 4096  regular empty file  
 Device: 801h/2049d    Inode: 134252   Links: 1  
 Access: (0664/-rw-rw-r--) Uid: ( 1000/aubinxia)  Gid: ( 1000/aubinxia)  
 Access: 2014-06-29 15:22:47.468391015 -0400  
 Modify: 2014-06-29 15:22:47.468391015 -0400  
 Change: 2014-06-29 15:22:47.468391015 -0400  
  Birth: -  
=========================================
command ls can also be used to list 3 different times of a given file.
"ls -l " will give the last modification time
"ls -u" will give the last access time
"ls -c " will give the last change time
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -l test2  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 15:22 test2  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lu test2  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 15:22 test2  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lc test2  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 15:22 test2  

3. Last access time
The last time the file get read "on disk".
1) List the last access time of test3, which is 16:21
2) After running command "cat" to read test3, the access time get updated
3) List the last access time, it already get changed to 16:28
4) Running command "cat" to read test3 again. But when you run cat the 2nd time, the last access time doesn't get changed at all, the reason is probably unix has buffered the file content in memory, and it doesn't go to file system to access the file on disk.
5) List the last access time, it doesn't get changed.
6) Then we add "Hello world" to the tail of test3,
7) List the last access time, but change on content doesn't change the "last access time".
8) Next we run "cat" command again to read test3, since test3's content get changed, so unix has to go to disk to access test3, then last access time get updated.
9) List the last access time, it get updated.

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lu test3  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:21 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lu test3  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:28 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lu test3  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:28 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo "Hello world!" >> test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lu test3  
 -rw-rw-r-- 1 aubinxia aubinxia 13 Jun 29 16:28 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ cat test3  
 Hello world!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lu test3  
 -rw-rw-r-- 1 aubinxia aubinxia 13 Jun 29 16:29 test3  

4. Last change time
The time when file's status or content get changed.
terminal:
1) List the last change time of test3: 16:29
2) change the file status(permission)
3) List the last change time of test3: 16:36, meaning that changing file status could change the "last change time"
4) Add the new string to test3 file
5) List the last change time of test3: 16:37, meaning that changing file content could change the "last change time"
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lc test3  
 -rw-rw-r-- 1 aubinxia aubinxia 13 Jun 29 16:29 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ chmod +x test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lc test3  
 -rwxrwxr-x 1 aubinxia aubinxia 13 Jun 29 16:36 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo "Hello world!" >> test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -lc test3  
 -rwxrwxr-x 1 aubinxia aubinxia 26 Jun 29 16:37 test3  

5. Last Modification time:
The time when file's content get changed.
1) List the last modification time of test4: 16:21
2) change the file's status(permission)
3) List the last modification time of test4: 16:21, meaning that changing file status doesn't change the last modification time.
4) Add "Hello world!" to the file test4
5) List the modification time of test4: 16:40, meaning that changing file content could change the last modification time.
terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -l test4  
 -rw-rw-r-- 1 aubinxia aubinxia 0 Jun 29 16:21 test4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ chmod +x test4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -l test4  
 -rwxrwxr-x 1 aubinxia aubinxia 0 Jun 29 16:21 test4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo "Hello world!" >> test4  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -l test4  
 -rwxrwxr-x 1 aubinxia aubinxia 13 Jun 29 16:40 test4  

Unix Shell: List Files

1. list files 
terminal:
 1) echo command can be used to print out the file names matching the given expression
 2) ls command can print out all files whose name starting with t
 3) ls command can port the result to command cat with pipeline
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo t*  
 test test1 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls t*  
 test test1 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls t* | cat  
 test  
 test1  
 test3  

2. list files in one column
terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -1  
 2  
 test  
 test1  
 test3  

3. list files not existed
terminal:
1) ls need to ensure the given file exists
2) echo doesn't need to ensure that. If given file not exists, it will just 
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls k*  
 ls: cannot access k*: No such file or directory  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo k*  
 k*  

4. Provide no files to list
terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo *  
 2 test test1 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls *  
 2 test test1 test3  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo  
   
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls  
 2 test test1 test3  

5. List Hidden Files
Hidden files are files whose name starts with "."(dot)
terminal:
1) touch command is used to create temporary empty files
2) since all files are hidden, then "echo *" can't be used to echo all files, it will just take "*" as the string to print out
3) because all files are hidden, ls * will just files whose name is "*", and there is no such file
4) ls with nothing will just try to list all files, but nothing get found
5) But hidden files can be found by ".*", meaning any files whose names starting with a "."(dot). Besides three files created by touch, we also have following 2 files:
. : represents current directory
.. : represents the parent directory
6) The first line is a list of all hidden files, the 2nd and 3rd lines are content of "local directory" and content of "parent directory" , including all files/directories these places contain
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ touch .one .two .three  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo *  
 *  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls *  
 ls: cannot access *: No such file or directory  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo .*  
 . .. .one .three .two  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls .*  
 .one .three .two  
   
 .:  
   
 ..:  
 h1 h2 xxdev  

6. List contents of directory itself
sometimes, when running "ls .*" or other similar commands, we don't want to list too much details of other directories, we just want to focus on current directory, we can use -d option
terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls .*  
 .one .three .two  
   
 .:  
   
 ..:  
 h1 h2 xxdev  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -d .*  
 . .. .one .three .two  

7. List all files(including hidden files)
terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -a  
 . .. .one .three .two  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$  

8. List files with more information
terminal:
first column: first character: d means directory, - means ordinary file, l means symbolic links. Next 9 characters: file permissions for user, group and other. r: read, w: write, x: execution, - if the permission is absent.

2nd column:  link count. Only folder is a directory, it is pointing to another place.

3rd and 4th column: file owner and group

5th column: file size in bytes

6th 7th and 8th column: last modification time
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ls -l  
 total 4  
 drwxrwxr-x 2 aubinxia aubinxia 4096 Jun 29 15:23 folder  
 -rw-rw-r-- 1 aubinxia aubinxia  0 Jun 29 15:22 test1  
 -rw-rw-r-- 1 aubinxia aubinxia  0 Jun 29 15:22 test2  
 -rw-rw-r-- 1 aubinxia aubinxia  0 Jun 29 15:22 test3  

awk: numeric functions

1. Mathematical Functions
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   print atan2(90, 1); # arctangent of 90/1  
   print cos(90); # cosine of 90  
   print exp(10); # exponential of 10  
   print int(10.23); # integer part of 10.23  
   print log(10); # natural logrithm of 10  
   print sin(90); # return the sine of 90  
   print sqrt(5); # return the square root of 5  
   
 #output:  
 #1.55969  
 #-0.448074  
 #22026.5  
 #10  
 #2.30259  
 #0.893997  
 #2.23607  
 }'  

2. Random Number
Without srand(), rand() will return the same set of random numbers for each process.
rand() returns the random number from [0,1)
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   for(i=0; i<3; i++)  
   {  
     print rand();  
   }  
 }  
 '  
   
 #output:  
 #0.411633  
 #0.322733  
 #0.179324  
   
 awk 'BEGIN {  
   for(i=0; i<3; i++)  
   {  
     print rand();  
   }  
 }  
 '  
   
 #output:  
 #0.411633  
 #0.322733  
 #0.179324  
========================================
script_1:
srand() could setup the seed which make rand() return different set of random numbers every time.

 #! /bin/bash  
   
 awk 'BEGIN {  
   srand(); #If not providing the seed number as parameter  
        #By default it will use the current time stamp  
   for(i=0; i<3; i++)  
   {  
     print rand();  
   }  
 }  
 '  
   
 #output:  
 #0.582961  
 #0.827199  
 #0.736815  
   
 awk 'BEGIN {  
   srand(999);  
   for(i=0; i<3; i++)  
   {  
     print rand();  
   }  
 }  
 '  
   
 #output:  
 #0.537788  
 #0.605322  
 #0.650132  
   

Saturday, June 28, 2014

awk: string functions(3)

1. String Reconstruction
There is no utility to reconstruct the string from the array, but we can write one by ourselves.
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   arr[1]="Hello";  
   arr[2]="world!";  
   
   print reconstruct(arr, 2, " ");  
   #Output:  
   #Hello world!  
 }  
   
 function reconstruct(arr, len, fs,  i, s)  
 {  
   if(len >= 1)  
   {  
     s=arr[1];  
     for(i=2;i<=len;i++)  
       s=s fs arr[i];  
   }  
   
   return s;  
 }'  

2. String Formatting
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   printf("%10d\n", 100);  
   printf("%10s\n", "Hello");  
 #output:  
 #    100  
 #   Hello  
   
   s1=sprintf("%10d", 100);  
   s2=sprintf("%10s", "Hello");  
   print s1;  
   print s2;  
 #output:  
 #    100  
 #   Hello  
 }'  

awk: string functions(2)

1. String Substitution
text:
 Hello world! Amazing world!  
 Aloha world! Ni Hao world!  


script_1:
 #! /bin/bash  
   
 awk '{  
   sub("world","New York", $0);  
   print $0;  
 }' text  
   
 #output:  
 #Hello New York! Amazing world!  
 #Aloha New York! Ni Hao world!  
 #Replace the "world" by "New York", but just for  
 #one occurance  
   
 awk '{  
   sub("world", "New York");  
   print $0;  
 }' text  
 #output:  
 #Hello New York! Amazing world!  
 #Aloha New York! Ni Hao world!  
 #If not having the third parameter of sub, by default  
 #it will apply on $0, record itself  
   
 awk '{  
   gsub("world", "New York", $0);  
   print $0;  
 }' text  
 #output:  
 #Hello New York! Amazing New York!  
 #Aloha New York! Ni Hao New York!  
 #The only difference between sub and gsub is: gsub  
 #applies on all occurances of "world"  
   
 awk '{  
   gsub("world", "New York");  
   print $0;  
 }' text  
 #output:  
 #Hello New York! Amazing New York!  
 #Aloha New York! Ni Hao New York!  
 #If omiting the 3rd parameter, by default gsub will  
 #apply one $0, the record itself  
   
 awk '{  
   gsub("world", "&&");  
   print $0;  
 }' text  
 #output:  
 #Hello worldworld! Amazing worldworld!  
 #Aloha worldworld! Ni Hao worldworld!  
 #"&" represents another presence of the matched string  
   
 awk '{  
   gsub("world", "\&\&");  
   print $0;  
 }' text  
 #output:  
 #Hello &&! Amazing &&!  
 #Aloha &&! Ni Hao &&!  
 #Use back slash to disable the "&" feature and treat  
 #"&" literally  

2. String splitting
text:
  Hello : world!  


script_1:
 #! /bin/bash  
   
 awk '{  
   split($0, arr);  
   for(i in arr)  
   {  
     print arr[i];  
   }  
 }' text  
 #output:  
 #Hello  
 #:  
 #world!  
   
 #If ignoring the 3rd parameter of split, it will use  
 #FS variable as the separator to split the string and  
 #put each part into array  
   
 awk '{  
   split($0, arr, "[ ]");  
   for(i in arr)  
   {  
     print arr[i];  
   }  
 }' text  
 #output:  
 #  
 #Hello  
 #:  
 #world!  
   
 #We use the single white space as the separator, so  
 #the first string we get is the empty string  
   
 awk '{  
   split($0, arr, ":");  
   for(i in arr)  
   {  
     print arr[i];  
   }  
 }' text  
 #Output:  
 # Hello   
 # world!  
   
 #We use the coln as the the separator, so it will  
 #separate the string into 2 parts separated by colon  
   
 echo =========  
   
 awk 'BEGIN {  
   len=split("Hello", arr, "");  
   print len;  
   for(i in arr)  
   {  
     print arr[i];  
   }  
   
   len=split("", arr);  
   print len;  
 }'  
 #Output:  
 #5  
 #H  
 #e  
 #l  
 #l  
 #o  
 #0  
   
 #If using the empty string as the separator, it will just make   
 #each character as one entry in array.  
 #split's returning code is the length of array. If we give the  
 #empty string as the first parameter, then we will clear up the  
 #array  

Wednesday, June 25, 2014

awk: string functions(1)

1. Substring
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   print substr("Hello", 1, 2);  
   #Output: He  
   #Meaning: start from first character, extract 2 chars  
   
   print substr("Hello", 1);  
   #Output: Hello  
   #If omitting the 3rd parameter, will just extract remaining  
   #characters  
   
   print substr("Hello", 0);  
   print substr("Hello", -1, -2);  
   #substr starts the index from 1 instead of 0, if index is  
   #out-of-bounds, the output is implementation defined  
 }'  

2. letter case conversion
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   print tolower("ABCdef(123)");  
   print toupper("ABCdef(123)");  
   
   #output:  
   #abcdef(123)  
   #ABCDEF(123)  
 }'  

3. string searching
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   print index("Hi world!", "world");  
   #output 4;  
   #This means, awk still start the index from 1  
   # instead of 0  
   
   print index("Hi world!", "WORLD");  
   #output 0;  
   #meaning not found the string  
   
   print index(tolower("Hi world!"), tolower("WORLD"));  
   #output 4;  
   #Ignore the letter case, by converting them to lower case  
   
   print rindex("Hi world Hi world!", "world");  
   #output 13  
   #Self-made the find the index from the end of string  
 }  
   
 function rindex(str, find, k, ls, lf)  
 {  
   ls=length(str);  
   lf=length(find);  
   
   if(lf>ls) return 0;  
   
   for(k=ls-lf+1;k>0;k--)  
   {  
     if(substr(str, k, lf)==find)  
       return k;  
   }  
   
   return 0;  
 }  
 '  

4. String matching
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   str="Hello world! Amazign world!";  
   print str ~ "w.*!";  
   #output 1  
   #meaning: str match the regular expression "w.*!"  
   
   print str ~ "k.*!";  
   #output 0  
   #meaning: str does not match the regular exprssion  
   
   print match(str, "world!");  
   #output 7  
   #meaning: str match the regular expression, and   
   #starting index is 7  
   
   print RSTART, RLENGTH;  
   #Output: 7 6  
   #The side effect of match function is updating the  
   #variable RSTART and RLENGTH, which means starting  
   #index and length of matched string  
   
   print substr(str, RSTART, RLENGTH);  
   #output: world!  
   #We can use the RSTART and RLENGTH to extract matched  
   #string  
 }'  

Tuesday, June 24, 2014

awk: functions

1. awk function basic
script_1:
 #! /bin/bash  
   
 awk 'BEGIN{  
 #Following example demostrates that if passing in  
 #string variables, awk will take the copy instead  
 #of the reference, that is why v1, v2 at BEGIN section  
 #does not get changed even if they are changed inside  
 #func1.   
   
   v1="Hello world!";  
   v2="Amazing world!";  
   
   res=func1(v1, v2);  
    
   print v1;  
   print v2;  
   print res;  
   #output:  
   #Hello world!  
   #Amazing world!  
   #100  
   
 #Following example demostrates that if passing in  
 #array variables, awk will take the reference instead  
 #of the copy. Code in func2 installs 2 entries in   
 #the array, and it is reflected from the caller side.  
 #By the way, func2 does not return any value, so by   
 #default, the return value is empty string. But this  
 #is implementation dependent feature, in some other  
 #platforms, it may return numeric 0.  
   va[0]="Hello";  
   va[1]="world!";  
   
   res=func2(va);  
   for(i in va)  
     print va[i];  
   #output:  
   #0  
   #1  
   #Hello  
   #world!  
   
   print res;  
   #output: empty line  
 }  
   
 function func1(v1, v2)  
 {  
   temp=v1;  
   v1=v2;  
   v2=temp;  
   
   return 100;  
 }  
   
 function func2(va)  
 {  
   va["Hello"]=0;  
   va["world"]=1;  
 }  
 '  

2. Management of Variables
A bad way:
script_1:
 #! /bin/bash  
   
 #Variables name clashing is a big source for awk bugs  
 #For function, if it is using a variable whose name is  
 #not listed in arguments, then it is taken as global   
 #variable. If it is listed in arguments, awk will hide  
 #any global variables with same name  
   
 #At following example, we expect p not getting changed.  
 #After calling find_key, we want to print out the original  
 #string saved in "p". But p is used inside find_key, and  
 #it is taken as global variable, and value get changed there  
 #Its value is the last key in array.  
 awk 'BEGIN {  
   p="Hello world!";  
   va[0]="Hello";  
   va[1]="World";  
   key=find_key(va, "World");  
   print p;  
   print key;  
   
   #Output:  
   #1  
   #1  
 }  
 function find_key(va, value)  
 {  
   for(p in va)  
   {  
     if(va[p] == value)  
       return p;  
   }  
   return "";  
 }'  
========================================
A good way:
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   p="Hello world!";  
   va[0]="Hello";  
   va[1]="world!";  
   key = find_key(va, "world!");  
    
   print p;  
   print key;  
   
   #Output:  
   #Hello world!  
   #1  
 }  
   
 # We declare the p explictly so local variable  
 # p will hide the global one. And their value  
 # will not be clashed any more.  
 function find_key(va, value, p)  
 {  
   for(p in va)  
   {  
     if(va[p] == value)  
       return p;  
   }  
   
   return "";  
 }'  

3. Recursive Function
Write a function to get the greatest common denominator
./script_1:
 #! /bin/bash  
   
 awk -v x=$1 -v y=$2 'BEGIN {  
   r=getMaxCd(x,y);  
   print "getMaxCd(",x,",",y,")=",r;  
 }  
   
 # We need to declare the "r" at the argument list to  
 # hide the global "r" variable  
 function getMaxCd(x, y,  r)  
 {  
   x=int(x);  
   y=int(y);  
   print x,"%",y;  
   r=x % y;  
   return (r==0)? y : getMaxCd(y, r);  
 }'  

terminal:
 aubinxia@aubinxia-fastdev:~$ ./script_1 10 4  
 10 % 4  
 4 % 2  
 getMaxCd( 10 , 4 )= 2  
 aubinxia@aubinxia-fastdev:~$ ./script_1 100 33  
 100 % 33  
 33 % 1  
 getMaxCd( 100 , 33 )= 1  

Sunday, June 22, 2014

awk: external command

1. Run the external command with "system"
text2:
 3 Chicago  
 5 Los Angeles  
 1 Boston  
 4 Atlantic  

script_1:
 #! /bin/bash  
   
 awk '{  
   print $0 > "text";  
 }  
 END {  
   close(text);   
   #clear buffer, making sure buffer content(4 lines of   
   #records from text2) are saved in "text" file  
   
   system("sort < text");  
   #Run the external command, return code of system is  
   #the return code of external command. And standard   
   #output and standard error of external command are   
   #as awk. In this case, awk standard output is terminal  
   #So it will output content to terminal  
 }' text2  
   
 #output:  
 #1 Boston  
 #3 Chicago  
 #4 Atlantic  
 #5 Los Angeles  
   
 #Following awk program is same as above, unless  
 #that, we define standard output of awk to text3  
 #Then system's external command will output to  
 #text3  
 awk '{  
   print $0 > "text";  
 }  
 END {  
   close(text);   
   system("sort < text");  
 }' text2 >text3  
   
 cat text3;  
 #output:  
 #1 Boston  
 #3 Chicago  
 #4 Atlantic  
 #5 Los Angeles  

2. Run the external command with pipeline
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   shell="/bin/bash"  
   command="echo Hello world!";  
   print command | shell;  
   
   print "======="  
   
   command="v=\"Amazing world!\"";  
   print command | shell;  
   
   print "======="  
   
   command="echo $v";  
   print command | shell;  
   close(shell);  
 }'  
   
 #output:  
 #=======  
 #=======  
 #Hello world!  
 #Amazing world!  
   
 # For all commands we passed to shell, it never get   
 # executed until call "close". That's why 2 lines of  
 # separators get output firstly, then shell's output  
 # get printed out lastly.  
 # Before "close" command is called, all commands we  
 # passed to shell will be buffered, and these commands  
 # will get executed like a script when close is called.  
   
 awk 'BEGIN {  
   command="v=\"Hello world!\"";  
   shell="/bin/bash";  
   print command | shell;  
   close(shell);  
     
   command="echo $v";  
   print command | shell;  
   close(shell);  
 }'  
 #output: empty line  
   
 #Since we closed the shell after feeding command to it.  
 #Then next time, when print feed command to the shell,  
 #awk starts a new shell process, who doesn't recognize  
 #the variable "v" at all! Apparently it will output an  
 #empty line, since v is empty there.  

awk: Output Redirection

1. Truncate vs Append
text:
 Hello world!  
 Hello Amazing world!  

text2:
 Hello Chicago  
 Hello Los Angeles  
 Hello Boston  
 Hello Atlantic  

script_1:
 #! /bin/bash  
   
 # print ">" will truncate the file only at "opening"  
 # time, which is the first time when print command   
 # is touching "text". For following executions of  
 # actions, it will add output to end of existing   
 # contents.  
 awk '{  
   print $0 > "text";  
 }' text2;  
   
 cat text;  
 #output:  
 #Hello Chicago  
 #Hello Los Angeles  
 #Hello Boston  
 #Hello Atlantic  
   
 # For each execution of action, print command will  
 # open the "text", which also truncate the file, since  
 # we close the file every time after writing content  
 # inside. So in the end, "text" only contains last  
 # record.  
 awk '{  
   print $0 > "text";  
   close("text");  
 }' text2;  
   
 cat text;  
 #Output:  
 #Hello Atlantic  
   
 #print ">>" will append content to the end of "text"  
 #file. When opening the file, it won't truncate the  
 #existing file.  
 awk '{  
   print $0 >> "text";  
 }' text2;  
   
 cat text;  
 #output:  
 #Hello Atlantic  
 #Hello Chicago  
 #Hello Los Angeles  
 #Hello Boston  
 #Hello Atlantic  

2.Pipeline to external command
text2:
 3 Chicago  
 5 Los Angeles  
 1 Boston  
 4 Atlantic  

script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   command = "sort -k1 > text";  
 }  
 {  print $0 | command; }  
 END { close(command); }' text2  
   
 #After using pipeline to feed input into "command",  
 #close(command) will free resources on awk side, also,  
 #it will run the command, and process all input content   
 #fed before, lastly feed output into text file  
   
 cat text;  
 #output:  
 #1 Boston  
 #3 Chicago  
 #4 Atlantic  
 #5 Los Angeles  
   
 awk 'BEGIN {  
   command = "sort -k1 >text";  
 }  
 { print $0 | command;   
  close(command);}' text2  
   
 #since we are doing the "close" every time we give  
 #input to command, then command is executed every time  
 #with only one line record. During the execution, it  
 # will truncate "text" file. So in the end, text file  
 #only contains last line of text2.  
   
 cat text;    
 #output:  
 #4 Atlantic  
   
 awk 'BEGIN {  
   command = "sort -k1 >>text";  
 }  
 { print $0 | command;   
  close(command);}' text2  
 #Every time when we use close command to clear out  
 #the input buffer, and execute the command with the  
 #content in buffer(only one line of record), the  
 #record is just appended to the end of text file  
 #So in the end, 4 four records in text2 are appended  
 #into text file with the original order.  
   
 cat text;  
 #output:  
 #4 Atlantic  
 #3 Chicago  
 #5 Los Angeles  
 #1 Boston  
 #4 Atlantic  

awk: User Controlled Input(2)

1. User Controlled Input: from terminal:
text:
 Hello world!  
 Hello Amazing world!  

script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   getline < "/dev/tty";  
 #script stops here, waiting for input:  
 #Input:  
 #My Input  
   
   print "BEGIN:" $0;  
   print "BEGIN, NF, NR:",NF,NR;  
 #Output:  
 #My Input  
 #BEGIN:My Input  
 #BEGIN, NF, NR: 2 0  
   
   getline v < "/dev/tty";  
 #Input:  
 #My Own Input  
 #Since getline read input into variable v, instead of $0,  
 #so, NF is not touched.  
   
   print "BEGIN:" v;  
   print "BEGIN, NF, NR:", NF, NR;  
 #Output:  
 #My Own Input  
 #BEGIN:My Own Input  
 #BEGIN, NF, NR: 2 0  
 }  
 { print $0; }' text  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 My Input  
 BEGIN:My Input  
 BEGIN, NF, NR: 2 0  
 My Own Input  
 BEGIN:My Own Input  
 BEGIN, NF, NR: 2 0  
 Hello world!  
 Hello Amazing world!  

2. User Controlled Input: while loop
text:
 Hello world!  
 Hello Amazing world!  

text2:
 Hello Chicago  
 Hello Los Angeles  
 Hello Boston  
 Hello Atlantic  

script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   # getline returning 1 means "success"  
   # returning 0 means "end of file"  
   # returning -1 means error  
   while( getline < "text2" )  
   {  
     print "BEGIN:" $0;  
   }  
 }  
 { print $0; }' text  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 BEGIN:Hello Chicago  
 BEGIN:Hello Los Angeles  
 BEGIN:Hello Boston  
 BEGIN:Hello Atlantic  
 Hello world!  
 Hello Amazing world!  

3. User Controlled Input: from external command
text2:
 Hello Chicago  
 Hello Los Angeles  
 Hello Boston  
 Hello Atlantic  

script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   command="cat text2";  
   # Sice we do not close the command every time after  
   # "getline", so for each iteration, getline will  
   # read the next record compared to the one in the  
   # last iteration.  
   while((command | getline v)>0)  
     print v;  
   close(command);  
 }'  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 Hello Chicago  
 Hello Los Angeles  
 Hello Boston  
 Hello Atlantic  

awk: User Controlled Input(1)

1. User Controlled Input -- from standard input
text2:
 Hello Chicago  
 Hello Los Angeles  
 Hello Boston  
 Hello Atlantic  

script_1:
 #! /bin/bash  
   
 awk 'BEGIN {   
   print $0;  
 #output: empty line;  
 #At this time, $0 is still empty  
   
   getline;  
 #Read in the next record from standard input to  
 #$0, while updating the NF NR at the same time.  
   
   print "BEGIN:" $0;  
   print "BEGIN: NF, NR:", NF, NR;  
 #output:  
 #BEGIN:Hello Chicago  
 #BEGIN: NF, NR: 2 1  
   
   getline v;  
 #Read in the next record from standard input to  
 #variable v, while updating the NF, BUT NOT UPDATING  
 # NF, since $0 is not touched at all.  
   
   print "BEGIN: v:",v;  
   print "BEGIN: NF, NR:", NF, NR;  
 #output:  
 #BEGIN: v: Hello Los Angeles  
 #BEGIN: NF, NR: 2 2  
 #Actually the value of NF is wrong, number of fields is  
 #supposed to be 3 in this case. But since $0 is not touched  
 #so it does not get updated.  
 }  
 { print $0; }  
 #Since BEGIN section already read in 2 lines of records  
 #this action will be applied on remaining records only  
 ' text2  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
   
 BEGIN:Hello Chicago  
 BEGIN: NF, NR: 2 1  
 BEGIN: v: Hello Los Angeles  
 BEGIN: NF, NR: 2 2  
 Hello Boston  
 Hello Atlantic  

2. User Controlled Input -- from external file
text:
 Hello world!  
 Hello Amazing world!  

text2:
 Hello Chicago  
 Hello Los Angeles  
 Hello Boston  
 Hello Atlantic  

script_1:
 #! /bin/bash  
   
 awk 'BEGIN {   
   getline <"text";  
 #Since this is our first time to read from "text" file,  
 #awk will open the file specifically and do all necessary  
 #underlying work.  
 #Similarly, getline will read the input record from "text"  
 #file to $0, and NF, NR get updated.  
   
   print "BEGIN:" $0;  
   print "BEGIN: NF, NR:" NF, NR;  
 #Output:  
 #BEGIN:Hello world!  
 #BEGIN: NF, NR:2 0  
   
   close("text");  
 #This will close the "text" file, then next getline  
 #command will re-open the "text" file, and make reading  
 #position to go to the beginning of file again.   
 #Without closing the file here, next getline command  
 #will read in "next record" compared to the one it   
 #already read in above.  
   
   getline v<"text";  
 #Since we already closed the "text" file above, in this  
 #case, getline has to re-open it. And start reading into  
 #record from the beginning again. Similarly because getline  
 #read into variable v, instead of $0, NF is not getting updated.  
   
   print "BEGIN:" v;  
   print "BEGIN: NF, NR:" NF, NR;  
 #Output:  
 #BEGIN:Hello world!  
 #BEGIN: NF, NR:2 0  
 }  
 { print $0; }  
 #In Begin section, we do not touch the standard input.  
 #So this action will be applied on all records.  
 ' text2  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 BEGIN:Hello world!  
 BEGIN: NF, NR:2 0  
 BEGIN:Hello Amazing world!  
 BEGIN: NF, NR:2 0  
 Hello Chicago  
 Hello Los Angeles  
 Hello Boston  
 Hello Atlantic  

awk: Statements(3)

1. Comprehensive Iterative Example:
./script_1:
 #! /bin/bash  
   
 awk -v num=$1 'BEGIN {  
   n=num;  
   m=n=(num>=2)? n:2;  
   factors = "";  
   for(k=2; (m>1) && (k^2 <= n);)  
   {  
     if(int(m % k) != 0)  
     {   
       k++;  
       continue;  
     }  
   
     m/=k;  
     factors = (factors == "")? ("" k) : (factors "*" k);  
   }  
   
   if( (m > 1) && ( m < n) )  
     factors = factors "*" m;  
   print n, (factors=="")? "is a prime number":("=" factors)  
 }'  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 2  
 2 is a prime number  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 3  
 3 is a prime number  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 4  
 4 =2*2  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 5  
 5 is a prime number  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1 6  
 6 =2*3  

2. Array Member Testing
awk needs contant time to do array member testing.
text2:
 1 Chicago  
 4 Boston  
 3 Atlantic  

./script_1:
 #! /bin/bash  
   
 awk '{  
   cities[$1]=$2;  
 }  
 END {  
   for(i=1; i<=4; i++)  
   {  
     if (i in cities)  
       print i ":" cities[i];  
   }  
 }' text2  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 1:Chicago  
 3:Atlantic  
 4:Boston  

3. awk Flow Control
Skip further patterns for this record:
text 2:
 start Chicago  
 skip Boston  
 end Atlantic  

script_1:
After executing next, awk will ignore all remaining statements, including the "print $0"
 #! /bin/bash  
   
 awk '/skip/ { next; print $0;}  
 { print $0; } ' text2  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 start Chicago  
 end Atlantic 
=======================================
Exit from current awk script:
text2:
 start Chicago  
 process Los Angeles  
 exit Boston  
 end Atlantic  

script_1:
 #! /bin/bash  
   
 awk '/exit/ { exit 20; }  
 { print $0; } ' text2  
   

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 start Chicago  
 process Los Angeles  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ echo $?  
 20  

Saturday, June 21, 2014

awk: Statements(2)

1. Iterative Statements: for(2)
text2:
 2 Chicago  
 1 Boston  
 3 Atlanta  

script_1:
i  will iterate the index of all cities.
 #! /bin/bash  
   
 awk '{ cities[$1] = $2; }  
    END {  
    for ( i in cities )  
    {  
      print i ":" cities[i];  
    }}' text2  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 1:Boston  
 2:Chicago  
 3:Atlanta  

2. Iterative Statements: for(3)
text2 is same as above

script_1:
"break" is getting out of current loop, "continue" is going to next iteration of the loop and skipping remaining operations.
 #! /bin/bash  
   
 awk '{ cities[$1] = $2; }  
    END {  
    for ( i in cities )  
    {  
      if (i==2)  
        break;  
      print i ":" cities[i];  
    }  
      
    print "";  
   
    for (i in cities)  
    {  
      if (i==1)  
        continue;  
      print i ":" cities[i];  
    }  
 }' text2  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 1:Boston  
   
 2:Chicago  
 3:Atlanta  

3. Iterative Statements: while
text2:
 Hello Chicago  
 Hello Boston  
   

script_1:
 #! /bin/bash  
   
 awk '{  
   i=0;  
   while(i<=NF)  
   {  
     print $i;  
     i++;  
   }  
 }' text2  

terminal:
while loop is checking the condition before each iteration, if true, continue to run.The last empty line is for the last empty line in "text2". awk is still taking that as one record.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 Hello Chicago  
 Hello  
 Chicago  
 Hello Boston  
 Hello  
 Boston  
   

awk: Statements(1)

1. Conditional Statements:

text2:
 1 Hello world!  
 2 Hello Chicago!  
 3 Hello Boston!  

script_1:
 #! /bin/bash  
   
 awk '{  
   if($1 == 1)  
     print $0;  
   else if($1 == 2)  
     print $2;  
   else   
     print $3;  
   }' text2;  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 1 Hello world!  
 Hello  
 Boston!  

2. Iterative Statements: for(1)
text2:
 Hello world!  
 Hello Chicago!  

script_1:
"i=0" is executed at the very beginning.
"i<=NF" is executed before each iteration, if true, continue to run
"i++" is executed after each iteration.
 #! /bin/bash  
   
 awk '{  
   for (i=0; i<=NF; i++)  
   {  
     print $i;  
   }  
   }' text2;  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 Hello world!  
 Hello  
 world!  
 Hello Chicago!  
 Hello  
 Chicago!  

Don't use float number in condition:
script_1:
 #! /bin/bash  
   
 awk 'BEGIN {  
   for (i=1; i>=0; i-=0.05)  
   {  
     print i;  
   }  
   }'  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ ./script_1  
 1  
 0.95  
 0.9  
 0.85  
 0.8  
 0.75  
 0.7  
 0.65  
 0.6  
 0.55  
 0.5  
 0.45  
 0.4  
 0.35  
 0.3  
 0.25  
 0.2  
 0.15  
 0.1  
 0.05  
It doesn't output the final 0. Because of the implementation of float number, every time i is decreased by more than 0.05 a little bit.

awk: one-line examples(3)

1. Convert double space lines to single space lines
text2:
 1 Hello  
   
 2 World  
   
 3 Hello  
   
 4 Chicago!  
   
 5 Hello  
   

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk -v RS="\n *\n" '{ print; }' text2  
 1 Hello  
 2 World  
 3 Hello  
 4 Chicago!  
 5 Hello  

2. Locate lines whose length exceeds the upperlimit
text2:
 Hello  
 Hello Chicago!  
 Hello Los Angeles!  

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ egrep "^.{8,}" text2  
 Hello Chicago!  
 Hello Los Angeles!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk 'length($0)>8' text2  
 Hello Chicago!  
 Hello Los Angeles!  

3. Strip the mark up tags
text2:
 <head>Hello<head />  
 <body>Hello Chicago!<body />  
 <end>Hello Los Angeles!<end />  

terminal:
We change the record separator to one regular expression representing the markup tag, and output record separator to a white space. Then in the end, for each record, awk execute the action to print it out and + white space in the end.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk 'BEGIN { ORS=" "; RS="<[^<>]*>" } { print }' text2  
  Hello   
  Hello Chicago!   
  Hello Los Angeles!   

4. Extract title tag from xml
text2:
 <title>Unix Shell</title>    
 <body>Unix shell is very awesome!</body>  
   <title>Algorithm</title>  
 <body>Algorithm is very awesome</body>   
     <title>Machine Learning</title>   
 <body>Machine Learning is very awesome</body>  

terminal:
For each record, as long as it satisfies the title markup tag, awk will execute the default action to print it out and pipe to another sed command, which remove the spaces at the beginning.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk '/<title *>.*<\/title *>/' text2  
 <title>Unix Shell</title>    
   <title>Algorithm</title>  
     <title>Machine Learning</title>   

 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk '/<title *>.*<\/title *>/' text2 | \  
 > sed -e 's/ *<title>/<title>/g'  
 <title>Unix Shell</title>    
 <title>Algorithm</title>  
 <title>Machine Learning</title>   

awk: one-line examples(2)

1. Simulation of grep
text2 is same as above

terminal:
1) grep using the regular expression "Hello"
2) With awk command, for each record, as long as it match regular expression "Hello", we execute the default action, print out.
3) For each matched record, we output with a special format including filename, record number(line number) and record itself.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ grep Hello text2  
 1 Hello  
 3 Hello  
 5 Hello  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk '/Hello/' text2  
 1 Hello  
 3 Hello  
 5 Hello  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk '/Hello/ { print FILENAME,":", FNR,":", $0 }' text2  
 text2 : 1 : 1 Hello  
 text2 : 3 : 3 Hello  
 text2 : 5 : 5 Hello  

2. Line restriction search
text2 is same as above

terminal:
1) -e means command option. For each record, sed will firstly try to print it out. -n disable the option. Next step, if the record satisfy the condition(line 1 to line 4), p option means printing it out. So the command means print out the first 4 lines of records.

2) For each record, if it satisfies the condition: line number is 1 to 4, and match the regular expression "Hello", then awk execute the default action: print it out.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ sed -n -e 1,4p text2  
 1 Hello  
 2 World  
 3 Hello  
 4 Chicago!  
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk '(FNR>=1) && (FNR<=4) && /Hello/' text2  
 1 Hello  
 3 Hello  

3. Swap columns
text2 is same as above

terminal:
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk '{ print $2,$1; }' text2  
 Hello 1  
 World 2  
 Hello 3  
 Chicago! 4  
 Hello 5  

4. Convert the column separators
text2 is same as above

terminal:
At the "BEGIN" section, we set up the "OFS" variable to tab, for each input record, as long as we changed the value of one field, the $0 will be assembled with each field and new OFS variable, in this case, tab. After assignment of $1, first field, $0 get changed too.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk 'BEGIN { OFS="\t"; } { $1=$1; print; }' text2  
 1    Hello  
 2    World  
 3    Hello  
 4    Chicago!  
 5    Hello  

5. Convert carriage-return/newline line terminator to newline terminator:
text2 is same as above

terminal:
We use carriage-return/newline as the record separator to retrieve records. And then use print command to print out  record, and "print" command will add the newline operator in the end.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk 'BEGIN { RS="\r\n"; } { print; }' text2  
 1 Hello  
 2 World  
 3 Hello  
 4 Chicago!  
 5 Hello  

6 Convert single-space line to double-space lines
text2 is same as above
terminal:
1) First Command is changing output record separator to double newline operator, then for each record, awk execute the action to print out the record plus the double newline operator.
2)Second Command also changed the output record separator to double new line operator. Then for each record, pattern "1" means always true, also means for each record, awk execute default action, print out the record + output record separator.
3) Third command doesn't change the output record separator, it is still single new line operator. But the action use two print command to add two output record separators.
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk 'BEGIN { ORS="\n\n"; } { print; } ' text2  
 1 Hello  
   
 2 World  
   
 3 Hello  
   
 4 Chicago!  
   
 5 Hello  
   
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk 'BEGIN { ORS="\n\n"; } 1' text2  
 1 Hello  
   
 2 World  
   
 3 Hello  
   
 4 Chicago!  
   
 5 Hello  
   
 aubinxia@aubinxia-fastdev:~/Desktop/xxdev$ awk '{ print;print "";}' text2  
 1 Hello  
   
 2 World  
   
 3 Hello  
   
 4 Chicago!  
   
 5 Hello