Sunday, June 15, 2014

awk: overview

1. awk Command Line format

./awkCommand:
 {   
   print "Output from awk command";  
   print $1,$2;  
 }  


./script_1:
 #! /bin/bash  
   
 #awk format:  
 #awk [ -F fs ] [ -v var=value ... ] 'program' [ -- ] \  
 #  [ var=value ... ] [ file(s) ]  
 #-F means setting up the default field separator  
 #-v means setting up the variables which are used by "program"  
 #-- means there are no other command options  
 #The last line of format means executing more than one  
 #file while setting up the variable firstly  
   
 #awk [ -F fs ] [ -v var=value ... ] -f programfile [ -- ] \  
 #  [ var=value ... ] [ file(s) ]  
 #If the program is not convenient to write in awk command,   
 #we can use -f option to provide the program file  
   
 #Change Default Field Separator to ":"  
 echo "Hello : world!" | awk -F: '{ printf $1 $2 "\n" }'  
 #output:  
 #Hello world!  
   
 #Change Default Field Separator to " : "(space : space)  
 echo "Hello : world!" | awk -F " : " '{ printf $1 $2 "\n" }'  
 #output:  
 #Helloworld!  
   
 #Use Default Field Separator  
 #Since default separator is space, so we have 3 fields  
 #placing together  
 echo "Hello : world!" | awk '{printf $1 $2 $3"\n" }'  
 #output:  
 #Hello:world!  
   
 #Change variable by -v  
 #Change default output separator to "##"  
 #We have to use print here:  
 #print can add new line at the end automatically  
 #print argument separated by "," can be substituted by OFS  
 echo "Hello world!" | awk -v 'OFS=##' '{ print $1,$2 }'  
 #output:  
 #Hello##world!  
   
 echo "Hello world!" | awk -f awkCommand  
 #output:  
 #Output from awk command  
 #Hello world!  
   

2. Programming Model
Input text will be taken as a collection of records(each line is one record), and each record is taken as a collection of fields

pattern { action }   Run the action for each line who is matching the pattern
pattern                    print the line to standard output for each matching line.
             { action }   Run the action for each line

awkInput:
 Hello world!  
 Hello New York!  
 Hello Chicago!  


awkCommand:
 /Chicago/   
 /Chicago/ { print $1,"Middle West!" }  
 /Middle West!/ { print $1, "America!" }  


./script_1:
 #! /bin/bash  
   
 #Run the action for each line of input file  
 awk '{ print $0 }' <awkInput  
 #output:  
 #Hello world!  
 #Hello New York!  
 #Hello Chicago!  
   
 #Without providing the matching pattern, then action  
 #goes against each input line.  
 #For each line of input file, run each action against  
 #each line. (Note: Iteration of each line of input file  
 #goes first, for each line run each action, Then turn   
 #to next line)  
 awk '{ print $0 } { print "||",$0,"||" }' <awkInput  
 #output: ($0 means entire input line)  
 #Hello world!  
 #|| Hello world! ||  
 #Hello New York!  
 #|| Hello New York! ||  
 #Hello Chicago!  
 #|| Hello Chicago! ||  
   
 #For each line, who is matching the pattern "Hello"  
 #Default action is outputting the data to standard  
 #output.  
 awk '/Hello/' < awkInput  
 #Output:  
 #Hello world!  
 #Hello New York!  
 #Hello Chicago!  
   
 #For the line matching pattern /Chicago/, output it  
 #with a special format  
 awk '/Chicago/ { print "||",$0,"||" }' < awkInput  
 #Output:  
 #|| Hello Chicago! ||  
   
 awk -f awkCommand < awkInput  
 #First pattern: for lines matching pattern /Chicago/,  
 #we don't provide any action, then its default action  
 #is output that line to standard output  
 #Second pattern: for lines maching pattern /Chicago/  
 #we provide that action printing it out as:  
 #"Hello, Middle West!"  
 #Note: at this time, the original record in original  
 #input doesn't change, it is still "Hello Chicago!"  
 #Third pattern, test if current record is matching the  
 #pattern "Middle West!"(It is still "Hello Chicago!"),   
 #the anwser is no. And the consequence is: it doesn't   
 #execute the action provided.  
   
 #output:  
 #Hello Chicago!  
 #Hello Middle West!  
   

3. BEGIN and END
pattern BEGIN means: the following action will be executed at the very beginning
pattern END means: the following action will be executed at the very end

awkInput is same as above

awkCommand:
 /Chicago/   
 /Chicago/ { print $1,"Middle West!" }  
 /Middle West!/ { print $1, "America!" }  
 BEGIN { print "Begin awk 1!" }  
 END { print "End awk!" }  
 BEGIN { print "Begin awk 2!" }  


./script_1:
 #! /bin/bash  
   
 #BEGIN action will be exectued at the very beginning  
 #since we don't provide any other "pattern action" pair  
 #the program just does nothing  
 awk 'BEGIN { print "Begin awk!" }' <awkInput  
 #Output:  
 #Begin awk!  
   
 #END action will be executed at the very end  
 #similar as above  
 awk 'END{ print "End awk!" }' <awkInput  
 #Output:  
 #End awk!  
   
 #Besides BEGIN,END action, we provide the pattern /Hello/  
 #For each line who is matching this pattern, the default   
 #action is printing it to standard output  
 #Execute the BEGIN action the very beginning, execute the  
 #END action at the very end  
 awk 'BEGIN { print "Begin awk!" } END { print "End awk!" } /Hello/' <awkInput  
 #Output:  
 #Begin awk!  
 #Hello world!  
 #Hello New York!  
 #Hello Chicago!  
 #End awk!  
   
 #It doesn't matter where BEGIN and END action is,  
 #BEGIN actions will be executed in the beginning  
 #END action will be executed in the end  
 #If there are more than 1 BEGIN or END actions  
 #Execute them in order.  
 awk -f awkCommand <awkInput  
 #output:  
 #Begin awk 1!  
 #Begin awk 2!  
 #Hello Chicago!  
 #Hello Middle West!  
 #End awk!  

No comments:

Post a Comment