Thursday, June 19, 2014

awk: Patterns and Actions

1. Patterns:
text:
 Hello Chicago!  
 Hello New York!  
 Hello Los Angeles!  

text2:
 Hello world!  
 Hello amazing world!  

script_1:
 #! /bin/bash  
   
 #For each record, if it matches regular expression  
 #"Chicago", the action will be carried out.  
 awk '/Chicago/ { print $0; }' text  
 #output:  
 #Hello Chicago!  
   
 #Above statement is equal to following statement:  
 #"~" operator means match, if $0 match regular expression  
 #"Chicago", the action will be carried out  
 awk '$0 ~ /Chicago/ { print $0; }' text  
 #output:  
 #Hello Chicago!  
   
 #Actually placing a regular expression is processed by  
 #awk specially. The formal format is providing one   
 #expression returning true of false  
 awk 'NF>2 { print $0; }' text  
 #output:  
 #Hello New York!  
 #Hello Los Angeles!  
   
 #We can provide a more complicated expression. We only  
 #output the record if number of fields is greater than 2  
 # and FILENAME is equal to "text"  
 awk 'NF>2 && FILENAME=="text" { print $0; }' text text2  
 #output:  
 #Hello New York!  
 #Hello Los Angeles!  
   
 #We use a range expression here, from record number 2 to  
 #record number 4, we print out the record  
 awk 'NR==2, NR==4 { print $0; }' text text2  
 #output:  
 #Hello New York!  
 #Hello Los Angeles!  
 #Hello world!  
   
 #We use a range expression, each part is a regular expression  
 #We extract records starting from the one matching "New York"  
 #until the one matching "Los Angeles".  
 awk '/New York/, /Los Angeles/ { print $0; }' text text2  
 #output:  
 #Hello New York!  
 #Hello Los Angeles!  
   
 #Following example illustrates that for FILENAME, FNR, NR, NF.  
 #At the begin section, they are all unintialized.  
 #Also, if the program(action) only contains BEGIN part, then  
 #it won't process any input text, so we don't need to provide  
 #the input file in this case.  
 awk 'BEGIN {  
   print FILENAME; #output: empty string  
   print FNR;    #output: 0  
   print NR;    #output: 0  
   print NF;    #output: 0  
 }'  
   

2. Actions
 #! /bin/bash  
   
 #Only providing the expression, and ignoring the action part  
 #Following statement means, for each record, expression is 1(true),  
 #awk should carry out the default action, which is print the   
 #record, and add the output record separator, which is newline  
 #by default  
 echo "Hello world!" | awk '1'  
 #output "Hello world!"  
   
 #Pure "print" statement means printing out the entire record  
 #and add the output record separator in the end. Ignoring the  
 #expression part means that for each record, awk should carry  
 #out the action.  
 echo "Hello world!" | awk '{ print; }'  
 #output: "Hello world!"  
   
 #print will add the output record separator to the end anyway  
 #but "," means, it will convert it to output field seperator  
 echo "Hello world!" | awk '{   
   print $1,$2;  
   print $1 $2;  
 }'  
 #output:  
 #Hello world!  
 #Helloworld!  
   
 #We change the OFS to "\n", so it will output use new line  
 #to seperate different fields  
 echo "Hello world!" | awk '{  
   OFS="\n";  
   print $1, $2;  
 }'  
 #output:  
 #Hello  
 #world!  
   
 #$0 is already populated before entering the action body  
 #so changing the OFS doesn't affect $0, but if changing   
 #the any field variable, it will cause awk to re-populate  
 #$0 with current OFS variable  
 echo "Hello world!" | awk '{  
   OFS="\n";  
   print $0;  
   #output: Hello world!  
   
   $1=$1;  
   print $0;  
   #output:  
   #Hello  
   #world!  
 }'  
   

No comments:

Post a Comment