Unix Shell, Programming and Multithreading: awk: Program Elements(3)

1. Built-in Variables
./text:

 Hello world!!  
 Amazing world!

./text_2:

 Hello Chicago!  
 Hello New York!

FILENAME:

 #! /bin/bash  
   
 cat text | awk '{  
   print "FILENAME:",FILENAME;  
 }'  
   
 #output: (Because text has 2 lines, which are 2 records  
 #In this case, awk will apply the pattern/action to each   
 #record once, then output 2 lines of codes. Also, since 
 #the text file content is from pipe, awk will take it as
 #standard input, FILENAME will be '-' in this scenario)
 #FILENAME: -  
 #FILENAME: -     
   
 awk '{  
   print "FILENAME:",FILENAME;  
 }' text  
   
 #output:  
 #FILENAME: text  
 #FILENAME: text

=========================================
FNR: current file record number

 #! /bin/bash  
   
 cat text | awk '{  
   print "FNR:",FNR;  
   #Output: (FNR represents current record number in the  
   #current file, awk applies the pattern/action to each  
   #record, which output each record number)  
   #FNR: 1  
   #FNR: 2  
 }'

==========================================
FS: input field separator

 #! /bin/bash  
   
 cat text | awk '{  
   print "FS:",FS;  
   FS="w";  
   print $1,$2;  
   #output:  
   #FS:   
   #Hello world!!  
   #FS: w  
   #Amazing orld!  
   
   #FS represents: Field Separator  
   #Above output looks confusing, the true story is:  
   #When awk firstly applies the action to first   
   #record "Hello world!!", FS is just white space,   
   #Before entering the action body, fields are already  
   #separated by "white space" field separator.   
   #So during the procedure while applying the action   
   #to first record, it output "FS" as white space,   
   #$1 $2 are "Hello" and "world!!" separately.  
   #But during the procedure while applying the action  
   #to second record, "FS" is already changed to "w".  
   #Before entering the action body, fields are separated  
   #by separator "w".  
 }'  
   
 #Normally we setup the FS outside of program action body  
 cat text | awk -v FS=w '{  
   print $1,$2;  
 }'  
 #output:  
 #Hello orld!!  
 #Amazing orld!

==========================================
NF: number of fields in current record

 #! /bin/bash  
   
 cat text | awk '{  
   print NF;  
   NF=8;  
 }'  
   
 #Output:  
 #2  
 #2  
 #  
 #NF variable is assigned value of "number of fields" in  
 #current record before entering the action body. So although  
 #we assigned 8 to NF in action, before entering the action   
 #against next record, it will be assigned to 2 again.

===========================================
NR: record number of current job.
The difference from FNR: FNR is the record number in current file. Following example illustrates that when feeding two files: text and text_2 to awk, FNR will only output record number "inside" the file.
But NR will output record number across different files.

 #! /bin/bash  
   
 awk '{  
   print FNR;  
 }' text text_2  
   
 #output:  
 #1  
 #2  
 #1  
 #2  
   
 awk '{  
   print NR;  
 }' text text_2  
 #output:  
 #1  
 #2  
 #3  
 #4

=============================================
OFS: output field separator

 #! /bin/bash  
   
 echo "" | awk '{  
   print "Hello", "world!"  
 }'  
 #output: Hello world!  
 #Since default OFS is white space, which is used  
 #to separate output fields  
   
 echo "" | awk '{  
   OFS="::";  
   print "Hello", "world!";  
 }'  
 #output:  
 #Hello::world!  
 #OFS is set to "::", used to separate output fields  
   
 echo "" | awk -v OFS=:: '{  
   print "Hello", "world!";  
 }'  
 #output:  
 #Hello::world!  
 #OFS is setup outside of the action body

=============================================
ORS: output record separator

 #! /bin/bash  
   
 awk '{  
   print $1, $2;  
 }' text  
   
 #output:  
 #Hello world!  
 #Amazing world!  
 #By default, ORS is new line operator, which means  
 #it will add "\n" to the end of "output record"  
   
 awk '{  
   ORS="::"  
   print $1,$2  
 }' text  
 #output:  
 #Hello world!!::Amazing world!::  
 #We changed the output record separator to "::", so  
 #awk will add the "::" to end of each line  
   
 printf "\n"

=============================================
RS: input record separator

 #! /bin/bash  
   
 awk '{  
   print $0;  
 }' text  
   
 #output:  
 #Hello world!  
 #Amazing world!  
   
 #By default, RS is new line operator, so each record  
 #is just the line itself  
   
 awk '{   
   RS=" ";  
   print $0;  
 }' text  
   
 #output:   
 #Hello world!!  
 #Amazing  
 #world!  
   
 #Before entering the action body, the next record is   
 #already retrieved based on the RS variable.   
 #For first record, before entering the action, RS is still  
 #the default value: new line. So the first record is retrieved  
 #as "Hello world". But when processing the 2nd record, the RS  
 #variable is already changed to white space, so the 2nd record  
 #is retrieved as "Amazing", the 3rd one is retrieved as "world!"  
   
 awk -v RS=' ' '{  
   print $0;  
 }' text  
 #output:  
 #Hello  
 #world!!  
 #Amazing  
 #world!  
   
 #Correct way is to put the RS outside of action body.

Unix Shell, Programming and Multithreading

Thursday, June 19, 2014

awk: Program Elements(3)

No comments:

Post a Comment