Saturday, May 3, 2014

Unix Shell Extended Regular Expression

1. Match single character

Everything is the same as BRE, except that for all special characters in brackets, we have to use "\" to escape the special characters:

test:
 #! /bin/bash  
 []\  

terminal:
1) First command uses BRE to grab the "[" in bracket
2) Second command is using the ERE, so we have to use the "\" to escape the "[" in bracket, otherwise, it will be wrong.
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep [[] ./test  
 []\  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E [\[] ./test  
 []\  

2. Backreferences don't exist in extended regular expressions

3. Multiple occurances of pattern

test:
 #! /bin/bash  
 a  
 aaa  
 hijhijhij  

terminal:
1) First command "a?" means that "0 or more a". So every line in ./test is picked up.
2) Second command "a+" means that "1 or more a". So only lines containing at least one 'a' is picked up.
3) Third command "aa+" means that "starting with one a, and follows at least another a", only lines containing at least 2 consecutive 'a' are picked
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E a? ./test  
 #! /bin/bash  
 a  
 aaa  
 hijhijhij  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E a+ ./test  
 #! /bin/bash  
 a  
 aaa  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E aa+ ./test  
 aaa  

4. Exact Number of Occurances of Patterns

test file is the same as above.

terminal:
1) First command, a{1} represents the line containing at least one "a"
2) Second command, a{2} represents the line containing exactly two 'a', a{2,} represents the line containing at least two 'a'
3) Third command, a{1,3} represents the line containing from one to three 'a'
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E a{1} ./test  
 #! /bin/bash  
 a  
 aaa  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E a{2} ./test  
 aaa  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "a{1,3}" ./test  
 #! /bin/bash  
 a  
 aaa  

5. Alternation

test file is the same as above.

terminal:
1. First command is trying to grab lines containing "aaa" or "hij"
2. Second command is trying to grab lines containing "at least two 'a'" or "hij"
Note: pipe operator will catch all the line to the right end and to the left end.
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "aaa|hij" ./test  
 aaa  
 hijhijhij  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "a{2,}|hij" ./test  
 aaa  
 hijhijhij  

6. Grouping
test:
 #! /bin/bash  
 a  
 aaa  
 hijhijhij  
 Hello Hello Hello  
 world  
 world world  

terminal:
First command: (a|hij)+ means more than one occurance "a" or "hij"
Second command: (hij|aa+) means hij or at least 2 occurances of "a"
Third command: (Hello|world)+ means at least one copy of "Hello" or "world"
Fourth command: (Hello|world){2,} means at least two copies of "Hello" or "world" in sequence. We get nothing because there is white space between two "Hello" or "world" in our text file. The correct way is the fifth command.
Fifth command:  ((Hello|world)[[:space:]]){2,} means "Hello" or "world" following with one space, this pattern repeat at least two times. We only pick the line containing three "Hello" instead of last line. That's because the last "world" in the last line get followed with a new line character, instead of a white space.
Sixth command: the expression means "Hello" or "world", followed with 0 or more spaces, such a pattern repeat at least 2 times.
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "(a|hij)+" ./test  
 #! /bin/bash  
 a  
 aaa  
 hijhijhij  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "(hij|aa+)" ./test  
 aaa  
 hijhijhij  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "(Hello|world)+" ./test  
 Hello Hello Hello  
 world  
 world world  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "(Hello|world){2,}" ./test  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "((Hello|world)[[:space:]]){2,}" ./test  
 Hello Hello Hello  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "((Hello|world)[[:space:]]{0,}){2,}" ./test  
 Hello Hello Hello  
 world world  

7. Anchor the text

test file is the same as above

terminal:
First command: try to get lines starting with "Hello" or ending with "world"
Second command: try to get lines containing exactly "Hello" or "world"
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "(^Hello|world$)" ./test  
 Hello Hello Hello  
 world  
 world world  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -E "^(Hello|world)$" ./test  
 world  

8. Unix Applications and BRE, ERE
Tools Using BRE: sed, ed, more, vi/vim, grep -G,
Tools using ERE: egrep(or grep -E), awk, lex


No comments:

Post a Comment