Friday, May 2, 2014

Unix Shell Basic Regular Expression(2)

1. Backreferences

test:
 abcabc  
 abcdefcdab  

 "Hello world"  

terminal:

"(" ")" are special characters, we have to use "\" to escape it when using it.
Following example take "abc" as the first matching pattern, and \1 represents the first matching pattern. This match means we have to grep the line containing exactly "abcabc".
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep "\(abc\)\1" ./test  
 abcabc  

Following example take "ab" and "cd" as the first and second matching pattern, it follows with any number of "e" or "f", then finally end up with "cd" and "ab".
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep "\(ab\)\(cd\)[ef]*\2\1" ./test  
 abcdefcdab  

Following example take '"' double quote as the first match, it follows with any number of any characters, then end up with another double quote.
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep '\("\).*\1' ./test  
 "Hello world"  

2.Match multiple occurances
test:
 #! /bin/bash  
 a  
 aaa  
 hijhijhij  

terminal:
4 Commands indicates:
1) select lines containing exactly one 'a'
2) select lines containing at least one 'a'
3) select lines containing one to three 'a'
4) select lines containing exactly three 'a'
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G "a\{1\}" ./test  
 #! /bin/bash  
 a  
 aaa  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G "a\{1,\}" ./test  
 #! /bin/bash  
 a  
 aaa  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G "a\{1,3\}" ./test  
 #! /bin/bash  
 a  
 aaa  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G "a\{3\}" ./test  
 aaa  

combination with backreferences:
1) First command, select lines starting with "hij" and contain exactly another "hij", so output highlights first 2 copies of "hij"
2) Second command, select lines starting with "hij" and is followed with at least one copy of "hij". So output highlights all 3 copies of "hij"
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G "\(hij\)\1\{1\}" ./test  
 hijhijhij  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G "\(hij\)\1\{1,\}" ./test  
 hijhijhij  

3. Anchor the text

test:
 #! /bin/bash  
 a  
 aaa  
 hijhijhij  

terminal:
1) First command selects all lines starting with characters "hij"
2) Second command selects all lines ending with characters "hij"
3) Third command selects lines having exactly 3 alpha characters
4) Fourth command complements(-v) "^$" meaning empty lines, it just filtered out all empty lines and output non-empty lines.
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G ^hij ./test  
 hijhijhij  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G hij$ ./test  
 hijhijhij  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -G "^[[:alpha:]]\{3\}$" ./test  
 aaa  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ grep -Gv "^$" ./test  
 #! /bin/bash
 a  
 aaa  
 hijhijhij  

4. Take a look at the compiled c++ file
We use g++ to compile a cpp file: foo.cpp, and -E means, we just output the compiled code instead of linking it. We output the compiled code to grep through pipeline, and use "-v ^$" to remove all empty lines. And the result is crazy, we have 12534 lines of code for a simple "Hello world" c++ program.

 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ g++ -E foo.cpp | grep -v "^$" > foo.text  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ cat foo.text | wc -l  
 12534  
 aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ less foo.text  
 # 1 "foo.cpp"  
 # 1 "<command-line>"  
 # 1 "/usr/include/stdc-predef.h" 1 3 4  
 # 1 "<command-line>" 2  
 # 1 "foo.cpp"  
 # 1 "/usr/include/c++/4.8/iostream" 1 3  
 # 36 "/usr/include/c++/4.8/iostream" 3  
 ......  
 namespace std __attribute__ ((__visibility__ ("default")))  
 {  
  template<class _CharT>  
   struct char_traits;  
  template<typename _CharT, typename _Traits = char_traits<_CharT>,  
       typename _Alloc = allocator<_CharT> >  
   class basic_string;  
  template<> struct char_traits<char>;  
  typedef basic_string<char> string;  
  template<> struct char_traits<wchar_t>;  
  typedef basic_string<wchar_t> wstring;  
 # 86 "/usr/include/c++/4.8/bits/stringfwd.h" 3  
 }  
 ......  

No comments:

Post a Comment