grep

--global/generalized regular expression printer/parser
--pattern matching utility (same as in ed, sed, awk, emacs, C, Perl, JavaScript, Java)
--searches file for lines containing string pattern (i.e. limited or full regular expression) and outputs each line that matches. Processes one line at a time.
--text manipulation is a Unix forte
--useful for database-like queries but not numeric comparisons, eg. field3 < 800, nor ANDing, nor fields

grep [options] pattern file(s)

$ ps ax | grep billybob    # only lines of ps that contain billybob
billybob    Tue  12:21   tty03     sh

Ex file: f1

hello, this is file 'stuff'
it has only two lines

$ grep th f1
hello, this is file 'stuff'
$ grep aardvark f1
$ 
#no lines match, thus no output

if the pattern has any spaces or punctuation quote it:

$ grep 'is is' f1    # Use quotes to escape shell parsing/interpretation as two args is and is
hello, this is file 'stuff'            # (2nd is would be filename)

-n option print line number. useful if big file:

$ grep -n has f1
2: it has only two lines

-v option inverts sense, i.e. lines that don't match:

$ grep -v has f1
hello, this is file 'stuff'
$ grep -v 'a' rime.txt  #lines that don't contain an a
$ grep -v ' ' rime.txt  #lines that don't contain a space

-o option show only the match
--color option colorized match

search in multiple files: i.e. find files that contain certain information, strings, patterns:

$ grep billybob *                # all lines in all files in current directory that contain billybob
...
$ grep 'rand()'  *.cpp            #quote shell's ()
main.cpp:      r1 = rand() % N;
main.cpp:   if (rand() % 2)
simul.cpp:       alpha[i] = rand() % limit;

$ grep wills /etc/* 2>/dev/null       #toss error messages

-l option list filename only:

$ grep -l 'rand()' *.cpp
main.cpp
simul.cpp

-i option ignore case:

$ grep -i oak  myfile                     # all 8 strings
$ grep -i mariner rime.txt

-c gives count of matched lines (or pipe into wc -l):

$ grep -ci mariner rime.txt

Patterns: grep has its own set of metacharacters (operators). Regular expressions: a language for describing the patterns of strings. Different and more powerful than shell's filename generation patterns.

. any single char (ASCII char):
Example pattern: sp.t matches spat, spot, spIt, spxt, sp3t, sp@t, sp t, sp.t but not sp\nt Newline never matched (newline not part of the line grep searches in). grep works line by line (by default)

$ grep 'h.s' f1
hello, this is file 'stuff'
it has only two lines

\ escape grep metacharacter: (grep's \, not the shell's)

$ grep '\.' f1   # quote to escape shell, \ to escape grep's "." grep's arg is \.
$                             # no lines in f1 that have "."
$ grep \. f1     # arg to grep is .  Actual match is first (ie. leftmost) on line.
hello, this is file 'stuff'
it has only two lines

[] char class, like shell, any single char enclosed:

$ grep 'h[ieo]s' f1           # his, hes, hos  (no others)
hello, this is file 'stuff'

$ grep '[oO][aA][kK]'  myfile          #all 8 strings

Range of chars:

$ grep 'f[a-z]' f1         # fa, fb, fc,...fz
hello, this is file 'stuff'

$ grep '[0-9]' rime.txt     #any digits?

.[a-z]      any char followed by a lowercase letter

Reverse set or range: ^ as first character in []
[^aeiou]       any char except lowercase vowel
[^a-zA-Z]      non-letter
$ grep 'h[^ei]' f1
it has only two lines

[A-Z][^A-Z]    an uppercase letter followed by any char except an uppercase letter,
                     e.g. D5, Dc, R , M$      26 * 229 such 2 char combos
[0-9][0-9][0-9][0-9][0-9]            # "zip code"

Anchors: ^ beginning of line, $ end of line

^The    --match any line starting with The
done$   --match any line ending with done
\.$     --match any line ending with period
.$      --match any char at end of line
^[^a-zA-Z]  --line not starting with letter

$ grep '^i' f1      # any line starting with i
it has only two lines

Ex. list directories:

$ ls -l | grep '^d'                      # lines starting with d

Ex. list executable files:

$ ls -l | grep '^-..x'           
# starting with -, followed by any 2 chars, followed by x

^[0-9][0-9][0-9][0-9][0-9]$        # lines consisting of "zip code" (and nothing else)

Closure: * match zero or more occurences of the preceding pattern. Multiplier.

ho*t        #ht, hot, hoot, hooot,...
[0-9][0-9]* #one or more digits
.*          #zero or more chars, i.e. anything/everything up to newline.   grep '.*'  ==  cat
[a-zA-Z_][a-zA-Z0-9_]*      #C/Java identifier
<.*>      #HTML tag
-i '^[a-z][a-z]*$'  lines consisting of letters only

N.B. Newline is never matched.
Leftmost maximal match. Maximal matching i.e. longest possible match, "greedy match".

.*v              --match everything up to last v.  
                cf. sed editing:   hellov hiv there
[a-zA-Z]*        --match any alphabetic string, including null string
[a-zA-Z][a-zA-Z]*    --match any nonempty alphabetic string

^$      --match line with zero chars, i.e. newline only.  Empty line.
^ *$    --match empty line and line with spaces only
^   *$    --match empty line and line with tabs only
^[  ]*$     --match empty line and line with spaces and tabs only. bash: ^V to escape Tab
^[  ][  ]*$ --lines with blanks and tabs only, but not empty lines

$ grep 'a.*e.*i.*o.*u'  /usr/share/dict/words        
# all 5 vowels in order, ex. sacrilegious

# one of each vowel, in order, ex. facetious
$ grep '^[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*$'  /usr/dict/words

# all 5 vowels, ex. unidirectional. a pipeline of greps:  AND
$ grep a /usr/share/dict/words | grep e | grep i | grep o | grep u

$ grep '................' /usr/share/dict/words      
#lines longer than 15 chars.  (wc -L will tell length of longest)
$ grep '^.$'   #lines of one character

# words of 6 chars or more in letter order, ex. almost
$ grep '^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$'  /usr/dict/words |  grep '......'

pattern\{3,5\}      #match 3 to 5 of pattern
[0-9]\{5\}          #match 5 digits, e.g. zip code
$ grep '[a-zA-Z]\{10,\}' rime.txt       #lines with words of 10 or more characters

$ grep -c '^[   ]*$' myfile             # number of empty and blank lines
20
Ex. All lines with word cat:
$ grep cat myfile                   # but gives cat, cattle, scatter,...
$ grep ' cat ' myfile               # but misses cat., cat!, cat?, cat,, (cat,...
$ grep '[ ({]cat[)}.,?!]' myfile    # but is inclusive?
$ grep '[^A-Za-z]cat[^A-Za-z]' myfile    # any punctuation ok [but
misses beginning and end of line]

-w matches exact words
\<word\>

Ex. users without password, i.e. 2nd field of passwd empty
# start of line, any number of non-colon chars, followed by 2 colons:
$ grep '^[^:]*::' /etc/passwd
$ grep -v '^[^:]*:x:' /etc/passwd  #passwords of x

fgrep: no patterns, but many searches in parallel (OR). Can be done in grep with -F option.
f=fast. Different algorithm than grep.

$ fgrep 'garp
> jones
> billybob' /etc/passwd                                    
# all lines with garp or jones or billybob or combo

$ fgrep -f names_file /etc/passwd                     
# names_file has the words to search for

egrep: all that grep has plus full/extended regular expressions.
| Or (alternation):

$ egrep 'garp|jones|billybob' /etc/passwd
# any line with garp or jones or billybob

Jack|Jill Jones                            
# Jack or Jill Jones, not Jack Jones or Jill Jones

() for grouping:

(Jack|Jill) Jones                        
 # matches Jack Jones or Jill Jones
compan(y|ies)

+ one or more occurences of preceding pattern

[0-9]+      #one or more digits
^[  ]+$ #lines with blanks and/or tabs but not empty lines

? zero or one occurence of preceding pattern

ho*t                    # ht, hot, hoot, hooot,...
(ho)*t                  # t, hot, hohot, hohohot,...
ho+t                    # hot, hoot, hooot,...
ho?t                    # ht, hot
80[234]?86           #Intel processors

Practice:
integers: optional + or -, without leading zeros
reals:
reals with optional exponential part
# palindromes, ex. civic
# words with only one vowel, ex. dystrophy
# words with 4 of same vowel, ex. voodoo