BASH: Single character/range pattern matching

When searching for files and only a single character or range is different, use the [square bracket] to locate the target.

Sample list of different filenames

$ ls -l file*
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file1
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file10
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file2
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file3
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file4
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file5
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file6
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file7
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file8
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file9
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:56 fileA
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:55 file_for_james
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:56 fileb
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:55 files_for_ken

Single character substitution

With single character substitutions, the shell will attempt to locate files with each character in the brackets. Ex: file[abc] will locate filea, fileb, and filec.

Only fileA matched the given pattern

$ ls -l file[aAB]
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:56 fileA

A lot of different possibilities were provided but only 2 existed

$ ls -l file[aABbc]
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:56 fileA
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:56 fileb

Range Pattern Matching

In range pattern matching, it can instead search a given range of characters instead. Ex: file[a-c] will locate filea, fileb, and filec. This is the same as file[abc] but the hyphen gives it a range which the shell will automatically expand for you. Range can also do numbers like  file[1-3] .

Search for a range of numbers

$ ls -l file[1-5]*
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file1
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file10
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file2
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file3
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file4
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file5

Search for 2 number ranges

Note: Both do the same thing

$ ls -l file[1-5,7-9]*
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file1
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file10
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file2
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file3
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file4
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file5
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file7
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file8
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file9

$ ls -l file[1-57-9]*
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file1
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file10
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file2
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file3
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file4
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file5
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file7
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file8
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:53 file9

Why did it also find  file10 ? It was looking for anything matching file1*  and file10 fits the bill. It doesn’t consider anything after the first range substitution character.

Summary

These are great tricks to narrow down searches, locate/use similarly named files, make queries more concise, and just look like a pro in shell expansion. Also check out my article about substituting whole words.

BASH: Curly Brace Wizardry (Multiple Word Matching)

Single character/range pattern matching is great but we can do the same with strings. This way, there’s no need to repeat the entire length of the filename if a section has the same name. Check out the examples below using the {curly brace}.

Create files prefixed with names

$ touch {james,ken,ryu,vega,bison}_files.txt; ls -l
total 0
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:42 bison_files.txt
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:42 james_files.txt
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:42 ken_files.txt
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:42 ryu_files.txt
-rw-r--r-- 1 jvalero wheel 0 Jun  5 06:42 vega_files.txt

 Grep through files

$ echo bad > bison_files.txt ; echo good > ken_files.txt ; echo good > ryu_files.txt

$ grep good {bison,ken,ryu}_files.txt
ken_files.txt:good
ryu_files.txt:good

 

Grep: Find keyword with duplicate tags

Situation

Need to extract key tags within a file.

Issue

These key tags are not predictable where it appears in a large file as there are nested within another parent tag. To make the problem even worse, it doesn’t always appear on the same line after the parent tag.

Solution

grep --include=*2015-10-06* --include=*20151005* -A15 "record\ product-id" * | grep -B5 value\>Out | grep record\ product-id* | awk -F\" '{print $2}' | sort -u

Short Version

Close in areas you know and effectively use -A (lines to capture after keyword) and -B (lines to capture before keyword) to inch yourself closer to what you need. Consider unique patterns that relate to what you want to keep.

Long Version

  1. -A15 = Get 15 lines after the parent1 record (parent1 = record product-id).
    1. Based on existing patterns, what we want is always somewhere after 15 lines
  2. -B5 = Within each parent snippet, search for the parent2 record (parent2=value>Out) and checks returns up to 5 lines before
    1. The duplicated tag is typically found somewhere between lines 1-10 after parent1 so we get 5 lines going up instead from parent2
  3. Finally, the desired tag should be somewhere in this hodgepodge so grep for it and keep only the tag’s value (key tag = record product-id)
    1. The list was sorted and removed duplicates