Grep: Find keyword with duplicate tags

Situation

Need to extract key tags within a file.

Issue

These key tags are not predictable where it appears in a large file as there are nested within another parent tag. To make the problem even worse, it doesn’t always appear on the same line after the parent tag.

Solution

grep --include=*2015-10-06* --include=*20151005* -A15 "record\ product-id" * | grep -B5 value\>Out | grep record\ product-id* | awk -F\" '{print $2}' | sort -u

Short Version

Close in areas you know and effectively use -A (lines to capture after keyword) and -B (lines to capture before keyword) to inch yourself closer to what you need. Consider unique patterns that relate to what you want to keep.

Long Version

  1. -A15 = Get 15 lines after the parent1 record (parent1 = record product-id).
    1. Based on existing patterns, what we want is always somewhere after 15 lines
  2. -B5 = Within each parent snippet, search for the parent2 record (parent2=value>Out) and checks returns up to 5 lines before
    1. The duplicated tag is typically found somewhere between lines 1-10 after parent1 so we get 5 lines going up instead from parent2
  3. Finally, the desired tag should be somewhere in this hodgepodge so grep for it and keep only the tag’s value (key tag = record product-id)
    1. The list was sorted and removed duplicates

Leave a Reply

Your email address will not be published. Required fields are marked *