Working with Regular Expression on Linux

Regular expressions (regex or regexp) are powerful patterns used for matching and manipulating text strings. In the context of the shell, regular expressions are often used with commands like grep, sed, awk, and others for text processing and pattern matching. Here’s a detailed explanation of regular expressions in the shell:

Basics of Regular Expressions:

1. Literal Characters:

  • Ordinary characters (e.g., letters, digits) match themselves.
  • Example: The regex abc matches the string “abc” exactly.

2. Metacharacters:

  • Special characters with a reserved meaning. Some common metacharacters include . (dot), * (asterisk), + (plus), ? (question mark), | (pipe), () (parentheses), [] (square brackets), {} (curly braces), and \ (backslash).

Character Classes:

1. Dot (.):

  • Matches any single character except a newline.
Bash
grep "a.b" filename

  • Matches “axb“, “aab“, “a@b“, etc.

2. Character Sets ([]):

  • Matches any one of the characters inside the brackets.
Bash
grep "[aeiou]" filename

  • Matches any line containing a vowel.

3. Negation (^ inside []):

  • Matches any character NOT listed.
Bash
grep "[^0-9]" filename

  • Matches any line that does not contain a digit.

Quantifiers:

1. Asterisk (*):

  • Matches zero or more occurrences of the preceding character or group.
Bash
grep "a*b" filename

  • Matches “b“, “ab“, “aab“, “aaab“, etc.

2. Plus (+):

  • Matches one or more occurrences of the preceding character or group.
Bash
grep "a+b" filename

  • Matches “ab“, “aab“, “aaab“, etc., but not “b“.

3. Question Mark (?):

  • Matches zero or one occurrence of the preceding character or group.
Bash
grep "ab?c" filename

  • Matches “abc” and “ac“.

4. Braces ({}):

  • Specifies a specific number of occurrences.
Bash
grep "a{2}" filename

  • Matches “aa” but not “a“.

Anchors:

1. Caret (^):

  • Anchors the pattern to the beginning of the line.
Bash
grep "^start" filename

  • Matches lines that start with “start”.

2. Dollar ($):

  • Anchors the pattern to the end of the line.
Bash
grep "end$" filename

  • Matches lines that end with “end“.

Escape Character (\):

1. Backslash (\):

  • Escapes a metacharacter, treating it as a literal character.
Bash
grep "a\.b" filename

  • Matches “a.b“.

Grouping (()):

1. Parentheses (()) for Grouping:

  • Groups characters together to apply a quantifier to the entire group.
Bash
grep "\(abc\)\{2\}" filename

  • Matches “abcabc“.

Examples:

1. Matching IP Addresses:

Bash
grep "\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}" filename

  • Matches IPv4 addresses.

2. Extracting Email Addresses:

Bash
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" filename

  • Matches email addresses.

3. Matching Numbers in a Range:

Bash
grep "[1-9][0-9]\{0,2\}" filename

  • Matches numbers from 1 to 999.

Using Regular Expressions in Commands:

1. grep Command:

Bash
grep "pattern" filename

2. sed Command:

Bash
sed 's/pattern/replacement/' filename

3. awk Command:

Bash
awk '/pattern/ {print $0}' filename

Regular expressions are a fundamental tool for text processing in the shell. While the basics covered here are common across many tools, there are some variations and extensions depending on the specific command or programming language being used. Practice and experimentation will help you become more comfortable and proficient with regular expressions.

Share
OpenLib .

OpenLib .

The Founder - OpenLib.io

You may also like...