Working with Regular Expression on Linux
Regular expressions (regex or regexp) are powerful patterns used for matching and manipulating text strings. In the context of the shell, regular expressions are often used with commands like grep
, sed
, awk
, and others for text processing and pattern matching. Here’s a detailed explanation of regular expressions in the shell:
Basics of Regular Expressions:
1. Literal Characters:
- Ordinary characters (e.g., letters, digits) match themselves.
- Example: The regex
abc
matches the string “abc” exactly.
2. Metacharacters:
- Special characters with a reserved meaning. Some common metacharacters include
.
(dot),*
(asterisk),+
(plus),?
(question mark),|
(pipe),()
(parentheses),[]
(square brackets),{}
(curly braces), and\
(backslash).
Character Classes:
1. Dot (.
):
- Matches any single character except a newline.
grep "a.b" filename
- Matches “axb“, “aab“, “a@b“, etc.
2. Character Sets ([]
):
- Matches any one of the characters inside the brackets.
grep "[aeiou]" filename
- Matches any line containing a vowel.
3. Negation (^
inside []
):
- Matches any character NOT listed.
grep "[^0-9]" filename
- Matches any line that does not contain a digit.
Quantifiers:
1. Asterisk (*
):
- Matches zero or more occurrences of the preceding character or group.
grep "a*b" filename
- Matches “b“, “ab“, “aab“, “aaab“, etc.
2. Plus (+
):
- Matches one or more occurrences of the preceding character or group.
grep "a+b" filename
- Matches “ab“, “aab“, “aaab“, etc., but not “b“.
3. Question Mark (?
):
- Matches zero or one occurrence of the preceding character or group.
grep "ab?c" filename
- Matches “abc” and “ac“.
4. Braces ({}
):
- Specifies a specific number of occurrences.
grep "a{2}" filename
- Matches “aa” but not “a“.
Anchors:
1. Caret (^
):
- Anchors the pattern to the beginning of the line.
grep "^start" filename
- Matches lines that start with “start”.
2. Dollar ($
):
- Anchors the pattern to the end of the line.
grep "end$" filename
- Matches lines that end with “end“.
Escape Character (\
):
1. Backslash (\
):
- Escapes a metacharacter, treating it as a literal character.
grep "a\.b" filename
- Matches “a.b“.
Grouping (()
):
1. Parentheses (()
) for Grouping:
- Groups characters together to apply a quantifier to the entire group.
grep "\(abc\)\{2\}" filename
- Matches “abcabc“.
Examples:
1. Matching IP Addresses:
grep "\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}" filename
- Matches IPv4 addresses.
2. Extracting Email Addresses:
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" filename
- Matches email addresses.
3. Matching Numbers in a Range:
grep "[1-9][0-9]\{0,2\}" filename
- Matches numbers from 1 to 999.
Using Regular Expressions in Commands:
1. grep
Command:
grep "pattern" filename
2. sed
Command:
sed 's/pattern/replacement/' filename
3. awk
Command:
awk '/pattern/ {print $0}' filename
Regular expressions are a fundamental tool for text processing in the shell. While the basics covered here are common across many tools, there are some variations and extensions depending on the specific command or programming language being used. Practice and experimentation will help you become more comfortable and proficient with regular expressions.