Rabi Siddique
495 words
2 minutes
Regular Expressions
2024-06-24

Regular expressions, commonly known as regex, are sequences of characters that define search patterns. They are a powerful tool that is used for string matching and manipulation. Regex is essential for tasks like data validation, text parsing, string substitution, and analyzing logs. It allows developers to handle complex text processing with concise and flexible patterns.

Matching if a String Starts with Certain Characters#

To check if a string starts with specific characters, use the caret symbol ^.

^abc

The ^ at the start asserts the position at the beginning of the string. This pattern matches any string that starts with abc.

echo "abcdef" | grep -E "^abc"  # This will match
echo "xyzabc" | grep -E "^abc"  # This will not match

Matching if a String Ends with Certain Characters#

To check if a string ends with specific characters, use the dollar sign $.

abc$

The $ at the end asserts the position at the end of the string. This pattern matches any string that ends with abc.

echo "xyzabc" | grep -E "abc$"  # This will match
echo "abcdef" | grep -E "abc$"  # This will not match

agoric124u437uxnvtx4lyk4rxqz85mq3x0x0djjk6ngs

Validating if it’s the Correct Wallet Address#

Consider wallet addresses that look like this: agoric124u437uxnvtx4lyk4rxqz85mq3x0x0djjk6ngs. If you look closely, the wallet address starts with agoric1 and is followed by 38 characters. We can validate this address by using the following regex:

^agoric1[a-z0-9]{38}$

^ asserts the position at the start of the string, agoric1 matches the literal string agoric1, [a-z0-9]{38} matches exactly 38 characters that can be lowercase letters “(a-z)or digits(0-9), and$` asserts the position at the end of the string.

echo agoric124u437uxnvtx4lyk4rxqz85mq3x0x0djjk6ngs | grep -E "^agoric1[a-z0-9]{38}$"

If we need to handle uppercase letters as well, we can modify the pattern to include both lowercase and uppercase letters:

echo agoric124u437uxnvtx4lyk4rxqz85mq3x0x0djjk6ngs | grep -E "^agoric1[a-zA-Z0-9]{38}$"

If we are not concerned with the specific characters (whether they are letters or digits), we can simplify our regex to match any character:

echo agoric124u437uxnvtx4lyk4rxqz85mq3x0x0djjk6ngs | grep -E "^agoric1.{38}$"

Email verifying Regex#

To verify Gmail addresses using regex, you can use the following regular expression. This regex ensures that the email address follows the standard Gmail format:

^[a-zA-Z0-9._%+-]+@gmail\.com$

Breakdown of this regex pattern:

  • ^ asserts the position at the start of the string.
  • [a-zA-Z0-9._%+-]+ matches one or more characters that can be lowercase or uppercase letters `(a-zA-Z), digits (0-9), dots (.), underscores (_), percentage signs (%), plus signs (+), or hyphens (-).
  • @ matches the literal ”@” character.
  • gmail matches the literal “gmail”.
  • \. matches the literal dot character.
  • com matches the literal “com”.
  • $ asserts the position at the end of the string.

Finding the Number of Functions in Code#

The regex pattern function( ?\w* ?)\( is used to match function declarations in code. Let’s break it down:

function: This part of the pattern matches the literal string “function”.

( ?\w\* ?): This part is a capturing group, meaning it will capture whatever it matches for later reference. Within this group:

  • ?: This matches zero or one space character.
  • \w\*: This matches zero or more word characters (letters, digits, and underscores).
  • ?: This matches zero or one space character.
  • \(: This matches the literal opening parenthesis (.
Regular Expressions
https://rabisiddique.com/posts/regex/
Author
Rabi Siddique
Published at
2024-06-24