We at Codesmith cultivate technologists who are at the intersection of society and tech, able to meet this moment and thrive in an ever-changing world. Ready to become a modern software engineer?
As a software developer, you’ve probably encountered regular expressions several times and were confused when seeing this daunting set of characters grouped together like this:
And you may have wondered what this gibberish means…Regular expressions (Regex or Regexp) are extremely useful in stepping up your algorithm game and will make you a better problem solver. The structure of regular expressions can be intimidating at first, but it is so rewarding once you grasp the patterns and implement them in your work properly.
A Regular Expression, commonly referred to as Regex, is a powerful tool used for searching, validating, and manipulating text strings. It is essentially a sequence of characters that defines a search pattern. Regex is supported in numerous programming languages, including scripting languages like Perl, Python, PHP, and JavaScript, as well as general-purpose languages such as Java. Even word processors, such as Microsoft Word, support regex for advanced text searching. The true strength of regex lies in its ability to perform complex pattern matching and text manipulation tasks with concise expressions, often replacing dozens of lines of traditional programming code.
Regex syntax consists of a sequence of characters, metacharacters, and quantifiers.
Metacharacters are special characters that define specific operations or behaviors in regex. Some commonly used metacharacters include:
A character class is a set of characters that can be matched by a regex pattern. Character classes are defined using square brackets [] and can contain a list of characters, a range of characters, or a combination of both.
For example:
Character classes can also be negated by using the caret symbol ^ at the beginning of the class. For instance:
This flexibility allows you to create specific and targeted search patterns to suit various use cases.
A word character in regex includes letters, digits, and underscores (_). These characters are matched using the shorthand \w. For example, \w+ matches one or more word characters, which can be helpful for finding words, variable names, or identifiers in text.
The opposite of a word character is a non-word character, which can be matched using the shorthand \W. For example, \W matches any character that is not a letter, digit, or underscore, such as punctuation or spaces.
This distinction between word and non-word characters is crucial for creating accurate and efficient regex patterns.
Quantifiers are used to specify how many times a pattern should appear within the text being matched. The most common quantifiers are:
These quantifiers allow you to control the repetition of patterns in your regex, which is essential when you need to match repeating sequences or optional elements.
Groups are used to capture parts of a match for reuse or extraction. Groups are defined by enclosing a pattern in parentheses (). For example, (abc) captures the string 'abc', which can then be referenced in the same regex using \1(this refers to the first captured group).
There are two types of groups:
This capability to group and reference parts of a match adds a powerful layer of flexibility to your regex patterns, especially in complex matching scenarios.
Flags modify the behavior of a regular expression and can be added after the closing slash or as the second parameter in the RegExp constructor. Here are the most commonly used flags:
Example of using flags:
const regex = /pattern/gim;
In this example, the regex will:
There are two types of regular expressions you can create:
To create a regular expression literal, you start and end with forward slashes ( /) to enclose the Regex pattern. Syntax:
/regex pattern/flags
For a RegExp constructor, this method builds the expression for you. Syntax:
new RegExp(regex pattern[, flags])
If your regular expression is constant and does not change its value, you should use the regex literal for better performance. In cases where it is dynamic and not a literal string (i.e., an expression), it is best to use the regex constructor (see above example).
There are three common Regex methods that you should be familiar with: test, match, and replace.
Let's look at an example of the test method.
In the example above, the .test method returns a boolean - checking if the string contains a regex match or no match in the search pattern.
Now instead of using RegExp.test(String) which just returns a boolean if the pattern is matched, you can use the .match method to match strings. This method returns an array with the whole matched string. Though it’s great to have the .test method check whether a Regular expression pattern is true or not, there will be times where we want to be in control of actually doing the match. That’s where the match method comes in handy! It returns an array of the match which can be helpful information depending on your use case.
Here is a very basic example below. Later on, you will see how Regex match can be a powerful tool when combining the Regex with flags.
The .replace method searches for a string for a specified value (or regular expression) and returns a new string where the specified value is replaced.
NOTE:
You CANNOT replace multiple instances using a regular value, but you CAN do this with Regex. The example below is using a regular value.
Inside bracket expressions, you can place any special character you want to use to specify the character sets.
For example, const regex = /[A-Z]/. Notice that A-Z is inside the square brackets. This will search for all uppercase letters in the alphabet. Here are some similar search patterns:
*Inside a character set, the ^ character means all the characters that are NOT in the a-z or A-Z.
After we end with a slash character, we can either choose one specific flag or combine them. Regex uses flags to be more specific on how to properly find and match the defined custom characters.
Before we go into the specific flags, you should keep in mind that flags are optional like the example below
Without flags, Regex will find the first character that returns true in an array within the slashes. So in this case, our code will return: [‘T’] because it found the first uppercase letter in the sentence.
The g in g flag stands for "global" which means it will return what is true within the entire regular expression. In other words, it will not only return after the first match, but ALL the occurrences that matched.
If we added the g flag at the end of our slash, it would return all the characters from the regular expression that is upper case.
Let’s say we changed const to be const regex = /[a-z]/m. The m flag will be checking to see the first instance of a lowercase letter from a-z so it will return [‘h’].
As an additional side note, there are three other character classes that can help when using multiple character sets for pattern matching.
The negations of \d, \w, and \s will be \D, \W, and \S. It will find the following:
Quantifiers are basic symbols in regular expressions that have a special meaning.
Let’s go through this example to demonstrate our understanding of quantifiers.
You can see that the regular expression is checking all the lowercase letters from a-z and using the + symbol to match up all the previous items. So when you console log found, it will return [ ‘for’, ‘if’, ‘rof’, ‘fi’ ].
Let’s say that + symbol was not there and the Regex was only:
Then it will return [ ‘f’, ‘o’, ‘r’, ‘i’, ‘f’, ‘r’, ‘o’, ‘f’, ‘f’, ‘i’ ].
Remember this long string of characters we saw at the beginning of this article?
Now that we have learned the basic methods and terminologies used in Regex, let’s break down this once daunting but now understandable string of characters one step at a time.
First, let’s take a look at this Regex piece by piece. So from the beginning of the string, we have < strong>^\w+< /strong>. We can see that ^ character is simply starting off the regular expression and then checking for an alphanumeric & underscore character using the w flag. The + quantifier is there to match up the previous items. From our example, this first piece is checking the ‘student’ characters from the email: student-id@alumni.school.edu
Next, we got our second piece of the Regex broken up as < strong>([.-]?\w)+< /strong>. The opening/closing parenthesis is used as the first capturing group where inside we have a character set which will search for either a “.” character or “-” character in our email. The ? is a quantifier that matches between 0 and 1 of the preceding characters so it checks to make sure that there is only one “-” or “.” followed by the w flag. There cannot be more than one of those characters consecutively in a valid email. So this second piece represents the ‘-id’ characters from the email example. If it was ‘student–id@alumni.school.edu’ with two hyphens, this would come out to be an invalid email.
The third piece is < strong>@\w+< /strong> and this will be checking for the @ character in the given email followed by the w flag to check for any alphanumeric character. This covers for the ‘@alumni’ piece of the email. The + quantifier continues to match up the previous sections of the email address.
The following piece of < strong>([.]?\w)+< /strong> is the same search pattern as our second piece except it’s only checking for the “.” character and alphanumeric character, excluding our “-” symbol. This represents “.school” in the email.
The next chunk < strong>(.[a-zA-Z]{2,3})+< /strong> is a crucial piece in checking an email format. This piece is for the top-level domain (TLD) of an email address. It’s the part of a domain that comes after the dot, for example - com, org, or net. This Regex will match a “.” character and another character set that will check for any lowercase and uppercase letters. The {2, 3} will be matching between 2 and 3 of the previous matches where 2 indicates the min number of matches and 3 stands for the max number of matches. So the letters can only be up to 2-3 characters. In this case, it is ‘.edu’.
Finally, we have the < strong>$< /strong> character to end our Regex string.
And that’s it! Now we know how to use Regex for a basic email validation. Additionally, you can implement brackets, flags, and/or quantifiers in your Regex to accommodate for other edge cases not considered in our Regex string.
*$ means zero or more of the preceding character at the end of a string.
Regex is a tool used to search, match, and manipulate text patterns in a string, like finding specific words or validating inputs.
Start by learning basic patterns (like \d for digits, \w for words) and practice common use cases, such as searching, matching, and replacing text using regex tools or code.
Break it down into parts: start by identifying literals, metacharacters, quantifiers, and groups. Learn what each piece matches to understand the overall pattern.
.+) matches one or more of any character, with a closing parenthesis often indicating a group.
$1 and $2 are back references that refer to the first and second captured groups in a regex match.
The tilde ~ isn’t a special character in regex, but in some languages (like Perl), it's used to delimit regex patterns.
% is not special in regex unless used in specific languages. It generally matches itself.
\+ matches the literal + character since + is normally a quantifier for "one or more".
The underscore _ is treated as a literal character and matches itself.
\s matches any whitespace character (spaces, tabs, line breaks).
Regular expressions are an essential tool for developers, offering a powerful way to search, validate, and manipulate text efficiently. Whether you're performing input validation, searching for patterns in logs, or parsing complex data formats like dates and URLs, mastering regex will greatly enhance your problem-solving skills. As you continue to explore its capabilities, you'll find regex invaluable for automating repetitive tasks, filtering large datasets, and handling diverse text-processing challenges across various programming languages and environments. Regex isn't just about matching text—it's about streamlining tasks and writing cleaner, more efficient code.
Explore CS Prep further in our beginner-friendly program.
Get more free resources and access to coding events every 2 weeks.
Connect with one of our graduates/recruiters.
Our graduates/recruiters work at:
Connect with one of our recruiters to learn about their journeys.
Our graduates/recruiters work at: