What Is a Regex? A Beginner's Guide to Regular Expressions
If you have ever used βFind & Replaceβ in a text editor and wondered how to match any email address β not just one specific address β you were wishing for regular expressions.
A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. Regex is built into virtually every programming language, text editor, and command-line tool on the planet. Learning even the basics unlocks a superpower: the ability to find, validate, and transform text with surgical precision.
What Can You Do With Regex?
Regular expressions are used everywhere:
- Validation β Check that a form field contains a valid email, phone number, or ZIP code.
- Search β Find all occurrences of a pattern in a log file or source code.
- Extraction β Pull specific data (dates, URLs, prices) out of unstructured text.
- Substitution β Replace all instances of one pattern with another in a single operation.
- Parsing β Break a structured string (like a CSV row or HTTP header) into its component parts.
Core Regex Syntax
Regex patterns look cryptic at first, but they follow a small set of rules. Here are the building blocks:
Literal Characters
The simplest pattern is a literal match. The regex cat matches any string containing the sequence βcatβ β in βconcatenateβ, βcategoryβ, or βcatβ.
The Dot .
A dot matches any single character except a newline. The pattern c.t matches βcatβ, βcutβ, βc3tβ, or βc@tβ.
Character Classes [...]
Square brackets define a set of allowed characters. [aeiou] matches any single vowel. [0-9] matches any digit. You can negate a class with a caret: [^0-9] matches anything that is not a digit.
Shorthand Classes
These are abbreviations for common character classes:
\dβ any digit (equivalent to[0-9])\wβ any word character: letters, digits, and underscore ([a-zA-Z0-9_])\sβ any whitespace (space, tab, newline)\D,\W,\Sβ the uppercase versions match the opposite
Quantifiers
Quantifiers control how many times the preceding element must appear:
*β zero or more times+β one or more times?β zero or one time (makes the element optional){3}β exactly 3 times{2,5}β between 2 and 5 times
So \d{4} matches exactly four digits β useful for matching a year.
Anchors
Anchors match a position rather than a character:
^β start of the string (or start of a line in multiline mode)$β end of the string (or end of a line)
The pattern ^\d{5}$ matches a string that contains only five digits β a US ZIP code format.
Groups and Alternation
Parentheses (...) create a capturing group, which lets you extract or reference part of a match. The pipe | means βorβ: (cat|dog) matches either βcatβ or βdogβ.
A Practical Example
Say you want to validate a simple email address. Here is a basic regex pattern:
^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$
Breaking it down:
^β must start at the beginning of the string[\w.+-]+β one or more word characters, dots, plus signs, or hyphens (the local part before@)@β a literal@[\w-]+β one or more word characters or hyphens (the domain name)\.β a literal dot (escaped because.alone means βany characterβ)[a-zA-Z]{2,}β two or more letters (the TLD:.com,.org,.io, etc.)$β must end at the end of the string
This is a simplified pattern β real-world email validation is far more complex β but it captures the logic well.
Flags (Modifiers)
Most regex engines support flags that change matching behaviour:
iβ case-insensitive:catmatches βCatβ, βCATβ, βcAtβgβ global: find all matches, not just the first onemβ multiline:^and$match the start/end of each line, not just the whole string
In JavaScript, for example: /pattern/gi uses both the global and case-insensitive flags.
Greedy vs. Lazy Matching
By default, quantifiers are greedy β they match as much text as possible. Given the string <b>bold</b> and <i>italic</i>, the pattern <.+> (greedy) matches the entire string from the first < to the last >.
Adding a ? makes the quantifier lazy (match as little as possible): <.+?> matches <b> alone, then </b> separately β which is usually what you want when parsing HTML-like structures.
Frequently Asked Questions
Is regex the same in every language?
The core syntax is very consistent, but there are dialect differences. Most modern languages (JavaScript, Python, Java, PHP) use PCRE-compatible regex. Some older tools like grep default to POSIX regex, which lacks some features like \d. When in doubt, test your pattern in the specific language you are using.
Are regex patterns case-sensitive by default?
Yes. Cat does not match βcatβ unless you use the i flag.
Can regex parse HTML or JSON?
Technically yes for simple cases, but it is a well-known trap. HTML and JSON are nested structures that regex cannot reliably handle in general. Use a proper parser for those formats; use regex for flat, line-oriented, or known-structure text.
Test Your Patterns Online
The fastest way to learn regex is to experiment. Our Regex Tester lets you write a pattern, paste in test text, and instantly see every match highlighted β with support for flags (g, i, m) and capture group display. No install, no sign-up, nothing leaves your browser.
The more patterns you write and test, the faster the syntax becomes second nature. Start with something simple β matching a date like \d{4}-\d{2}-\d{2} β and build from there.