Apr 062013
 
Article MySQL

Regular expressions are a powerful tool available in many programming languages to check if a text string matches a pattern. In MySQL, regular expressions can be used as well in the WHERE clause of a SQL sentence.

This post comments on the possibilities, the syntax and the special considerations to be taken into account in using regular expression in MySQL, with practical usage examples.

REGEXP and RLIKE keywords

A text string can be compared against a regular expression by means of the REGEXP keyword. Alternatively, the RLIKE synonym can be used if desired.

For instance, the following sentence can be used to select all records in a table “users”, where the field “name” starts with a vowel:

In the regex used in the above example, the special character “^” signals the start of the string. The first character of the string can be any of the characters enclosed in square brackets.

Special characters in regular expressions

“^” y “$” – Start and end of the string

We have already seen that the special character “^” marks the start of the string. In the same way, “$” marks the end of the string. For instance, to select all names ending in a vowel in the “users” table, following sentence would be used:

“.” – Match any character

The special character “.” (dot) matches any possible single character in the string. for instance, to select all names of exactly three characters, the following sentence would be used:

In this example, the rexeg specifies that the start of the string (“^”) must be followed by a single character (“.”),  a single character (“.”),  a single character (“.”), and the end of the string (“$”).

“?”, “+” y “*” – Repetition indicators

The special character “?” in a regular expresion means that the preceding character may appear zero or one times in the string being matched.

The special character “+” means that the preceding character may appear one or more times in the string.

Finally, the character “*” means that the preceding character may appear zero, one or more times in the string.

For instance, to select all names starting with the letter “A” and ending in “o” or in “os”, the following sentence can be used:

The regex matches those values where the start of the string (“^”) is immediately followed by the letter “A” (“A”), followed by any character (“.”) zero or more times (“*”), followed by the letter “o”, optionally followed by the letter “s” (“s?”) at the end of the string (“$”).

If we want to select names where there is at least one character between the starting “A” and the ending “o(s)”, we should use instead the sentence:

{n}, {n,m} – Generic repetition indicator

The {n} construction in a regular expression means that the preceding character must appear exactly n consecutive times.

The {n,m} construction means that the preceding character must appear in the string a minimum of n consecutive times, and a maximum of m times. If m is not specified, there is no limit in the maximum number of appearances of the character.

The repetition indicators in the previous section can be expressed also in this generic notation as:

  • “a?” is equivalent to “a{0,1}”
  • “a+” is equivalent to “a{1,}”
  • “a*” is equivalent to “a{0,}”

The values n, m cannot be greater than the MySQL parameter RE_DUP_MAX (by default, 255)

“|” – Match one of several alternative character sequences

With the “|” (pipe) character, a regular expression can be written that matches if the string contains the character sequence to the left, or the character sequence to the right of the pipe. For instance, to select names that contain “John”, “Peter” o “Lewis”, the following sentence can be used:

“()” – Character sequence

A special character can be preceded by a character sequence enclosed in parentheses, to mean that the special character affects the whole sequence. For instance, we can write a regular expression that matches a string if the substring “Doe ” appears two or more consecutive times in it:

(Doe ){2,}

“[]”,”[^]” – Match/Does not match a single character in a set

A set of characters enclosed in square brackets in a regex matches if any of the characters in the set appears in the string. For instance, the expression “[arq]” matches if the string includes any of the characters “a”, “r” or “q”. (An equivalent of this example regex could have been written using the “|” special character, as: “a|r|q”)

A range of characters can also be specified using a hiphen “-“: For instance “[c-f3-7]” matches if any of the characters c, d, e, f, 3, 4, 5, 6 or 7 appear in the string.

Note: To include the “-” character as part of the set, it must be the first or last character inside the square brackets. For instance, the expression “[abc-]” matches if the string contains any of the characters “a”, “b”, “c” or “-“.

Also, to include the “]” character in the set, it must be placed immediately after the opening “[” character. For instance: “[]abc]” matches “]”, “a”, “b” or “c”.

If the character “^” appears immediately after the opening bracket, it negates the expression. In this case, the expression is matched if none of the characters in the set appear in the string. For instance, “[^a-z]” matches if there are no alphabetic characters “a”,”b”,…,”z” in the string.

 

[.character.] – Match a character by character name

The [.character.] construction may appear inside a “[]” construction, to represent a given character by its name. It is used mainly to represent control characters. For instance, to select records en the “users” table where the “name” column includes a line feed or a escape character:

The following table shows the character names that can be used in this construction:

[:character_class:]

[:character_class:] can be used inside a “[]” construction to represent any of the characters in a given class. The following table shows the classes that can be used in this expression:

[[:<:]], [[:>:]]

These markers mean the beginning and the end of a word.

For instance, to select  all names that include the word “peter”:

SELECT * from users WHERE name REGEXP ‘[[:<:]]peter[[:>:]]’;

Literals of special characters in a regex

A special character literal can be included in a regex by preceding it with two backshlashes (“\”).

For instance, to check if a string includes the sequence “1+2”: The “+” sign is a special character, and therefore we should write the regex as “1\+2”.

 

 Posted by at 12:01 pm

 Leave a Reply

(required)

(required)