“And” in regular expressions `&&`

When attempting to build a logical “and” operation using regular expressions, we have a few approaches to follow. The first approach may seem obvious, but if you think about it regular expressions are logical “and” by default. Every sequential character in a regular expression is “and’ed” together. If you can express your statements in order, then the work has already been done for you.

However, since you’ve searched this far, we can assume that you’re looking for something a little more advanced. To facilitate this, we have two options: We can use “lookaheads”, or we can just perform a second match using a separate regular expression if supported by whatever tool or language you are using.

This tutorial is part of the course: “[[Regular Expressions]].”

Achieving logical “and” with Look-aheads

Look-ahead and look-behind operations are essentially extra contraints that you can place on a regular expression. You can specify additional patterns which need to be satisfied in order for a successful match. The following are examples of look-ahead expressions.

(?=.*word1)(?=.*word2)(?=.*word3)

Notice that each expression contains .* – this is because look-aheads are place-sensitive, and begin matching from where they appear within the pattern; so if we had, for instance, a pattern like the one below:

^Start (?=.*kind)(?=.*good).* deed$

This pattern will match "Start with a good word and end with a kind deed" and "Start with a kind word and end with a good deed".

In summary: Once the first look-ahead begins to process, the match position in the expression is saved; the .* in the first look-ahead matches as many characters it needs to before it gets to “kind”; the match position is reset, and the next look-ahead searches forward for “good”; and last but not least, our final look-ahead searches for “word”, then pattern matching resumes as usual. Matching continues with the expression’s basic .*, and continues to match through “deed” to the end of the string.

The reason this works is because, as mentioned, match-position is reset after each look-around is evaluated. This means that the order of our adjacent “and” look-ahead expressions is not important; however, if we moved one of our conditions just past .*, we would see different results:

^Start (?=.*kind).*(?=.*good) deed.$

Now that (?=.*good) comes after our catch-all, neither of our previous strings will match, because it is not possible for “good” to exist after .* has been evaluated.

Tip, if you want to match whole words and not partial strings of longer words, you need to add word boundaries to your statements:
^Start (?=.*\bkind\b)(?=.*\bgood\b)(?=.*\bword\b).* deed$

Achieving logical “and” with your language

If all else fails, you should always feel comfortable simply to perform another match, combining the results using the native “and” logical feature of your programming language of choice. This can often be simpler and easier for others to maintain in the future, with only minimal performance impacts over moderate data-sets. In Java, for example:
Using Java conditionals to test string matches
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import org.junit.Test;

public class RegexTest
{
   @Test
   public void testRegex()
   {
      assertTrue(stringMatches("Start with a good word and end with a kind deed."));
      assertTrue(stringMatches("Start with a kind word and end with a good deed."));
      assertFalse(stringMatches("Start with a deed."));
   }

   private boolean stringMatches(String string)
   {
      return string.matches("^Start .* deed.$") && string.matches(".*good.*") && string.matches(".*kind.*");
   }

}

Read more about regular expressions.

Leave a Comment




Please note: In order to submit code or special characters, wrap it in

[code lang="xml"][/code]
(for your language) - or your tags will be eaten.

Please note: Comment moderation is enabled and may delay your comment from appearing. There is no need to resubmit your comment.