“Or” in regular expressions `||`

When attempting to build a logical “or” operation using regular expressions, we have a few approaches to follow. Fortunately the grouping and alternation facilities provided by the regex engine are very capable, but when all else fails we can just perform a second match using a separate regular expression – supported by the tool or native language of your choice.

This functionality is simple enough, however, that it is not usually necessary to use higher programming features to achieve logical “or.”

This tutorial is part of the course: “[[Regular Expressions]].”

Achieving logical “or” with Grouping and Alternation

Grouping and alternation are core features of every modern regular expression library. You can provide as many terms as desired, as long as they are separated with the pipe character: |. This character separates terms contained within each (...) group. Take the following example, for instance:

^I like (dogs|penguins), but not (lions|tigers).$

This expression will match any of the following strings:

I like dogs, but not lions.
I like dogs, but not tigers.
I like penguins, but not lions.
I like penguins, but not tigers.

However, there is an unintended side-effect of our grouping and alternation, as written above. This pattern will match any combination of the terms we’ve supplied, as expected, but it will also store those matches into match groups for later inspection.

If you don’t want your grouping and alternation to interfere with other numbered groups in your expression, each “or” group must be prefixed with ?: – like so:

^I like (?:dogs|penguins), but not (?:lions|tigers).$
Tip, you can also combine alternation with look-ahead and look-behind statements, notice that we are using both positive and negative look-aheads in order to restrict the value of what is matched by the word character class \\w+:
^I like (?=dogs|penguins)\\w+, but not (?!dogs|penguins)\\w+$

Achieving logical “or” with your language

Logical “or” is more difficult to achieve using tools external to the regular expression engine itself, but it is achievable by combining results of multiple regular expressions using the native “or” logical feature of your programming language of choice. This can sometimes be simpler and easier for others to maintain in the future, with only minimal performance impacts over moderate data-sets. In Java, for example:
Using Java conditionals to test string matches
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import org.junit.Test;

public class RegexTest
{
   @Test
   public void testRegex()
   {
      assertTrue(stringMatches("I like dogs, but not lions."));
      assertTrue(stringMatches("I like penguins, but not lions."));
      assertFalse(stringMatches("I like lions, but not penguins."));
   }

   private boolean stringMatches(String string)
   {
      return (string.matches("like dogs") || string.matches("like penguins"))
          && (string.matches("not lions") || string.matches("not tigers"));
   }

}

Read more about regular expressions.

7 Comments

  1. Lincoln A Baxter (Lincoln's Dad) says:

    The discussion regarding disabling grouping expressions so as to not interfer with other match groups, is not clear. The reason is that you have not provided a context in which other groups are being captured or how they are referenced. The beginning of this discussion exists in you part1 discussion, so at the very least, you need provide a reference link directly to this discussion of groups. Second, that discussion really needs to be expanded to include disabling the capture of subgroups. Also note that grouping behavior is not a "side-effect," is it defined behavior, and is very useful as you point out in your main discussion.

    1. You are correct! I will clarify 🙂 Thanks, Dad.

  2. Gary says:

    Hi, thanks for the tutorial, but your stringMatches method does not have the right context as teh testAssert. Your stringMatches method has "penguin….., etc.", but your testAsserts pass "Start…."

    1. Wow, strange. I don’t know how that got lost 🙂 Will fix asap.

  3. This and the other Regex post was really helpful.

    I was able to get a regex that worked for my needs but I’m wondering if there is a more concise way approach?

    Here’s my regex:

    ‘^(?!(?|foo|bar)$)(?!.*_)(.*)’;

    Basically I want to match any string that is NOT EQUAL TO "foo" nor "bar" nor that has an underscore. So "fool" and "baroom" should match as well as "bebar" but not "bar_oom" nor "be_bar" nor even "hello_goodbye." Said another way, only words without an underscore and that are not "foo" or "bar."

    So is there a more straightforward regex I could use? Negative matching for the entire words "foo" and "bar" were what made this difficult.

    Thanks in advance for your time.

Leave a Comment




Please note: In order to submit code or special characters, wrap it in

[code lang="xml"][/code]
(for your language) - or your tags will be eaten.

Please note: Comment moderation is enabled and may delay your comment from appearing. There is no need to resubmit your comment.