“Or” in regular expressions `||`
When attempting to build a logical “or” operation using regular expressions, we have a few approaches to follow. Fortunately the grouping and alternation facilities provided by the regex engine are very capable, but when all else fails we can just perform a second match using a separate regular expression – supported by the tool or native language of your choice.
This functionality is simple enough, however, that it is not usually necessary to use higher programming features to achieve logical “or.”
Achieving logical “or” with Grouping and Alternation
Grouping and alternation are core features of every modern regular expression library. You can provide as many terms as desired, as long as they are separated with the pipe character: |
. This character separates terms contained within each (...)
group. Take the following example, for instance:
^I like (dogs|penguins), but not (lions|tigers).$ |
This expression will match any of the following strings:
I like dogs, but not lions. I like dogs, but not tigers. I like penguins, but not lions. I like penguins, but not tigers. |
However, there is an unintended side-effect of our grouping and alternation, as written above. This pattern will match any combination of the terms we’ve supplied, as expected, but it will also store those matches into match groups for later inspection.
If you don’t want your grouping and alternation to interfere with other numbered groups in your expression, each “or” group must be prefixed with ?:
– like so:
^I like (?:dogs|penguins), but not (?:lions|tigers).$ |
\\w+
:
^I like (?=dogs|penguins)\\w+, but not (?!dogs|penguins)\\w+$ |
Achieving logical “or” with your language
Logical “or” is more difficult to achieve using tools external to the regular expression engine itself, but it is achievable by combining results of multiple regular expressions using the native “or” logical feature of your programming language of choice. This can sometimes be simpler and easier for others to maintain in the future, with only minimal performance impacts over moderate data-sets. In Java, for example:import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; import org.junit.Test; public class RegexTest { @Test public void testRegex() { assertTrue(stringMatches("I like dogs, but not lions.")); assertTrue(stringMatches("I like penguins, but not lions.")); assertFalse(stringMatches("I like lions, but not penguins.")); } private boolean stringMatches(String string) { return (string.matches("like dogs") || string.matches("like penguins")) && (string.matches("not lions") || string.matches("not tigers")); } }
The discussion regarding disabling grouping expressions so as to not interfer with other match groups, is not clear. The reason is that you have not provided a context in which other groups are being captured or how they are referenced. The beginning of this discussion exists in you part1 discussion, so at the very least, you need provide a reference link directly to this discussion of groups. Second, that discussion really needs to be expanded to include disabling the capture of subgroups. Also note that grouping behavior is not a "side-effect," is it defined behavior, and is very useful as you point out in your main discussion.
You are correct! I will clarify 🙂 Thanks, Dad.
Hi, thanks for the tutorial, but your stringMatches method does not have the right context as teh testAssert. Your stringMatches method has "penguin….., etc.", but your testAsserts pass "Start…."
Wow, strange. I don’t know how that got lost 🙂 Will fix asap.
Updated.
This and the other Regex post was really helpful.
I was able to get a regex that worked for my needs but I’m wondering if there is a more concise way approach?
Here’s my regex:
‘^(?!(?|foo|bar)$)(?!.*_)(.*)’;
Basically I want to match any string that is NOT EQUAL TO "foo" nor "bar" nor that has an underscore. So "fool" and "baroom" should match as well as "bebar" but not "bar_oom" nor "be_bar" nor even "hello_goodbye." Said another way, only words without an underscore and that are not "foo" or "bar."
So is there a more straightforward regex I could use? Negative matching for the entire words "foo" and "bar" were what made this difficult.
Thanks in advance for your time.
This might be close to what you want:
Click to see live example in visual regex tester…