2

The Bash reference manual, in sec. 3.2.5.2 Conditional Constructs, says

It is sometimes difficult to specify a regular expression properly without using quotes, or to keep track of the quoting used by regular expressions while paying attention to shell quoting and the shell’s quote removal.

So I started playing with it.

[[ ' 2' =~ 1 | 2 ]] && echo YES || echo NO

gives the error

bash: syntax error in conditional expression: unexpected token `|'
bash: syntax error near `|'

Also

[[ ' 2' =~ 1 \| 2 ]] && echo YES || echo NO

gives the error

bash: syntax error in conditional expression
bash: syntax error near `|'

My question is: why the above do not work, considering the following?

| is a bash metacharacter and also a regular expression special character.

"The words between the [[ and ]] do not undergo word splitting and filename expansion. The shell performs tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal on those words (the expansions that would occur if the words were enclosed in double quotes)". Ref. Bash reference manual sec. 3.2.5.2 Conditional Constructs).

So I expect Bash consider ' 2' =~ 1 | 2 as a single word, in particular not splitting at bash metacharacter |.

When you use ‘=~’, the string to the right of the operator is considered a POSIX extended regular expression pattern and matched accordingly. Ref. Bash reference manual sec. 3.2.5.2 Conditional Constructs).

So I expect Bash to interpret 1 | 2 as a regex.

Even if | is parsed as a metacharacter by bash, I expect that quoting it (like in \|) would be good as Bash would pass 1 | 2 to the regex engine.

I know how to make it work, for example

[[ ' 2' =~ '1 '|' 2' ]] && echo YES || echo NO

prints 'YES'.

5
  • 4
    ' 2' =~ 1 | 2 is not a single word, it is five: ' 2', =~, 1, |, and 2. In particular, since there's a space between 1 and |, the | cannot be parsed as part of the regular expression, but must be a separate syntactic element... which is invalid syntax. When the manual says the words inside "do not undergo word splitting", that means that after a variable's value is expanded the result is not split into words on whitespace; it has nothing to do with how the words are parsed before variable expansion. Commented Jul 4 at 16:11
  • @GordonDavisson Thanks. I consider this the real answer (even though also the answers below are useful). This lead me to read sec. "3.1.1 Shell Operation" of Bash Reference Manual. Basically the examples I gave fail at point 3 ("Parses the tokens into simple and compoud commands") and not at 4 ("Performs the various shell expansions").
    – the_eraser
    Commented Jul 5 at 10:26
  • 1
    May I ask why my question has been downvoted? My question was "why it fails", and not "how to make it work", and I think it's a legit question. I tried my best to make it clear, but if it wasn't, commenting before downvoting would have helped to clarify.
    – the_eraser
    Commented Jul 5 at 10:31
  • 1
    Question was very well framed with example code and references. It has been unfairly downvoted (my upvote was already there)
    – anubhava
    Commented Jul 5 at 10:49
  • See unix.stackexchange.com/questions/382054/…
    – Ed Morton
    Commented Jul 5 at 12:06

2 Answers 2

7

If you really want to go to escaping route then you will need to escape all the whitespaces in your regex to tell shell that it is a single word like this:

[[ ' 2' =~ 1\ |\ 2 ]] && echo YES || echo NO
YES

However cleaner approach is to store regex in a variable and then use it like this:

rx='1 | 2'
[[ ' 2' =~ $rx ]] && echo YES || echo NO
YES
1

As a variation to the answer given by @anubhava :

I find escaping less pleasant than quoting, and I would write something like

[[ ' 2' =~ "1 "|" 2" ]]

Note that you can not quote the whole expression, because everything between quotes is taken literally, not as regex. Therefore the | must be outside the quotes.

Not the answer you're looking for? Browse other questions tagged or ask your own question.