The Bash reference manual, in sec. 3.2.5.2 Conditional Constructs, says
It is sometimes difficult to specify a regular expression properly without using quotes, or to keep track of the quoting used by regular expressions while paying attention to shell quoting and the shell’s quote removal.
So I started playing with it.
[[ ' 2' =~ 1 | 2 ]] && echo YES || echo NO
gives the error
bash: syntax error in conditional expression: unexpected token `|'
bash: syntax error near `|'
Also
[[ ' 2' =~ 1 \| 2 ]] && echo YES || echo NO
gives the error
bash: syntax error in conditional expression
bash: syntax error near `|'
My question is: why the above do not work, considering the following?
|
is a bash metacharacter and also a regular expression special character.
"The words between the [[ and ]] do not undergo word splitting and filename expansion. The shell performs tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal on those words (the expansions that would occur if the words were enclosed in double quotes)". Ref. Bash reference manual sec. 3.2.5.2 Conditional Constructs).
So I expect Bash consider ' 2' =~ 1 | 2
as a single word, in particular not splitting at bash metacharacter |
.
When you use ‘=~’, the string to the right of the operator is considered a POSIX extended regular expression pattern and matched accordingly. Ref. Bash reference manual sec. 3.2.5.2 Conditional Constructs).
So I expect Bash to interpret 1 | 2
as a regex.
Even if |
is parsed as a metacharacter by bash, I expect that quoting it (like in \|
) would be good as Bash would pass 1 | 2
to the regex engine.
I know how to make it work, for example
[[ ' 2' =~ '1 '|' 2' ]] && echo YES || echo NO
prints 'YES'.
' 2' =~ 1 | 2
is not a single word, it is five:' 2'
,=~
,1
,|
, and2
. In particular, since there's a space between1
and|
, the|
cannot be parsed as part of the regular expression, but must be a separate syntactic element... which is invalid syntax. When the manual says the words inside "do not undergo word splitting", that means that after a variable's value is expanded the result is not split into words on whitespace; it has nothing to do with how the words are parsed before variable expansion.