Patterns are used throughout TF, including triggers, hooks, /purge, /list, /limit, /recall, and expressions. There are four styles of pattern matching available:
The pattern is compared directly to the string. There are no special characters. Case is significant.
The string must contain the pattern. There are no special characters. Case is significant.
Globbing is the default matching style, and was the only style available before version 3.2. It is similar to filename expansion ("globbing") used by many shells (but unlike shells, tf uses glob only for comparison, not expansion).
There are several special sequences that can be used in tf globbing:
Patterns containing "{...}" can easily be meaningless. A valid {...} pattern must: (a) contain no spaces, (b) follow a wildcard, space, or beginning of string, (c) be followed by a wildcard, space, or end of string.
The pattern "{}" will match the empty string.
Examples:
"d?g
" matches "dog", "dig" and "dug" but not "dg" or "drug".
"d*g
" matches "dg", "dog", "drug", "debug", "dead slug", etc.
"{d*g}
" matches "dg", "dog", "drug", "debug", but not "dead slug".
"M[rs].
" matches "Mr." and "Ms."
"M[a-z]
" matches "Ma", "Mb", "Mc", etc.
"[^a-z]
" matches any character that is not in the English alphabet.
"{storm|chup*}*
" matches "chupchup fehs" and "Storm jiggles".
"{storm|chup*}*
" does NOT match "stormette jiggles".
TF implements regular expressions with the package PCRE 2.08, Copyright (c) 1997-1999 University of Cambridge. The PCRE regexp syntax is documented on its own page under the topic "pcre".
The syntax and semantics of these regular expressions is nearly identical to those in perl 5, and is roughly a superset of those used in versions of tf prior to 5.0. There is one incompatability with old tf regexps: the "{" character is now special, and must be written "\{" to match a literal "{". To help with the transition to the new syntax, you will be warned if you use a regexp containing "{", unless you turn off the warn_curly_re variable.
If all letters in a regexp are lower case, the regexp will default to using caseless matching. If a regexp contains any upper case letters, it will default to case-sensitive matching. Of course, you can explicitly specify caseless matching by including "(?i)" at the beginning of the regexp, or case-sensitive by including "(?-i)".
Regexps will honor the locale that was set when the regexp was defined. Locale affects caseless matching, and determines whether characters are letters, digits, or whatever. So, for example, while the regexp "[A-Za-z]" will match only English letters, "[^\W\d_]" will match any letter defined by the locale.
After a regexp match, %Pn substitutions can be used to get the value of the string that matched various parts of the regexp. See %Pn.
For those of you who care about code details: TF compiles PCRE regexps with the PCRE_DOLLAR_ENDONLY and PCRE_DOTALL options.
See also: regmatch(), substitution.
regexp equivalent glob ------ ----------------- "part of line" "*part of line*" "^entire line$" "entire line" "\bword\b" "*{word}*" "^(you|hawkeye) " "{you|hawkeye} *" "foo.*bar" "*foo*bar*" "f(oo|00)d" "*{*food*|*f00d*}*" "line\d" "*line[0-9]*" "^[^ ]+ whispers," "{*} whispers,*" "foo(xy)?bar" "*{*foobar*|*fooxybar*}*" "zoo+m" none "foo ?bar" none "(foo bar|frodo)" none
.*
" or "^.*
" at the beginning of a
regexp. It is very inefficient, and not needed. Use
%PL instead if you need to
retrieve the substring to the left of the match.