patterns

Patterns are used throughout TF, including triggers, hooks, /purge, /list, /limit, /recall, and expressions. There are four styles of pattern matching available:

simple: target string and pattern string must be identical
glob: similar to shell filename patterns
regexp: perl-compatible regular expressions
substr: target string must contain pattern string

The style used by a particular command is determined either by the use of the -m option or the setting of the global variable %{matching}.

Simple matching ("simple")

The pattern is compared directly to the string. There are no special characters. Case is significant.

Substring matching ("substr")

The string must contain the pattern. There are no special characters. Case is significant.

Globbing ("glob")

Globbing is the default matching style, and was the only style available before version 3.2. It is similar to filename expansion ("globbing") used by many shells (but unlike shells, tf uses glob only for comparison, not expansion).

There are several special sequences that can be used in tf globbing:

The '*' character matches any number of characters.
The '?' character matches any one character.
Square brackets ([...]) can be used to match any one of a sequence of characters. Ranges can be specified by giving the first and last characters with a '-' between them. If '^' is the first character, the sequence will match any character NOT specified.
Curly braces ({...}) can be used to match any one of a list of words. Different words can be matched by listing each within the braces, separated by a '|' (or) character. Both ends of {...} will only match a space or end of string. Therefore "{foo}*" and "{foo}p" do not match "foop", and "*{foo}" and "p{foo}" do not match "pfoo".
Patterns containing "{...}" can easily be meaningless. A valid {...} pattern must: (a) contain no spaces, (b) follow a wildcard, space, or beginning of string, (c) be followed by a wildcard, space, or end of string.
The pattern "{}" will match the empty string.
Any other character will match itself, ignoring case. A special character can be made to match itself by preceding it with '\' to remove its special meaning.

Examples:
"d?g" matches "dog", "dig" and "dug" but not "dg" or "drug".
"d*g" matches "dg", "dog", "drug", "debug", "dead slug", etc.
"{d*g}" matches "dg", "dog", "drug", "debug", but not "dead slug".
"M[rs]." matches "Mr." and "Ms."
"M[a-z]" matches "Ma", "Mb", "Mc", etc.
"[^a-z]" matches any character that is not in the English alphabet.
"{storm|chup*}*" matches "chupchup fehs" and "Storm jiggles".
"{storm|chup*}*" does NOT match "stormette jiggles".

Regular expressions ("regexp")

The syntax and semantics of these regular expressions is nearly identical to those in perl 5, and is roughly a superset of those used in versions of tf prior to 5.0. There is one incompatability with old tf regexps: the "{" character is now special, and must be written "\{" to match a literal "{". To help with the transition to the new syntax, you will be warned if you use a regexp containing "{", unless you turn off the warn_curly_re variable.

If all letters in a regexp are lower case, the regexp will default to using caseless matching. If a regexp contains any upper case letters, it will default to case-sensitive matching. Of course, you can explicitly specify caseless matching by including "(?i)" at the beginning of the regexp, or case-sensitive by including "(?-i)".

Regexps will honor the locale that was set when the regexp was defined. Locale affects caseless matching, and determines whether characters are letters, digits, or whatever. So, for example, while the regexp "[A-Za-z]" will match only English letters, "[^\W\d_]" will match any letter defined by the locale.

After a regexp match, %Pn substitutions can be used to get the value of the string that matched various parts of the regexp. See %Pn.

For those of you who care about code details: TF compiles PCRE regexps with the PCRE_DOLLAR_ENDONLY and PCRE_DOTALL options.

Comparison of glob and regexps.

In a glob, '*' and '?' by themselves match text. In a regexp, '*' and '?' are only meaningful in combination with the pattern they follow. Regexps are not "anchored"; that is, the match may occur anywhere in the string, unless you explicitly use '^' and/or '$' to anchor it. Globs are anchored, and must match the entire string.

    regexp		equivalent glob
    ------		-----------------
    "part of line"	"*part of line*"
    "^entire line$"	"entire line"
    "\bword\b"		"*{word}*"
    "^(you|hawkeye) "	"{you|hawkeye} *"
    "foo.*bar"		"*foo*bar*"
    "f(oo|00)d"		"*{*food*|*f00d*}*"
    "line\d"		"*line[0-9]*"
    "^[^ ]+ whispers,"	"{*} whispers,*"
    "foo(xy)?bar"	"*{*foobar*|*fooxybar*}*"
    "zoo+m"		none
    "foo ?bar"		none
    "(foo bar|frodo)"	none

Notes.

For best performance, make the beginning of your patterns as specific as possible.
Do not use ".*" or "^.*" at the beginning of a regexp. It is very inefficient, and not needed. Use %PL instead if you need to retrieve the substring to the left of the match.
If a glob and regexp can do the same job, the glob is usually slightly faster. But if using a glob instead of a regexp would mean you need some extra code, then that extra code will cost much more than the regexp would have. So if only a regexp can do what you need, don't hesitate to use it.

Back to index
Back to tf home page