patterns

Patterns are used throughout TF, including triggers, hooks, /purge, /list, /limit, /recall, and expressions. There are four styles of pattern matching available:

simple
target string and pattern string must be identical
glob
similar to shell filename patterns
regexp
perl-compatible regular expressions
substr
target string must contain pattern string
The style used by a particular command is determined either by the use of the -m option or the setting of the global variable %{matching}.

Simple matching ("simple")

The pattern is compared directly to the string. There are no special characters. Case is significant.

Substring matching ("substr")

The string must contain the pattern. There are no special characters. Case is significant.

Globbing ("glob")

Globbing is the default matching style, and was the only style available before version 3.2. It is similar to filename expansion ("globbing") used by many shells (but unlike shells, tf uses glob only for comparison, not expansion).

There are several special sequences that can be used in tf globbing:

Examples:
"d?g" matches "dog", "dig" and "dug" but not "dg" or "drug".
"d*g" matches "dg", "dog", "drug", "debug", "dead slug", etc.
"{d*g}" matches "dg", "dog", "drug", "debug", but not "dead slug".
"M[rs]." matches "Mr." and "Ms."
"M[a-z]" matches "Ma", "Mb", "Mc", etc.
"[^a-z]" matches any character that is not in the English alphabet.
"{storm|chup*}*" matches "chupchup fehs" and "Storm jiggles".
"{storm|chup*}*" does NOT match "stormette jiggles".

Regular expressions ("regexp")

TF implements regular expressions with the package PCRE 2.08, Copyright (c) 1997-1999 University of Cambridge. The PCRE regexp syntax is documented on its own page under the topic "pcre".

The syntax and semantics of these regular expressions is nearly identical to those in perl 5, and is roughly a superset of those used in versions of tf prior to 5.0. There is one incompatability with old tf regexps: the "{" character is now special, and must be written "\{" to match a literal "{". To help with the transition to the new syntax, you will be warned if you use a regexp containing "{", unless you turn off the warn_curly_re variable.

If all letters in a regexp are lower case, the regexp will default to using caseless matching. If a regexp contains any upper case letters, it will default to case-sensitive matching. Of course, you can explicitly specify caseless matching by including "(?i)" at the beginning of the regexp, or case-sensitive by including "(?-i)".

Regexps will honor the locale that was set when the regexp was defined. Locale affects caseless matching, and determines whether characters are letters, digits, or whatever. So, for example, while the regexp "[A-Za-z]" will match only English letters, "[^\W\d_]" will match any letter defined by the locale.

After a regexp match, %Pn substitutions can be used to get the value of the string that matched various parts of the regexp. See %Pn.

For those of you who care about code details: TF compiles PCRE regexps with the PCRE_DOLLAR_ENDONLY and PCRE_DOTALL options.

See also: regmatch(), substitution.

Comparison of glob and regexps.

In a glob, '*' and '?' by themselves match text. In a regexp, '*' and '?' are only meaningful in combination with the pattern they follow. Regexps are not "anchored"; that is, the match may occur anywhere in the string, unless you explicitly use '^' and/or '$' to anchor it. Globs are anchored, and must match the entire string.
    regexp		equivalent glob
    ------		-----------------
    "part of line"	"*part of line*"
    "^entire line$"	"entire line"
    "\bword\b"		"*{word}*"
    "^(you|hawkeye) "	"{you|hawkeye} *"
    "foo.*bar"		"*foo*bar*"
    "f(oo|00)d"		"*{*food*|*f00d*}*"
    "line\d"		"*line[0-9]*"
    "^[^ ]+ whispers,"	"{*} whispers,*"
    "foo(xy)?bar"	"*{*foobar*|*fooxybar*}*"
    "zoo+m"		none
    "foo ?bar"		none
    "(foo bar|frodo)"	none

Notes.


Back to index
Back to tf home page
Copyright © 1995, 1996, 1997, 1998, 1999, 2002, 2003, 2004, 2005, 2006-2007 Ken Keys