tango.text.Regex

Members

Aliases

Regex
alias Regex = RegExpT!(char)
Undocumented in source.

Classes

RegExpException
class RegExpException
Undocumented in source.
RegExpT
class RegExpT(char_t)

Regular expression compiler and interpreter.

UtfException
class UtfException
Undocumented in source.

Functions

_bug
void _bug(CharRange!(dchar) a)
Undocumented in source. Be warned that the author may not have intended to support it.
decode
dchar decode(const(char)[] s, size_t idx)
Undocumented in source. Be warned that the author may not have intended to support it.
decode
dchar decode(const(wchar)[] s, size_t idx)
Undocumented in source.
decode
dchar decode(const(dchar)[] s, size_t idx)
Undocumented in source.
encode
void encode(const(char)[] s, dchar c)
Undocumented in source.
encode
void encode(const(wchar)[] s, dchar c)
Undocumented in source.
encode
void encode(const(dchar)[] s, dchar c)
Undocumented in source.
isValidDchar
bool isValidDchar(dchar c)
Undocumented in source. Be warned that the author may not have intended to support it.
main
void main()

"*" quantifier bug. In "(☃+)" pattern test ((r.match(0) == "☃☃☃☃☃")&&(r.match(1) == "☃☃☃☃☃")) has passed.

quickSort
void quickSort(T[] a)
Undocumented in source. Be warned that the author may not have intended to support it.
quickSort
void quickSort(T[] a, size_t l, size_t r)
Undocumented in source. Be warned that the author may not have intended to support it.

Structs

CharClass
struct CharClass(char_t)
Undocumented in source.
CharRange
struct CharRange(char_t)
Undocumented in source.

Meta

License

BSD style: $(LICENSE)

Version

Initial release: Jan 2008

Authors

Jascha Wetzel

This is a regular expression compiler and interpreter based on the Tagged NFA/DFA method.

The Regex class is not thread safe

See <a href="http://en.wikipedia.org/wiki/Regular_expression">Wikpedia's article on regular expressions</a> for details on regular expressions in general.

The used method implies, that the expressions are <i>regular</i>, in the way language theory defines it, as opposed to what &quot;regular expression&quot; means in most implementations (e.g. PCRE or those from the standard libraries of Perl, Java or Python). The advantage of this method is it's performance, it's disadvantage is the inability to realize some features that Perl-like regular expressions have (e.g. back-references). See <a href="http://swtch.com/~rsc/regexp/regexp1.html">&quot;Regular Expression Matching Can Be Simple And Fast&quot;</a> for details on the differences.

The time for matching a regular expression against an input string of length N is in O(M*N), where M depends on the number of matching brackets and the complexity of the expression. That is, M is constant wrt. the input and therefore matching is a linear-time process.

The syntax of a regular expressions is as follows. <i>X</i> and <i>Y</i> stand for an arbitrary regular expression.

<table border=1 cellspacing=0 cellpadding=5> <caption>Operators</caption>

X|Yalternation, i.e. X or Y(X)matching brackets - creates a sub-match(?X)non-matching brackets - only groups X, no sub-match is createdZcharacter class specification, Z is a string of characters or character ranges, e.g. [a-zA-Z0-9_.\-][^Z]negated character class specification&lt;Xlookbehind, X may be a single character or a character class&gt;Xlookahead, X may be a single character or a character class^start of input or start of line$end of input or end of line\bstart or end of word, equals (?&lt;\s&gt;\S|&lt;\S&gt;\s)\Bopposite of \b, equals (?&lt;\S&gt;\S|&lt;\s&gt;\s)

</table>

<table border=1 cellspacing=0 cellpadding=5> <caption>Quantifiers</caption>

X?zero or oneX*zero or moreX+one or moreX{n,m}at least n, at most m instances of X.<br>If n is missing, it's set to 0.<br>If m is missing, it is set to infinity.X??non-greedy version of the above operatorsX*?see aboveX+?see aboveX{n,m}?see above

</table>

<table border=1 cellspacing=0 cellpadding=5> <caption>Pre-defined character classes</caption>

.any printable character\swhitespace\Snon-whitespace\walpha-numeric characters or underscore\Wopposite of \w\ddigits\Dnon-digit

</table>

Note that "alphanumeric" only applies to Latin-1.