summaryrefslogtreecommitdiffstats
path: root/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook
diff options
context:
space:
mode:
authorTimothy Pearson <kb9vqf@pearsoncomputing.net>2011-12-03 11:05:10 -0600
committerTimothy Pearson <kb9vqf@pearsoncomputing.net>2011-12-03 11:05:10 -0600
commitf7e7a923aca8be643f9ae6f7252f9fb27b3d2c3b (patch)
tree1f78ef53b206c6b4e4efc88c4849aa9f686a094d /tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook
parent85ca18776aa487b06b9d5ab7459b8f837ba637f3 (diff)
downloadtde-i18n-f7e7a923aca8be643f9ae6f7252f9fb27b3d2c3b.tar.gz
tde-i18n-f7e7a923aca8be643f9ae6f7252f9fb27b3d2c3b.zip
Second part of prior commit
Diffstat (limited to 'tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook')
-rw-r--r--tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook1219
1 files changed, 1219 insertions, 0 deletions
diff --git a/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook b/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook
new file mode 100644
index 00000000000..c692da92cd5
--- /dev/null
+++ b/tde-i18n-en_GB/docs/tdebase/kate/regular-expressions.docbook
@@ -0,0 +1,1219 @@
+<appendix id="regular-expressions">
+<appendixinfo>
+<authorgroup>
+<author
+>&Anders.Lund; &Anders.Lund.mail;</author>
+<othercredit role="translator"
+><firstname
+>Malcolm</firstname
+><surname
+>Hunter</surname
+><affiliation
+><address
+><email
+>malcolm.hunter@gmx.co.uk</email
+></address
+></affiliation
+><contrib
+>Conversion to British English</contrib
+></othercredit
+>
+</authorgroup>
+</appendixinfo>
+
+<title
+>Regular Expressions</title>
+
+<synopsis
+>This Appendix contains a brief but hopefully sufficient and
+covering introduction to the world of <emphasis
+>regular
+expressions</emphasis
+>. It documents regular expressions in the form
+available within &kate;, which is not compatible with the regular
+expressions of perl, nor with those of for example
+<command
+>grep</command
+>.</synopsis>
+
+<sect1>
+
+<title
+>Introduction</title>
+
+<para
+><emphasis
+>Regular Expressions</emphasis
+> provides us with a way to describe some possible contents of a text string in a way understood by a small piece of software, so that it can investigate if a text matches, and also in the case of advanced applications with the means of saving pieces or the matching text.</para>
+
+<para
+>An example: Say you want to search a text for paragraphs that starts with either of the names <quote
+>Henrik</quote
+> or <quote
+>Pernille</quote
+> followed by some form of the verb <quote
+>say</quote
+>.</para>
+
+<para
+>With a normal search, you would start out searching for the first name, <quote
+>Henrik</quote
+> maybe followed by <quote
+>sa</quote
+> like this: <userinput
+>Henrik sa</userinput
+>, and while looking for matches, you would have to discard those not being the beginning of a paragraph, as well as those in which the word starting with the letters <quote
+>sa</quote
+> was not either <quote
+>says</quote
+>, <quote
+>said</quote
+> or so. And then of cause repeat all of that with the next name...</para>
+
+<para
+>With Regular Expressions, that task could be accomplished with a single search, and with a larger degree of preciseness.</para>
+
+<para
+>To achieve this, Regular Expressions defines rules for expressing in details a generalisation of a string to match. Our example, which we might literally express like this: <quote
+>A line starting with either <quote
+>Henrik</quote
+> or <quote
+>Pernille</quote
+> (possibly following up to 4 blanks or tab characters) followed by a whitespace followed by <quote
+>sa</quote
+> and then either <quote
+>ys</quote
+> or <quote
+>id</quote
+></quote
+> could be expressed with the following regular expression:</para
+> <para
+><userinput
+>^[ \t]{0,4}(Henrik|Pernille) sa(ys|id)</userinput
+></para>
+
+<para
+>The above example demonstrates all four major concepts of modern Regular Expressions, namely:</para>
+
+<itemizedlist>
+<listitem
+><para
+>Patterns</para
+></listitem>
+<listitem
+><para
+>Assertions</para
+></listitem>
+<listitem
+><para
+>Quantifiers</para
+></listitem>
+<listitem
+><para
+>Back references</para
+></listitem>
+</itemizedlist>
+
+<para
+>The caret (<literal
+>^</literal
+>) starting the expression is an assertion, being true only if the following matching string is at the start of a line.</para>
+
+<para
+>The stings <literal
+>[ \t]</literal
+> and <literal
+>(Henrik|Pernille) sa(ys|id)</literal
+> are patterns. The first one is a <emphasis
+>character class</emphasis
+> that matches either a blank or a (horizontal) tab character; the other pattern contains first a subpattern matching either <literal
+>Henrik</literal
+> <emphasis
+>or</emphasis
+> <literal
+>Pernille</literal
+>, then a piece matching the exact string <literal
+> sa</literal
+> and finally a subpattern matching either <literal
+>ys</literal
+> <emphasis
+>or</emphasis
+> <literal
+>id</literal
+></para>
+
+<para
+>The string <literal
+>{0,4}</literal
+> is a quantifier saying <quote
+>anywhere from 0 up to 4 of the previous</quote
+>.</para>
+
+<para
+>Because regular expression software supporting the concept of <emphasis
+>back references</emphasis
+> saves the entire matching part of the string as well as sub-patterns enclosed in parentheses, given some means of access to those references, we could get our hands on either the whole match (when searching a text document in an editor with a regular expression, that is often marked as selected) or either the name found, or the last part of the verb.</para>
+
+<para
+>All together, the expression will match where we wanted it to, and only there.</para>
+
+<para
+>The following sections will describe in details how to construct and use patterns, character classes, assertions, quantifiers and back references, and the final section will give a few useful examples.</para>
+
+</sect1>
+
+<sect1 id="regex-patterns">
+
+<title
+>Patterns</title>
+
+<para
+>Patterns consists of literal strings and character classes. Patterns may contain sub-patterns, which are patterns enclosed in parentheses.</para>
+
+<sect2>
+<title
+>Escaping characters</title>
+
+<para
+>In patterns as well as in character classes, some characters have a special meaning. To literally match any of those characters, they must be marked or <emphasis
+>escaped</emphasis
+> to let the regular expression software know that it should interpret such characters in their literal meaning.</para>
+
+<para
+>This is done by prepending the character with a backslash (<literal
+>\</literal
+>).</para>
+
+
+<para
+>The regular expression software will silently ignore escaping a character that does not have any special meaning in the context, so escaping for example a <quote
+>j</quote
+> (<userinput
+>\j</userinput
+>) is safe. If you are in doubt whether a character could have a special meaning, you can therefore escape it safely.</para>
+
+<para
+>Escaping of cause includes the backslash character it self, to literally match a such, you would write <userinput
+>\\</userinput
+>.</para>
+
+</sect2>
+
+<sect2>
+<title
+>Character Classes and abbreviations</title>
+
+<para
+>A <emphasis
+>character class</emphasis
+> is an expression that matches one of a defined set of characters. In Regular Expressions, character classes are defined by putting the legal characters for the class in square brackets, <literal
+>[]</literal
+>, or by using one of the abbreviated classes described below.</para>
+
+<para
+>Simple character classes just contains one or more literal characters, for example <userinput
+>[abc]</userinput
+> (matching either of the letters <quote
+>a</quote
+>, <quote
+>b</quote
+> or <quote
+>c</quote
+>) or <userinput
+>[0123456789]</userinput
+> (matching any digit).</para>
+
+<para
+>Because letters and digits have a logical order, you can abbreviate those by specifying ranges of them: <userinput
+>[a-c]</userinput
+> is equal to <userinput
+>[abc]</userinput
+> and <userinput
+>[0-9]</userinput
+> is equal to <userinput
+>[0123456789]</userinput
+>. Combining these constructs, for example <userinput
+>[a-fynot1-38]</userinput
+> is completely legal (the last one would match, of cause, either of <quote
+>a</quote
+>,<quote
+>b</quote
+>,<quote
+>c</quote
+>,<quote
+>d</quote
+>, <quote
+>e</quote
+>,<quote
+>f</quote
+>,<quote
+>y</quote
+>,<quote
+>n</quote
+>,<quote
+>o</quote
+>,<quote
+>t</quote
+>, <quote
+>1</quote
+>,<quote
+>2</quote
+>,<quote
+>3</quote
+> or <quote
+>8</quote
+>).</para>
+
+<para
+>As capital letters are different characters from their non-capital equivalents, to create a caseless character class matching <quote
+>a</quote
+> or <quote
+>b</quote
+>, in any case, you need to write it <userinput
+>[aAbB]</userinput
+>.</para>
+
+<para
+>It is of cause possible to create a <quote
+>negative</quote
+> class matching as <quote
+>anything but</quote
+> To do so put a caret (<literal
+>^</literal
+>) at the beginning of the class: </para>
+
+<para
+><userinput
+>[^abc]</userinput
+> will match any character <emphasis
+>but</emphasis
+> <quote
+>a</quote
+>, <quote
+>b</quote
+> or <quote
+>c</quote
+>.</para>
+
+<para
+>In addition to literal characters, some abbreviations are defined, making life still a bit easier: <variablelist>
+
+<varlistentry>
+<term
+><userinput
+>\a</userinput
+></term>
+<listitem
+><para
+>This matches the <acronym
+>ASCII</acronym
+> bell character (BEL, 0x07).</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\f</userinput
+></term>
+<listitem
+><para
+>This matches the <acronym
+>ASCII</acronym
+> form feed character (FF, 0x0C).</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\n</userinput
+></term>
+<listitem
+><para
+>This matches the <acronym
+>ASCII</acronym
+> line feed character (LF, 0x0A, Unix newline).</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\r</userinput
+></term>
+<listitem
+><para
+>This matches the <acronym
+>ASCII</acronym
+> carriage return character (CR, 0x0D).</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\t</userinput
+></term>
+<listitem
+><para
+>This matches the <acronym
+>ASCII</acronym
+> horizontal tab character (HT, 0x09).</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\v</userinput
+></term>
+<listitem
+><para
+>This matches the <acronym
+>ASCII</acronym
+> vertical tab character (VT, 0x0B).</para
+></listitem>
+</varlistentry>
+<varlistentry>
+<term
+><userinput
+>\xhhhh</userinput
+></term>
+
+<listitem
+><para
+>This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (&ie;, \zero ooo) matches the <acronym
+>ASCII</acronym
+>/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>.</userinput
+> (dot)</term>
+<listitem
+><para
+>This matches any character (including newline).</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\d</userinput
+></term>
+<listitem
+><para
+>This matches a digit. Equal to <literal
+>[0-9]</literal
+></para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\D</userinput
+></term>
+<listitem
+><para
+>This matches a non-digit. Equal to <literal
+>[^0-9]</literal
+> or <literal
+>[^\d]</literal
+></para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\s</userinput
+></term>
+<listitem
+><para
+>This matches a whitespace character. Practically equal to <literal
+>[ \t\n\r]</literal
+></para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\S</userinput
+></term>
+<listitem
+><para
+>This matches a non-whitespace. Practically equal to <literal
+>[^ \t\r\n]</literal
+>, and equal to <literal
+>[^\s]</literal
+></para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\w</userinput
+></term>
+<listitem
+><para
+>Matches any <quote
+>word character</quote
+> - in this case any letter or digit. Note that underscore (<literal
+>_</literal
+>) is not matched, as is the case with perl regular expressions. Equal to <literal
+>[a-zA-Z0-9]</literal
+></para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\W</userinput
+></term>
+<listitem
+><para
+>Matches any non-word character - anything but letters or numbers. Equal to <literal
+>[^a-zA-Z0-9]</literal
+> or <literal
+>[^\w]</literal
+></para
+></listitem>
+</varlistentry>
+
+
+</variablelist>
+
+</para>
+
+<para
+>The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write <userinput
+>[\w \.]</userinput
+></para
+>
+
+<note
+> <para
+>The POSIX notation of classes, <userinput
+>[:&lt;class name&gt;:]</userinput
+> is currently not supported.</para
+> </note>
+
+<sect3>
+<title
+>Characters with special meanings inside character classes</title>
+
+<para
+>The following characters has a special meaning inside the <quote
+>[]</quote
+> character class construct, and must be escaped to be literally included in a class:</para>
+
+<variablelist>
+<varlistentry>
+<term
+><userinput
+>]</userinput
+></term>
+<listitem
+><para
+>Ends the character class. Must be escaped unless it is the very first character in the class (may follow an unescaped caret)</para
+></listitem>
+</varlistentry>
+<varlistentry>
+<term
+><userinput
+>^</userinput
+> (caret)</term>
+<listitem
+><para
+>Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class.</para
+></listitem
+>
+</varlistentry>
+<varlistentry>
+<term
+><userinput
+>-</userinput
+> (dash)</term>
+<listitem
+><para
+>Denotes a logical range. Must always be escaped within a character class.</para
+></listitem>
+</varlistentry>
+<varlistentry>
+<term
+><userinput
+>\</userinput
+> (backslash)</term>
+<listitem
+><para
+>The escape character. Must always be escaped.</para
+></listitem>
+</varlistentry>
+
+</variablelist>
+
+</sect3>
+
+</sect2>
+
+<sect2>
+
+<title
+>Alternatives: matching <quote
+>one of</quote
+></title>
+
+<para
+>If you want to match one of a set of alternative patterns, you can separate those with <literal
+>|</literal
+> (vertical bar character).</para>
+
+<para
+>For example to find either <quote
+>John</quote
+> or <quote
+>Harry</quote
+> you would use an expression <userinput
+>John|Harry</userinput
+>.</para>
+
+</sect2>
+
+<sect2>
+
+<title
+>Sub Patterns</title>
+
+<para
+><emphasis
+>Sub patterns</emphasis
+> are patterns enclosed in parentheses, and they have several uses in the world of regular expressions.</para>
+
+<sect3>
+
+<title
+>Specifying alternatives</title>
+
+<para
+>You may use a sub pattern to group a set of alternatives within a larger pattern. The alternatives are separated by the character <quote
+>|</quote
+> (vertical bar).</para>
+
+<para
+>For example to match either of the words <quote
+>int</quote
+>, <quote
+>float</quote
+> or <quote
+>double</quote
+>, you could use the pattern <userinput
+>int|float|double</userinput
+>. If you only want to find one if it is followed by some whitespace and then some letters, put the alternatives inside a subpattern: <userinput
+>(int|float|double)\s+\w+</userinput
+>.</para>
+
+</sect3>
+
+<sect3>
+
+<title
+>Capturing matching text (back references)</title>
+
+<para
+>If you want to use a back reference, use a sub pattern to have the desired part of the pattern remembered.</para>
+
+<para
+>For example, it you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could write <userinput
+>(\w+),\s*\1</userinput
+>. The sub pattern <literal
+>\w+</literal
+> would find a chunk of word characters, and the entire expression would match if those were followed by a comma, 0 or more whitespace and then an equal chunk of word characters. (The string <literal
+>\1</literal
+> references <emphasis
+>the first sub pattern enclosed in parentheses</emphasis
+>)</para>
+
+<!-- <para
+>See also <link linkend="backreferences"
+>Back references</link
+>.</para
+> -->
+
+</sect3>
+
+<sect3 id="lookahead-assertions">
+<title
+>Lookahead Assertions</title>
+
+<para
+>A lookahead assertion is a sub pattern, starting with either <literal
+>?=</literal
+> or <literal
+>?!</literal
+>.</para>
+
+<para
+>For example to match the literal string <quote
+>Bill</quote
+> but only if not followed by <quote
+> Gates</quote
+>, you could use this expression: <userinput
+>Bill(?! Gates)</userinput
+>. (This would find <quote
+>Bill Clinton</quote
+> as well as <quote
+>Billy the kid</quote
+>, but silently ignore the other matches.)</para>
+
+<para
+>Sub patterns used for assertions are not captured.</para>
+
+<para
+>See also <link linkend="assertions"
+>Assertions</link
+></para>
+
+</sect3>
+
+</sect2>
+
+<sect2 id="special-characters-in-patterns">
+<title
+>Characters with a special meaning inside patterns</title>
+
+<para
+>The following characters have meaning inside a pattern, and must be escaped if you want to literally match them: <variablelist>
+
+<varlistentry>
+<term
+><userinput
+>\</userinput
+> (backslash)</term>
+<listitem
+><para
+>The escape character.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>^</userinput
+> (caret)</term>
+<listitem
+><para
+>Asserts the beginning of the string.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>$</userinput
+></term>
+<listitem
+><para
+>Asserts the end of string.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>()</userinput
+> (left and right parentheses)</term>
+<listitem
+><para
+>Denotes sub patterns.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>{}</userinput
+> (left and right curly braces)</term>
+<listitem
+><para
+>Denotes numeric quantifiers.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>[]</userinput
+> (left and right square brackets)</term>
+<listitem
+><para
+>Denotes character classes.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>|</userinput
+> (vertical bar)</term>
+<listitem
+><para
+>logical OR. Separates alternatives.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>+</userinput
+> (plus sign)</term>
+<listitem
+><para
+>Quantifier, 1 or more.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>*</userinput
+> (asterisk)</term>
+<listitem
+><para
+>Quantifier, 0 or more.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>?</userinput
+> (question mark)</term>
+<listitem
+><para
+>An optional character. Can be interpreted as a quantifier, 0 or 1.</para
+></listitem>
+</varlistentry>
+
+</variablelist>
+
+</para>
+
+</sect2>
+
+</sect1>
+
+<sect1 id="quantifiers">
+<title
+>Quantifiers</title>
+
+<para
+><emphasis
+>Quantifiers</emphasis
+> allows a regular expression to match a specified number or range of numbers of either a character, character class or sub pattern.</para>
+
+<para
+>Quantifiers are enclosed in curly brackets (<literal
+>{</literal
+> and <literal
+>}</literal
+>) and have the general form <literal
+>{[minimum-occurrences][,[maximum-occurrences]]}</literal
+> </para>
+
+<para
+>The usage is best explained by example: <variablelist>
+
+<varlistentry>
+<term
+><userinput
+>{1}</userinput
+></term>
+<listitem
+><para
+>Exactly 1 occurrence</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>{0,1}</userinput
+></term>
+<listitem
+><para
+>Zero or 1 occurrences</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>{,1}</userinput
+></term>
+<listitem
+><para
+>The same, with less work;)</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>{5,10}</userinput
+></term>
+<listitem
+><para
+>At least 5 but maximum 10 occurrences.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>{5,}</userinput
+></term>
+<listitem
+><para
+>At least 5 occurrences, no maximum.</para
+></listitem>
+</varlistentry>
+
+</variablelist>
+
+</para>
+
+<para
+>Additionally, there are some abbreviations: <variablelist>
+
+<varlistentry>
+<term
+><userinput
+>*</userinput
+> (asterisk)</term>
+<listitem
+><para
+>similar to <literal
+>{0,}</literal
+>, find any number of occurrences.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>+</userinput
+> (plus sign)</term>
+<listitem
+><para
+>similar to <literal
+>{1,}</literal
+>, at least 1 occurrence.</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>?</userinput
+> (question mark)</term>
+<listitem
+><para
+>similar to <literal
+>{0,1}</literal
+>, zero or 1 occurrence.</para
+></listitem>
+</varlistentry>
+
+</variablelist>
+
+</para>
+
+<sect2>
+
+<title
+>Greed</title>
+
+<para
+>When using quantifiers with no maximum, regular expressions defaults to match as much of the searched string as possible, commonly known as <emphasis
+>greedy</emphasis
+> behaviour.</para>
+
+<para
+>Modern regular expression software provides the means of <quote
+>turning off greediness</quote
+>, though in a graphical environment it is up to the interface to provide you with access to this feature. For example a search dialogue providing a regular expression search could have a check box labelled <quote
+>Minimal matching</quote
+> as well as it ought to indicate if greediness is the default behaviour.</para>
+
+</sect2>
+
+<sect2>
+<title
+>In context examples</title>
+
+<para
+>Here are a few examples of using quantifiers</para>
+
+<variablelist>
+
+<varlistentry>
+<term
+><userinput
+>^\d{4,5}\s</userinput
+></term>
+<listitem
+><para
+>Matches the digits in <quote
+>1234 go</quote
+> and <quote
+>12345 now</quote
+>, but neither in <quote
+>567 eleven</quote
+> nor in <quote
+>223459 somewhere</quote
+></para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\s+</userinput
+></term>
+<listitem
+><para
+>Matches one or more whitespace characters</para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>(bla){1,}</userinput
+></term>
+<listitem
+><para
+>Matches all of <quote
+>blablabla</quote
+> and the <quote
+>bla</quote
+> in <quote
+>blackbird</quote
+> or <quote
+>tabla</quote
+></para
+></listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>/?&gt;</userinput
+></term>
+<listitem
+><para
+>Matches <quote
+>/&gt;</quote
+> in <quote
+>&lt;closeditem/&gt;</quote
+> as well as <quote
+>&gt;</quote
+> in <quote
+>&lt;openitem&gt;</quote
+>.</para
+></listitem>
+</varlistentry>
+
+</variablelist>
+
+</sect2>
+
+</sect1>
+
+<sect1 id="assertions">
+<title
+>Assertions</title>
+
+<para
+><emphasis
+>Assertions</emphasis
+> allows a regular expression to match only under certain controlled conditions.</para>
+
+<para
+>An assertion does not need a character to match, it rather investigates the surroundings of a possible match before acknowledging it. For example the <emphasis
+>word boundary</emphasis
+> assertion does not try to find a non word character opposite a word one at its position, instead it makes sure that there is not a word character. This means that the assertion can match where there is no character, &ie; at the ends of a searched string.</para>
+
+<para
+>Some assertions actually does have a pattern to match, but the part of the string matching that will not be a part of the result of the match of the full expression.</para>
+
+<para
+>Regular Expressions as documented here supports the following assertions: <variablelist>
+
+<varlistentry
+>
+<term
+><userinput
+>^</userinput
+> (caret: beginning of string)</term
+>
+<listitem
+><para
+>Matches the beginning of the searched string.</para
+> <para
+>The expression <userinput
+>^Peter</userinput
+> will match at <quote
+>Peter</quote
+> in the string <quote
+>Peter, hey!</quote
+> but not in <quote
+>Hey, Peter!</quote
+> </para
+> </listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>$</userinput
+> (end of string)</term>
+<listitem
+><para
+>Matches the end of the searched string.</para>
+
+<para
+>The expression <userinput
+>you\?$</userinput
+> will match at the last you in the string <quote
+>You didn't do that, did you?</quote
+> but nowhere in <quote
+>You didn't do that, right?</quote
+></para>
+
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\b</userinput
+> (word boundary)</term>
+<listitem
+><para
+>Matches if there is a word character at one side and not a word character at the other.</para>
+<para
+>This is useful to find word ends, for example both ends to find a whole word. The expression <userinput
+>\bin\b</userinput
+> will match at the separate <quote
+>in</quote
+> in the string <quote
+>He came in through the window</quote
+>, but not at the <quote
+>in</quote
+> in <quote
+>window</quote
+>.</para
+></listitem>
+
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>\B</userinput
+> (non word boundary)</term>
+<listitem
+><para
+>Matches wherever <quote
+>\b</quote
+> does not.</para>
+<para
+>That means that it will match for example within words: The expression <userinput
+>\Bin\B</userinput
+> will match at in <quote
+>window</quote
+> but not in <quote
+>integer</quote
+> or <quote
+>I'm in love</quote
+>.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>(?=PATTERN)</userinput
+> (Positive lookahead)</term>
+<listitem
+><para
+>A lookahead assertion looks at the part of the string following a possible match. The positive lookahead will prevent the string from matching if the text following the possible match does not match the <emphasis
+>PATTERN</emphasis
+> of the assertion, but the text matched by that will not be included in the result.</para>
+<para
+>The expression <userinput
+>handy(?=\w)</userinput
+> will match at <quote
+>handy</quote
+> in <quote
+>handyman</quote
+> but not in <quote
+>That came in handy!</quote
+></para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term
+><userinput
+>(?!PATTERN)</userinput
+> (Negative lookahead)</term>
+
+<listitem
+><para
+>The negative lookahead prevents a possible match to be acknowledged if the following part of the searched string does match its <emphasis
+>PATTERN</emphasis
+>.</para>
+<para
+>The expression <userinput
+>const \w+\b(?!\s*&amp;)</userinput
+> will match at <quote
+>const char</quote
+> in the string <quote
+>const char* foo</quote
+> while it can not match <quote
+>const QString</quote
+> in <quote
+>const QString&amp; bar</quote
+> because the <quote
+>&amp;</quote
+> matches the negative lookahead assertion pattern.</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+
+</para>
+
+</sect1>
+
+<!-- TODO sect1 id="backreferences">
+
+<title
+>Back References</title>
+
+<para
+></para>
+
+</sect1 -->
+
+</appendix>