Copy the KDE 3.5 branch to branches/trinity for new KDE 3.5 features.

BUG:215923 git-svn-id: svn://anonsvn.kde.org/home/kde/branches/trinity/kdebase@1054174 283d02a7-25f6-0310-bc7c-ecb5cbfe19da
author: toma <toma@283d02a7-25f6-0310-bc7c-ecb5cbfe19da> 2009-11-25 17:56:58 +0000
committer: toma <toma@283d02a7-25f6-0310-bc7c-ecb5cbfe19da> 2009-11-25 17:56:58 +0000
commit: 4aed2c8219774f5d797760606b8489a92ddc5163 (patch)
tree: 3f8c130f7d269626bf6a9447407ef6c35954426a /doc/kate/highlighting.docbook
download: tdebase-4aed2c8219774f5d797760606b8489a92ddc5163.tar.gz
tdebase-4aed2c8219774f5d797760606b8489a92ddc5163.zip
1 files changed, 931 insertions, 0 deletions
diff --git a/doc/kate/highlighting.docbook b/doc/kate/highlighting.docbook
new file mode 100644
index 000000000..76952d26a
--- /dev/null
+++ b/doc/kate/highlighting.docbook
@@ -0,0 +1,931 @@
+<appendix id="highlight">
+<appendixinfo>
+<authorgroup>
+<author><personname><firstname></firstname></personname></author>
+<!-- TRANS:ROLES_OF_TRANSLATORS -->
+</authorgroup>
+</appendixinfo>
+<title>Working with Syntax Highlighting</title>
+
+<sect1 id="highlight-overview">
+
+<title>Overview</title>
+
+<para>Syntax Highlighting is what makes the editor automatically
+display text in different styles/colors, depending on the function of
+the string in relation to the purpose of the file.  In program source
+code for example, control statements may be rendered bold, while data
+types and comments get different colors from the rest of the
+text. This greatly enhances the readability of the text, and thus
+helps the author to be more efficient and productive.</para>
+
+<mediaobject>
+<imageobject><imagedata format="PNG" fileref="highlighted.png"/></imageobject>
+<textobject><phrase>A Perl function, rendered with syntax
+highlighting.</phrase></textobject>
+<caption><para>A Perl function, rendered with syntax highlighting.</para>
+</caption>
+</mediaobject>
+
+<mediaobject>
+<imageobject><imagedata format="PNG" fileref="unhighlighted.png"/></imageobject>
+<textobject><phrase>The same Perl function, without
+highlighting.</phrase></textobject>
+<caption><para>The same Perl function, without highlighting.</para></caption>
+</mediaobject>
+
+<para>Of the two examples, which is easiest to read?</para>
+
+<para>&kate; comes with a flexible, configurable and capable system
+for doing syntax highlighting, and the standard distribution provides
+definitions for a wide range of programming, scripting and markup
+languages and other text file formats. In addition you can
+provide your own definitions in simple &XML; files.</para>
+
+<para>&kate; will automatically detect the right syntax rules when you
+open a file, based on the &MIME; Type of the file, determined by its
+extension, or, if it has none, the contents. Should you experience a
+bad choice, you can manually set the syntax to use from the
+<menuchoice><guimenu>Documents</guimenu><guisubmenu>Highlight
+Mode</guisubmenu></menuchoice> menu.</para>
+
+<para>The styles and colors used by each syntax highlight definition
+can be configured using the <link
+linkend="config-dialog-editor-appearance">Appearance</link> page of the
+<link linkend="config-dialog">Config Dialog</link>, while the &MIME; Types
+it should be used for, are handeled by the <link
+linkend="config-dialog-editor-highlighting">Highlight</link>
+page.</para>
+
+<note>
+<para>Syntax highlighting is there to enhance the readability of
+correct text, but you cannot trust it to validate your text. Marking
+text for syntax is difficult depending on the format you are using,
+and in some cases the authors of the syntax rules will be proud if 98%
+of text gets correctly rendered, though most often you need a rare
+style to see the incorrect 2%.</para>
+</note>
+
+<tip>
+<para>You can download updated or additional syntax highlight
+definitions from the &kate; website by clicking the
+<guibutton>Download</guibutton> button in the <link
+linkend="config-dialog-editor-highlighting">Highlight Page</link> of the <link
+linkend="config-dialog">Config Dialog</link>.</para>
+</tip>
+
+</sect1>
+
+<sect1 id="katehighlight-system">
+
+<title>The &kate; Syntax Highlight System</title>
+
+<para>This section will discuss the &kate; syntax highlighting
+mechanism in more detail. It is for you if you want to know about
+it, or if you want to change or create syntax definitions.</para>
+
+<sect2 id="katehighlight-howitworks">
+
+<title>How it Works</title>
+
+<para>Whenever you open a file, one of the first things the &kate;
+editor does is detect which syntax definition to use for the
+file. While reading the text of the file, and while you type away in
+it, the syntax highlighting system will analyze the text using the
+rules defined by the syntax definition and mark in it where different
+contexts and styles begin and end.</para>
+
+<para>When you type in the document, the new text is analyzed and marked on the
+fly, so that if you delete a character that is marked as the beginning or end
+of a context, the style of surrounding text changes accordingly.</para>
+
+<para>The syntax definitions used by the &kate; Syntax Highlighting System are
+&XML; files, containing
+<itemizedlist>
+<listitem><para>Rules for detecting the role of text, organized into context blocks</para></listitem>
+<listitem><para>Keyword lists</para></listitem>
+<listitem><para>Style Item definitions</para></listitem>
+</itemizedlist>
+</para>
+
+<para>When analyzing the text, the detection rules are evaluated in
+the order in which they are defined, and if the beginning of the
+current string matches a rule, the related context is used. The start
+point in the text is moved to the final point at which that rule
+matched and a new loop of the rules begins, starting in the context
+set by the matched rule.</para>
+
+</sect2>
+
+<sect2 id="highlight-system-rules">
+<title>Rules</title>
+
+<para>The detection rules are the heart of the highlighting detection
+system. A rule is a string, character or <link
+linkend="regular-expressions">regular expression</link> against which
+to match the text being analyzed. It contains information about which
+style to use for the matching part of the text. It may switch the
+working context of the system either to an explicitly mentioned
+context or to the previous context used by the text.</para>
+
+<para>Rules are organized in context groups. A context group is used
+for main text concepts within the format, for example quoted text
+strings or comment blocks in program source code. This ensures that
+the highlighting system does not need to loop through all rules when
+it is not necessary, and that some character sequences in the text can
+be treated differently depending on the current context.
+</para>
+
+<para>Contexts may be generated dynamically to allow the usage of instance
+specific data in rules.</para>
+
+</sect2>
+
+<sect2 id="highlight-context-styles-keywords">
+<title>Context Styles and Keywords</title>
+
+<para>In some programming languages, integer numbers are treated
+differently than floating point ones by the compiler (the program that
+converts the source code to a binary executable), and there may be
+characters having a special meaning within a quoted string. In such
+cases, it makes sense to render them differently from the surroundings
+so that they are easy to identify while reading the text. So even if
+they do not represent special contexts, they may be seen as such by
+the syntax highlighting system, so that they can be marked for
+different rendering.</para>
+
+<para>A syntax definition may contain as many styles as required to
+cover the concepts of the format it is used for.</para>
+
+<para>In many formats, there are lists of words that represent a
+specific concept. For example in programming languages, the control
+statements is one concept, data type names another, and built in
+functions of the language a third. The &kate; Syntax Highlighting
+System can use such lists to detect and mark words in the text to
+emphasize concepts of the text formats.</para>
+
+</sect2>
+
+<sect2 id="kate-highlight-system-default-styles">
+<title>Default Styles</title>
+
+<para>If you open a C++ source file, a &Java; source file and an
+<acronym>HTML</acronym> document in &kate;, you will see that even
+though the formats are different, and thus different words are chosen
+for special treatment, the colors used are the same. This is because
+&kate; has a predefined list of Default Styles which are employed by
+the individual syntax definitions.</para>
+
+<para>This makes it easy to recognize similar concepts in different
+text formats. For example comments are present in almost any
+programming, scripting or markup language, and when they are rendered
+using the same style in all languages, you do not have to stop and
+think to identify them within the text.</para>
+
+<tip>
+<para>All styles in a syntax definition use one of the default
+styles. A few syntax definitions use more styles that there are
+defaults, so if you use a format often, it may be worth launching the
+configuration dialog to see if some concepts are using the same
+style. For example there is only one default style for strings, but as
+the Perl programming language operates with two types of strings, you
+can enhance the highlighting by configuring those to be slightly
+different. All <link linkend="kate-highlight-default-styles">available default styles</link>
+will be explained later.</para>
+</tip>
+
+</sect2>
+
+</sect1>
+
+<sect1 id="katehighlight-xml-format">
+<title>The Highlight Definition &XML; Format</title>
+
+<sect2>
+<title>Overview</title>
+
+<para>This section is an overview of the Highlight Definition &XML;
+format. Based on a small example it will describe the main components
+and their meaning and usage. The next section will go into detail with
+the highlight detection rules.</para>
+
+<para>The formal definition, aka the <acronym>DTD</acronym> is stored
+in the file <filename>language.dtd</filename> which should be
+installed on your system in the folder
+<filename>$<envar>KDEDIR</envar>/share/apps/katepart/syntax</filename>.
+</para>
+
+<variablelist>
+<title>Main sections of &kate; Highlight Definition files</title>
+
+<varlistentry>
+<term>A highlighting file contains a header that sets the XML version and the doctype:</term>
+<listitem>
+<programlisting>
+&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
+&lt;!DOCTYPE language SYSTEM &quot;language.dtd&quot;&gt;
+</programlisting>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>The root of the definition file is the element <userinput>language</userinput>.
+Available attributes are:</term>
+
+<listitem>
+<para>Required attributes:</para>
+<para><userinput>name</userinput> sets the name of the language. It appears in the menus and dialogs afterwards.</para>
+<para><userinput>section</userinput> specifies the category.</para>
+<para><userinput>extensions</userinput> defines file extensions, like &quot;*.cpp;*.h&quot;</para>
+
+<para>Optional attributes:</para>
+<para><userinput>mimetype</userinput> associates files &MIME; Type based.</para>
+<para><userinput>version</userinput> specifies the current version of the definition file.</para>
+<para><userinput>kateversion</userinput> specifies the latest supported &kate; version.</para>
+<para><userinput>casesensitive</userinput> defines, whether the keywords are casesensitiv or not.</para>
+<para><userinput>priority</userinput> is necessary if another highlight definition file uses the same extensions. The higher priority will win.</para>
+<para><userinput>author</userinput> contains the name of the author and his email-address.</para>
+<para><userinput>license</userinput> contains the license, usually LGPL, Artistic, GPL and others.</para>
+<para><userinput>hidden</userinput> defines, whether the name should appear in &kate;'s menus.</para>
+<para>So the next line may look like this:</para>
+<programlisting>
+&lt;language name=&quot;C++&quot; version=&quot;1.00&quot; kateversion=&quot;2.4&quot; section=&quot;Sources&quot; extensions=&quot;*.cpp;*.h&quot; /&gt;
+</programlisting>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>Next comes the <userinput>highlighting</userinput> element, which
+contains the optional element <userinput>list</userinput> and the required
+elements <userinput>contexts</userinput> and <userinput>itemDatas</userinput>.</term>
+<listitem>
+<para><userinput>list</userinput> elements contain a list of keywords. In
+this case the keywords are <emphasis>class</emphasis> and <emphasis>const</emphasis>.
+You can add as many lists as you need.</para>
+<para>The <userinput>contexts</userinput> element contains all contexts.
+The first context is by default the start of the highlighting. There are
+two rules in the context <emphasis>Normal Text</emphasis>, which match
+the list of keywords with the name <emphasis>somename</emphasis> and a
+rule that detects a quote and switches the context to <emphasis>string</emphasis>.
+To learn more about rules read the next chapter.</para>
+<para>The third part is the <userinput>itemDatas</userinput> element. It
+contains all color and font styles needed by the contexts and rules.
+In this example, the <userinput>itemData</userinput> <emphasis>Normal Text</emphasis>,
+<emphasis>String</emphasis> and <emphasis>Keyword</emphasis> are used.
+</para>
+<programlisting>
+  &lt;highlighting&gt;
+    &lt;list name=&quot;somename&quot;&gt;
+      &lt;item&gt; class &lt;/item&gt;
+      &lt;item&gt; const &lt;/item&gt;
+    &lt;/list&gt;
+    &lt;contexts&gt;
+      &lt;context attribute=&quot;Normal Text&quot; lineEndContext=&quot;#pop&quot; name=&quot;Normal Text&quot; &gt;
+        &lt;keyword attribute=&quot;Keyword&quot; context=&quot;#stay&quot; String=&quot;somename&quot; /&gt;
+        &lt;DetectChar attribute=&quot;String&quot; context=&quot;string&quot; char=&quot;&amp;quot;&quot; /&gt;
+      &lt;/context&gt;
+      &lt;context attribute=&quot;String&quot; lineEndContext=&quot;#stay&quot; name=&quot;string&quot; &gt;
+        &lt;DetectChar attribute=&quot;String&quot; context=&quot;#pop&quot; char=&quot;&amp;quot;&quot; /&gt;
+      &lt;/context&gt;
+    &lt;/contexts&gt;
+    &lt;itemDatas&gt;
+      &lt;itemData name=&quot;Normal Text&quot; defStyleNum=&quot;dsNormal&quot; /&gt;
+      &lt;itemData name=&quot;Keyword&quot; defStyleNum=&quot;dsKeyword&quot; /&gt;
+      &lt;itemData name=&quot;String&quot; defStyleNum=&quot;dsString&quot; /&gt;
+    &lt;/itemDatas&gt;
+  &lt;/highlighting&gt;
+</programlisting>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>The last part of a highlight definition is the optional
+<userinput>general</userinput> section. It may contain information
+about keywords, code folding, comments and indentation.</term>
+
+<listitem>
+<para>The <userinput>comment</userinput> section defines with what
+string a single line comment is introduced. You also can define a
+multiline comments using <emphasis>multiLine</emphasis> with the
+additional attribute <emphasis>end</emphasis>. This is used if the
+user presses the corresponding shortcut for <emphasis>comment/uncomment</emphasis>.</para>
+<para>The <userinput>keywords</userinput> section defines whether
+keyword lists are casesensitive or not. Other attributes will be
+explained later.</para>
+<programlisting>
+  &lt;general&gt;
+    &lt;comments&gt;
+      &lt;comment name="singleLine" start="#"/&gt;
+    &lt;/comments&gt;
+    &lt;keywords casesensitive="1"/&gt;
+  &lt;/general&gt;
+&lt;/language&gt;
+</programlisting>
+</listitem>
+</varlistentry>
+
+</variablelist>
+
+
+</sect2>
+
+<sect2 id="kate-highlight-sections">
+<title>The Sections in Detail</title>
+<para>This part will describe all available attributes for contexts,
+itemDatas, keywords, comments, code folding and indentation.</para>
+
+<variablelist>
+<varlistentry>
+<term>The element <userinput>context</userinput> belongs into the group
+<userinput>contexts</userinput>. A context itself defines context specific
+rules like what should happen if the highlight system reaches the end of a
+line. Available attributes are:</term>
+
+
+<listitem>
+<para><userinput>name</userinput> the context name. Rules will use this name
+to specify the context to switch to if the rule matches.</para>
+<para><userinput>lineEndContext</userinput> defines the context the highlight
+system switches to if it reaches the end of a line. This may either be a name
+of another context, <userinput>#stay</userinput> to not switch the context
+(eg. do nothing) or <userinput>#pop</userinput> which will cause to leave this
+context. It is possible to use for example <userinput>#pop#pop#pop</userinput>
+to pop three times.</para>
+<para><userinput>lineBeginContext</userinput> defines the context if a begin
+of a line is encountered. Default: #stay.</para>
+<para><userinput>fallthrough</userinput> defines if the highlight system switches
+to the context specified in fallthroughContext if no rule matches.
+Default: <emphasis>false</emphasis>.</para>
+<para><userinput>fallthroughContext</userinput> specifies the next context
+if no rule matches.</para>
+<para><userinput>dynamic</userinput> if <emphasis>true</emphasis>, the context
+remembers strings/placeholders saved by dynamic rules. This is needed for HERE
+documents for example. Default: <emphasis>false</emphasis>.</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>The element <userinput>itemData</userinput> is in the group
+<userinput>itemDatas</userinput>. It defines the font style and colors.
+So it is possible to define your own styles and colors, however we
+recommend to stick to the default styles if possible so that the user
+will always see the same colors used in different languages. Though,
+sometimes there is no other way and it is necessary to change color
+and font attributes. The attributes name and defStyleNum are required,
+the other optional. Available attributes are:</term>
+
+<listitem>
+<para><userinput>name</userinput> sets the name of the itemData.
+Contexts and rules will use this name in their attribute
+<emphasis>attribute</emphasis> to reference an itemData.</para>
+<para><userinput>defStyleNum</userinput> defines which default style to use.
+Available default styles are explained in detail later.</para>
+<para><userinput>color</userinput> defines a color. Valid formats are
+'#rrggbb' or '#rgb'.</para>
+<para><userinput>selColor</userinput> defines the selection color.</para>
+<para><userinput>italic</userinput> if <emphasis>true</emphasis>, the text will be italic.</para>
+<para><userinput>bold</userinput> if <emphasis>true</emphasis>, the text will be bold.</para>
+<para><userinput>underline</userinput> if <emphasis>true</emphasis>, the text will be underlined.</para>
+<para><userinput>strikeout</userinput> if <emphasis>true</emphasis>, the text will be stroked out.</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>The element <userinput>keywords</userinput> in the group
+<userinput>general</userinput> defines keyword properties. Available attributes are:</term>
+
+<listitem>
+<para><userinput>casesensitive</userinput> may be <emphasis>true</emphasis>
+or <emphasis>false</emphasis>. If <emphasis>true</emphasis>, all keywords
+are matched casesensitive</para>
+<para><userinput>weakDeliminator</userinput> is a list of characters that
+do not act as word delimiters. For example the dot <userinput>'.'</userinput>
+is a word delimiter. Assume a keyword in a <userinput>list</userinput> contains
+a dot, it will only match if you specify the dot as a weak delimiter.</para>
+<para><userinput>additionalDeliminator</userinput> defines additional delimiters.</para>
+<para><userinput>wordWrapDeliminator</userinput> defines characters after which a
+line wrap may occur.</para>
+<para>Default delimiters and word wrap delimiters are the characters
+<userinput>.():!+,-&lt;=&gt;%&amp;*/;?[]^{|}~\</userinput>, space (<userinput>' '</userinput>)
+and tabulator (<userinput>'\t'</userinput>).</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>The element <userinput>comment</userinput> in the group
+<userinput>comments</userinput> defines comment properties which are used
+for <menuchoice><guimenu>Tools</guimenu><guimenuitem>Comment</guimenuitem></menuchoice> and
+<menuchoice><guimenu>Tools</guimenu><guimenuitem>Uncomment</guimenuitem></menuchoice>.
+Available attributes are:</term>
+
+<listitem>
+<para><userinput>name</userinput> is either <emphasis>singleLine</emphasis>
+or <emphasis>multiLine</emphasis>. If you choose <emphasis>multiLine</emphasis>
+the attributes <emphasis>end</emphasis> and <emphasis>region</emphasis> are
+required.</para>
+<para><userinput>start</userinput> defines the string used to start a comment.
+In C++ this would be &quot;/*&quot;.</para>
+<para><userinput>end</userinput> defines the string used to close a comment.
+In C++ this would be &quot;*/&quot;.</para>
+<para><userinput>region</userinput> should be the name of the the foldable
+multiline comment. Assume you have <emphasis>beginRegion="Comment"</emphasis>
+... <emphasis>endRegion="Comment"</emphasis> in your rules, you should use
+<emphasis>region="Comment"</emphasis>. This way uncomment works even if you
+do not select all the text of the multiline comment. The cursor only must be
+in the multiline comment.</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>The element <userinput>folding</userinput> in the group
+<userinput>general</userinput> defines code folding properties.
+Available attributes are:</term>
+
+<listitem>
+<para><userinput>indentationsensitive</userinput> if <emphasis>true</emphasis>, the code folding markers
+will be added indentation based, like in the scripting language Python. Usually you
+do not need to set it, as it defaults to <emphasis>false</emphasis>.</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>The element <userinput>indentation</userinput> in the group
+<userinput>general</userinput> defines which indenter will be used, however we strongly
+recommend to omit this element, as the indenter usually will be set by either defining
+a File Type or by adding a mode line to the text file. If you specify an indenter though,
+you will force a specific indentation on the user, which he might not like at all.
+Available attributes are:</term>
+
+<listitem>
+<para><userinput>mode</userinput> is the name of the indenter. Available indenters
+right now are: <emphasis>normal, cstyle, csands, xml, python</emphasis> and
+<emphasis>varindent</emphasis>.</para>
+</listitem>
+</varlistentry>
+
+
+</variablelist>
+
+
+</sect2>
+
+<sect2 id="kate-highlight-default-styles">
+<title>Available Default Styles</title>
+<para>Default Styles were <link linkend="kate-highlight-system-default-styles">already explained</link>,
+as a short summary: Default styles are predefined font and color styles.</para>
+<variablelist>
+<varlistentry>
+<term>So here only the list of available default styles:</term>
+<listitem>
+<para><userinput>dsNormal</userinput>, used for normal text.</para>
+<para><userinput>dsKeyword</userinput>, used for keywords.</para>
+<para><userinput>dsDataType</userinput>, used for data types.</para>
+<para><userinput>dsDecVal</userinput>, used for decimal values.</para>
+<para><userinput>dsBaseN</userinput>, used for values with a base other than 10.</para>
+<para><userinput>dsFloat</userinput>, used for float values.</para>
+<para><userinput>dsChar</userinput>, used for a character.</para>
+<para><userinput>dsString</userinput>, used for strings.</para>
+<para><userinput>dsComment</userinput>, used for comments.</para>
+<para><userinput>dsOthers</userinput>, used for 'other' things.</para>
+<para><userinput>dsAlert</userinput>, used for warning messages.</para>
+<para><userinput>dsFunction</userinput>, used for function calls.</para>
+<para><userinput>dsRegionMarker</userinput>, used for region markers.</para>
+<para><userinput>dsError</userinput>, used for error highlighting and wrong syntax.</para>
+</listitem>
+</varlistentry>
+</variablelist>
+
+</sect2>
+
+</sect1>
+
+<sect1 id="kate-highlight-rules-detailled">
+<title>Highlight Detection Rules</title>
+
+<para>This section describes the syntax detection rules.</para>
+
+<para>Each rule can match zero or more characters at the beginning of
+the string they are test against. If the rule matches, the matching
+characters are assigned the style or <emphasis>attribute</emphasis>
+defined by the rule, and a rule may ask that the current context is
+switched.</para>
+
+<para>A rule looks like this:</para>
+
+<programlisting>&lt;RuleName attribute=&quot;(identifier)&quot; context=&quot;(identifier)&quot; [rule specific attributes] /&gt;</programlisting>
+
+<para>The <emphasis>attribute</emphasis> identifies the style to use
+for matched characters by name, and the <emphasis>context</emphasis>
+identifies the context to use from here.</para>
+
+<para>The <emphasis>context</emphasis> can be identified by:</para>
+
+<itemizedlist>
+<listitem>
+<para>An <emphasis>identifier</emphasis>, which is the name of the other
+context.</para>
+</listitem>
+<listitem>
+<para>An <emphasis>order</emphasis> telling the engine to stay in the
+current context (<userinput>#stay</userinput>), or to pop back to a
+previous context used in the string (<userinput>#pop</userinput>).</para>
+<para>To go back more steps, the #pop keyword can be repeated:
+<userinput>#pop#pop#pop</userinput></para>
+</listitem>
+</itemizedlist>
+
+<para>Some rules can have <emphasis>child rules</emphasis> which are
+then evaluated only if the parent rule matched. The entire matched
+string will be given the attribute defined by the parent rule. A rule
+with child rules looks like this:</para>
+
+<programlisting>
+&lt;RuleName (attributes)&gt;
+  &lt;ChildRuleName (attributes) /&gt;
+  ...
+&lt;/RuleName&gt;
+</programlisting>
+
+
+<para>Rule specific attributes varies and are described in the
+following sections.</para>
+
+
+<itemizedlist>
+<title>Common attributes</title>
+<para>All rules have the following attributes in common and are
+available whenever <userinput>(common attributes)</userinput> appears.
+<emphasis>attribute</emphasis> and <emphasis>context</emphasis>
+are required attributes, all others are optional.
+</para>
+
+<listitem>
+<para><emphasis>attribute</emphasis>: An attribute maps to a defined <emphasis>itemData</emphasis>.</para>
+</listitem>
+<listitem>
+<para><emphasis>context</emphasis>: Specify the context to which the highlighting system switches if the rule matches.</para>
+</listitem>
+<listitem>
+<para><emphasis>beginRegion</emphasis>: Start a code folding block. Default: unset.</para>
+</listitem>
+<listitem>
+<para><emphasis>endRegion</emphasis>: Close a code folding block. Default: unset.</para>
+</listitem>
+<listitem>
+<para><emphasis>lookAhead</emphasis>: If <emphasis>true</emphasis>, the
+highlighting system will not process the matches length.
+Default: <emphasis>false</emphasis>.</para>
+</listitem>
+<listitem>
+<para><emphasis>firstNonSpace</emphasis>: Match only, if the string is
+the first non-whitespace in the line. Default: <emphasis>false</emphasis>.</para>
+</listitem>
+<listitem>
+<para><emphasis>column</emphasis>: Match only, if the column matches. Default: unset.</para>
+</listitem>
+</itemizedlist>
+
+<itemizedlist>
+<title>Dynamic rules</title>
+<para>Some rules allow the optional attribute <userinput>dynamic</userinput>
+of type boolean that defaults to <emphasis>false</emphasis>. If dynamic is
+<emphasis>true</emphasis>, a rule can use placeholders representing the text
+matched by a <emphasis>regular expression</emphasis> rule that switched to the
+current context in its <userinput>string</userinput> or
+<userinput>char</userinput> attributes. In a <userinput>string</userinput>,
+the placeholder <replaceable>%N</replaceable> (where N is a number) will be
+replaced with the corresponding capture <replaceable>N</replaceable>
+from the calling regular expression. In a
+<userinput>char</userinput> the placeholer must be a number
+<replaceable>N</replaceable> and it will be replaced with the first character of
+the corresponding capture <replaceable>N</replaceable> from the calling regular
+expression. Whenever a rule allows this attribute it will contain a
+<emphasis>(dynamic)</emphasis>.</para>
+
+<listitem>
+<para><emphasis>dynamic</emphasis>: may be <emphasis>(true|false)</emphasis>.</para>
+</listitem>
+</itemizedlist>
+
+<sect2 id="highlighting-rules-in-detail">
+<title>The Rules in Detail</title>
+
+<variablelist>
+<varlistentry>
+<term>DetectChar</term>
+<listitem>
+<para>Detect a single specific character. Commonly used for example to
+find the ends of quoted strings.</para>
+<programlisting>&lt;DetectChar char=&quot;(character)&quot; (common attributes) (dynamic) /&gt;</programlisting>
+<para>The <userinput>char</userinput> attribute defines the character
+to match.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Detect2Chars</term>
+<listitem>
+<para>Detect two specific characters in a defined order.</para>
+<programlisting>&lt;Detect2Chars char=&quot;(character)&quot; char1=&quot;(character)&quot; (common attributes) (dynamic) /&gt;</programlisting>
+<para>The <userinput>char</userinput> attribute defines the first character to match,
+<userinput>char1</userinput> the second.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>AnyChar</term>
+<listitem>
+<para>Detect one character of a set of specified characters.</para>
+<programlisting>&lt;AnyChar String=&quot;(string)&quot; (common attributes) /&gt;</programlisting>
+<para>The <userinput>String</userinput> attribute defines the set of
+characters.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>StringDetect</term>
+<listitem>
+<para>Detect an exact string.</para>
+<programlisting>&lt;StringDetect String=&quot;(string)&quot; [insensitive=&quot;true|false&quot;] (common attributes) (dynamic) /&gt;</programlisting>
+<para>The <userinput>String</userinput> attribute defines the string
+to match. The <userinput>insensitive</userinput> attribute defaults to
+<emphasis>false</emphasis> and is passed to the string comparison
+function. If the value is <emphasis>true</emphasis> insensitive
+comparing is used.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>RegExpr</term>
+<listitem>
+<para>Matches against a regular expression.</para>
+<programlisting>&lt;RegExpr String=&quot;(string)&quot; [insensitive=&quot;true|false&quot;] [minimal=&quot;true|false&quot;] (common attributes) (dynamic) /&gt;</programlisting>
+<para>The <userinput>String</userinput> attribute defines the regular
+expression.</para>
+<para><userinput>insensitive</userinput> defaults to
+<emphasis>false</emphasis> and is passed to the regular expression
+engine.</para>
+<para><userinput>minimal</userinput> defaults to
+<emphasis>false</emphasis> and is passed to the regular expression
+engine.</para>
+<para>Because the rules are always matched against the beginning of
+the current string, a regular expression starting with a caret
+(<literal>^</literal>) indicates that the rule should only be
+matched against the start of a line.</para>
+<para>See <link linkend="regular-expressions">Regular Expressions</link>
+for more information on those.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>keyword</term>
+<listitem>
+<para>Detect a keyword from a specified list.</para>
+<programlisting>&lt;keyword String=&quot;(list name)&quot; (common attributes) /&gt;</programlisting>
+<para>The <userinput>String</userinput> attribute identifies the
+keyword list by name. A list with that name must exist.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int</term>
+<listitem>
+<para>Detect an integer number.</para>
+<para><programlisting>&lt;Int (common attributes) (dynamic) /&gt;</programlisting></para>
+<para>This rule has no specific attributes. Child rules are typically
+used to detect combinations of <userinput>L</userinput> and
+<userinput>U</userinput> after the number, indicating the integer type
+in program code. Actually all rules are allowed as child rules, though,
+the <acronym>DTD</acronym> only allowes the child rule <userinput>StringDetect</userinput>.</para>
+<para>The following example matches integer numbers follows by the character 'L'.
+<programlisting>
+&lt;Int attribute="Decimal" context="#stay" &gt;
+  &lt;StringDetect attribute="Decimal" context="#stay" String="L" insensitive="true"/&gt;
+&lt;/Int&gt;
+</programlisting></para>
+
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Float</term>
+<listitem>
+<para>Detect a floating point number.</para>
+<para><programlisting>&lt;Float (common attributes) /&gt;</programlisting></para>
+<para>This rule has no specific attributes. <userinput>AnyChar</userinput> is
+allowed as a child rules and typically used to detect combinations, see rule
+<userinput>Int</userinput> for reference.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>HlCOct</term>
+<listitem>
+<para>Detect an octal point number representation.</para>
+<para><programlisting>&lt;HlCOct (common attributes) /&gt;</programlisting></para>
+<para>This rule has no specific attributes.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>HlCHex</term>
+<listitem>
+<para>Detect a hexadecimal number representation.</para>
+<para><programlisting>&lt;HlCHex (common attributes) /&gt;</programlisting></para>
+<para>This rule has no specific attributes.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>HlCStringChar</term>
+<listitem>
+<para>Detect an escaped character.</para>
+<para><programlisting>&lt;HlCStringChar (common attributes) /&gt;</programlisting></para>
+<para>This rule has no specific attributes.</para>
+
+<para>It matches literal representations of characters commonly used in
+program code, for example <userinput>\n</userinput>
+(newline) or <userinput>\t</userinput> (TAB).</para>
+
+<para>The following characters will match if they follow a backslash
+(<literal>\</literal>):
+<userinput>abefnrtv&quot;'?\</userinput>. Additionally, escaped
+hexadecimal numbers like for example <userinput>\xff</userinput> and
+escaped octal numbers, for example <userinput>\033</userinput> will
+match.</para>
+
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>HlCChar</term>
+<listitem>
+<para>Detect an C character.</para>
+<para><programlisting>&lt;HlCChar (common attributes) /&gt;</programlisting></para>
+<para>This rule has no specific attributes.</para>
+
+<para>It matches C characters enclosed in a tick (Example: <userinput>'c'</userinput>).
+So in the ticks may be a simple character or an escaped character.
+See HlCStringChar for matched escaped character sequences.</para>
+
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>RangeDetect</term>
+<listitem>
+<para>Detect a string with defined start and end characters.</para>
+<programlisting>&lt;RangeDetect char=&quot;(character)&quot;  char1=&quot;(character)&quot; (common attributes) /&gt;</programlisting>
+<para><userinput>char</userinput> defines the character starting the range,
+<userinput>char1</userinput> the character ending the range.</para>
+<para>Usefull to detect for example small quoted strings and the like, but
+note that since the highlighting engine works on one line at a time, this
+will not find strings spanning over a line break.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>LineContinue</term>
+<listitem>
+<para>Matches at end of line.</para>
+<programlisting>&lt;LineContinue (common attributes) /&gt;</programlisting>
+<para>This rule has no specific attributes.</para>
+<para>This rule is useful for switching context at end of line, if the last
+character is a backslash (<userinput>'\'</userinput>). This is needed for
+example in C/C++ to continue macros or strings.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>IncludeRules</term>
+<listitem>
+<para>Include rules from another context or language/file.</para>
+<programlisting>&lt;IncludeRules context=&quot;contextlink&quot; [includeAttrib=&quot;true|false&quot;] /&gt;</programlisting>
+
+<para>The <userinput>context</userinput> attribute defines which context to include.</para>
+<para>If it a simple string it includes all defined rules into the current context, example:
+<programlisting>&lt;IncludeRules context=&quot;anotherContext&quot; /&gt;</programlisting></para>
+
+<para>
+If the string begins with <userinput>##</userinput> the highlight system
+will look for another language definition with the given name, example:
+<programlisting>&lt;IncludeRules context=&quot;##C++&quot; /&gt;</programlisting></para>
+<para>If <userinput>includeAttrib</userinput> attribute is
+<emphasis>true</emphasis>, change the destination attribute to the one of
+the source. This is required to make for example commenting work, if text
+matched by the included context is a different highlight than the host
+context.
+</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>DetectSpaces</term>
+<listitem>
+<para>Detect whitespaces.</para>
+<programlisting>&lt;DetectSpaces (common attributes) /&gt;</programlisting>
+
+<para>This rule has no specific attributes.</para>
+<para>Use this rule if you know that there can several whitespaces ahead,
+for example in the beginning of indented lines. This rule will skip all
+whitespace at once, instead of testing multiple rules and skipping one at the
+time due to no match.</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>DetectIdentifier</term>
+<listitem>
+<para>Detect identifier strings (as a regular expression: [a-zA-Z_][a-zA-Z0-9_]*).</para>
+<programlisting>&lt;DetectIdentifier (common attributes) /&gt;</programlisting>
+
+<para>This rule has no specific attributes.</para>
+<para>Use this rule to skip a string of word characters at once, rather than
+testing with multiple rules and skipping one at the time due to no match.</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2>
+<title>Tips &amp; Tricks</title>
+
+<itemizedlist>
+<para>Once you have understood how the context switching works it will be
+easy to write highlight definitions. Though you should carefully check what
+rule you choose in what situation. Regular expressions are very mighty, but
+they are slow compared to the other rules. So you may consider the following
+tips.
+</para>
+
+<listitem>
+<para>If you only match two characters use <userinput>Detect2Chars</userinput>
+instead of <userinput>StringDetect</userinput>. The same applies to
+<userinput>DetectChar</userinput>.</para>
+</listitem>
+<listitem>
+<para>Regular expressions are easy to use but often there is another much
+faster way to achieve the same result. Consider you only want to match
+the character <userinput>'#'</userinput> if it is the first character in the
+line. A regular expression based solution would look like this:
+<programlisting>&lt;RegExpr attribute=&quot;Macro&quot; context=&quot;macro&quot; String=&quot;^\s*#&quot; /&gt;</programlisting>
+You can achieve the same much faster in using:
+<programlisting>&lt;DetectChar attribute=&quot;Macro&quot; context=&quot;macro&quot; char=&quot;#&quot; firstNonSpace=&quot;true&quot; /&gt;</programlisting>
+If you want to match the regular expression <userinput>'^#'</userinput> you
+can still use <userinput>DetectChar</userinput> with the attribute <userinput>column=&quot;0&quot;</userinput>.
+The attribute <userinput>column</userinput> counts character based, so a tabulator still is only one character.
+</para>
+</listitem>
+<listitem>
+<para>You can switch contexts without processing characters. Assume that you
+want to switch context when you meet the string <userinput>*/</userinput>, but
+need to process that string in the next context. The below rule will match, and
+the <userinput>lookAhead</userinput> attribute will cause the highlighter to
+keep the matched string for the next context.
+<programlisting>&lt;Detect2Chars attribute=&quot;Comment&quot; context=&quot;#pop&quot; char=&quot;*&quot; char1=&quot;/&quot; lookAhead=&quot;true&quot; /&gt;</programlisting>
+</para>
+</listitem>
+<listitem>
+<para>Use <userinput>DetectSpaces</userinput> if you know that many whitespaces occur.</para>
+</listitem>
+<listitem>
+<para>Use <userinput>DetectIdentifier</userinput> instead of the regular expression <userinput>'[a-zA-Z_]\w*'</userinput>.</para>
+</listitem>
+<listitem>
+<para>Use default styles whenever you can. This way the user will find a familiar environment.</para>
+</listitem>
+<listitem>
+<para>Look into other XML-files to see how other people implement tricky rules.</para>
+</listitem>
+<listitem>
+<para>You can validate every XML file by using the command
+<command>xmllint --dtdvalid language.dtd mySyntax.xml</command>.</para>
+</listitem>
+<listitem>
+<para>If you repeat complex regular expression very often you can use
+<emphasis>ENTITIES</emphasis>. Example:</para>
+<programlisting>
+&lt;?xml version="1.0" encoding="UTF-8"?&gt;
+&lt;!DOCTYPE language SYSTEM "language.dtd"
+[
+        &lt;!ENTITY myref    "[A-Za-z_:][\w.:_-]*"&gt;
+]&gt;
+</programlisting>
+<para>Now you can use <emphasis>&amp;myref;</emphasis> instead of the regular
+expression.</para>
+</listitem>
+</itemizedlist>
+</sect2>
+
+</sect1>
+
+</appendix>
author	toma <toma@283d02a7-25f6-0310-bc7c-ecb5cbfe19da>	2009-11-25 17:56:58 +0000
committer	toma <toma@283d02a7-25f6-0310-bc7c-ecb5cbfe19da>	2009-11-25 17:56:58 +0000
commit	4aed2c8219774f5d797760606b8489a92ddc5163 (patch)
tree	3f8c130f7d269626bf6a9447407ef6c35954426a /doc/kate/highlighting.docbook
download	tdebase-4aed2c8219774f5d797760606b8489a92ddc5163.tar.gz tdebase-4aed2c8219774f5d797760606b8489a92ddc5163.zip