Working with Syntax HiglightingOverviewSyntax Highlighting is what makes the editor automatically
display text in different styles/colors, depending on the function of
the string in relation to the purpose of the file. In program source
code for example, control statements may be rendered bold, while data
types and comments get different colors from the rest of the
text. This greatly enhances the readability of the text, and thus
helps the author to be more efficient and productive.A perl function, rendered with syntax
highlighting.
A perl function, rendered with syntax highlighting.
The same perl function, without
highlighting.
The same perl function, without highlighting.
Of the two examples, which is easiest to read?&kate; comes with a flexible, configurable and capable system
for doing syntax highlighting, and the standard distribution provides
definitions for a wide range of programming languages, markup and
scripting languages and other text file formats. In addition you can
provide your own definitions in simple &XML; files.&kate; will automatically detect the right syntax rules when you
open a file, based on the &MIME; Type of the file, determined by its
extension, or, if it has none, the contents. Should you experience a
bad choice, you can manually set the syntax to use from the
DocumentsHighlight
Mode menu.The styles and colors used by each syntax highlight definition,
as well as which &MIME;types it should be used for, can be configured
using the Highlight
page of the Config Dialog.Syntax highlighting is there to enhance the readability of
correct text, but you cannot trust it to validate your text. Marking
text for syntax is difficult depending on the format you are using,
and in some cases the authors of the syntax rules will be proud if 98%
of text gets correctly rendered, though most often you need a rare
style to see the incorrect 2%.You can download updated or additional syntax highlight
definitions from the &kate; website by clicking the
Download button in the Highlight Page of the Config Dialog.The &kate; Syntax Highlight SystemThis section will discuss the &kate; syntax highlighting
mechanism in more detail. It is for you if you want to know know about
it, or if you want to change or create syntax definitions.How it WorksWhenever you open a file, one of the first things the &kate;
editor does is detect which syntax definition to use for the
file. While reading the text of the file, and while you type away in
it, the syntax highlighting system will analyze the text using the
rules defined by the syntax definition and mark in it where different
contexts and styles begin and end.When you type in the document, the new text is analyzed and marked on the
fly, so that if you delete a character that is marked as the beginning or end
of a context, the style of surrounding text changes accordingly.The syntax definitions used by the &kate; syntax highlighting system are
&XML; files, containing
Rules for detecting the role of text, organized into context blocksKeyword listsStyle Item definitionsWhen analyzing the text, the detection rules are evaluated in
the order in which they are defined, and if the beginning of the
current string matches a rule, the related context is used. The start
point in the text is moved to the final point at which that rule
matched and a new loop of the rules begins, starting in the context
set by the matched rule.RulesThe detection rules are the heart of the highlighting detection
system. A rule is a string, character or regular expression against which
to match the text being analyzed. It contains information about which
style to use for the matching part of the text. It may switch the
working context of the system either to an explicitly mentioned
context or to the previous context used by the text.Rules are organized in context groups. A context group is used
for main text concepts within the format, for example quoted text
strings or comment blocks in program source code. This ensures that
the highlighting system does not need to loop through all rules when
it is not necessary, and that some character sequences in the text can
be treated differently depending on the current context.
Context Styles and KeywordsIn some programming languages, integer numbers are treated
differently than floating point ones by the compiler (the program that
converts the source code to a binary executable), and there may be
characters having a special meaning within a quoted string. In such
cases, it makes sense to render them differently from the surroundings
so that they are easy to identify while reading the text. So even if
they do not represent special contexts, they may be seen as such by
the syntax highlighting system, so that they can be marked for
different rendering.A syntax definition may contain as many styles as required to
cover the concepts of the format it is used for.In many formats, there are lists of words that represent a
specific concept. For example in programming languages, the control
statements is one concept, data type names another, and built in
functions of the language a third. The &kate; Syntax Highlighting
System can use such lists to detect and mark words in the text to
emphasize concepts of the text formats.Default StylesIf you open a C++ source file, a &Java; source file and an
HTML document in &kate;, you will see that even
though the formats are different, and thus different words are chosen
for special treatment, the colors used are the same. This is because
&kate; has a predefined list of Default Styles, that are employed by
the individual syntax definitions.This makes it easy to recognize similar concepts in different
text formats. For example comments are present in almost any
programming, scripting or markup language, and when they are rendered
using the same style in all languages, you do not have to stop and
think to identify them within the text.All styles in a syntax definition use one of the default
styles. A few syntax definitions use more styles that there are
defaults, so if you use a format often, it may be worth launching the
configuration dialog to see if some concepts are using the same
style. For example there is only one default style for strings, but as
the perl programming language operates with two types of strings, you
can enhance the highlighting by configuring those to be slightly
different.The Highlight Definition &XML; FormatOverviewThis section is an overview of the Highlight Definition &XML;
format. It will describe the main components and their meaning and
usage, and go into detail with the detection rules.The formal definition, aka the DTD is stored
in the file language.dtd which should be
installed on your system in the directory
$KDEDIR/share/apps/kate/syntax.Main components of &kate; Highlight DefinitionsThe General SectionThe General Section contains information on the comment format
of the described language, and defines whether keywords are case
sensitive.HighlightingThe Highlighting section contains all data required to analyze
and render the text. This includes:ItemDatasContains ItemData elements, each defining a
style.Keyword listsEach list has a name, and may contain any number of items.ContextsContains contexts, which again contain the syntax detection rules.Highlight Detection RulesThis section describes the syntax detection rules.Each rule can match zero or more characters at the beginning of
the string they are asked to test. If the rule matches, the matching
characters are assigned the style or attribute
defined by the rule, and a rule may ask that the current context is
switched.The attribute and
context attributes are common to all
rules.A rule looks like this:<RuleName attribute="(identifier)" context="(identifier|order)" [rule specific attributes] />The attribute identifies the style to use
for matched characters by name or index, and the
context identifies the context to use from
here.The attribute can be identified either by
name, or by its zero-based index in the ItemDatas group.The context can be identified by:An identifier, currently only its zero-based
index in the contexts group.An order telling the engine to stay in the
current context (#stay), or to pop back to a
previous context used in the string
(#pop).To go back more steps, the #pop keyword can be repeated:
#pop#pop#popSome rules can have child rules which are
then evaluated if and only if the parent rule matched. The entire
matched string will be given the attribute defined by the parent
rule. A rule with child rules looks like this:
<RuleName (attributes)>
<ChildRuleName (attributes) />
...
</RuleName>
Rule specific attributes varies and are described in the
following list.The Rules in DetailDetectCharDetect a single specific character. Commonly used for example to
find the ends of quoted strings.<DetectChar char="(character)" (common attributes) />The char attribute defines the character
to match.Detect2CharsDetect two specific characters in a defined order.<Detect2Chars char="(character)" char1="(character)" (common attributes) />The char attribute defines the first character to match,
char1 the second.AnyCharDetect one character of a set of specified characters.<AnyChar String="(string)" (common attributes) />The String attribute defines the set of
characters.StringDetectDetect an exact string.<StringDetect String="(string)" [insensitive="TRUE|FALSE;"] (common attributes) />The String attribute defines the string
to match. The insensitive attribute defaults to
FALSE and is fed to the string comparison
function. If the value is TRUE insensitive
comparing is used.RegExprMatches against a regular expression.<RegExpr String="(string)" [insensitive="TRUE|FALSE;"] [minimal="TRUE|FALSE"] (common attributes) />The String attribute defines the regular
expression.insensitive defaults to
FALSE and is fed to the regular expression
engine.minimal defaults to
FALSE and is fed to the regular expression
engine.Because the rules are always matched against the beginning of
the current string, a regular expression starting with a caret
(^) indicates that the rule should only be
matched against the start of a line.See Regular
Expressions for more information on those.KeywordDetect a keyword from a specified list.<keyword String="(list name)" (common attributes) />The String attribute identifies the
keyword list by name. A list with that name must exist.IntDetect an integer number.<Int (common attributes) />This rule has no specific attributes. Child rules are typically
used to detect combinations of L and
U after the number, indicating the integer type
in program code.FloatDetect a floating point number.<Float (common attributes)
/>This rule has no specific attributes.HlCOctDetect an octal point number representation.<HlCOct (common attributes) />This rule has no specific attributes.HlCHexDetect a hexadecimal number representation.<Int (common attributes) />This rule has no specific attributes.HlCStringCharDetect an escaped character.<HlCStringChar (common attributes)
/>This rule has no specific attributes.It matches letteral representations of invisible characters
commonly used in program code, for example \n
(newline) or \t (TAB).The following characters will match if they follow a backslash
(\):
abefnrtv"'?. Additionally, escaped
hexadecimal numbers like for example \xff and
escaped octal numbers, for example \033 will
match.RangeDetectDetect a string with defined start and end characters.<RangeDetect char="(character)" char1="(character)" (common attributes) />char defines the character starting the range,
char2 the character ending the range.Usefull to detect for example small quoted strings and the like, but note that
since the hl engine works on one line at a time, this will not find strings spanning over a line break.LineContinueMatches at end of line.<LineContinue (common attributes) />This rule has no specific attributes.This rule is usefull for switching context at end of line.