1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
|
<appendix id="highlight">
<appendixinfo>
<authorgroup>
<author><personname><firstname></firstname></personname></author>
<!-- TRANS:ROLES_OF_TRANSLATORS -->
</authorgroup>
</appendixinfo>
<title>Working with Syntax Highlighting</title>
<sect1 id="highlight-overview">
<title>Overview</title>
<para>Syntax Highlighting is what makes the editor automatically
display text in different styles/colors, depending on the function of
the string in relation to the purpose of the file. In program source
code for example, control statements may be rendered bold, while data
types and comments get different colors from the rest of the
text. This greatly enhances the readability of the text, and thus
helps the author to be more efficient and productive.</para>
<mediaobject>
<imageobject><imagedata format="PNG" fileref="highlighted.png"/></imageobject>
<textobject><phrase>A Perl function, rendered with syntax
highlighting.</phrase></textobject>
<caption><para>A Perl function, rendered with syntax highlighting.</para>
</caption>
</mediaobject>
<mediaobject>
<imageobject><imagedata format="PNG" fileref="unhighlighted.png"/></imageobject>
<textobject><phrase>The same Perl function, without
highlighting.</phrase></textobject>
<caption><para>The same Perl function, without highlighting.</para></caption>
</mediaobject>
<para>Of the two examples, which is easiest to read?</para>
<para>&kate; comes with a flexible, configurable and capable system
for doing syntax highlighting, and the standard distribution provides
definitions for a wide range of programming, scripting and markup
languages and other text file formats. In addition you can
provide your own definitions in simple &XML; files.</para>
<para>&kate; will automatically detect the right syntax rules when you
open a file, based on the &MIME; Type of the file, determined by its
extension, or, if it has none, the contents. Should you experience a
bad choice, you can manually set the syntax to use from the
<menuchoice><guimenu>Documents</guimenu><guisubmenu>Highlight
Mode</guisubmenu></menuchoice> menu.</para>
<para>The styles and colors used by each syntax highlight definition
can be configured using the <link
linkend="config-dialog-editor-appearance">Appearance</link> page of the
<link linkend="config-dialog">Config Dialog</link>, while the &MIME; Types
it should be used for, are handeled by the <link
linkend="config-dialog-editor-highlighting">Highlight</link>
page.</para>
<note>
<para>Syntax highlighting is there to enhance the readability of
correct text, but you cannot trust it to validate your text. Marking
text for syntax is difficult depending on the format you are using,
and in some cases the authors of the syntax rules will be proud if 98%
of text gets correctly rendered, though most often you need a rare
style to see the incorrect 2%.</para>
</note>
<tip>
<para>You can download updated or additional syntax highlight
definitions from the &kate; website by clicking the
<guibutton>Download</guibutton> button in the <link
linkend="config-dialog-editor-highlighting">Highlight Page</link> of the <link
linkend="config-dialog">Config Dialog</link>.</para>
</tip>
</sect1>
<sect1 id="katehighlight-system">
<title>The &kate; Syntax Highlight System</title>
<para>This section will discuss the &kate; syntax highlighting
mechanism in more detail. It is for you if you want to know about
it, or if you want to change or create syntax definitions.</para>
<mediaobject>
<imageobject>
<imagedata format="PNG" fileref="configdialog02.png"/>
</imageobject>
</mediaobject>
<sect2 id="katehighlight-howitworks">
<title>How it Works</title>
<para>Whenever you open a file, one of the first things the &kate;
editor does is detect which syntax definition to use for the
file. While reading the text of the file, and while you type away in
it, the syntax highlighting system will analyze the text using the
rules defined by the syntax definition and mark in it where different
contexts and styles begin and end.</para>
<para>When you type in the document, the new text is analyzed and marked on the
fly, so that if you delete a character that is marked as the beginning or end
of a context, the style of surrounding text changes accordingly.</para>
<para>The syntax definitions used by the &kate; Syntax Highlighting System are
&XML; files, containing
<itemizedlist>
<listitem><para>Rules for detecting the role of text, organized into context blocks</para></listitem>
<listitem><para>Keyword lists</para></listitem>
<listitem><para>Style Item definitions</para></listitem>
</itemizedlist>
</para>
<para>When analyzing the text, the detection rules are evaluated in
the order in which they are defined, and if the beginning of the
current string matches a rule, the related context is used. The start
point in the text is moved to the final point at which that rule
matched and a new loop of the rules begins, starting in the context
set by the matched rule.</para>
</sect2>
<sect2 id="highlight-system-rules">
<title>Rules</title>
<para>The detection rules are the heart of the highlighting detection
system. A rule is a string, character or <link
linkend="regular-expressions">regular expression</link> against which
to match the text being analyzed. It contains information about which
style to use for the matching part of the text. It may switch the
working context of the system either to an explicitly mentioned
context or to the previous context used by the text.</para>
<para>Rules are organized in context groups. A context group is used
for main text concepts within the format, for example quoted text
strings or comment blocks in program source code. This ensures that
the highlighting system does not need to loop through all rules when
it is not necessary, and that some character sequences in the text can
be treated differently depending on the current context.
</para>
<para>Contexts may be generated dynamically to allow the usage of instance
specific data in rules.</para>
</sect2>
<sect2 id="highlight-context-styles-keywords">
<title>Context Styles and Keywords</title>
<para>In some programming languages, integer numbers are treated
differently than floating point ones by the compiler (the program that
converts the source code to a binary executable), and there may be
characters having a special meaning within a quoted string. In such
cases, it makes sense to render them differently from the surroundings
so that they are easy to identify while reading the text. So even if
they do not represent special contexts, they may be seen as such by
the syntax highlighting system, so that they can be marked for
different rendering.</para>
<para>A syntax definition may contain as many styles as required to
cover the concepts of the format it is used for.</para>
<para>In many formats, there are lists of words that represent a
specific concept. For example in programming languages, the control
statements is one concept, data type names another, and built in
functions of the language a third. The &kate; Syntax Highlighting
System can use such lists to detect and mark words in the text to
emphasize concepts of the text formats.</para>
</sect2>
<sect2 id="kate-highlight-system-default-styles">
<title>Default Styles</title>
<para>If you open a C++ source file, a &Java; source file and an
<acronym>HTML</acronym> document in &kate;, you will see that even
though the formats are different, and thus different words are chosen
for special treatment, the colors used are the same. This is because
&kate; has a predefined list of Default Styles which are employed by
the individual syntax definitions.</para>
<para>This makes it easy to recognize similar concepts in different
text formats. For example comments are present in almost any
programming, scripting or markup language, and when they are rendered
using the same style in all languages, you do not have to stop and
think to identify them within the text.</para>
<tip>
<para>All styles in a syntax definition use one of the default
styles. A few syntax definitions use more styles that there are
defaults, so if you use a format often, it may be worth launching the
configuration dialog to see if some concepts are using the same
style. For example there is only one default style for strings, but as
the Perl programming language operates with two types of strings, you
can enhance the highlighting by configuring those to be slightly
different. All <link linkend="kate-highlight-default-styles">available default styles</link>
will be explained later.</para>
</tip>
</sect2>
</sect1>
<sect1 id="katehighlight-xml-format">
<title>The Highlight Definition &XML; Format</title>
<sect2>
<title>Overview</title>
<para>This section is an overview of the Highlight Definition &XML;
format. Based on a small example it will describe the main components
and their meaning and usage. The next section will go into detail with
the highlight detection rules.</para>
<para>The formal definition, aka the <acronym>DTD</acronym> is stored
in the file <filename>language.dtd</filename> which should be
installed on your system in the folder
<filename>$<envar>TDEDIR</envar>/share/apps/katepart/syntax</filename>.
</para>
<variablelist>
<title>Main sections of &kate; Highlight Definition files</title>
<varlistentry>
<term>A highlighting file contains a header that sets the XML version and the doctype:</term>
<listitem>
<programlisting>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd">
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>The root of the definition file is the element <userinput>language</userinput>.
Available attributes are:</term>
<listitem>
<para>Required attributes:</para>
<para><userinput>name</userinput> sets the name of the language. It appears in the menus and dialogs afterwards.</para>
<para><userinput>section</userinput> specifies the category.</para>
<para><userinput>extensions</userinput> defines file extensions, like "*.cpp;*.h"</para>
<para>Optional attributes:</para>
<para><userinput>mimetype</userinput> associates files &MIME; Type based.</para>
<para><userinput>version</userinput> specifies the current version of the definition file.</para>
<para><userinput>kateversion</userinput> specifies the latest supported &kate; version.</para>
<para><userinput>casesensitive</userinput> defines, whether the keywords are casesensitiv or not.</para>
<para><userinput>priority</userinput> is necessary if another highlight definition file uses the same extensions. The higher priority will win.</para>
<para><userinput>author</userinput> contains the name of the author and his email-address.</para>
<para><userinput>license</userinput> contains the license, usually LGPL, Artistic, GPL and others.</para>
<para><userinput>hidden</userinput> defines, whether the name should appear in &kate;'s menus.</para>
<para>So the next line may look like this:</para>
<programlisting>
<language name="C++" version="1.00" kateversion="2.4" section="Sources" extensions="*.cpp;*.h" />
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>Next comes the <userinput>highlighting</userinput> element, which
contains the optional element <userinput>list</userinput> and the required
elements <userinput>contexts</userinput> and <userinput>itemDatas</userinput>.</term>
<listitem>
<para><userinput>list</userinput> elements contain a list of keywords. In
this case the keywords are <emphasis>class</emphasis> and <emphasis>const</emphasis>.
You can add as many lists as you need.</para>
<para>The <userinput>contexts</userinput> element contains all contexts.
The first context is by default the start of the highlighting. There are
two rules in the context <emphasis>Normal Text</emphasis>, which match
the list of keywords with the name <emphasis>somename</emphasis> and a
rule that detects a quote and switches the context to <emphasis>string</emphasis>.
To learn more about rules read the next chapter.</para>
<para>The third part is the <userinput>itemDatas</userinput> element. It
contains all color and font styles needed by the contexts and rules.
In this example, the <userinput>itemData</userinput> <emphasis>Normal Text</emphasis>,
<emphasis>String</emphasis> and <emphasis>Keyword</emphasis> are used.
</para>
<programlisting>
<highlighting>
<list name="somename">
<item> class </item>
<item> const </item>
</list>
<contexts>
<context attribute="Normal Text" lineEndContext="#pop" name="Normal Text" >
<keyword attribute="Keyword" context="#stay" String="somename" />
<DetectChar attribute="String" context="string" char="&quot;" />
</context>
<context attribute="String" lineEndContext="#stay" name="string" >
<DetectChar attribute="String" context="#pop" char="&quot;" />
</context>
</contexts>
<itemDatas>
<itemData name="Normal Text" defStyleNum="dsNormal" />
<itemData name="Keyword" defStyleNum="dsKeyword" />
<itemData name="String" defStyleNum="dsString" />
</itemDatas>
</highlighting>
</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term>The last part of a highlight definition is the optional
<userinput>general</userinput> section. It may contain information
about keywords, code folding, comments and indentation.</term>
<listitem>
<para>The <userinput>comment</userinput> section defines with what
string a single line comment is introduced. You also can define a
multiline comments using <emphasis>multiLine</emphasis> with the
additional attribute <emphasis>end</emphasis>. This is used if the
user presses the corresponding shortcut for <emphasis>comment/uncomment</emphasis>.</para>
<para>The <userinput>keywords</userinput> section defines whether
keyword lists are casesensitive or not. Other attributes will be
explained later.</para>
<programlisting>
<general>
<comments>
<comment name="singleLine" start="#"/>
</comments>
<keywords casesensitive="1"/>
</general>
</language>
</programlisting>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2 id="kate-highlight-sections">
<title>The Sections in Detail</title>
<para>This part will describe all available attributes for contexts,
itemDatas, keywords, comments, code folding and indentation.</para>
<variablelist>
<varlistentry>
<term>The element <userinput>context</userinput> belongs into the group
<userinput>contexts</userinput>. A context itself defines context specific
rules like what should happen if the highlight system reaches the end of a
line. Available attributes are:</term>
<listitem>
<para><userinput>name</userinput> the context name. Rules will use this name
to specify the context to switch to if the rule matches.</para>
<para><userinput>lineEndContext</userinput> defines the context the highlight
system switches to if it reaches the end of a line. This may either be a name
of another context, <userinput>#stay</userinput> to not switch the context
(eg. do nothing) or <userinput>#pop</userinput> which will cause to leave this
context. It is possible to use for example <userinput>#pop#pop#pop</userinput>
to pop three times.</para>
<para><userinput>lineBeginContext</userinput> defines the context if a begin
of a line is encountered. Default: #stay.</para>
<para><userinput>fallthrough</userinput> defines if the highlight system switches
to the context specified in fallthroughContext if no rule matches.
Default: <emphasis>false</emphasis>.</para>
<para><userinput>fallthroughContext</userinput> specifies the next context
if no rule matches.</para>
<para><userinput>dynamic</userinput> if <emphasis>true</emphasis>, the context
remembers strings/placeholders saved by dynamic rules. This is needed for HERE
documents for example. Default: <emphasis>false</emphasis>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>The element <userinput>itemData</userinput> is in the group
<userinput>itemDatas</userinput>. It defines the font style and colors.
So it is possible to define your own styles and colors, however we
recommend to stick to the default styles if possible so that the user
will always see the same colors used in different languages. Though,
sometimes there is no other way and it is necessary to change color
and font attributes. The attributes name and defStyleNum are required,
the other optional. Available attributes are:</term>
<listitem>
<para><userinput>name</userinput> sets the name of the itemData.
Contexts and rules will use this name in their attribute
<emphasis>attribute</emphasis> to reference an itemData.</para>
<para><userinput>defStyleNum</userinput> defines which default style to use.
Available default styles are explained in detail later.</para>
<para><userinput>color</userinput> defines a color. Valid formats are
'#rrggbb' or '#rgb'.</para>
<para><userinput>selColor</userinput> defines the selection color.</para>
<para><userinput>italic</userinput> if <emphasis>true</emphasis>, the text will be italic.</para>
<para><userinput>bold</userinput> if <emphasis>true</emphasis>, the text will be bold.</para>
<para><userinput>underline</userinput> if <emphasis>true</emphasis>, the text will be underlined.</para>
<para><userinput>strikeout</userinput> if <emphasis>true</emphasis>, the text will be stroked out.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>The element <userinput>keywords</userinput> in the group
<userinput>general</userinput> defines keyword properties. Available attributes are:</term>
<listitem>
<para><userinput>casesensitive</userinput> may be <emphasis>true</emphasis>
or <emphasis>false</emphasis>. If <emphasis>true</emphasis>, all keywords
are matched casesensitive</para>
<para><userinput>weakDeliminator</userinput> is a list of characters that
do not act as word delimiters. For example the dot <userinput>'.'</userinput>
is a word delimiter. Assume a keyword in a <userinput>list</userinput> contains
a dot, it will only match if you specify the dot as a weak delimiter.</para>
<para><userinput>additionalDeliminator</userinput> defines additional delimiters.</para>
<para><userinput>wordWrapDeliminator</userinput> defines characters after which a
line wrap may occur.</para>
<para>Default delimiters and word wrap delimiters are the characters
<userinput>.():!+,-<=>%&*/;?[]^{|}~\</userinput>, space (<userinput>' '</userinput>)
and tabulator (<userinput>'\t'</userinput>).</para>
</listitem>
</varlistentry>
<varlistentry>
<term>The element <userinput>comment</userinput> in the group
<userinput>comments</userinput> defines comment properties which are used
for <menuchoice><guimenu>Tools</guimenu><guimenuitem>Comment</guimenuitem></menuchoice> and
<menuchoice><guimenu>Tools</guimenu><guimenuitem>Uncomment</guimenuitem></menuchoice>.
Available attributes are:</term>
<listitem>
<para><userinput>name</userinput> is either <emphasis>singleLine</emphasis>
or <emphasis>multiLine</emphasis>. If you choose <emphasis>multiLine</emphasis>
the attributes <emphasis>end</emphasis> and <emphasis>region</emphasis> are
required.</para>
<para><userinput>start</userinput> defines the string used to start a comment.
In C++ this would be "/*".</para>
<para><userinput>end</userinput> defines the string used to close a comment.
In C++ this would be "*/".</para>
<para><userinput>region</userinput> should be the name of the the foldable
multiline comment. Assume you have <emphasis>beginRegion="Comment"</emphasis>
... <emphasis>endRegion="Comment"</emphasis> in your rules, you should use
<emphasis>region="Comment"</emphasis>. This way uncomment works even if you
do not select all the text of the multiline comment. The cursor only must be
in the multiline comment.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>The element <userinput>folding</userinput> in the group
<userinput>general</userinput> defines code folding properties.
Available attributes are:</term>
<listitem>
<para><userinput>indentationsensitive</userinput> if <emphasis>true</emphasis>, the code folding markers
will be added indentation based, like in the scripting language Python. Usually you
do not need to set it, as it defaults to <emphasis>false</emphasis>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>The element <userinput>indentation</userinput> in the group
<userinput>general</userinput> defines which indenter will be used, however we strongly
recommend to omit this element, as the indenter usually will be set by either defining
a File Type or by adding a mode line to the text file. If you specify an indenter though,
you will force a specific indentation on the user, which he might not like at all.
Available attributes are:</term>
<listitem>
<para><userinput>mode</userinput> is the name of the indenter. Available indenters
right now are: <emphasis>normal, cstyle, csands, xml, python</emphasis> and
<emphasis>varindent</emphasis>.</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2 id="kate-highlight-default-styles">
<title>Available Default Styles</title>
<para>Default Styles were <link linkend="kate-highlight-system-default-styles">already explained</link>,
as a short summary: Default styles are predefined font and color styles.</para>
<variablelist>
<varlistentry>
<term>So here only the list of available default styles:</term>
<listitem>
<para><userinput>dsNormal</userinput>, used for normal text.</para>
<para><userinput>dsKeyword</userinput>, used for keywords.</para>
<para><userinput>dsDataType</userinput>, used for data types.</para>
<para><userinput>dsDecVal</userinput>, used for decimal values.</para>
<para><userinput>dsBaseN</userinput>, used for values with a base other than 10.</para>
<para><userinput>dsFloat</userinput>, used for float values.</para>
<para><userinput>dsChar</userinput>, used for a character.</para>
<para><userinput>dsString</userinput>, used for strings.</para>
<para><userinput>dsComment</userinput>, used for comments.</para>
<para><userinput>dsOthers</userinput>, used for 'other' things.</para>
<para><userinput>dsAlert</userinput>, used for warning messages.</para>
<para><userinput>dsFunction</userinput>, used for function calls.</para>
<para><userinput>dsRegionMarker</userinput>, used for region markers.</para>
<para><userinput>dsError</userinput>, used for error highlighting and wrong syntax.</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
</sect1>
<sect1 id="kate-highlight-rules-detailled">
<title>Highlight Detection Rules</title>
<para>This section describes the syntax detection rules.</para>
<para>Each rule can match zero or more characters at the beginning of
the string they are test against. If the rule matches, the matching
characters are assigned the style or <emphasis>attribute</emphasis>
defined by the rule, and a rule may ask that the current context is
switched.</para>
<para>A rule looks like this:</para>
<programlisting><RuleName attribute="(identifier)" context="(identifier)" [rule specific attributes] /></programlisting>
<para>The <emphasis>attribute</emphasis> identifies the style to use
for matched characters by name, and the <emphasis>context</emphasis>
identifies the context to use from here.</para>
<para>The <emphasis>context</emphasis> can be identified by:</para>
<itemizedlist>
<listitem>
<para>An <emphasis>identifier</emphasis>, which is the name of the other
context.</para>
</listitem>
<listitem>
<para>An <emphasis>order</emphasis> telling the engine to stay in the
current context (<userinput>#stay</userinput>), or to pop back to a
previous context used in the string (<userinput>#pop</userinput>).</para>
<para>To go back more steps, the #pop keyword can be repeated:
<userinput>#pop#pop#pop</userinput></para>
</listitem>
</itemizedlist>
<para>Some rules can have <emphasis>child rules</emphasis> which are
then evaluated only if the parent rule matched. The entire matched
string will be given the attribute defined by the parent rule. A rule
with child rules looks like this:</para>
<programlisting>
<RuleName (attributes)>
<ChildRuleName (attributes) />
...
</RuleName>
</programlisting>
<para>Rule specific attributes varies and are described in the
following sections.</para>
<itemizedlist>
<title>Common attributes</title>
<para>All rules have the following attributes in common and are
available whenever <userinput>(common attributes)</userinput> appears.
<emphasis>attribute</emphasis> and <emphasis>context</emphasis>
are required attributes, all others are optional.
</para>
<listitem>
<para><emphasis>attribute</emphasis>: An attribute maps to a defined <emphasis>itemData</emphasis>.</para>
</listitem>
<listitem>
<para><emphasis>context</emphasis>: Specify the context to which the highlighting system switches if the rule matches.</para>
</listitem>
<listitem>
<para><emphasis>beginRegion</emphasis>: Start a code folding block. Default: unset.</para>
</listitem>
<listitem>
<para><emphasis>endRegion</emphasis>: Close a code folding block. Default: unset.</para>
</listitem>
<listitem>
<para><emphasis>lookAhead</emphasis>: If <emphasis>true</emphasis>, the
highlighting system will not process the matches length.
Default: <emphasis>false</emphasis>.</para>
</listitem>
<listitem>
<para><emphasis>firstNonSpace</emphasis>: Match only, if the string is
the first non-whitespace in the line. Default: <emphasis>false</emphasis>.</para>
</listitem>
<listitem>
<para><emphasis>column</emphasis>: Match only, if the column matches. Default: unset.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<title>Dynamic rules</title>
<para>Some rules allow the optional attribute <userinput>dynamic</userinput>
of type boolean that defaults to <emphasis>false</emphasis>. If dynamic is
<emphasis>true</emphasis>, a rule can use placeholders representing the text
matched by a <emphasis>regular expression</emphasis> rule that switched to the
current context in its <userinput>string</userinput> or
<userinput>char</userinput> attributes. In a <userinput>string</userinput>,
the placeholder <replaceable>%N</replaceable> (where N is a number) will be
replaced with the corresponding capture <replaceable>N</replaceable>
from the calling regular expression. In a
<userinput>char</userinput> the placeholer must be a number
<replaceable>N</replaceable> and it will be replaced with the first character of
the corresponding capture <replaceable>N</replaceable> from the calling regular
expression. Whenever a rule allows this attribute it will contain a
<emphasis>(dynamic)</emphasis>.</para>
<listitem>
<para><emphasis>dynamic</emphasis>: may be <emphasis>(true|false)</emphasis>.</para>
</listitem>
</itemizedlist>
<sect2 id="highlighting-rules-in-detail">
<title>The Rules in Detail</title>
<variablelist>
<varlistentry>
<term>DetectChar</term>
<listitem>
<para>Detect a single specific character. Commonly used for example to
find the ends of quoted strings.</para>
<programlisting><DetectChar char="(character)" (common attributes) (dynamic) /></programlisting>
<para>The <userinput>char</userinput> attribute defines the character
to match.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Detect2Chars</term>
<listitem>
<para>Detect two specific characters in a defined order.</para>
<programlisting><Detect2Chars char="(character)" char1="(character)" (common attributes) (dynamic) /></programlisting>
<para>The <userinput>char</userinput> attribute defines the first character to match,
<userinput>char1</userinput> the second.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>AnyChar</term>
<listitem>
<para>Detect one character of a set of specified characters.</para>
<programlisting><AnyChar String="(string)" (common attributes) /></programlisting>
<para>The <userinput>String</userinput> attribute defines the set of
characters.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>StringDetect</term>
<listitem>
<para>Detect an exact string.</para>
<programlisting><StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) /></programlisting>
<para>The <userinput>String</userinput> attribute defines the string
to match. The <userinput>insensitive</userinput> attribute defaults to
<emphasis>false</emphasis> and is passed to the string comparison
function. If the value is <emphasis>true</emphasis> insensitive
comparing is used.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>RegExpr</term>
<listitem>
<para>Matches against a regular expression.</para>
<programlisting><RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) /></programlisting>
<para>The <userinput>String</userinput> attribute defines the regular
expression.</para>
<para><userinput>insensitive</userinput> defaults to
<emphasis>false</emphasis> and is passed to the regular expression
engine.</para>
<para><userinput>minimal</userinput> defaults to
<emphasis>false</emphasis> and is passed to the regular expression
engine.</para>
<para>Because the rules are always matched against the beginning of
the current string, a regular expression starting with a caret
(<literal>^</literal>) indicates that the rule should only be
matched against the start of a line.</para>
<para>See <link linkend="regular-expressions">Regular Expressions</link>
for more information on those.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>keyword</term>
<listitem>
<para>Detect a keyword from a specified list.</para>
<programlisting><keyword String="(list name)" (common attributes) /></programlisting>
<para>The <userinput>String</userinput> attribute identifies the
keyword list by name. A list with that name must exist.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Int</term>
<listitem>
<para>Detect an integer number.</para>
<para><programlisting><Int (common attributes) (dynamic) /></programlisting></para>
<para>This rule has no specific attributes. Child rules are typically
used to detect combinations of <userinput>L</userinput> and
<userinput>U</userinput> after the number, indicating the integer type
in program code. Actually all rules are allowed as child rules, though,
the <acronym>DTD</acronym> only allowes the child rule <userinput>StringDetect</userinput>.</para>
<para>The following example matches integer numbers follows by the character 'L'.
<programlisting>
<Int attribute="Decimal" context="#stay" >
<StringDetect attribute="Decimal" context="#stay" String="L" insensitive="true"/>
</Int>
</programlisting></para>
</listitem>
</varlistentry>
<varlistentry>
<term>Float</term>
<listitem>
<para>Detect a floating point number.</para>
<para><programlisting><Float (common attributes) /></programlisting></para>
<para>This rule has no specific attributes. <userinput>AnyChar</userinput> is
allowed as a child rules and typically used to detect combinations, see rule
<userinput>Int</userinput> for reference.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>HlCOct</term>
<listitem>
<para>Detect an octal point number representation.</para>
<para><programlisting><HlCOct (common attributes) /></programlisting></para>
<para>This rule has no specific attributes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>HlCHex</term>
<listitem>
<para>Detect a hexadecimal number representation.</para>
<para><programlisting><HlCHex (common attributes) /></programlisting></para>
<para>This rule has no specific attributes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>HlCStringChar</term>
<listitem>
<para>Detect an escaped character.</para>
<para><programlisting><HlCStringChar (common attributes) /></programlisting></para>
<para>This rule has no specific attributes.</para>
<para>It matches literal representations of characters commonly used in
program code, for example <userinput>\n</userinput>
(newline) or <userinput>\t</userinput> (TAB).</para>
<para>The following characters will match if they follow a backslash
(<literal>\</literal>):
<userinput>abefnrtv"'?\</userinput>. Additionally, escaped
hexadecimal numbers like for example <userinput>\xff</userinput> and
escaped octal numbers, for example <userinput>\033</userinput> will
match.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>HlCChar</term>
<listitem>
<para>Detect an C character.</para>
<para><programlisting><HlCChar (common attributes) /></programlisting></para>
<para>This rule has no specific attributes.</para>
<para>It matches C characters enclosed in a tick (Example: <userinput>'c'</userinput>).
So in the ticks may be a simple character or an escaped character.
See HlCStringChar for matched escaped character sequences.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>RangeDetect</term>
<listitem>
<para>Detect a string with defined start and end characters.</para>
<programlisting><RangeDetect char="(character)" char1="(character)" (common attributes) /></programlisting>
<para><userinput>char</userinput> defines the character starting the range,
<userinput>char1</userinput> the character ending the range.</para>
<para>Usefull to detect for example small quoted strings and the like, but
note that since the highlighting engine works on one line at a time, this
will not find strings spanning over a line break.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>LineContinue</term>
<listitem>
<para>Matches at end of line.</para>
<programlisting><LineContinue (common attributes) /></programlisting>
<para>This rule has no specific attributes.</para>
<para>This rule is useful for switching context at end of line, if the last
character is a backslash (<userinput>'\'</userinput>). This is needed for
example in C/C++ to continue macros or strings.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>IncludeRules</term>
<listitem>
<para>Include rules from another context or language/file.</para>
<programlisting><IncludeRules context="contextlink" [includeAttrib="true|false"] /></programlisting>
<para>The <userinput>context</userinput> attribute defines which context to include.</para>
<para>If it a simple string it includes all defined rules into the current context, example:
<programlisting><IncludeRules context="anotherContext" /></programlisting></para>
<para>
If the string begins with <userinput>##</userinput> the highlight system
will look for another language definition with the given name, example:
<programlisting><IncludeRules context="##C++" /></programlisting></para>
<para>If <userinput>includeAttrib</userinput> attribute is
<emphasis>true</emphasis>, change the destination attribute to the one of
the source. This is required to make for example commenting work, if text
matched by the included context is a different highlight than the host
context.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>DetectSpaces</term>
<listitem>
<para>Detect whitespaces.</para>
<programlisting><DetectSpaces (common attributes) /></programlisting>
<para>This rule has no specific attributes.</para>
<para>Use this rule if you know that there can several whitespaces ahead,
for example in the beginning of indented lines. This rule will skip all
whitespace at once, instead of testing multiple rules and skipping one at the
time due to no match.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>DetectIdentifier</term>
<listitem>
<para>Detect identifier strings (as a regular expression: [a-zA-Z_][a-zA-Z0-9_]*).</para>
<programlisting><DetectIdentifier (common attributes) /></programlisting>
<para>This rule has no specific attributes.</para>
<para>Use this rule to skip a string of word characters at once, rather than
testing with multiple rules and skipping one at the time due to no match.</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Tips & Tricks</title>
<itemizedlist>
<para>Once you have understood how the context switching works it will be
easy to write highlight definitions. Though you should carefully check what
rule you choose in what situation. Regular expressions are very mighty, but
they are slow compared to the other rules. So you may consider the following
tips.
</para>
<listitem>
<para>If you only match two characters use <userinput>Detect2Chars</userinput>
instead of <userinput>StringDetect</userinput>. The same applies to
<userinput>DetectChar</userinput>.</para>
</listitem>
<listitem>
<para>Regular expressions are easy to use but often there is another much
faster way to achieve the same result. Consider you only want to match
the character <userinput>'#'</userinput> if it is the first character in the
line. A regular expression based solution would look like this:
<programlisting><RegExpr attribute="Macro" context="macro" String="^\s*#" /></programlisting>
You can achieve the same much faster in using:
<programlisting><DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" /></programlisting>
If you want to match the regular expression <userinput>'^#'</userinput> you
can still use <userinput>DetectChar</userinput> with the attribute <userinput>column="0"</userinput>.
The attribute <userinput>column</userinput> counts character based, so a tabulator still is only one character.
</para>
</listitem>
<listitem>
<para>You can switch contexts without processing characters. Assume that you
want to switch context when you meet the string <userinput>*/</userinput>, but
need to process that string in the next context. The below rule will match, and
the <userinput>lookAhead</userinput> attribute will cause the highlighter to
keep the matched string for the next context.
<programlisting><Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" /></programlisting>
</para>
</listitem>
<listitem>
<para>Use <userinput>DetectSpaces</userinput> if you know that many whitespaces occur.</para>
</listitem>
<listitem>
<para>Use <userinput>DetectIdentifier</userinput> instead of the regular expression <userinput>'[a-zA-Z_]\w*'</userinput>.</para>
</listitem>
<listitem>
<para>Use default styles whenever you can. This way the user will find a familiar environment.</para>
</listitem>
<listitem>
<para>Look into other XML-files to see how other people implement tricky rules.</para>
</listitem>
<listitem>
<para>You can validate every XML file by using the command
<command>xmllint --dtdvalid language.dtd mySyntax.xml</command>.</para>
</listitem>
<listitem>
<para>If you repeat complex regular expression very often you can use
<emphasis>ENTITIES</emphasis>. Example:</para>
<programlisting>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd"
[
<!ENTITY myref "[A-Za-z_:][\w.:_-]*">
]>
</programlisting>
<para>Now you can use <emphasis>&myref;</emphasis> instead of the regular
expression.</para>
</listitem>
</itemizedlist>
</sect2>
</sect1>
</appendix>
|