summaryrefslogtreecommitdiffstats
path: root/doc/other/Lexer.txt
diff options
context:
space:
mode:
authorgregory guy <gregory-tde@laposte.net>2021-10-07 15:17:57 +0200
committerSlávek Banko <slavek.banko@axis.cz>2021-12-08 16:28:04 +0100
commitbf076238d30b3e8523b5362f5698ea2e3660c5bf (patch)
treef9651b26933ef3054a482f07ae428fd5c0aec431 /doc/other/Lexer.txt
parent8081a13a660196b987f3e99f899f502f9bc947c7 (diff)
downloadtqscintilla-bf076238d30b3e8523b5362f5698ea2e3660c5bf.tar.gz
tqscintilla-bf076238d30b3e8523b5362f5698ea2e3660c5bf.zip
Conversion to the cmake building system.
Signed-off-by: gregory guy <gregory-tde@laposte.net> (cherry picked from commit a69b55c674b0528c00598bea54b7a661f4e49f27)
Diffstat (limited to 'doc/other/Lexer.txt')
-rw-r--r--doc/other/Lexer.txt226
1 files changed, 226 insertions, 0 deletions
diff --git a/doc/other/Lexer.txt b/doc/other/Lexer.txt
new file mode 100644
index 0000000..9d4ab50
--- /dev/null
+++ b/doc/other/Lexer.txt
@@ -0,0 +1,226 @@
+How to write a scintilla lexer
+
+A lexer for a particular language determines how a specified range of
+text shall be colored. Writing a lexer is relatively straightforward
+because the lexer need only color given text. The harder job of
+determining how much text actually needs to be colored is handled by
+Scintilla itself, that is, the lexer's caller.
+
+
+Parameters
+
+The lexer for language LLL has the following prototype:
+
+ static void ColouriseLLLDoc (
+ unsigned int startPos, int length,
+ int initStyle,
+ WordList *keywordlists[],
+ Accessor &styler);
+
+The styler parameter is an Accessor object. The lexer must use this
+object to access the text to be colored. The lexer gets the character
+at position i using styler.SafeGetCharAt(i);
+
+The startPos and length parameters indicate the range of text to be
+recolored; the lexer must determine the proper color for all characters
+in positions startPos through startPos+length.
+
+The initStyle parameter indicates the initial state, that is, the state
+at the character before startPos. States also indicate the coloring to
+be used for a particular range of text.
+
+Note: the character at StartPos is assumed to start a line, so if a
+newline terminates the initStyle state the lexer should enter its
+default state (or whatever state should follow initStyle).
+
+The keywordlists parameter specifies the keywords that the lexer must
+recognize. A WordList class object contains methods that make simplify
+the recognition of keywords. Present lexers use a helper function
+called classifyWordLLL to recognize keywords. These functions show how
+to use the keywordlists parameter to recognize keywords. This
+documentation will not discuss keywords further.
+
+
+The lexer code
+
+The task of a lexer can be summarized briefly: for each range r of
+characters that are to be colored the same, the lexer should call
+
+ styler.ColourTo(i, state)
+
+where i is the position of the last character of the range r. The lexer
+should set the state variable to the coloring state of the character at
+position i and continue until the entire text has been colored.
+
+Note 1: the styler (Accessor) object remembers the i parameter in the
+previous calls to styler.ColourTo, so the single i parameter suffices to
+indicate a range of characters.
+
+Note 2: As a side effect of calling styler.ColourTo(i,state), the
+coloring states of all characters in the range are remembered so that
+Scintilla may set the initStyle parameter correctly on future calls to
+the
+lexer.
+
+
+Lexer organization
+
+There are at least two ways to organize the code of each lexer. Present
+lexers use what might be called a "character-based" approach: the outer
+loop iterates over characters, like this:
+
+ lengthDoc = startPos + length ;
+ for (unsigned int i = startPos; i < lengthDoc; i++) {
+ chNext = styler.SafeGetCharAt(i + 1);
+ << handle special cases >>
+ switch(state) {
+ // Handlers examine only ch and chNext.
+ // Handlers call styler.ColorTo(i,state) if the state changes.
+ case state_1: << handle ch in state 1 >>
+ case state_2: << handle ch in state 2 >>
+ ...
+ case state_n: << handle ch in state n >>
+ }
+ chPrev = ch;
+ }
+ styler.ColourTo(lengthDoc - 1, state);
+
+
+An alternative would be to use a "state-based" approach. The outer loop
+would iterate over states, like this:
+
+ lengthDoc = startPos+lenth ;
+ for ( unsigned int i = startPos ;; ) {
+ char ch = styler.SafeGetCharAt(i);
+ int new_state = 0 ;
+ switch ( state ) {
+ // scanners set new_state if they set the next state.
+ case state_1: << scan to the end of state 1 >> break ;
+ case state_2: << scan to the end of state 2 >> break ;
+ case default_state:
+ << scan to the next non-default state and set new_state >>
+ }
+ styler.ColourTo(i, state);
+ if ( i >= lengthDoc ) break ;
+ if ( ! new_state ) {
+ ch = styler.SafeGetCharAt(i);
+ << set state based on ch in the default state >>
+ }
+ }
+ styler.ColourTo(lengthDoc - 1, state);
+
+This approach might seem to be more natural. State scanners are simpler
+than character scanners because less needs to be done. For example,
+there is no need to test for the start of a C string inside the scanner
+for a C comment. Also this way makes it natural to define routines that
+could be used by more than one scanner; for example, a scanToEndOfLine
+routine.
+
+However, the special cases handled in the main loop in the
+character-based approach would have to be handled by each state scanner,
+so both approaches have advantages. These special cases are discussed
+below.
+
+Special case: Lead characters
+
+Lead bytes are part of DBCS processing for languages such as Japanese
+using an encoding such as Shift-JIS. In these encodings, extended
+(16-bit) characters are encoded as a lead byte followed by a trail byte.
+
+Lead bytes are rarely of any lexical significance, normally only being
+allowed within strings and comments. In such contexts, lexers should
+ignore ch if styler.IsLeadByte(ch) returns TRUE.
+
+Note: UTF-8 is simpler than Shift-JIS, so no special handling is
+applied for it. All UTF-8 extended characters are >= 128 and none are
+lexically significant in programming languages which, so far, use only
+characters in ASCII for operators, comment markers, etc.
+
+
+Special case: Folding
+
+Folding may be performed in the lexer function. It is better to use a
+separate folder function as that avoids some troublesome interaction
+between styling and folding. The folder function will be run after the
+lexer function if folding is enabled. The rest of this section explains
+how to perform folding within the lexer function.
+
+During initialization, lexers that support folding set
+
+ bool fold = styler.GetPropertyInt("fold");
+
+If folding is enabled in the editor, fold will be TRUE and the lexer
+should call:
+
+ styler.SetLevel(line, level);
+
+at the end of each line and just before exiting.
+
+The line parameter is simply the count of the number of newlines seen.
+It's initial value is styler.GetLine(startPos) and it is incremented
+(after calling styler.SetLevel) whenever a newline is seen.
+
+The level parameter is the desired indentation level in the low 12 bits,
+along with flag bits in the upper four bits. The indentation level
+depends on the language. For C++, it is incremented when the lexer sees
+a '{' and decremented when the lexer sees a '}' (outside of strings and
+comments, of course).
+
+The following flag bits, defined in Scintilla.h, may be set or cleared
+in the flags parameter. The SC_FOLDLEVELWHITEFLAG flag is set if the
+lexer considers that the line contains nothing but whitespace. The
+SC_FOLDLEVELHEADERFLAG flag indicates that the line is a fold point.
+This normally means that the next line has a greater level than present
+line. However, the lexer may have some other basis for determining a
+fold point. For example, a lexer might create a header line for the
+first line of a function definition rather than the last.
+
+The SC_FOLDLEVELNUMBERMASK mask denotes the level number in the low 12
+bits of the level param. This mask may be used to isolate either flags
+or level numbers.
+
+For example, the C++ lexer contains the following code when a newline is
+seen:
+
+ if (fold) {
+ int lev = levelPrev;
+
+ // Set the "all whitespace" bit if the line is blank.
+ if (visChars == 0)
+ lev |= SC_FOLDLEVELWHITEFLAG;
+
+ // Set the "header" bit if needed.
+ if ((levelCurrent > levelPrev) && (visChars > 0))
+ lev |= SC_FOLDLEVELHEADERFLAG;
+ styler.SetLevel(lineCurrent, lev);
+
+ // reinitialize the folding vars describing the present line.
+ lineCurrent++;
+ visChars = 0; // Number of non-whitespace characters on the line.
+ levelPrev = levelCurrent;
+ }
+
+The following code appears in the C++ lexer just before exit:
+
+ // Fill in the real level of the next line, keeping the current flags
+ // as they will be filled in later.
+ if (fold) {
+ // Mask off the level number, leaving only the previous flags.
+ int flagsNext = styler.LevelAt(lineCurrent);
+ flagsNext &= ~SC_FOLDLEVELNUMBERMASK;
+ styler.SetLevel(lineCurrent, levelPrev | flagsNext);
+ }
+
+
+Don't worry about performance
+
+The writer of a lexer may safely ignore performance considerations: the
+cost of redrawing the screen is several orders of magnitude greater than
+the cost of function calls, etc. Moreover, Scintilla performs all the
+important optimizations; Scintilla ensures that a lexer will be called
+only to recolor text that actually needs to be recolored. Finally, it
+is not necessary to avoid extra calls to styler.ColourTo: the sytler
+object buffers calls to ColourTo to avoid multiple updates of the
+screen.
+
+Page contributed by Edward K. Ream \ No newline at end of file