diff options
author | gregory guy <gregory-tde@laposte.net> | 2021-10-07 15:17:57 +0200 |
---|---|---|
committer | Slávek Banko <slavek.banko@axis.cz> | 2021-12-08 16:28:04 +0100 |
commit | bf076238d30b3e8523b5362f5698ea2e3660c5bf (patch) | |
tree | f9651b26933ef3054a482f07ae428fd5c0aec431 /doc/other/Lexer.txt | |
parent | 8081a13a660196b987f3e99f899f502f9bc947c7 (diff) | |
download | tqscintilla-bf076238d30b3e8523b5362f5698ea2e3660c5bf.tar.gz tqscintilla-bf076238d30b3e8523b5362f5698ea2e3660c5bf.zip |
Conversion to the cmake building system.
Signed-off-by: gregory guy <gregory-tde@laposte.net>
(cherry picked from commit a69b55c674b0528c00598bea54b7a661f4e49f27)
Diffstat (limited to 'doc/other/Lexer.txt')
-rw-r--r-- | doc/other/Lexer.txt | 226 |
1 files changed, 226 insertions, 0 deletions
diff --git a/doc/other/Lexer.txt b/doc/other/Lexer.txt new file mode 100644 index 0000000..9d4ab50 --- /dev/null +++ b/doc/other/Lexer.txt @@ -0,0 +1,226 @@ +How to write a scintilla lexer + +A lexer for a particular language determines how a specified range of +text shall be colored. Writing a lexer is relatively straightforward +because the lexer need only color given text. The harder job of +determining how much text actually needs to be colored is handled by +Scintilla itself, that is, the lexer's caller. + + +Parameters + +The lexer for language LLL has the following prototype: + + static void ColouriseLLLDoc ( + unsigned int startPos, int length, + int initStyle, + WordList *keywordlists[], + Accessor &styler); + +The styler parameter is an Accessor object. The lexer must use this +object to access the text to be colored. The lexer gets the character +at position i using styler.SafeGetCharAt(i); + +The startPos and length parameters indicate the range of text to be +recolored; the lexer must determine the proper color for all characters +in positions startPos through startPos+length. + +The initStyle parameter indicates the initial state, that is, the state +at the character before startPos. States also indicate the coloring to +be used for a particular range of text. + +Note: the character at StartPos is assumed to start a line, so if a +newline terminates the initStyle state the lexer should enter its +default state (or whatever state should follow initStyle). + +The keywordlists parameter specifies the keywords that the lexer must +recognize. A WordList class object contains methods that make simplify +the recognition of keywords. Present lexers use a helper function +called classifyWordLLL to recognize keywords. These functions show how +to use the keywordlists parameter to recognize keywords. This +documentation will not discuss keywords further. + + +The lexer code + +The task of a lexer can be summarized briefly: for each range r of +characters that are to be colored the same, the lexer should call + + styler.ColourTo(i, state) + +where i is the position of the last character of the range r. The lexer +should set the state variable to the coloring state of the character at +position i and continue until the entire text has been colored. + +Note 1: the styler (Accessor) object remembers the i parameter in the +previous calls to styler.ColourTo, so the single i parameter suffices to +indicate a range of characters. + +Note 2: As a side effect of calling styler.ColourTo(i,state), the +coloring states of all characters in the range are remembered so that +Scintilla may set the initStyle parameter correctly on future calls to +the +lexer. + + +Lexer organization + +There are at least two ways to organize the code of each lexer. Present +lexers use what might be called a "character-based" approach: the outer +loop iterates over characters, like this: + + lengthDoc = startPos + length ; + for (unsigned int i = startPos; i < lengthDoc; i++) { + chNext = styler.SafeGetCharAt(i + 1); + << handle special cases >> + switch(state) { + // Handlers examine only ch and chNext. + // Handlers call styler.ColorTo(i,state) if the state changes. + case state_1: << handle ch in state 1 >> + case state_2: << handle ch in state 2 >> + ... + case state_n: << handle ch in state n >> + } + chPrev = ch; + } + styler.ColourTo(lengthDoc - 1, state); + + +An alternative would be to use a "state-based" approach. The outer loop +would iterate over states, like this: + + lengthDoc = startPos+lenth ; + for ( unsigned int i = startPos ;; ) { + char ch = styler.SafeGetCharAt(i); + int new_state = 0 ; + switch ( state ) { + // scanners set new_state if they set the next state. + case state_1: << scan to the end of state 1 >> break ; + case state_2: << scan to the end of state 2 >> break ; + case default_state: + << scan to the next non-default state and set new_state >> + } + styler.ColourTo(i, state); + if ( i >= lengthDoc ) break ; + if ( ! new_state ) { + ch = styler.SafeGetCharAt(i); + << set state based on ch in the default state >> + } + } + styler.ColourTo(lengthDoc - 1, state); + +This approach might seem to be more natural. State scanners are simpler +than character scanners because less needs to be done. For example, +there is no need to test for the start of a C string inside the scanner +for a C comment. Also this way makes it natural to define routines that +could be used by more than one scanner; for example, a scanToEndOfLine +routine. + +However, the special cases handled in the main loop in the +character-based approach would have to be handled by each state scanner, +so both approaches have advantages. These special cases are discussed +below. + +Special case: Lead characters + +Lead bytes are part of DBCS processing for languages such as Japanese +using an encoding such as Shift-JIS. In these encodings, extended +(16-bit) characters are encoded as a lead byte followed by a trail byte. + +Lead bytes are rarely of any lexical significance, normally only being +allowed within strings and comments. In such contexts, lexers should +ignore ch if styler.IsLeadByte(ch) returns TRUE. + +Note: UTF-8 is simpler than Shift-JIS, so no special handling is +applied for it. All UTF-8 extended characters are >= 128 and none are +lexically significant in programming languages which, so far, use only +characters in ASCII for operators, comment markers, etc. + + +Special case: Folding + +Folding may be performed in the lexer function. It is better to use a +separate folder function as that avoids some troublesome interaction +between styling and folding. The folder function will be run after the +lexer function if folding is enabled. The rest of this section explains +how to perform folding within the lexer function. + +During initialization, lexers that support folding set + + bool fold = styler.GetPropertyInt("fold"); + +If folding is enabled in the editor, fold will be TRUE and the lexer +should call: + + styler.SetLevel(line, level); + +at the end of each line and just before exiting. + +The line parameter is simply the count of the number of newlines seen. +It's initial value is styler.GetLine(startPos) and it is incremented +(after calling styler.SetLevel) whenever a newline is seen. + +The level parameter is the desired indentation level in the low 12 bits, +along with flag bits in the upper four bits. The indentation level +depends on the language. For C++, it is incremented when the lexer sees +a '{' and decremented when the lexer sees a '}' (outside of strings and +comments, of course). + +The following flag bits, defined in Scintilla.h, may be set or cleared +in the flags parameter. The SC_FOLDLEVELWHITEFLAG flag is set if the +lexer considers that the line contains nothing but whitespace. The +SC_FOLDLEVELHEADERFLAG flag indicates that the line is a fold point. +This normally means that the next line has a greater level than present +line. However, the lexer may have some other basis for determining a +fold point. For example, a lexer might create a header line for the +first line of a function definition rather than the last. + +The SC_FOLDLEVELNUMBERMASK mask denotes the level number in the low 12 +bits of the level param. This mask may be used to isolate either flags +or level numbers. + +For example, the C++ lexer contains the following code when a newline is +seen: + + if (fold) { + int lev = levelPrev; + + // Set the "all whitespace" bit if the line is blank. + if (visChars == 0) + lev |= SC_FOLDLEVELWHITEFLAG; + + // Set the "header" bit if needed. + if ((levelCurrent > levelPrev) && (visChars > 0)) + lev |= SC_FOLDLEVELHEADERFLAG; + styler.SetLevel(lineCurrent, lev); + + // reinitialize the folding vars describing the present line. + lineCurrent++; + visChars = 0; // Number of non-whitespace characters on the line. + levelPrev = levelCurrent; + } + +The following code appears in the C++ lexer just before exit: + + // Fill in the real level of the next line, keeping the current flags + // as they will be filled in later. + if (fold) { + // Mask off the level number, leaving only the previous flags. + int flagsNext = styler.LevelAt(lineCurrent); + flagsNext &= ~SC_FOLDLEVELNUMBERMASK; + styler.SetLevel(lineCurrent, levelPrev | flagsNext); + } + + +Don't worry about performance + +The writer of a lexer may safely ignore performance considerations: the +cost of redrawing the screen is several orders of magnitude greater than +the cost of function calls, etc. Moreover, Scintilla performs all the +important optimizations; Scintilla ensures that a lexer will be called +only to recolor text that actually needs to be recolored. Finally, it +is not necessary to avoid extra calls to styler.ColourTo: the sytler +object buffers calls to ColourTo to avoid multiple updates of the +screen. + +Page contributed by Edward K. Ream
\ No newline at end of file |