diff options
Diffstat (limited to 'debian/uncrustify-trinity/uncrustify-trinity-0.78.1/documentation/theory.txt')
-rw-r--r-- | debian/uncrustify-trinity/uncrustify-trinity-0.78.1/documentation/theory.txt | 129 |
1 files changed, 129 insertions, 0 deletions
diff --git a/debian/uncrustify-trinity/uncrustify-trinity-0.78.1/documentation/theory.txt b/debian/uncrustify-trinity/uncrustify-trinity-0.78.1/documentation/theory.txt new file mode 100644 index 00000000..37ba9da1 --- /dev/null +++ b/debian/uncrustify-trinity/uncrustify-trinity-0.78.1/documentation/theory.txt @@ -0,0 +1,129 @@ + +------------------------------------------------------------------------------- +This document is too incomplete to be of much use. +Patches are welcome! + + +Theory of operation +------------------- + +Uncrustify goes through several steps to reformat code. +The first step, parsing, is the most complex and important. + + +Step 1 - Tokenize +----------------- +C code must be understood to some degree to be able to be properly indented. +The parsing step reads in a text buffer and breaks it into chunks and puts +those chunks in a list. + +When a chunk is parsed, the original column and line are recorded. + +These are the chunks that are parsed: + - punctuators + - numbers + - words (keywords, variables, etc) + - comments + - strings + - whitespace + - preprocessors + +See token_enum.h for a complete list. +See punctuators.cpp and keywords.cpp for examples of how they are used. + +In the code, chunk types are prefixed with 'CT_'. +The CT_WORD token is changed into a more specific token using the lookup table +in keywords.cpp + + +Step 2 - Tokenize Cleanup +------------------------- + +The second step is to change the token type for certain constructs that need +to be adjusted early on. +For example, the '<' token can be either a CT_COMPARE or CT_ANGLE_OPEN. +Both are handled very differently. +If a CT_WORD follows CT_ENUM/CT_STRUCT/CT_UNION, then it is marked as a CT_TYPE. +Basically, anything that doesn't depend on the nesting level can be done at this +stage. + + +Step 3 - Brace Cleanup +------------------------- + +This is possibly the most difficult step. +do/if/else/for/switch/while bodies are examined and virtual braces are added. +Brace parent types are set. +Statement start and expression starts are labeled. +And #ifdef constructs are handled. + +This step determines the levels (m_braceLevel, level and m_ppLevel). + +REVISIT: + The code in brace_cleanup.cpp needs to be reworked to take advantage of being + able to scan forward and backward. The original code was going to be merged + into tokenize.cpp, but that was WAY too complex. + + +Step 4 - Fix Symbols (combine.cpp) +---------------------------------- + +This step is no longer properly named. +In the original design, neighboring chunks were to be combined into longer +chunks. This proved to be a silly idea. But the name of the file stuck. + +This is where most of the interesting identification stuff goes on. +Colons type are detected, variables are marked, functions are labeled, etc. +Also, all the punctuators are classified. Ie, CT_MINUS become CT_NEG or CT_ARITH. + + - Types are marked. + - Functions are marked. + - Parenthesis and braces are marked where appropriate. + - finds and marks casts + - finds and marks variable definitions (for aligning) + - finds and marks assignments that may be aligned + - changes CT_INCDEC_AFTER to CT_INCDEC_BEFORE + - changes CT_STAR to either CT_PTR_TYPE, CT_DEREF or CT_ARITH + - changes CT_MINUS to either CT_NEG or CT_ARITH + - changes CT_PLUS and CT_ADDR to CT_ARITH, if needed + - other stuff? + + +Casts +----- +Casts are detected as follows: + - paren pair not part of if/for/etc nor part of a function + - contains only CT_QUALIFIER, CT_TYPE, '*', and no more than one CT_WORD + - is not followed by CT_ARITH + +Tough cases: +(foo) * bar; + +If uncertain about a cast like this: (foo_t), some simple rules are applied. +If the word ends in '_t', it is a cast, unless followed by '+'. +If the word is all caps (FOO), it is a cast. +If you use custom types (very likely) that aren't detected properly (unlikely), +the add them to the config file like so: (example Using C-Sharp types) +type UInt32 UInt16 UInt8 Byte +type Int32 Int16 Int8 + + +Step 6+ Everything else +------------------------- + +From this point on, many filters are run on the chunk list to change the +token columns. + +indent.cpp sets the left-most column. +align.cpp set the column for individual chunks. +space.cpp sets the spacing between chunks. +Others insert newlines, change token position, etc. + + +Last Step - Output +------------------------- + +At the final step the list is printed to the output. +Everything except comments are printed as-is. +Comments are reformatted in the output stage. + |