=-----------------------------------------------------------------------------= * * * Paren Stack * * * =-----------------------------------------------------------------------------= At the heart of this algorithm are two stacks. There is the Paren Stack (PS) and the Frame stack. The PS (pse in the code) keeps track of braces, parens, if/else/switch/do/while/etc items -- anything that is nestable. Complex statements go through some of these BS_ stages: BS_PAREN1 - paren on if/for/switch/while, etc BS_PAREN2 - paren on do{}while() BS_BRACE_DO - brace set on do{} BS_BRACE2 - brace on if/else/for/switch/while BS_ELSE - expecting 'else' after 'if' BS_ELSEIF - expecting 'if' after 'else' BS_WHILE - expecting 'while' after 'do' The file is processed one token at a time to support #if/#else/#endif preprocessors at any point. Take this simple if statement as an example: if ( x ) { x--; } The stack would look like so: The format is first the token processed and then the PSE stack as it appears AFTER the token is processed. 'if' [IF - PAREN1] '(' [IF - PAREN1] [SPAREN OPEN] 'x' [IF - PAREN1] [SPAREN OPEN] ')' [IF - BRACE2] <- note that the stage was changed on SPAREN_CLOSE '{' [IF - BRACE2] [BRACE OPEN] 'x' [IF - BRACE2] [BRACE OPEN] '--' [IF - BRACE2] [BRACE OPEN] ';' [IF - BRACE2] [BRACE OPEN] '}' [IF - ELSE] <- lack of else kills the ELSE, closes statement Virtual brace example: if ( x ) x--; else if (y) y--; else z++; 'if' [IF - PAREN1] '(' [IF - PAREN1] [SPAREN OPEN] 'x' [IF - PAREN1] [SPAREN OPEN] ')' [IF - BRACE2] 'x' [IF - BRACE2] [VBRACE OPEN] <- VBrace open inserted before because the token was not '{' '--' [IF - BRACE2] [VBRACE OPEN] ';' [IF - ELSE] <- Semicolon causes a VBrace close to be inserted after the semicolon 'else' [ELSE - ELSEIF] <- IF changed into ELSE, expect IF or BRACE 'x' [ELSE - BRACE2] [VBRACE OPEN] <- lack of '{' -> VBrace '++' [ELSE - BRACE2] [VBRACE OPEN] ';' <- VBrace close inserted after semicolon ELSE removed after statement close Nested virtual brace example: (EOF represents the end of the file) if ( x ) if (y) y--; else z++; EOF 'if' [IF - PAREN1] '(' [IF - PAREN1] [PAREN OPEN] 'x' [IF - PAREN1] [PAREN OPEN] ')' [IF - BRACE2] 'if' [IF - BRACE2] [VBRACE OPEN] [IF - PAREN1] <- VBrace on BRACE2, IF opened '(' [IF - BRACE2] [VBRACE OPEN] [IF - PAREN1] [SPAREN OPEN] 'y' [IF - BRACE2] [VBRACE OPEN] [IF - PAREN1] [SPAREN OPEN] ')' [IF - BRACE2] [VBRACE OPEN] [IF - BRACE2] 'y' [IF - BRACE2] [VBRACE OPEN] [IF - BRACE2] [VBRACE OPEN] '--' [IF - BRACE2] [VBRACE OPEN] [IF - BRACE2] [VBRACE OPEN] ';' [IF - BRACE2] [VBRACE OPEN] [IF - ELSE] 'else' [IF - BRACE2] [VBRACE OPEN] [ELSE - ELSEIF] 'z' [IF - BRACE2] [VBRACE OPEN] [ELSE - BRACE2] [VBRACE OPEN] '++' [IF - BRACE2] [VBRACE OPEN] [ELSE - BRACE2] [VBRACE OPEN] ';' [IF - BRACE2] [VBRACE OPEN] [ELSE - BRACE2] - step1 [IF - BRACE2] [VBRACE OPEN] - step2 [IF - ELSE] - step3 EOF -- this last semi is more complicated - first it terminates the VBRACE and then the else, which then, since it is the end of a statement, terminates the VBRACE. That bumps the IF stage to ELSE. The EOF kills that off (since it is not an else) Order of operation: 1) if TOS=VBRACE && PC=SEMI, insert VBRACE close, PC=>VBRACE close 2) if PC=VBRACE close or PC=BRACE close, and TOS is complex (if/else/etc) then advance complex stage. If statement ends, pop and advance Stages for each complex statement: if IF-PAREN1, IF-BRACE2, IF-ELSE if/else IF-PAREN1, IF-BRACE2, IF-ELSE, ELSE-ELSEIF, ELSE-BRACE2 if/else if/else IF-PAREN1, IF-BRACE2, IF-ELSE, ELSE-ELSEIF, IF-PAREN1, IF-BRACE2, IF-ELSE, ELSE-ELSEIF, ELSE-BRACE2 for FOR-PAREN1, FOR-BRACE2 while WHILE-PAREN1, WHILE-BRACE2 switch SWITCH-PAREN1, SWITCH-BRACE2 synchronized SYNCHRONIZED-PAREN1 do/while DO-BRACE_DO, DO-WHILE, WHILE-PAREN2 Another less-interesting example: { if (x) volatile { y++; } return y; } '{' [BRACE OPEN] 'if' [BRACE OPEN] [IF - PAREN1] '(' [BRACE OPEN] [IF - PAREN1] [PAREN OPEN] 'x' [BRACE OPEN] [IF - PAREN1] [PAREN OPEN] ')' [BRACE OPEN] [IF - BRACE2] 'volatile' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] '{' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] [BRACE OPEN] 'y' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] [BRACE OPEN] '++' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] [BRACE OPEN] ';' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] [BRACE OPEN] '}' [BRACE OPEN] [IF - ELSE] <- the brace close ends brace-open, volatile-brace2 and vbrace-open 'return' [BRACE OPEN] <- not else 'y' [BRACE OPEN] ';' [BRACE OPEN] '}' <- empties the stack =-----------------------------------------------------------------------------= * * * Parse Frames * * * =-----------------------------------------------------------------------------= The pse stack is kept on a frame stack. The frame stack is need for languages that support preprocessors (C, C++, C#) that can arbitrarily change code flow. It also isolates #define macros so that they are indented independently and do not affect the rest of the program. When an #if is hit, a copy of the current frame is push on the frame stack. When an #else/#elif is hit, a copy of the current stack is pushed under the #if frame and the original (pre-#if) frame is copied to the current frame. When #endif is hit, the top frame is popped. This has the following effects: - a simple #if / #endif does not affect program flow - #if / #else /#endif - continues from the #if clause When a #define is entered, the current frame is pushed and cleared. When a #define is exited, the frame is popped. Take this example, which isn't very exciting, as both the #if and #else parts end up with the same paren stack. This is the usual case. { foo(param1, #ifdef DEBUG "debug"); #else "release"); #endif } Right before the #ifdef, we have this for the paren stack: Top> [BRACE OPEN] [PAREN OPEN] The #ifdef pushes a copy of the current stack, so we have this: Top> [BRACE OPEN] [PAREN OPEN] [BRACE OPEN] [PAREN OPEN] The close paren after "debug" closes out the PAREN-OPEN on the top of the stack. Top> [BRACE OPEN] [BRACE OPEN] [PAREN OPEN] The #else swaps the top two frames. Top> [BRACE OPEN] [PAREN OPEN] [BRACE OPEN] Right after the #else, we hit another close paren after the "release". Top> [BRACE OPEN] [BRACE OPEN] At the #endif, the top of stack is thrown out, which restores us to the #if path. Top> [BRACE OPEN]