Antlr syntactic predicate - mismatched character - 【StackMirror】|antlr|antlr3

I have the following grammar:

SPACE : (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};
NAME_TAG : 'name';
IS_TAG : 'is';

START : 'START';
END : ('END START') => 'END START'  ;

WORD    : 'A'..'Z'+;

rule :  START NAME_TAG IS_TAG WORD END;

and want to parse languages like: "START name is END END START". The problem here is the END-token, because the 'END ' (Word + SPACE) is misinterpreted. I thought the correct approach here would be with the syntactic predicate (END-token) but maybe I am wrong.

antlr
antlr3

2012-04-05 20:37
by user1286372

I'd not create tokens that are 2 (or more) WORDs separated by spaces. Why not tokenize 'END' as and END-token and then do something like this:

rule     : START NAME_TAG IS_TAG word END START;
word     : WORD | END; // expand this rule, as you see fit
NAME_TAG : 'name';
IS_TAG   : 'is';
START    : 'START';
END      : 'END';
WORD     : 'A'..'Z'+;
SPACE    : (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};

which would parse "START name is END END START" into the following parse tree:

enter image description here

EDIT

What you did wrong is not to give the lexer rule the possibility to recover if the predicate failed. Here's a proper use of a predicate:

rule     :  START NAME_TAG IS_TAG WORD END;

SPACE    : (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};
NAME_TAG : 'name';
IS_TAG   : 'is';
START    : 'START';
WORD     : ('END START')=> 'END START' {$type=END;}
         | 'A'..'Z'+
         ;

fragment END : ;

2012-04-05 20:56
by Bart Kiers

thank you. Could you tell me the error in my grammar-definition - user1286372 2012-04-06 15:30

@user1286372, see my EDIT. Although I prefer my first suggestion over the second - Bart Kiers 2012-04-06 15:50

Ah ok so a syntactic predicate does only make sense if there is at least one other rule - user1286372 2012-04-09 12:50