Antlr greedy-option - 【StackMirror】|antlr|antlr3

(I edited my question based on the first comment of @Bart Kiers - thank you!)

I have the following grammar:

SPACE : (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};
START : 'START:';
STRING_LITERAL  : ('"' .* '"')+;
rule    :  START STRING_LITERAL;

and I want to parse languages like: 'START: "abcd" START: "img src="test.jpg""' (string literals could be inside string literals).
The grammar defined above does not work if there are string literals inside a string literal because for the language 'START: "img src="test.jpg""' the lexer translates it into the following tokens: START('START:') STRING_LITERAL("img src=") test.jpg.
Is there any way to define a grammar which is fine for my problem?

antlr
antlr3

2012-04-03 22:09
by user1286372

There are a couple of things wrong here:

you cannot use fragment rules inside parser rules. You grammar will never create a START token;
a . char (DOT-char) inside a parser rule matches any token, while inside a lexer rule, it matches any character;
if you let .* match greedily (and you had defined a proper lexer rule that matches a string literal), the input START: "abcd" START: "img src="test.jpg"" would then have one large string in it: "abcd" START: "img src="test.jpg"" (the first and the last quote would be matched).

So, you cannot embed string literals inside string literals using the same quotes. The lexer is not able to determine if a quote is meant to close the string, or if it's the start of a (new) embedded string. You will need to change that in your grammar.

2012-04-04 06:57
by Bart Kiers

Thank you! I updated my orignal question to the one above. Unfortuanaltey I have no idea how to fix that string literals inside a string literal problem. How could I change the grammar? - Thank you in advance - user1286372 2012-04-04 09:07

@user1286372, my point was that you needs to change your grammar so that it is not possible to have strings inside strings (at least, not without escaping the nested strings). In other words: it's not possible to support it. Why not define your outer quotes as single quotes - Bart Kiers 2012-04-04 09:17

Yes that would be a possible solution, but I hoped that it is possible without any changes on the language. But thank you very much anyway! The information that it is not possible to support it helped me - user1286372 2012-04-04 10:15

You're welcome @user1286372 - Bart Kiers 2012-04-04 10:17

I have another question based on the solution (replace inner quotes as single quotes). My solution is to replace the inner quotes by single quotes before starting the parsing. Is there any easy way to replace a string inside the ANTLRInputStream (or before) with another string by regex definition? Or is there any other simple solution - user1286372 2012-04-04 11:37

@user1286372, I don't quite understand. Perhaps you could create a new question, since it is not really related to this one - Bart Kiers 2012-04-04 11:44

Sorry for my short problem description. I started a new question -> http://stackoverflow.com/questions/10013170/efficiently-replacing-a-string-or-character-from-file-input-for-the-antlrinputs - user1286372 2012-04-04 14:28