where:
- a finite set N of nonterminal symbols;
- a finite set Σ of terminal symbols that is disjoint from N;
- a finite set P of production rules;
- a distinguished symbol S from set N that is the start symbol.
- type 0 - unrestricted grammars - include all formal grammars;
- type 1 - context-sensitive grammars - generate the context-sensitive languages;
- type 2 - context-free grammars - generate the context-free languages. Context free languages are the theoretical basis for the syntax of most programming languages;
- type 3 - regular grammars - generate the regular languages. Regular languages are commonly used to define search patterns and the lexical structure of programming languages.
In computer science, Extended Backus–Naur Form (EBNF) is a metasyntax notation used to express context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. It is an extension of the basic Backus–Naur Form (BNF) metasyntax notation.
Compiler of computer programming languages consists of:
- parsing:
- lexical analysis - produces tokens / lexemes;
- syntactic analysis - produces parse tree or abstract syntax tree (AST);
- optimizer (optional);
- code generator.
Sample:
- grammar:
S ::= Ax
A ::= a
A ::= b - input sequence:
ax - Top-down parsing:
S → Ax → ax - Bottom-up parsing:
ax → Ax → S
- LL parser (Left-to-right, Leftmost derivation)
- LR parser (Left-to-right, Rightmost derivation)
- SLR parser (Simple LR parser)
- LALR parser (lookahead LR parser)
I think that ANTLR is one of the best tools because:
- it has a lot of features
- it has nice GUI - ANTLRWorks
- there are a lot of sample grammars
- there is a book - The Definitive ANTLR Reference: Building Domain-Specific Languages (DSL)
Screens of ANTLR GUI: