Watch Star Fork

时间坐标 / bosonPython

与超过 300 万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
Boson - grammar analyzer generator
Loading... 8.68 KB

Boson - grammar analyzer generator

Build Status

Join the chat at


Boson can use production sentences given by the user to generate grammar analyzer code.

Only support Python3.

How to install boson?

Use pip: pip install boson

This way may get the lastest release version.

Install from source code

You just need download the ZIP file, and unZIP it, type the command: python3 install or sudo python3 install

Or Clone it from Github:

git clone


Boson read file contains a set of productions, and each production form is:

name : [derivation] = [grammar name](grammar tuple) | [derivation] = [grammar name](grammar tuple) | ... ;

There is a convention which must exist a non-terminal named "start"

derivation looks like EBNF, but that is a little different.

(grammar tuple) is the AST node structure of this production and it's not necessary.


name satisfied [_a-zA-Z][_a-zA-Z0-0]*

(grammar tuple) satisfied ($$, $0, $1, ...)

$$, $0, $1, ... has special meaning:

  • $$, the number of this production.
  • $0 $1 $2 ..., the index of element of this production(base-0).

You can unpack one non-terminal by *. Example:

list : packed text = ($0, $1);
packed : text text = ($0, $1);

This well parse '1' '2' '3' as AST like (('1', '2'), '3')

list : packed text = (*$0, $1);
packed : text text = ($0, $1);

This well parse '1' '2' '3' as AST like ('1', '2', '3')

You can write more complex grammar tuple.


function: func_name '(' arg (',' arg)* ')' = func_sign($0, $2, *$3($1));

Then you can get arg (',' arg)* as a list like [arg, arg, arg, ...].

The func_sign at before grammar tuple is grammar name, which use for semantic analyzing.

Simple example

A example of arithmetic grammar like this:

start : E ;
E : E plus T | E minus T | T ;
T : T times F | T div F | F ;
F : F power D | D ;
D : bl E br | N ;
N : int | float ;

You also can write literal terminal like this:

start : E ;
E : E '+' T | E '-' T | T ;
T : T '*' F | T '/' F | F ;
F : F '^' D | D ;
D : '(' E ')' | N ;
N : int | float ;

NOTICE: The null reduce production present as "~" (without quotation marks) or write nothing.

There are some commands can add to grammar file, these commands start with "%", the form is:

%command argument1 argument2 ...

The command can be: start_symbol, grammar_analyzer_class_name, grammar_class_name etc.

  • start_symbol, set the start non-terminal literal.
  • grammar_analyzer_class_name, set the grammar analyzer name.
  • grammar_class_name, set the lexical analyzer name.

You can consult the example at "example/arithmetic_grammar.txt" and "boson_grammar.boson" is boson itself grammar file.

Easy to use

Use console command boson to generate code.

You can execute boson by type "boson" at console interface. So that you can see the usage of console command boson.

The usage of boson is:

usage: boson [-h] [-o OUTPUT] [-a {slr,lr,lalr}] [-l {python3}] [-r] [-f]

Boson v0.9 - Grammar analyzer generator

positional arguments:
  grammar_file          Inpute grammar description file.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output grammar analyzer code.
  -a {slr,lr,lalr}, --analyzer {slr,lr,lalr}
                        Analyzer type (default is LALR).
                          slr  - SLR (Simple LR)
                          lr   - LR (Canonical LR)
                          lalr - LALR (Look-Ahead LR)
  -l {python3}, --language {python3}
                        Generate code language (default is Python3).
                          python3 - Python3 code.
  -r, --report          Report conflict when create grammar analyzer.
  -f, --force           Force generate code when exist conflict.

You can input boson -h to get this information.


use "example/arithmetic_grammar.txt" to generate "example/":

> boson example/arithmetic_grammar.txt -a SLR -l Python -o example/

Use boson as library

You can use boson as library, just "import boson".

Of course, you can install the python package, and just import boson module.


    >>> from boson.bs_grammar_analysis import bs_token_list
    >>> from import bs_lalr_generate_table
    >>> from boson.bs_code_generator import bs_generate_python3_code
    >>> grammar_package = bs_grammar_analysis("example/arithmetic_grammar.txt")
    >>> lalr_analyzer_table = bs_lalr_generate_table(grammar_package.sentence_set)
    >>> code = bs_generate_python3_code(lalr_analyzer_table, grammar_package)
    >>> print(code)

Parse grammar file

"" file contains functions are used to parse grammar file.

bs_token_list() function read grammar file and return a token list.

token_list = bs_token_list(grammar_file)

bs_grammar_analysis() function use token list to generate production sentence set, reduce code and some other content as class GrammarPackage.

grammar_package = bs_grammar_analysis(token_list)

The class GrammarPackage like this:

class GrammarPackage:
    def __init__(self):
        self.command_list = None
        self.sentence_set = None
        self.grammar_tuple_map = None
        self.none_grammar_tuple_set = None
        self.literal_map = None
        self.literal_reverse_map = None
        self.sentence_grammar_map = None
        self.naive_sentence = None

Now, the final result of "" is data_package, which contains many things used by generator and analyzer.

Generate analyzer table

Use sentence list to generate analyzer table, just import corresponding generator file of one analyzer type.

For example, if we want to generate SLR analyzer, import "".

bs_slr_generate_table() function use sentence list to create all tables used by analyzer(DFA). It returns a tuple, which contains these tables.

tables_tuple = bs_slr_generate_table(sentence_list)

Generate analyzer code

All code generator use analyzer table to generate code of grammar analyzer.

Import "" file, which include code generators of all languages.

Each generator is a function, now, I am already finished Python3 code generator, and it definition is:

Python3 code generator definition

bs_generate_python3_code(analyzer_table: AnalyzerTable, grammar_package: GrammarPackage)

analyzer_table is grammar analyzer tables, which output by SLR, LR, or LALR generator.

grammar_package is a class contains reduce code, literal and many other things.


Boson can generate SLR, LR, LALR analyzer.

All analyzer input is sentence_set, which created by

They all have a dfa generate function and table generate function.

dfa generate function returns state_list and state_transfer.

table generate function returns AnalyzerTable

class AnalyzerTable:
    def __init__(self):
        self.terminal_index = None
        self.action_table = None
        self.goto_table = None
        self.reduce_symbol_sum = None
        self.reduce_to_non_terminal_index = None
        self.sentence_list = None
        self.conflict_list = None

SLR - Simple LR

SLR in boson means SLR(1), it is fast and simple, but it only can recognize a few grammars.


bs_slr_generate_dfa(sentence_set) function can generate SLR DFA.

bs_slr_generate_table(sentence_set) function can generate SLR table.

LR - canonical LR

LR in boson means LR(1), it is slower than SLR, but can recognize more grammar. And its table will bigger than SLR.


bs_lr_generate_dfa(sentence_set) function can generate LR DFA.

bs_lr_generate_table(sentence_set) function can generate LR table.

LALR - Look-Ahead LR

LALR in boson means LALR(1), it is slow like LR, and can recognize more grammar than SLR, but less than LR. The feature is that its table size is the same as SLR.


bs_lalr_generate_dfa(sentence_set) function can generate LALR DFA.

bs_lalr_generate_table(sentence_set) function can generate LALR table.


Author: ict



项目点评 ( 0 )