https://www.gnu.org/software/bison/manual/bison.html
参考:
https://blog.csdn.net/weixin_44705391/article/details/115555161
https://zhuanlan.zhihu.com/p/52326306
https://zhuanlan.zhihu.com/p/120812270
https://zhuanlan.zhihu.com/p/89479111
说明:
Unix Lex/YACC 发展为 Linux FLex/Bison
Lex是1975年由Mike Lesk和当时尚在AT&T实习的Eric Schmidt共同完成的(Schmidt做的更多),是一个词法分析器的生成程序,可以单独使用也可以与Johnson的yacc协同工作。lex很有名气,但是无奈效率太低加上有bug。大概在1987年,Lawrence Berkeley实验室的Vern Paxson用C重新写了Lex,并命名为FLex(the Fast Lexical Analyzer Generator),基于伯克利许可证。flex现在是SourceForge的一个项目,依然基于伯克利许可,
[Flex](https://github.com/westes/flex "Flex") 是起初unix版lex的free (but non-GNU) implementation,用于c/c ++ 的词法扫描生成器。(注意:Schmidt曾是google的CEO)
bison的前身是yacc。yacc是由贝尔实验室的S.C.Johnson基于Knuth大神的LR语法分析理论,于1975~1978年写成。大约1985年,UC Berkeley 的研究生Bob Corbett使用改进的内部算法实现了伯克利yacc,来自FSF的Richard Stallman改写了伯克利yacc并将其用于GNU项目,添加了很多特性,形成了今天的GNU Bison。bison现在作为FSF的项目被维护,基于GNU公共许可证发布,[Bison](http://www.gnu.org/software/bison/manual/)是兼容yacc的free的语法生成器。
早期Unix的Lex/YACC,发展为FLex/Bison,新版本的程序是向上兼容的(即兼容老版本),现chang用Flex和Bison。
flex-bison工作原理
使用的角度,Flex和Bison是Linux下用来生成词法分析器和语法分析器两个程序的工具,可以处理结构化输入,一般结合使用来处理复杂的文件解析工作。
flex文件是定义pattern(哪是黄豆,哪是绿豆...),通过flex处理(词法分析)将输出切分成一段一段的token(将输入的豆子一个个摘出来),从而执行不同的action(黄豆就磨豆浆(action),绿豆去做绿豆糕(action))...
flex 生成的tokens可以喂给Bison处理(更简便易调试),当然也可以不喂给bison而直接自己处理就得了(如喜下例)。但是使用bison可以更方便的处理复杂的逻辑,编写简单,调试方便。编码实战:字符统计器
//本例中仅仅使用flex以及少量手写代码(main中),来完成字符串统计功能。 Yolandas-MacBook-Pro:flex-bison liuyuanyuan$ cat fb1-1.l /* 统计输入字符串*/ %{ int chars = 0; int words = 0; int lines =0; %} %% [a-zA-Z]+ { words++; chars += strlen(yytext); } \n { chars++; lines++;} . { chars++; } %% int main(int args, char **argv) { yylex(); printf("lines=%6d words=%6d chars=%6d\n", lines, words, chars); return 0; } //Linux 系统上用 -lfl 选项编译, Mac 的编译选项是 -ll Yolandas-MacBook-Pro:flex-bison liuyuanyuan$ gcc -ll lex.yy.c -o fb1-1 Yolandas-MacBook-Pro:flex-bison liuyuanyuan$ ./fb1-1 hello this is yolanda bye. lines= 3 words= 5 chars= 28
1 flex(fast lex, scanner)文件内容结构(*.l, 分3部分)
/* P1: declarations(定义段) */ %{ %} %% /* P2: translation rules(规则段) */ %% /* P3: auxiliary functions(用户辅助程序段,c函数)*/
定义段 包括文字块、定义、内部声明等。
C语言的头文件、函数和变量的声明等一般就放在%{…%}之间,这一部分的内容会被直接复制到生成.c文件的开头部分。包含%option选项
%option noyywrap /* 定义段中包含了option选项*/ %{ #include "cal.tab.h" extern int yylval; %}
规则段 %%...%%之间部分,为一系列匹配模式(正则表达式)和动作(C代码)。
当flex扫描程序运行时,它把输入与规则段的模式进行匹配,每次发现一个匹配(被匹配的输入称为标记(token))时就执行与那种模式相关的C代码。
pattern(正则表达式) { action(c代码) } example: [0-9]+ {yylval = atoi(yytest); return NUMBER;}
用户辅助程序段 为C代码,会被原样复制到c文件中,一般这里定义一些辅助函数等。
int terror(chr *s) { printf("%s\n", s); return 0; }
2 bison(yacc, parser)文件内容结构(*.y , 分3部分,%%分隔)
/*P1: declarations 定义段*/ %{ %} %% /*P2: grammar rules 规则段(rule-action)*/ A: a1 { 语义动作1 } | a2 { 语义动作2 } | … | an { 语义动作n } | b //没有{…},则使用缺省的语义动作 ; //产生式结束标记 //语义动作一般是在产生式右部分析完,归约动作进行前执行。 A→ a1 | a2 | … | an | b %% /* P3: supporting C routines 用户辅助程序段(C函数) */
定义段 可以分为两部分:
1. %{ 和%}之间的部分,C语言编写的,包括头文件include、宏定义、全局变量定义、函数声明等;
2. 对文法的终结符和非终结符做一些相关声明。
常用的Bison标记声明有:%token %union %start %type %left %right %nonassoc等。
%token 定义文法中使用了哪些终结符。定义形式: %token TOKEN1 TOKEN2
终结符一般全大写;(如 TOKEN1 TOKEN2)
一行可定义多个终结符,空格分隔;
%left、%right、%nonassoc 也是定义文法中使用了哪些终结符。定义形式与%token类似。
先定义的优先级低,最后定义的优先级最高,同时定义的优先级想通过。
%left表示左结合,%right表示右结合;
%nonassoc 表示不可结合(即它定义的终结符不能连续出现。例如<,如果文法中不允许出现形如a<b<c的句子,则<就是不可结合的)
%left AA BB
%nonassoc CC
%right DD
表示优先级关系为:AA=BB<CC<DD,表示结核性为:AA\BB左结合, DD右结合, CC不可结合。
注意:也可以于%prec来结合使用来改变token的优先级和结合性 例如: :'-' expr %prec NEG { $$ = -$2; };
%union 声明语法符号的语义值类型的集合,
bison中每个符号包括记号和非终结符都有一个不同数据类型的语义值,并使用yylval变量在移进和归约中传递这些值,YYSTYPE(宏定义)为yylval的类型,默认为int;
我们使用%union来重新定义符号的类型
%union
{
int iValue; /* integer value */
char sIndex; /* symbol table index */
nodeType *nPtr; /* node pointer */
};
%type 定义非终结符其属性值的类型。
%type sym1 sym2
%type <sindex>sym3
%start 指定文法的开始的文法符号(非终结符),是最终需要规约而成的符号。
定义形式为:%start startsym , 其中startsym为文法的开始符号。
如果不使用%start定义文法开始符号,则默认在第二部分规则段中定义的第一条生产式规则的左部非终结符为开始符号。规则段 由rule(语法规则)和action(包括C代码)组成。
bison规则基本是BNF,做了一点简化以易于输入。
规则中目标或非终端符放在左边,后跟一个冒号:然后是产生式的右边,之后是对应的动作(用{}包含)。
%% program: program expr '\n' { printf("%d\n", $2); } ; expr: INTEGER {$$ = $1; } | expr '+' expr { $$ = $1 + $3; } | expr '-' expr { $$ = $1 - $3} ; %%
注意:$1表示右边的第一个标记的值,$2表示右边的第二个标记的值,依次类推。$$表示归约后的值。
用户辅助程序段 为C代码,会被原样复制到c文件中,这里一般自定义一些函数。
3 flex-bison 代码协作方式
flex/bison编码实践:编写简单计算器
1 macOS下flex/bison安装
brew install flex brew install bison # macos下flex/bison安装简单方便,另需安装gcc用于编译c语言程序。 brew install gcc
2 flex词法文件:calc.l
%option noyywrap %{ /* * 一个简单计算器的Lex词法文件 */ void yyerror(char*); #include "calc.tab.h" %} %% /* a-z为变量 */ [a-z] { yylval = *yytext - 'a'; return VARIABLE; } /* 整数 */ [0-9]+ { yylval = atoi(yytext); return INTEGER; } /* 运算符 */ [-+()=/*\n] {return *yytext;} /* 空白被忽略 */ [ \t] ; /* 其他字符都是非法的 */ . yyerror("invalid input"); %%
3 bison语法文件:calc.y
%token INTEGER VARIABLE %left '+' '-' %left '*' '/' %{ #include <stdio.h> void yyerror(char*); int yylex(void); int sym[26]; %} %% program: program statement '\n' | ; statement: expr {printf("%d\n", $1);} |VARIABLE '=' expr {sym[$1] = $3;} ; expr: INTEGER |VARIABLE{$$ = sym[$1];} |expr '+' expr {$$ = $1 + $3;} |expr '-' expr {$$ = $1 - $3;} |expr '*' expr {$$ = $1 * $3;} |expr '/' expr {$$ = $1 / $3;} |'('expr')' {$$ = $2;} ; %% void yyerror(char* s) { fprintf(stderr, "%s\n", s); } int main(void) { printf("A simple calculator.\n"); yyparse(); return 0; }
4 Makefile 文件:
all: clean calc calc: calc.l calc.y bison -d calc.y flex calc.l cc -o $@ calc.tab.c lex.yy.c -lm clean: rm -f calc \ lex.yy.c calc.tab.c calc.tab.h
5 执行
(可以直接执行make也就是执行Makefile中定义的内容,也可以单步执行)
Yolandas-MBP:calcSimple liuyuanyuan$ ls -l total 32 -rw-r--r--@ 1 liuyuanyuan staff 157 8 13 09:53 Makefile -rw-r--r--@ 1 liuyuanyuan staff 507 8 13 10:01 calc.l -rw-r--r--@ 1 liuyuanyuan staff 731 8 13 23:04 calc.y Yolandas-MBP:calcSimple liuyuanyuan$ make rm -f calc \ lex.yy.c calc.tab.c calc.tab.h bison -d calc.y flex calc.l cc -o calc calc.tab.c lex.yy.c -lm Yolandas-MBP:calcSimple liuyuanyuan$ ls -l total 272 -rw-r--r--@ 1 liuyuanyuan staff 157 8 13 09:53 Makefile -rwxr-xr-x 1 liuyuanyuan staff 24600 8 14 12:00 calc -rw-r--r--@ 1 liuyuanyuan staff 507 8 13 10:01 calc.l -rw-r--r-- 1 liuyuanyuan staff 42011 8 14 12:00 calc.tab.c -rw-r--r-- 1 liuyuanyuan staff 2143 8 14 12:00 calc.tab.h -rw-r--r--@ 1 liuyuanyuan staff 731 8 13 23:04 calc.y -rw-r--r-- 1 liuyuanyuan staff 44590 8 14 12:00 lex.yy.c Yolandas-MBP:calcSimple liuyuanyuan$ ./calc A simple calculator. 1+2 3
Bison
This manual (10 September 2021) is for GNU Bison (version 3.8.1), the GNU parser generator.
Copyright © 1988–1993, 1995, 1998–2015, 2018–2021 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover texts being “A GNU Manual,” and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License.”
(a) The FSF’s Back-Cover Text is: “You have the freedom to copy and modify this GNU manual. Buying copies from the FSF supports it in developing GNU and promoting software freedom.”
Table of Contents
- Introduction
- Conditions for Using Bison
- GNU GENERAL PUBLIC LICENSE
- 1 The Concepts of Bison
- 1.1 Languages and Context-Free Grammars
- 1.2 From Formal Rules to Bison Input
- 1.3 Semantic Values
- 1.4 Semantic Actions
- 1.5 Writing GLR Parsers
- 1.5.1 Using GLR on Unambiguous Grammars
- 1.5.2 Using GLR to Resolve Ambiguities
- 1.5.3 GLR Semantic Actions
- 1.5.3.1 Deferred semantic actions
- 1.5.3.2 YYERROR
- 1.5.3.3 Restrictions on semantic values and locations
- 1.5.4 Controlling a Parse with Arbitrary Predicates
- 1.6 Locations
- 1.7 Bison Output: the Parser Implementation File
- 1.8 Stages in Using Bison
- 1.9 The Overall Layout of a Bison Grammar
- 2 Examples
- 2.1 Reverse Polish Notation Calculator
- 2.1.1 Declarations for rpcalc
- 2.1.2 Grammar Rules for rpcalc
- 2.1.2.1 Explanation of input
- 2.1.2.2 Explanation of line
- 2.1.2.3 Explanation of exp
- 2.1.3 The rpcalc Lexical Analyzer
- 2.1.4 The Controlling Function
- 2.1.5 The Error Reporting Routine
- 2.1.6 Running Bison to Make the Parser
- 2.1.7 Compiling the Parser Implementation File
- 2.2 Infix Notation Calculator: calc
- 2.3 Simple Error Recovery
- 2.4 Location Tracking Calculator: ltcalc
- 2.4.1 Declarations for ltcalc
- 2.4.2 Grammar Rules for ltcalc
- 2.4.3 The ltcalc Lexical Analyzer.
- 2.5 Multi-Function Calculator: mfcalc
- 2.5.1 Declarations for mfcalc
- 2.5.2 Grammar Rules for mfcalc
- 2.5.3 The mfcalc Symbol Table
- 2.5.4 The mfcalc Lexer
- 2.5.5 The mfcalc Main
- 2.6 Exercises
- 2.1 Reverse Polish Notation Calculator
- 3 Bison Grammar Files
- 3.1 Outline of a Bison Grammar
- 3.1.1 The prologue
- 3.1.2 Prologue Alternatives
- 3.1.3 The Bison Declarations Section
- 3.1.4 The Grammar Rules Section
- 3.1.5 The epilogue
- 3.2 Symbols, Terminal and Nonterminal
- 3.3 Grammar Rules
- 3.3.1 Syntax of Grammar Rules
- 3.3.2 Empty Rules
- 3.3.3 Recursive Rules
- 3.4 Defining Language Semantics
- 3.4.1 Data Types of Semantic Values
- 3.4.2 More Than One Value Type
- 3.4.3 Generating the Semantic Value Type
- 3.4.4 The Union Declaration
- 3.4.5 Providing a Structured Semantic Value Type
- 3.4.6 Actions
- 3.4.7 Data Types of Values in Actions
- 3.4.8 Actions in Midrule
- 3.4.8.1 Using Midrule Actions
- 3.4.8.2 Typed Midrule Actions
- 3.4.8.3 Midrule Action Translation
- 3.4.8.4 Conflicts due to Midrule Actions
- 3.5 Tracking Locations
- 3.5.1 Data Type of Locations
- 3.5.2 Actions and Locations
- 3.5.3 Printing Locations
- 3.5.4 Default Action for Locations
- 3.6 Named References
- 3.7 Bison Declarations
- 3.7.1 Require a Version of Bison
- 3.7.2 Token Kind Names
- 3.7.3 Operator Precedence
- 3.7.4 Nonterminal Symbols
- 3.7.5 Syntax of Symbol Declarations
- 3.7.6 Performing Actions before Parsing
- 3.7.7 Freeing Discarded Symbols
- 3.7.8 Printing Semantic Values
- 3.7.9 Suppressing Conflict Warnings
- 3.7.10 The Start-Symbol
- 3.7.11 A Pure (Reentrant) Parser
- 3.7.12 A Push Parser
- 3.7.13 Bison Declaration Summary
- 3.7.14 %define Summary
- 3.7.15 %code Summary
- 3.8 Multiple Parsers in the Same Program
- 3.1 Outline of a Bison Grammar
- 4 Parser C-Language Interface
- 4.1 The Parser Function yyparse
- 4.2 Push Parser Interface
- 4.3 The Lexical Analyzer Function yylex
- 4.3.1 Calling Convention for yylex
- 4.3.2 Special Tokens
- 4.3.3 Finding Tokens by String Literals
- 4.3.4 Semantic Values of Tokens
- 4.3.5 Textual Locations of Tokens
- 4.3.6 Calling Conventions for Pure Parsers
- 4.4 Error Reporting
- 4.4.1 The Error Reporting Function yyerror
- 4.4.2 The Syntax Error Reporting Function yyreport_syntax_error
- 4.5 Special Features for Use in Actions
- 4.6 Parser Internationalization
- 4.6.1 Enabling Internationalization
- 4.6.2 Token Internationalization
- 5 The Bison Parser Algorithm
- 5.1 Lookahead Tokens
- 5.2 Shift/Reduce Conflicts
- 5.3 Operator Precedence
- 5.3.1 When Precedence is Needed
- 5.3.2 Specifying Operator Precedence
- 5.3.3 Specifying Precedence Only
- 5.3.4 Precedence Examples
- 5.3.5 How Precedence Works
- 5.3.6 Using Precedence For Non Operators
- 5.4 Context-Dependent Precedence
- 5.5 Parser States
- 5.6 Reduce/Reduce Conflicts
- 5.7 Mysterious Conflicts
- 5.8 Tuning LR
- 5.8.1 LR Table Construction
- 5.8.2 Default Reductions
- 5.8.3 LAC
- 5.8.4 Unreachable States
- 5.9 Generalized LR (GLR) Parsing
- 5.10 Memory Management, and How to Avoid Memory Exhaustion
- 6 Error Recovery
- 7 Handling Context Dependencies
- 7.1 Semantic Info in Token Kinds
- 7.2 Lexical Tie-ins
- 7.3 Lexical Tie-ins and Error Recovery
- 8 Debugging Your Parser
- 8.1 Generation of Counterexamples
- 8.2 Understanding Your Parser
- 8.3 Visualizing Your Parser
- 8.4 Visualizing your parser in multiple formats
- 8.5 Tracing Your Parser
- 8.5.1 Enabling Traces
- 8.5.2 Enabling Debug Traces for mfcalc
- 9 Invoking Bison
- 9.1 Bison Options
- 9.1.1 Operation Modes
- 9.1.2 Diagnostics
- 9.1.3 Tuning the Parser
- 9.1.4 Output Files
- 9.2 Option Cross Key
- 9.3 Yacc Library
- 9.1 Bison Options
- 10 Parsers Written In Other Languages
- 10.1 C++ Parsers
- 10.1.1 A Simple C++ Example
- 10.1.2 C++ Bison Interface
- 10.1.3 C++ Parser Interface
- 10.1.4 C++ Semantic Values
- 10.1.4.1 C++ Unions
- 10.1.4.2 C++ Variants
- 10.1.5 C++ Location Values
- 10.1.5.1 C++ position
- 10.1.5.2 C++ location
- 10.1.5.3 Exposing the Location Classes
- 10.1.5.4 User Defined Location Type
- 10.1.6 C++ Parser Context
- 10.1.7 C++ Scanner Interface
- 10.1.7.1 Split Symbols
- 10.1.7.2 Complete Symbols
- 10.1.8 A Complete C++ Example
- 10.1.8.1 Calc++ — C++ Calculator
- 10.1.8.2 Calc++ Parsing Driver
- 10.1.8.3 Calc++ Parser
- 10.1.8.4 Calc++ Scanner
- 10.1.8.5 Calc++ Top Level
- 10.2 D Parsers
- 10.2.1 D Bison Interface
- 10.2.2 D Semantic Values
- 10.2.3 D Location Values
- 10.2.4 D Parser Interface
- 10.2.5 D Parser Context Interface
- 10.2.6 D Scanner Interface
- 10.2.7 Special Features for Use in D Actions
- 10.2.8 D Push Parser Interface
- 10.2.9 D Complete Symbols
- 10.3 Java Parsers
- 10.3.1 Java Bison Interface
- 10.3.2 Java Semantic Values
- 10.3.3 Java Location Values
- 10.3.4 Java Parser Interface
- 10.3.5 Java Parser Context Interface
- 10.3.6 Java Scanner Interface
- 10.3.7 Special Features for Use in Java Actions
- 10.3.8 Java Push Parser Interface
- 10.3.9 Differences between C/C++ and Java Grammars
- 10.3.10 Java Declarations Summary
- 10.1 C++ Parsers
- 11 A Brief History of the Greater Ungulates
- 11.1 The ancestral Yacc
- 11.2 yacchack
- 11.3 Berkeley Yacc
- 11.4 Bison
- 11.5 Other Ungulates
- 12 Bison Version Compatibility: Best Practices
- 13 Frequently Asked Questions
- 13.1 Memory Exhausted
- 13.2 How Can I Reset the Parser
- 13.3 Strings are Destroyed
- 13.4 Implementing Gotos/Loops
- 13.5 Multiple start-symbols
- 13.6 Secure? Conform?
- 13.7 Enabling Relocatability
- 13.8 I can’t build Bison
- 13.9 Where can I find help?
- 13.10 Bug Reports
- 13.11 More Languages
- 13.12 Beta Testing
- 13.13 Mailing Lists
- Appendix A Bison Symbols
- Appendix B Glossary
- Appendix C GNU Free Documentation License
- Bibliography
- Index of Terms
Next: Conditions for Using Bison, Previous: Bison, Up: Bison [Contents][Index]