Python實作SLR(1)語法分析器
實驗課前一天晚上肝了個SLR語法分析器,當時還發朋友圈語法分析器和我晚上總得走一個,從第二天狀態來看,應該是我們倆一起走了(笑
- 撰寫的時間比較倉促,所以代碼有些地方實作不是很好,存在一些問題,以后有時間的話再來修補一下,比如在對兩個專案規范族進行比較時效率比較低,first集和follow集中對連續多個非終結符推到ε的情況下可能會有bug,但在我的文法定義中特意繞開了ε,勉強能用,
- 為了方便代碼閱讀,加入了許多注釋后的列印陳述句,將這些列印陳述句取消注釋后運行,能夠告訴你當前這段代碼所做的事,
- 在撰寫程序中,盡量對模塊進行封裝,把邏輯和文法定義分開,做到文法可以方便地修改,但時間過于倉促可能還是有點小問題,當文法修改時,修改
getCol函式(該函式將終結符和非終結符映射到action和goto表中相應的列),initProduction函式(該函式定義了文法產生式(拓廣文法),在本文中有28個產生式),source(輸入單詞序列),varset(非終結符集合),terminalset(終結符集合)
SLR(1)分析流程
- 輸入文法
- 求first集
- 求follow集
- 構造LR(0)專案集DFA
- 構造Action和Goto
- 按照Action和Goto進行分析
1.主要資料結構定義和基礎函式:
基礎函式
-
isVariable函式判斷是不是非終結符
-
isTerminal函式判斷是不是終結
-
transf(production_set, var)函式 production_set為一個LR(0)專案,嘗試通過var(終結符或非終結符)進行轉移
-
isSameStatus(status1, status2)函式:判斷status1和status2是不是兩個相同的LR(0)專案
-
isInPointset(production_set, pointset):#用來檢驗production_set是不是已經存在的point ,如果存在就把point回傳(生成DFA時用到)
資料結構
- 產生式采用類來存盤,left和right分別為list,number‘為產生式編號
- GraphPoint存盤DFA轉移,transfer為有向邊集合,集合中的一個元素由var(終結符或非終結符),和另一個GraphPoint組成
class Production:
def __init__(self, left, right, number):
self.left = left
self.right = right
self.number = number
class GraphPoint:
def __init__(self, begin_production, id):
self.status = begin_production
self.transfer = []
self.id = id
def add_transfer(self, var, graphPoint):
self.transfer.append([var, graphPoint])
2.文法定義
1.分析目標代碼:int lexicalanalysis(){ float a; int b; a=1.1; b=2; while(b<100){ b=b+1; a=a+3;}; if(a>5) {b=b-1;} else {b=b+1;}}
2.語法分析器輸入為目標代碼的詞法分析器輸出的單詞序列
source = [[5, "int", " 關鍵字"], [1, "lexicalanalysis", " 識別符號"], [13, "(", " 左括號"], [14, ")", " 右括號"], [20, "{", " 左大括號"],
[4, "float", " 關鍵字"], [1, "a", " 識別符號"], [15, ";", " 分號"], [5, "int", " 關鍵字"], [1, "b", " 識別符號"],
[15, ";", " 分號"], [1, "a", " 識別符號"], [12, "=", " 賦值號"], [3, "1.1", " 浮點數"], [15, ";", " 分號"], [1, "b", " 識別符號"],
[12, "=", " 賦值號"], [2, "2", " 整數"], [15, ";", " 分號"], [8, "while", " 關鍵字"], [13, "(", " 左括號"],
[1, "b", " 識別符號"], [17, "<", " 小于號"], [2, "100", " 整數"], [14, ")", " 右括號"], [20, "{", " 左大括號"],
[1, "b", " 識別符號"], [12, "=", " 賦值號"], [1, "b", " 識別符號"], [9, "+", " 加 號"], [2, "1", " 整數"], [15, ";", " 分號"],
[1, "a", " 識別符號"], [12, "=", " 賦值號"], [1, "a", " 識別符號"], [9, "+", " 加號"], [2, "3", " 整數"], [15, ";", " 分號"],
[21, "}", " 右大括號"], [15, ";", " 分號"], [6, "if", " 關鍵字"], [13, "(", " 左括號"], [1, "a", " 識別符號"],
[16, ">", " 大于號"], [2, "5", " 整數"], [14, ")", " 右括號"], [20, "{", " 左大括號"], [1, "b", " 識別符號"],
[12, "=", " 賦值號"], [1, "b", " 識別符號"], [10, "-", " 減號"], [2, "1", " 整數"], [15, ";", " 分號"], [21, "}", " 右大括號"],
[7, "else", " 關鍵字"], [20, "{", " 左大括號"], [1, "b", " 識別符號"], [12, "=", " 賦值號"], [1, "b", " 識別符號"],
[9, "+", " 加號"], [2, "1", " 整數"], [15, ";", " 分號"], [21, "}", " 右大括號"], [21, "}", " 右大括號"]]
3.文法定義:拓廣文法共有28個產生式,0號產生式為保證分析器只有一個接受狀態,而拓廣的產生式,
def initProduction():
production_list = []
production = Production(["A1"], ["A"], 0)
production_list.append(production)
production = Production(["A"], ["E", "I", "(", ")", "{", "D", "}"], 1)
production_list.append(production)
production = Production(["E"], ["int"], 2)
production_list.append(production)
production = Production(["E"], ["float"], 3)
production_list.append(production)
production = Production(["D"], ["D", ";", "B"], 4)
production_list.append(production)
production = Production(["B"], ["F"], 5)
production_list.append(production)
production = Production(["B"], ["G"], 6)
production_list.append(production)
production = Production(["B"], ["M"], 7)
production_list.append(production)
production = Production(["F"], ["E", "I"], 8)
production_list.append(production)
production = Production(["G"], ["I", "=", "P"], 9)
production_list.append(production)
production = Production(["P"], ["K"], 10)
production_list.append(production)
production = Production(["P"], ["K", "+", "P"], 11)
production_list.append(production)
production = Production(["P"], ["K", "-", "P"], 12)
production_list.append(production)
production = Production(["I"], ["id"], 13)
production_list.append(production)
production = Production(["K"], ["I"], 14)
production_list.append(production)
production = Production(["K"], ["number"], 15)
production_list.append(production)
production = Production(["K"], ["floating"], 16)
production_list.append(production)
production = Production(["M"], ["while", "(", "T", ")", "{", "D", ";", "}"], 18)
production_list.append(production)
production = Production(["N"], ["if", "(", "T", ")", "{", "D",";", "}", "else", "{", "D", ";","}"], 19)
production_list.append(production)
production = Production(["T"], ["K", "L", "K"], 20)
production_list.append(production)
production = Production(["L"], [">"], 21)
production_list.append(production)
production = Production(["L"], ["<"], 22)
production_list.append(production)
production = Production(["L"], [">="], 23)
production_list.append(production)
production = Production(["L"], ["<="], 24)
production_list.append(production)
production = Production(["L"], ["=="], 25)
production_list.append(production)
production = Production(["D"], ["B"], 26)
production_list.append(production)
production = Production(["B"], ["N"], 27)
production_list.append(production)
return production_list
3.求First集

根據此演算法即可求解first集,第8,9步可以采用遞回的方式進行求解,
def getFirst(production_list, varset, terminalset):
first_dic = {}
# 用來標記first集是否計算完畢,防止重復計算浪費時間
done = {}
for var in varset:
first_dic[var] = set()
done[var] = 0
# 所有終結符的first集是他自身
for var in terminalset:
first_dic[var] = {var}
done[var] = 1
# print("初始化后的done",done)
# print("初始化的first_dic",first_dic)
for var in varset:
if done[var] == 0:
# print("計算",var)
getFirstForVar(var, first_dic, varset, terminalset, done)
# print("計算完畢",var)
# print("此時的done", done)
# print("此時的first_dic", first_dic)
else:
pass
return first_dic
def getFirstForVar(var, first_dic, varset, terminalset, done):
# 已經推導過直接結束
if done[var] == 1:
# print("我已經推導過了吼")
return
# 對非終結符求first集合,先看右邊第一個元素為終結符
for production in production_list:
if var in production.left:
if isTerminal(production.right[0], terminalset):
first_dic[var].add(production.right[0])
# 用null表示空字符
if production.right[0] == "null":
# print("出現右側為空")
first_dic[var].add("null")
# 右邊第一個元素為非終結符
for production in production_list:
if var in production.left:
if isVariable(production.right[0], varset):
if var == production.right[0]:
continue
if done[production.right[0]] == 0:
getFirstForVar(production.right[0], first_dic, varset, terminalset, done)
if "null" in first_dic[production.right[0]]:
first_dic[production.right[0]].remove("null")
first_dic[var] = first_dic[var] | first_dic[production.right[0]]
# print("將 ",production.right[0],"的集合 ",first_dic[production.right[0]],"并入",var,"的集合中",first_dic[var],"中","得到",)
if isVariable(production.right[0], varset) and len(production.right) > 1:
index = 1
count = 1
while isVariable(production.right[index], varset):
index = index + 1
count = count + 1
if index >= len(production.right):
break
i = 0
while i < count:
getFirstForVar(production.right[i], first_dic, varset, terminalset, done)
if "null" in first_dic[production.right[i]]:
getFirstForVar(production.right[i + 1], first_dic, varset, terminalset, done)
first_dic[var] = first_dic[var] | first_dic[production.right[i + 1]]
else:
break
i = i + 1
# 完成后置為1
done[var] = 1
4.求解follow集
通過使用非終結符的follow集,提高識別能力,是SLR(1)分析的核心思想,
只有當專案集包含 A→α· ,則action[i,x]= rj,x屬于FOLLOW(A),j為產生式 A→α的編號,通過這種方式可以解決一部分的移進和歸約沖突,
ps:代碼中有坑,如果文法中出現了刁鉆ε,比如幾個非終結符連續推為空,會產生bug,時間原因以及我的文法定義中并不存在ε就沒有解決這個問題,
def getFollow(varset, terminalset, first_dic, production_list):
follow_dic = {}
done = {}
for var in varset:
follow_dic[var] = set()
done[var] = 0
follow_dic["A1"].add("#")
# for var in terminalset:
# follow_dic[var]=set()
# done[var] = 0
for var in follow_dic:
getFollowForVar(var, varset, terminalset, first_dic, production_list, follow_dic, done)
return follow_dic
def getFollowForVar(var, varset, terminalset, first_dic, production_list, follow_dic, done):
if done[var] == 1:
return
for production in production_list:
if var in production.right:
##index這里在某些極端情況下有bug,比如多次出現var,index只會回傳最左側的
if production.right.index(var) != len(production.right) - 1:
follow_dic[var] = first_dic[production.right[production.right.index(var) + 1]] | follow_dic[var]
# 沒有考慮右邊有非終結符但是為null的情況
if production.right[len(production.right) - 1] == var:
if var != production.left[0]:
# print(var, "吸納", production.left[0])
getFollowForVar(production.left[0], varset, terminalset, first_dic, production_list, follow_dic,
done)
follow_dic[var] = follow_dic[var] | follow_dic[production.left[0]]
done[var] = 1
5.構建LR(0)專案集DFA
1.首先先定義一個CLOSURE函式,它將會對集合中的產生式狀態進行不斷地擴充,并最終形成一個專案集閉包
def CLOSURE(varset, terminalset, production_set, production_list):
演算法:

2.構建DFA,函式定義
def generatingGraph(begin_production_set, varset, terminalset, production_list):
我們首先使用0號產生式來形成初始LR(0)專案集,產生初始節點(即開頭資料結構中的類),并將它放到一個集合中,每次從集合中取一個節點,來用 每一個var屬于(V | T)嘗試進行轉移,轉移成功后將這條有向邊存入該節點地transfer中,每次轉移后的專案集判斷是否是新專案集,如果是新專案集,則將新專案集放入集合中,當集合為空演算法停止,
# 生成狀態轉移圖
def generatingGraph(begin_production_set, varset, terminalset, production_list):
global id
CLOSURE(varset, terminalset, begin_production_set, production_list)
beginPoint = GraphPoint(begin_production_set, id)
id = id + 1
# print("從這個狀態開始!")
# print(beginPoint.id)
# for onepro in beginPoint.status:
# print(onepro.number, " ", onepro.left, "->", onepro.right, " ")
pointset = [beginPoint]
set = varset | terminalset
stack = [beginPoint]
while len(stack) != 0:
currentPoint = stack.pop()
######
# print("該點被彈出,進行轉移!")
# print(currentPoint.id)
# for onepro in currentPoint.status:
# print(onepro.number, " ", onepro.left, "->", onepro.right, " ")
#####
for var in set:
# print("嘗試用",var,"進行轉移")
result = transf(currentPoint.status, var)
if len(result) == 0:
# print(var,"轉移失敗!")
continue
else:
# print(var,"可轉移!")
# print("將使用result進行轉移!")
# for onepro in result:
# print(onepro.number, " ", onepro.left, "->", onepro.right, " ")
# 求出轉移后的閉包
CLOSURE(varset, terminalset, result, production_list)
nextpoint = isInPointset(result, pointset)
if nextpoint is None:
# print(var,"轉移為新狀態:")
# 新節點壓入尋找堆疊和點集合中,舊節點不能壓入
nextpoint = GraphPoint(result, id)
id = id + 1
pointset.append(nextpoint)
stack.append(nextpoint)
# print(nextpoint.id)
# for onepro in nextpoint.status:
# print(onepro.number, " ", onepro.left, "->", onepro.right, " ")
currentPoint.add_transfer(var, nextpoint)
# print("生成一個新狀態")
# for onepro in result:
# print(onepro.number," ",onepro.left,"->",onepro.right," ")
return pointset
# 形成閉包
def CLOSURE(varset, terminalset, production_set=[], production_list=[]):
sizebefore = len(production_list)
sizeafter = -1
# 用來測驗是不是已經形成閉包,避免進入死回圈
flag = 0
for production_in_set in production_set:
if production_in_set.right.index(".") != len(production_in_set.right) - 1:
if isVariable(production_in_set.right[production_in_set.right.index(".") + 1], varset):
flag = 1
if flag == 0:
return
while sizeafter != sizebefore:
for production_in_set in production_set:
# 點在最右側就不可能轉移
if (production_in_set.right.index(".") == len(production_in_set.right) - 1):
continue
i = production_in_set.right.index(".") + 1;
# print(i," length",len(production_in_set.right))
if isTerminal(production_in_set.right[i], terminalset):
continue;
templist = []
for x in production_list:
# print(i,len(production_in_set.right))
if x.left[0] == production_in_set.right[i]:
y = copy.deepcopy(x)
y.right.insert(0, ".")
flag = 0
for one in production_set:
rightflag = 0;
if len(one.right) != len(y.right):
rightflag = 1
else:
for j in range(0, len(y.right)):
if one.right[j] != y.right[j]:
rightflag = 1
if one.left[0] == y.left[0] and rightflag == 0:
flag = 1
if flag == 0:
templist.append(y)
sizebefore = len(production_set)
production_set.extend(templist)
sizeafter = len(production_set)
6.構造Action和Goto表
演算法:

演算法中的(1)(2)思想類似于計算機網路中的洪泛控制,將初始節點放入一個集合中,從集合中取一個節點,從一個節點走出它的所有有向邊,并將這個節點標記為已經走過,將到達所有的之前沒有走過的節點放入集合中,如此以往,直到集合為空,代碼中的一些列印出錯的陳述句為檢測是否存在沖突的陳述句,由于撰寫時間限制原因,大多數的沖突可被測出,但少部分沖突仍然不可見(天坑),
演算法(3)(4)通過遍歷專案集中的產生式狀態即可判斷,
#Cell為Action中的一個元素,do表示動作,which表示數字,如轉移的狀態或采用歸約的產生式序號,done為是否已經走過,類似于洪泛控制的作用
class Cell:
def __init__(self):
self.do = -1
self.which = -1
self.done = 0
def initActionAndGoto(pointset, varset, terminalset, begin, follow_dic):
Action = [[Cell() for i in range(len(terminalset))] for j in range(len(pointset))]
Goto = [[-1 for i in range(len(varset))] for j in range(len(pointset))]
for point in pointset:
# 轉移狀態
for tran in point.transfer:
if isVariable(tran[0], varset):
if Goto[point.id][getCol(tran[0])] != -1:
print("出錯404")
Goto[point.id][getCol(tran[0])] = tran[1].id
else:
if Action[point.id][getCol(tran[0])].done == 1:
print("出錯403")
Action[point.id][getCol(tran[0])].done = 1
Action[point.id][getCol(tran[0])].do = "S"
Action[point.id][getCol(tran[0])].which = tran[1].id
for production in point.status:
if production.right.index(".") == len(production.right) - 1 and production.left[0] == begin:
if Action[point.id][getCol("#")].done == 1:
print("出錯415")
Action[point.id][getCol("#")].do = "acc"
Action[point.id][getCol("#")].done = 1
if production.right.index(".") == len(production.right) - 1 and production.left[0] != begin:
# 在follow集中才可歸約
for terminal in terminalset:
if terminal in follow_dic[production.left[0]]:
# 沖突檢測
if Action[point.id][getCol(terminal)].done == 1:
for xx in point.status:
print(xx.number, " ", xx.left, "->", xx.right)
print("Action表", point.id, "行", getCol(terminal), "列沖突")
print("原本", Action[point.id][getCol(terminal)].do, Action[point.id][getCol(terminal)].which)
print("現在", "R", production.number)
print("出錯416")
Action[point.id][getCol(terminal)].do = "R"
Action[point.id][getCol(terminal)].done = 1
# 采用該產生式歸約
Action[point.id][getCol(terminal)].which = production.number
return Action, Goto
7.根據Action和Goto進行語法分析
演算法思想:
開始時句型前綴堆疊和狀態站分別壓入#和0狀態,
回圈:
如果表中為si,則將緩沖區第一個元素壓入句型前綴堆疊,并將 i(狀態)壓入狀態堆疊
如果表中為ri , 則采用第i個運算式進行歸約,彈出的元素個數為i個運算式的右側的元素個數,之后根據堆疊頂狀態和歸約得到的非終結符查看GOTO表,查找當前狀態,并將當前狀態和規約得到的非終結符分別入堆疊,
如果表中為error!,恭喜出錯,去找bug吧(也有可能是你的輸入不符合當前文法,文法沖突也會導致這種情況),
如果表中為acc,恭喜成功,
# SLR分析開始
def SLR(Action, Goto, source, production_list):
source.append([0, "#", "結束符"])
statusstack = [0]
sentence_stack = ["#"]
print(source)
while 1:
print("*****************************************")
print("緩沖區剩余元素", source)
terminal = source.pop(0)
print("狀態堆疊", statusstack)
print("句型堆疊", sentence_stack)
# 移進
if Action[statusstack[len(statusstack) - 1]][terminal[0]].do == "S":
print("動作: 移入操作,從緩沖區中讀取",terminal[1],"元素進行移入,并根據Action壓入",Action[statusstack[len(statusstack) - 1]][terminal[0]].which,"狀態")
statusstack.append(Action[statusstack[len(statusstack) - 1]][terminal[0]].which)
sentence_stack.append(terminal[1])
elif Action[statusstack[len(statusstack) - 1]][terminal[0]].do == "R":
# 歸約
# 記錄歸約產生式
r_production = 0
for production in production_list:
if production.number == Action[statusstack[len(statusstack) - 1]][terminal[0]].which:
r_production = production
for i in range(len(r_production.right)):
statusstack.pop()
sentence_stack.pop()
statusstack.append(Goto[statusstack[len(statusstack) - 1]][getCol(r_production.left[0])])
print("動作: 歸約操作,根據Action表利用第",r_production.number,"個產生式歸約")
sentence_stack.append(r_production.left[0])
source.insert(0, terminal)
elif Action[statusstack[len(statusstack) - 1]][terminal[0]].do == "acc":
print("!!!!!!!!!!語意分析完成!!!!!!!!!!!!!!")
break;
else:
print("error 462!");
8.運行與測驗
source = [[5, "int", " 關鍵字"], [1, "lexicalanalysis", " 識別符號"], [13, "(", " 左括號"], [14, ")", " 右括號"], [20, "{", " 左大括號"],
[4, "float", " 關鍵字"], [1, "a", " 識別符號"], [15, ";", " 分號"], [5, "int", " 關鍵字"], [1, "b", " 識別符號"],
[15, ";", " 分號"], [1, "a", " 識別符號"], [12, "=", " 賦值號"], [3, "1.1", " 浮點數"], [15, ";", " 分號"], [1, "b", " 識別符號"],
[12, "=", " 賦值號"], [2, "2", " 整數"], [15, ";", " 分號"], [8, "while", " 關鍵字"], [13, "(", " 左括號"],
[1, "b", " 識別符號"], [17, "<", " 小于號"], [2, "100", " 整數"], [14, ")", " 右括號"], [20, "{", " 左大括號"],
[1, "b", " 識別符號"], [12, "=", " 賦值號"], [1, "b", " 識別符號"], [9, "+", " 加 號"], [2, "1", " 整數"], [15, ";", " 分號"],
[1, "a", " 識別符號"], [12, "=", " 賦值號"], [1, "a", " 識別符號"], [9, "+", " 加號"], [2, "3", " 整數"], [15, ";", " 分號"],
[21, "}", " 右大括號"], [15, ";", " 分號"], [6, "if", " 關鍵字"], [13, "(", " 左括號"], [1, "a", " 識別符號"],
[16, ">", " 大于號"], [2, "5", " 整數"], [14, ")", " 右括號"], [20, "{", " 左大括號"], [1, "b", " 識別符號"],
[12, "=", " 賦值號"], [1, "b", " 識別符號"], [10, "-", " 減號"], [2, "1", " 整數"], [15, ";", " 分號"], [21, "}", " 右大括號"],
[7, "else", " 關鍵字"], [20, "{", " 左大括號"], [1, "b", " 識別符號"], [12, "=", " 賦值號"], [1, "b", " 識別符號"],
[9, "+", " 加號"], [2, "1", " 整數"], [15, ";", " 分號"], [21, "}", " 右大括號"], [21, "}", " 右大括號"]]
id = 0
varset = {"A1", "A", "E", "I", "D", "F", "G", "M", "P", "K", "T", "L", "B","N"}
terminalset = {"(", ")", "{", "}", ";", "int", "float", "number", "floating", "while", "if", "else", ">", "<", ">=",
"<=", "==", "=", "#", "+", "-", "id"}
production_list = initProduction()
first_dic = getFirst(production_list, varset, terminalset)
# for x in first_dic:
# print(x," : ",first_dic[x])
follow_dic = getFollow(varset, terminalset, first_dic, production_list)
# print("follow:")
# for x in follow_dic:
# print(x, ":", follow_dic[x])
production = Production(["A1"], [".", "A"], 0)
production_set = [production]
# for x in production_set:
# print(x.number," ",x.left,"->",x.right," ")
pointset = generatingGraph(production_set, varset, terminalset, production_list)
begin = "A1"
Action, Goto = initActionAndGoto(pointset, varset, terminalset, begin, follow_dic)
print("**********************GOTO***********************************")
for i in range(len(Goto)):
print(i, end=" ")
for j in range(len(Goto[i])):
print("%3d" % Goto[i][j], end=" ")
print("")
print("**********************Action***********************************")
for i in range(len(Action)):
print("%2d" % i, end=": ")
for j in range(len(Action[i])):
if (Action[i][j].done == 0):
print("error!", end=" ")
else:
print("%3s" % Action[i][j].do, "%2d" % Action[i][j].which, end=" ")
print("")
SLR(Action, Goto, source, production_list)
結果:
GOTO表(區域):(60*14)

Action表(區域):60*22

規約程序(區域):共142次
開始:

結束:

完整代碼編譯原理結課會放在博客里
2020.11.12
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/214228.html
標籤:其他
