這個問題在這里已經有了答案: Coursera 課程 - Python 作業中的資料科學介紹 1 (2 個答案) 昨天關門。
我撰寫了一個包含正則運算式的函式來分隔txt檔案的一些特殊部分。代碼作業正常,但我想得到一個字典作為輸出,長度應該是 979:
import re
def logs():
with open("C:/Users/ASUS/Desktop/logdata.txt", "r") as file:
logdata = file.read()
pattern = '''
(?P<host>\d{1,}\.\d{1,}\.\d{1,}\.\d{1,}) # host name
\s \S \s
(?P<user_name>(?<=-\s)(\w |-)(?=\s))\s \[ # user_name
(?P<time>([^[] ))\]\s " # time
(?P<request>[^"] )" # request
'''
for item in re.finditer(pattern, logdata, re.VERBOSE):
print(item.groupdict())
這個函式應該變成這樣的文本:
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
這種捕獲host,user_name等等:
{"host":"146.204.224.152",
"user_name":"feest6811",
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"}
我怎樣才能做到這一點?
uj5u.com熱心網友回復:
groupdict()直接使用即可:
import re
def rtr_dict(txt):
pattern = '''
(?P<host>\d{1,}\.\d{1,}\.\d{1,}\.\d{1,}) # host name
\s \S \s
(?P<user_name>(?<=-\s)(\w |-)(?=\s))\s \[ # user_name
(?P<time>([^[] ))\]\s " # time
(?P<request>[^"] )" # request
'''
if m:=re.match(pattern, txt, flags=re.VERBOSE):
return m.groupdict()
tgt='146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
>>>rtr_dict(tgt)
{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}
請你告訴我,我怎樣才能做到不止一行,就像我使用 for 回圈那樣。
鑒于:
tgt='''146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
146.204.224.153 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4623
146.204.224.154 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4624'''
如果您有多個匹配項,則可以回傳一個 dicts 串列:
def rtr_dict(txt):
pattern = '''
(?P<host>\d{1,}\.\d{1,}\.\d{1,}\.\d{1,}) # host name
\s \S \s
(?P<user_name>(?<=-\s)(\w |-)(?=\s))\s \[ # user_name
(?P<time>([^[] ))\]\s " # time
(?P<request>[^"] )" # request
'''
return [m.groupdict() for m in re.finditer(pattern, txt, flags=re.VERBOSE)]
>>> rtr_dict(tgt)
[{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}, {'host': '146.204.224.153', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}, {'host': '146.204.224.154', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}]
或使用生成器:
def rtr_dict(txt):
pattern = '''
(?P<host>\d{1,}\.\d{1,}\.\d{1,}\.\d{1,}) # host name
\s \S \s
(?P<user_name>(?<=-\s)(\w |-)(?=\s))\s \[ # user_name
(?P<time>([^[] ))\]\s " # time
(?P<request>[^"] )" # request
'''
for m in re.finditer(pattern, txt, flags=re.VERBOSE):
yield m.groupdict()
>>> list(rtr_dict(tgt))
# same list of dicts...
uj5u.com熱心網友回復:
它遲到了,但這個詳細的正則運算式也可以(回傳字典串列)
import re
def logs():
with open("C:/Users/ASUS/Desktop/logdata.txt", "r") as file:
logdata = file.read()
pattern = """
(?P<host>[\d\.]*) #IP host
(\ -\ ) #followed by
(?P<user_name>[\w-]*) #user name
(\ *\[) #followed by
(?P<time>[^\]]*) #time
(\]\ *") #followed by
(?P<request>[^\"]*) #request"""
return [item.groupdict() for item in re.finditer(pattern, logdata, re.VERBOSE)]
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/408175.html
標籤:
