我有 N 個 .txt 檔案,每個檔案在一個目錄中都有數千個 IP 地址(例如 /var/logs)。檔案名以“log”開頭,例如 log01.test.co.net、log02.hello.co.net、log01.domain.co.net 等。我需要將每個 .txt 檔案中的 IP 添加為將一列轉換為 excel 檔案并比較所有列以找到常見的 IP。然后將公共 IP 列印為名為“公共 IP”的另一列。
如何使用 Python Pandas 庫實作要求?有人可以幫忙嗎?
我是 Python 的新手,并且能夠從網上找到我們可以使用 Python Pandas 庫將 txt 檔案轉換為 excel 格式。
#!/usr/bin/env python3
import pandas as pd
df = pd.read_table('log01.txt')
df.to_excel('output.xlsx', 'Sheet2')
uj5u.com熱心網友回復:
使用pathlib來獲取所有的日志檔案,并加載它們pd.read_csv。使用提取常見 IP 地址,value_counts然后將它們添加到common_ips列中:
import pandas as pd
import pathlib
# Parse all log files
data = {}
for logfile in pathlib.Path('/var/logs').glob('log*'):
df = pd.read_csv(logfile, squeeze=True).drop_duplicates() \
.sort_values().reset_index(drop=True)
data[logfile.name] = df
# Merge all data from logs
df = pd.concat(data)
# Find common IP addresses through all files
common_ips = df.value_counts().eq(df.index.levels[0].nunique()) \
.loc[lambda x: x].index.to_frame(index=False, name='common_ips')
# Create final dataframe and export it to Excel
df = df.unstack(level=0).join(common_ips)
df.to_excel('report.xlsx', index=False)
示例輸出:
log02.hello.co.net log01.domain.co.net log01.test.co.net common_ips
0 1.114.104.134 1.156.243.176 1.131.142.79 99.19.144.176
1 1.124.207.151 1.243.206.110 1.207.189.107 NaN
2 1.13.108.19 1.62.221.173 1.249.176.100 NaN
3 1.147.63.14 1.63.85.25 1.252.71.6 NaN
4 1.90.129.211 1.147.154.71 1.27.141.156 NaN
.. ... ... ... ...
957 98.162.171.47 NaN NaN NaN
958 99.19.144.176 NaN NaN NaN
959 99.20.58.108 NaN NaN NaN
960 99.220.179.58 NaN NaN NaN
961 99.97.17.252 NaN NaN NaN
[962 rows x 4 columns]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/360289.html
