源資料如下:
Name : ValueA
Age: 23
Height: 178cm
Friend : A
Friend : B
Name : ValueB
Age: 22
Height: 168cm
Weight: 80Kg
Friend : A
Friend : C
Name : ValueC
Age: 40
Height: 188cm
IQ: 150
Friend : D
所需輸出
| 姓名 | 年齡 | 高度 | 重量 | 智商 | 朋友1 | 朋友2 |
|---|---|---|---|---|---|---|
| 值A | 23 | 178cm | 不適用 | 不適用 | 一種 | 乙 |
| 價值B | 22 | 168cm | 80公斤 | 不適用 | 一種 | C |
| 價值C | 40 | 188cm | 不適用 | 150 | D | 不適用 |
我正在使用的代碼是
with open("test.txt", "r") as f:
t = [line.strip() for line in f.readlines()]
# Organise data
stuff = {}
index = 0
for line in t:
key, value = line.split(": ")
if "Name" in key:
stuff[index] = {"Name": value}
current_key = index
index = 1
else:
stuff[current_key][key] = value
# create dataframe
result_df = pd.DataFrame(stuff).T
問題是由于重復的鍵值,“朋友”只有最后一個值被識別。我得到如下輸出:
| 姓名 | 年齡 | 高度 | 重量 | 智商 | 朋友 |
|---|---|---|---|---|---|
| 值A | 23 | 178cm | 不適用 | 不適用 | 一種 |
| 價值B | 22 | 168cm | 80公斤 | 不適用 | C |
| 價值C | 40 | 188cm | 不適用 | 150 | D |
請注意,還有其他重復的鍵值。名稱是主鍵。其余的應該能夠處理重復的鍵值。
uj5u.com熱心網友回復:
為了將我之前的回答推廣到任意數量的朋友,我們可以將“朋友”的所有值存盤在一個串列中,并在創建資料框之前將“朋友”串列的元素分配給它自己的列:
將此添加到您的 for 回圈中以將“朋友”存盤在串列中:
elif "Friend" in key: if "Friend" not in stuff[current_key]: stuff[current_key]["Friend"] = list() stuff[current_key]["Friend"].append(value)在你的 for 回圈之后,為每個“朋友”元素創建一個新鍵:
for key in stuff: for i, friend in enumerate(stuff[key]["Friend"], start=1): stuff[key][f"Friend{i}"] = friend del stuff[key]["Friend"]
結果代碼:
# Organise data
stuff = {}
index = 0
for line in t:
key, value = line.split(": ")
if "Name" in key:
stuff[index] = {"Name": value}
current_key = index
index = 1
elif "Friend" in key:
if "Friend" not in stuff[current_key]:
stuff[current_key]["Friend"] = list()
stuff[current_key]["Friend"].append(value)
else:
stuff[current_key][key] = value
for key in stuff:
for i, friend in enumerate(stuff[key]["Friend"]):
stuff[key][f"Friend{i}"] = friend
del stuff[key]["Friend"]
# create dataframe
result_df = pd.DataFrame(stuff).T
print(result_df)
第 2 部分:對任意鍵的泛化
stuff = {}
index = 0
for line in t:
key, value = line.split(": ")
if "Name" in key:
stuff[index] = {"Name": [value]} # Always use a list as vale
current_key = index
index = 1
elif key not in stuff[current_key]: # If key does not exist
stuff[current_key][key] = [value] # Create key with value in a list
else:
stuff[current_key][key].append(value) # If key exists, append the value
for key in stuff:
for subkey in list(stuff[key].keys()):
for i, element in enumerate(stuff[key][subkey], start=1):
stuff[key][f"{subkey}{i}"] = element
del stuff[key][subkey]
# create dataframe
result_df = pd.DataFrame(stuff).T
print(result_df)
輸出:
Name1 Age1 Height1 Friend 1 Friend 2 Weight1 IQ1
0 ValueA 23 178cm A B NaN NaN
1 ValueB 22 168cm A C 80Kg NaN
2 ValueC 40 188cm D NaN NaN 150
舊答案(不可擴展)
You could add this elif inside your loop. The idea is to identify the "Friend" key but store two possible keys ("Friend1" and "Friend2"). We only use "Friend2" if "Friend1" already exists:
elif "Friend" in key:
if "Friend1" not in stuff[current_key]:
stuff[current_key]["Friend1"] = value
elif "Friend2" not in stuff[current_key]:
stuff[current_key]["Friend2"] = value
Notice that, if a row only has "Friend1", pandas should fill "Friend2" with a NaN value automatically.
uj5u.com熱心網友回復:
要在 Python 中處理重復鍵,請使用允許多個值的資料結構,例如串列的字典。例如:
# Organise data
stuff = {}
index = 0
for line in t:
key, value = line.split(": ")
if "Name" in key:
stuff[index] = {"Name": value}
current_key = index
index = 1
else:
stuff[current_key].setdefault(key, []).append(value)
在此代碼中,setdefault 將值設定為key一個空串列(如果尚未設定),然后您立即附加到它。這意味著所有鍵(名稱除外)都將具有值串列而不是單個值。
我更喜歡的另一種方法是不命令事物。你有一個模式,所以你可以有一個資料類。
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
height: str
weight: str
friends: list
iq: int
people = []
cur_person = None
for line in t:
key, value = [i.strip() for i in line.split(":", 1)]
if key == 'Name':
cur_person = Person(value, age=None, height=None, weight=None,
friends=[], iq=None)
people.append(cur_person)
elif key in {'Age', 'IQ'}:
setattr(cur_person, key.lower(), int(value))
elif key == 'Friend':
cur_person.friends.append(value)
else:
setattr(cur_person, key.lower(), value)
uj5u.com熱心網友回復:
您可以將鍵的值設定為串列,如果鍵沒有值,您可以將其設定為None.
stuff = {}
index = 0
for line in t:
key, value = line.split(": ")
if "Name" in key:
stuff[index] = {"Name": value}
current_key = index
index = 1
else:
if key in list(stuff[current_key].keys()): #Check existing value
new_value = []
if type(stuff[current_key][key]) is list:
for x in stuff[current_key][key]:
new_value.append(x)
else:
new_value.append(stuff[current_key][key])
new_value.append(value)
stuff[current_key][key] = new_value
else: #No existing value
stuff[current_key][key] = value
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/435754.html
