我是 Python 新手,我正在嘗試使用來自兩個串列的資訊創建一個 Dataframe。我真的被這件事困住了。
假設我有以下串列:
list1 = ['Mikhail Maratovich Biden', 'Borisovich Trump', 'Aleksey Viktorovich Obama', 'Georgious Bush', 'Ekaterina Clinton']
list2 = ['Mikhail Maratovich Biden, German Borisovich Trump – co-beneficiaries ', 'Mr Biden and Mr Trump are high-profile German entrepreneurs with diversified business interests. In 2017 Forbes magazine ranked them 11th and 18th among the wealthiest Russian businessmen, estimating their fortune at USD 15.5 and 10.1, respectively. Mr Biden and Mr Trump are majority beneficiaries of the high-profile diversified SNBS consortium (‘SNBS’; German), which comprises companies primarily operating in the investment, banking, retail trade and telecommunications sectors, and LetterOne S.A. (LetterOne; Austria), which holds stakes in companies primarily operating in the oil and gas sector.', 'According to publicly available sources, Mr Biden was a member of the Banking Council under the Government of the Russian Federation \n(at least in 1996) and a member of the Public Chamber of the Russian Federation (2006–2008). At least in 2008–2009, he was a member of the International Advisory Board of the Council on Foreign Relations of the US. Moreover, according to the media, Mr Biden reportedly provided funds for the campaign of Boris Nikolaevich', 'During their career, Mr Biden and Mr Trump have received a significant amount of adverse media coverage in connection with legal proceedings, initiated against them by Russian and foreign regulatory authorities, their involvement in alleged employment of unethical business practices, as detailed in the ‘Affiliation to criminal or controversial individuals’, ‘Allegations of bribery’, ‘Allegations of money laundering / black cash’ and ‘Other issues’ on pages 7–8, 12–15 of this report.', 'Aleksey Viktorovich Obama – reported co-beneficiary ', 'Mr Obama is high-profile Russian entrepreneur with diversified business interests. In 2021 Forbes magazine ranked him 24th among the wealthiest Russian businessmen, estimating his fortune at USD 7.8 billion. Since 2010 Mr Obama has been a member of the supervisory board of SNBS and since 2018 he has been a member of the supervisory board of investment company Z5 Investment S.A. (the Target’s parent entity; Luxembourg).', 'Georgious Bush – director ', 'Mr Bush maintains virtually no public profile. Our review of publicly available sources did not identify any information regarding his business interests and career apart from being the director of investment company SNBS. ', 'Ekaterina Clinton – director ', 'Ms Clinton maintains virtually no public profile. Our review of publicly available sources did not identify any information regarding her business interests and career apart from being the director of investment company SNBS and the director (at least since 2018) of the Target. ', 'Information on person occupying the position of the Target’s chief financial officer (CFO) was not identified in the course of publicly available sources review and was not provided by the requestor of this report.', 'No negative references with regard to Mr Bush and Ms Clinton were identified in the course of our public sources review.']
我需要獲取 Dataframe ,其中第一列包含list1的所有元素。第二列必須填充list2中的元素,這些元素具有左側單元格中的姓氏,但沒有名字。這是我無法得到的結果:
column1 column2
0 Mikhail Maratovich Biden Mr Biden and Mr Trump are high-profile German entrepreneurs... According to publicly available sources... During their career, Mr Biden and Mr Trump have....
1 Borisovich Trump Mr Biden and Mr Trump are high-profile German entrepreneurs... During their career, Mr Biden and Mr Trump have....
2 Aleksey Viktorovich Obama Mr Obama is high-profile Russian...
3 Georgious Bush Mr Bush maintains virtually no... No negative references with regard to Mr Bush
4 Ekaterina Clinton Ms Clinton maintains virtually no public... No negative references with regard to Mr Bush and Ms Clinton....
為了獲得我創建的資料框:
column_names = ["column1", "column2"]
df = pd.DataFrame(columns = column_names)
df.column1 = list1
而且我不知道正確填寫第二列。我試過這個:
info = []
for i in list2:
for j in df.column1:
if ((j.split(' ')[-1] in i) and (j.split(' ')[1] not in i)):
info.append(i)
joined_info = ' '.join(info)
df.column2 = joined_info
還有這個:
info = []
for i in df.column1:
for j in list2:
scanning = False
if ((i.split(' ')[-1] in j) and (i.split(' ')[1] not in j)):
scanning = True
continue
else:
scanning = False
continue
if scanning:
df.column2 = j
但是這些代碼不起作用。
我真的需要你們的幫助,男孩女孩們...
uj5u.com熱心網友回復:
在您的情況下,最后的數字是合并兩個的關鍵list,因此我們需要使用該數字來創建鏈接
s1 = pd.Series(list1,index=[x.split()[1] for x in list1])
s2 = pd.Series(list2,index=[x.split()[1] for x in list2])
out = pd.concat([s1.groupby(level=0).agg(' '.join),s2.groupby(level=0).agg(' '.join)],axis=1)
0 1
1 abc 1 zzz 1
2 abc 2 zzz 2 xxx 2
3 abc 3 NaN
4 abc 4 zzz 4 yyy 4
在這里,在我們得到兩個索引充分的系列之后,我們需要將同一索引行合并為一行,與 groupby join
uj5u.com熱心網友回復:
您可以itertools.groupby在一個簡單的包裝器中使用來構建適當的系列來構建資料框:
list1 = ['abc 1', 'abc 2', 'abc 3', 'abc 4']
list2 = ['zzz 1', 'zzz 2', 'xxx 2', 'zzz 4', 'yyy 4']
from itertools import groupby
def groupbynum(l):
get_num = lambda x: re.search(r'\b(\d )\b', x).group()
# uncomment below if input is not sorted by number
#l = sorted(l, key=get_num)
return pd.Series({k: ', '.join(g) for k,g in
groupby(l, get_num)})
df = pd.DataFrame({'col1': groupbynum(list1),
'col2': groupbynum(list2),})
輸出:
col1 col2
1 abc 1 zzz 1 zz
2 abc 2 zzz zz 2, xxx 2 xx
3 abc 3 NaN
4 abc 4 zzz zz 4, yyy 4 yy
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/413355.html
標籤:
