遍歷資料框中的每一行，在第二個資料框中搜索此值，如果匹配，則從df1獲取一個值，從df2獲取另一個值-有解無憂

我有兩個資料框：

研究人員：所有研究人員及其 id_number 的串列
樣本：樣本串列和所有與之相關的研究人員，同一個單元格中可能有多個研究人員。

我想查看研究表中的每一行，并檢查它們是否出現在表樣本的每一行中。如果他們這樣做，我想得到：a) 他們從研究人員表中得到的 id 和從樣本表中得到的樣本編號。

表研究員

   id_researcher             full_name
0               1         Jack Sparrow
1               2           Demi moore
2               3              Bickman
3               4       Charles Darwin
4               5            H. Haffer

表樣

     sample_number                            collector
230  INPA A 231                                  Haffer
231  INPA A 232                          Charles Darwin
232  INPA A 233                                     NaN
233  INPA A 234                                     NaN
234  INPA A 235      Jack Sparrow; Demi Moore ; Bickman

我想要的輸出：

            id_researcher     num_samples
0               5             INPA A 231
1               4             INPA A 232
2               1             INPA A 235
3               2             INPA A 235
4               3             INPA A 235

我可以使用以下代碼在常規 python 中使用回圈來實作它，但它非常低而且很長。有誰知道更快更簡單的方法？也許與熊貓適用？

id_researcher = []
id_num_sample = []
for c in range(len(data_researcher)):
    for a in range(len(data_samples)):
        if pd.isna(data_samples['collector'].iloc[a]) == False and data_researcher['full_name'].iloc[c] in data_samples['collector'].iloc[a]:
                    id_researcher.append(data_researcher['id_researcher'].iloc[c])
                    id_num_sample.append(data_samples['No TEC'].iloc[a])
    
data_researcher_sample = pd.DataFrame.from_dict({'id_pesq': id_researcher, 'num_sample': id_num_sample}).sort_values(by='num_amostra')

uj5u.com熱心網友回復：

你有幾個資料清理作業，以小寫做這樣的“摩爾定律”，“Haffer”在一種情況下名字的首字母，沒有在其它等標準化你的兩個dataframes后，您可以split與explode collections和使用merge：

samples['collector'] = samples['collector'].str.split(';')
samples = samples.explode('collector')
samples['collector'] = samples['collector'].str.strip()
out = researchers.merge(samples, right_on='collector', left_on='full_name', how='left')[['id_researcher','sample_number']].sort_values(by='sample_number').reset_index(drop=True)

輸出：

   id_researcher sample_number
0              5    INPA A 231
1              4    INPA A 232
2              1    INPA A 235
3              2    INPA A 235
4              3    INPA A 235

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/398829.html

標籤：Python 熊猫数据框循环

上一篇：如何排列熊貓資料透視表列？

下一篇：保留資料框中給定觀察值最少缺失值的條目