如何將Pandas資料框單列中的多個昏迷分隔值分配給單獨但相關的列-有解無憂

我有一個簡單的 Pandas 資料框，其中 'SM_platform' 列包含多個昏迷分隔值，如 1,2,7。現在我想將這些值分配給資料框中單獨的相關列。例如：1 應該添加到列名 FB，2 到 Twitter，3 到 Youtube ......等等......請建議如何執行此任務。
感謝你的幫助。謝謝

   Age  SM_Platform
0   3   1, 2, 3, 7
1   3   1, 2, 3, 5, 7
2   1   1, 2, 3, 4
3   2   1, 2, 3, 4
4   1   1, 2

更新 - - - - - -

使用@Corralien 回答我收到以下錯誤

  KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'SM_Platform'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-b1a706d03e0a> in <module>
      4 
      5 out = df.join(
----> 6     df.pop('SM_Platform').str.split(', ').explode().astype(int)
      7       .replace(m).reset_index().assign(dummy=1)
      8       .pivot_table('dummy', 'index', 'SM_Platform', fill_value=0))

~\Anaconda3\lib\site-packages\pandas\core\frame.py in pop(self, item)
   5224         3  monkey        NaN
   5225         """
-> 5226         return super().pop(item=item)
   5227 
   5228     @doc(NDFrame.replace, **_shared_doc_kwargs)

~\Anaconda3\lib\site-packages\pandas\core\generic.py in pop(self, item)
    868 
    869     def pop(self, item: Hashable) -> Series | Any:
--> 870         result = self[item]
    871         del self[item]
    872 

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'SM_Platform'

任何幫助請

uj5u.com熱心網友回復：

使用str.get_dummies，并從Corralien 的答案中借用標簽......

labels = {'1': 'Facebook',
          '2': 'Twitter',
          '3': 'Youtube',
          '4': 'Linkedin',
          '5': 'Instagram',
          '6': 'Pinterest',
          '7': 'TikTok'}

df = pd.concat([df, df['SM_Platform'].str.get_dummies(', ').rename(columns=labels)], axis=1)

   Age    SM_Platform  Facebook  Twitter  Youtube  Linkedin  Instagram  TikTok
0    3     1, 2, 3, 7         1        1        1         0          0       1
1    3  1, 2, 3, 5, 7         1        1        1         0          1       1
2    1     1, 2, 3, 4         1        1        1         1          0       0
3    2     1, 2, 3, 4         1        1        1         1          0       0
4    1           1, 2         1        1        0         0          0       0

uj5u.com熱心網友回復：

創建您的平臺的映射（1 -> Facebook、2 -> Twitter 等），然后SM_Platform在用相應名稱替換數值之前分解您的列。添加一dummy列并透視您的資料框：

l = ['Facebook', 'Twitter', 'Youtube', 'Linkedin',
     'Instagram', 'Pinterest', 'TikTok']
m = dict(enumerate(l, 1))

out = df.join(
    df['SM_Platform'].str.findall(r'\d ').explode().astype(int)
        .replace(m).reset_index().assign(dummy=1)
        .pivot_table('dummy', 'index', 'SM_Platform', fill_value=0)
)

輸出：

>>> out
   Age    SM_Platform  Facebook  Instagram  Linkedin  TikTok  Twitter  Youtube
0    3     1, 2, 3, 7         1          0         0       1        1        1
1    3  1, 2, 3, 5, 7         1          1         0       1        1        1
2    1     1, 2, 3, 4         1          0         1       0        1        1
3    2     1, 2, 3, 4         1          0         1       0        1        1
4    1           1, 2         1          0         0       0        1        0

>>> m
{1: 'Facebook',
 2: 'Twitter',
 3: 'Youtube',
 4: 'Linkedin',
 5: 'Instagram',
 6: 'Pinterest',
 7: 'TikTok'}

更新

“ValueError：int() 的無效文字以 10 為基數：'2,\xa03'”

似乎您有不可見的空白，'\xa0'因此提取所有數字可能會更好。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/397672.html

標籤：熊猫

上一篇：熊貓將（不等長）串列的列拆分為python中的多列

下一篇：無法按索引對資料框進行排序。熊貓df.value_counts()