在pandas中創建包含列及其唯一值的資料框-有解無憂

我試圖尋找一種方法來創建列及其唯一值的資料框。我知道這用例較少，但將是初步了解唯一值的好方法。它看起來像這樣......

狀態	縣	城市
科羅拉多州	丹佛	丹佛
科羅拉多州	埃爾帕索	科羅拉多斯普林斯
科羅拉多州	拉里馬爾	柯林斯堡
科羅拉多州	拉里馬爾	拉夫蘭

變成了這個...

狀態	縣	城市
科羅拉多州	丹佛	丹佛
	埃爾帕索	科羅拉多斯普林斯
	拉里馬爾	柯林斯堡
		拉夫蘭

uj5u.com熱心網友回復：

我會使用mask一個 lambda

df.mask(df.apply(lambda x : x.duplicated(keep='first'))).fillna('')

      State   County              City
0  Colorado   Denver            Denver
1            El Paso  Colorado Springs
2            Larimar      Fort Collins
3                             Loveland

uj5u.com熱心網友回復：

這是我想出的最好的解決方案，希望能幫助其他人尋找類似的東西！

def create_unique_df(df) -> pd.DataFrame:
    """ take a dataframe and creates a new one containing unique values for each column
    note, it only works for two columns or more

    :param df: dataframe you want see unique values for
    :param type: pandas.DataFrame
    return: dataframe of columns with unique values
    """
    # using list() allows us to combine lists down the line
    data_series = df.apply(lambda x: list( x.unique() ) )

    list_df = data_series.to_frame()

    # to create a df from lists they all neet to be the same leng. so we can append null 
    # values
    # to lists and make them the same length. First find differenc in length of longest list and
    # the rest
    list_df['needed_nulls'] = list_df[0].str.len().max() - list_df[0].str.len()

    # Second create a column of lists with one None value
    list_df['null_list_placeholder'] = [[None] for _ in range(list_df.shape[0])]

    # Third multiply the null list times the difference to get a list we can add to the list of
    # unique values making all the lists the same length. Example: [None] * 3  == [None, None, 
    # None]
    list_df['null_list_needed'] = list_df.null_list_placeholder * list_df.needed_nulls
    list_df['full_list'] = list_df[0]   list_df.null_list_needed

    unique_df = pd.DataFrame(
        list_df['full_list'].to_dict()
    )

    return unique_df

uj5u.com熱心網友回復：

原始資料框。如果下次您提出問題時，您可以為我們提供可重現的代碼，我們可以使用它來幫助回答您的問題。在這里，我為你做這件事，但下次你問問題時就知道了。

import pandas as pd

df = pd.DataFrame({
    'State': ['Colorado', 'Colorado', 'Colorado', 'Colorado'], 
    'County': ['Denver', 'El Paso', 'Larimar', 'Larimar'],
    'City': ['Denver', 'Colorado Springs', 'Fort Collins', 'Loveland']
})

df

    State     County   City
0   Colorado  Denver   Denver
1   Colorado  El Paso  Colorado Springs
2   Colorado  Larimar  Fort Collins
3   Colorado  Larimar  Loveland

分別從每列中洗掉重復項，然后連接。NaN用空字串填充。

pd.concat([
    df.State.drop_duplicates(),
    df.County.drop_duplicates(),
    df.City.drop_duplicates()
], axis=1).fillna('')

輸出：

    State     County   City
0   Colorado  Denver   Denver
1             El Paso  Colorado Springs
2             Larimar  Fort Collins
3                      Loveland

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/531550.html

標籤：Python熊猫独特的

上一篇：在Python中合并多列并重新填充NAN值

下一篇：從兩個熊貓資料框列中查找值的總和