我目前正在嘗試破解一個編程難題,該難題具有非常簡單的資料框,host其中有 2 列名為city和amenities(都是object資料型別)。現在,兩列中的條目可以重復多次。下面是hostis beLOW的前幾個條目
City Amenities Price($)
NYC {TV,"Wireless Internet", "Air conditioning","Smoke 8
detector",Essentials,"Lock on bedroom door"}
LA {"Wireless Internet",Kitchen,Washer,Dryer,"First aid
kit",Essentials,"Hair dryer","translation missing:
en.hosting_amenity_49","translation missing:
en.hosting_amenity_50"}
10
SF {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,"Free
parking on premises","Pets live on this
property",Dog(s),"Indoor fireplace","Buzzer/wireless
intercom",Heating,Washer,Dryer,"Smoke detector","Carbon
monoxide detector","First aid kit","Safety card","Fire e
extinguisher",Essentials,Shampoo,"24-hour check-
in",Hangers,"Hair dryer",Iron,"Laptop friendly
workspace","translation missing:
en.hosting_amenity_49","translation missing:
en.hosting_amenity_50","Self Check-In",Lockbox} 15
NYC {"Wireless Internet","Air
conditioning",Kitchen,Heating,"Suitable for events","Smoke
detector","Carbon monoxide detector","First aid kit","Fire
extinguisher",Essentials,Shampoo,"Lock on bedroom
door",Hangers,"translation missing:
en.hosting_amenity_49","translation missing:
en.hosting_amenity_50"} 20
LA {TV,Internet,"Wireless Internet","Air
conditioning",Kitchen,"Free parking on
premises",Essentials,Shampoo,"translation missing:
en.hosting_amenity_49","translation missing:
en.hosting_amenity_50"}
LA {TV,"Cable TV",Internet,"Wireless Internet",Pool,Kitchen,"Free
parking on premises",Gym,Breakfast,"Hot tub","Indoor
fireplace",Heating,"Family/kid friendly",Washer,Dryer,"Smoke
detector","Carbon monoxide detector",Essentials,Shampoo,"Lock
on bedroom door",Hangers,"Private entrance"} 28
.....
題。輸出設施數量最多的城市。
我的嘗試。我試圖使用groupby()功能,它基于列組city使用host.groupby('city').現在,我需要算成功每套設施的元素的數量。由于資料型別不同,該len()函式不起作用,因為\集合中的每個元素之間都有(例如,如果我使用host['amenities'][0],輸出 is "{TV,\"Wireless Internet\",\"Air conditioning\",\"Smoke detector\",\"Carbon monoxide detector\",Essentials,\"Lock on bedroom door\",Hangers,Iron}"。應用于len()此輸出將導致134,這顯然是不正確的)。我嘗試使用host['amenities'][0].strip('\n')which 洗掉了\,但該len()功能仍然給出134.
任何人都可以幫我解決這個問題嗎?
對ddejohn 的要求:
的輸出 host["Amenities"] = host["Amenities"].str.replace ...

代碼第二行的輸出

原始資料集 travel

輸出后 travel.groupby('city')

uj5u.com熱心網友回復:
解決方案
import functools
# Process the Amenities strings into sets of strings
host["amenities"] = host["amenities"].str.replace('["{}]', "", regex=True).str.split(",").apply(set)
# Groupby city, perform the set union to remove duplicates, and get count of unique amenities
amenities_by_city = host.groupby("city")["amenities"].apply(lambda x: len(functools.reduce(set.union, x))).reset_index()
輸出:
city amenities
0 LA 27
1 NYC 17
2 SF 29
通過以下方式獲得擁有最多便利設施的城市
city_with_most_amenities = amenities_by_city.query("amenities == amenities.max()")
輸出:
city amenities
2 SF 29
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/347387.html
