我嘗試操縱一些資料框,并做了一個函式來計算兩個城市之間的距離。
def find_distance(A,B):
key = '0377f0e6b42a47fe9d30a4e9a2b3bb63' # get api key from: https://opencagedata.com
geocoder = OpenCageGeocode(key)
result_A = geocoder.geocode(A)
lat_A = result_A[0]['geometry']['lat']
lng_A = result_A[0]['geometry']['lng']
result_B = geocoder.geocode(B)
lat_B = result_B[0]['geometry']['lat']
lng_B = result_B[0]['geometry']['lng']
return int(geodesic((lat_A,lng_A), (lat_B,lng_B)).kilometers)
這是我的資料框
2 32 Mulhouse 1874.0 2?797 16.8 16,3 € 10.012786
13 13 Saint-étienne 1994.0 3?005 14.3 13,5 € 8.009882
39 39 Roubaix 2845.0 2?591 17.4 15,0 € 6.830968
27 27 Perpignan 2507.0 3?119 15.1 13,3 € 6.727255
40 40 Tourcoing 3089.0 2?901 17.5 15,3 € 6.327547
25 25 Limoges 2630.0 2?807 14.2 12,5 € 6.030424
20 20 Le Mans 2778.0 3?202 14.4 12,3 € 5.789559
有我的代碼:
def clean_text(row):
# return the list of decoded cell in the Series instead
return [r.decode('unicode_escape').encode('ascii', 'ignore') for r in row]
def main():
inFile = "prix_m2_france.xlsx" #On ouvre l'excel
inSheetName = "Sheet1" #le nom de l excel
cols = ['Ville', 'Prix_moyen', 'Loyer_moyen'] #Les colomnes
df =(pd.read_excel(inFile, sheet_name = inSheetName))
df[cols] = df[cols].replace({'€': '', ",": ".", " ": "", "\u202f":""}, regex=True)
# df['Prix_moyen'] = df.apply(clean_text)
# df['Loyer_moyen'] = df.apply(clean_text)
df['Prix_moyen'] = df['Prix_moyen'].astype(float)
df['Loyer_moyen'] = df['Loyer_moyen'].astype(float)
# df["Prix_moyen"] = 1
df["revenu"] = (df['Loyer_moyen'] * 12) / (df["Prix_moyen"] * 1.0744) * 100
# df['Ville'].replace({'Le-Havre': 'Le Havre', 'Le-Mans': 'Le Mans'})
df["Ville"] = df['Ville'].replace(['Le-Havre', 'Le-Mans'], ['Le Havre', 'Le Mans'])
df["distance"] = find_distance("Paris", df["Ville"])
df2 = df.sort_values(by = 'revenu', ascending = False)
print(df2.head(90))
main()
df["distance"] = find_distance("Paris", df["Ville"]) 失敗并給我這個錯誤:
opencage.geocoder.InvalidInputError: Input must be an unicode string, not 0 Paris 1 Marseille 2 Lyon 3 T
我把它想象成一個回圈,我將在其中放置巴黎和城市之間的距離,但我想它將所有資料框都放在我的第一個值上。
謝謝你的幫助
(編輯,我剛剛粘貼了我的資料框的一部分)
uj5u.com熱心網友回復:
您可以嘗試以下方法:
df["distance"] = [find_distance("Paris", city) for city in df["Ville"]]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/484942.html
標籤:Python python-3.x 熊猫 数据框
