我有以下 csv 檔案,我正在使用 pandas 資料框讀取它:
Timestamp, UTC, id, loc, spd
001, 12z, q20, "52, 13", 320
002, 13z, a32, "53, 12", 321
003, 14z, q32, "54, 11", 321
004, 15`, a43, "55, 10", 330
我正在提取資料如下:
import pandas as pd
import matplotlib.pyplot as plt
fname = "data.csv"
data = pd.read_csv(fname,sep=",", header=None, skiprows=1)
data.columns = ["Timestamp", "UTC", "Callsign", "Position", "Speed", "Direction"]
t = data["Timestamp"]
utc = data["UTC"]
acid = data["Callsign"]
pos = data["Position"]
spd = ["Speed"]
但是,對于位置列,這會導致該列中每行有 2 個值。我想在一個單獨的串列中有第一個位置值以及在一個單獨的串列中的第二個值,如下所示:
latitude = [52,53,54,55]
longitude = [13,12,11,10]
如何使用 pandas 資料框選擇它?
uj5u.com熱心網友回復:
如果需要 2 個新列,請使用,然后轉換為浮點數Series.str.strip:Series.str.split
data[['lat','lon']] = (data["Position"].str.strip('"')
.str.split(',\s ', expand=True)
.astype(float))
print (data)
Timestamp UTC Callsign Position Speed lat lon
0 1 12z q20 "52, 13" 320 52.0 13.0
1 2 13z a32 "53, 12" 321 53.0 12.0
2 3 14z q32 "54, 11" 321 54.0 11.0
3 4 15` a43 "55, 10" 330 55.0 10.0
如果需要 2 個串列:
lat, lon = (data["Position"].str.strip('"')
.str.split(',\s ', expand=True)
.astype(float)
.to_numpy()
.T.tolist())
print (lat, lon)
[52.0, 53.0, 54.0, 55.0] [13.0, 12.0, 11.0, 10.0]
uj5u.com熱心網友回復:
我們可以str.extract在這里使用,然后是演員表:
data[["lat", "lng"]] = data["Position"].str.extract(r'(-?\d (?:\.\d )?),\s*(-?\d (?:\.\d )?)')
data["lat"] = data["lat"].astype(float)
data["lng"] = data["lng"].astype(float)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/505730.html
上一篇:具有兩個列識別符號的滾動平均值
下一篇:為什么我會收到此功能的警告?
