希望你一切順利。這里有一個問題需要你的幫助。我正在處理一些關于熊貓的資料。

我想獲得與用戶輸入的值距離最近的 2 個點。
例如,如果我輸入 d=4,我想以最快的方式輸出結果((C:18 和 F:14)或(B:3 和 D:7))。我實作的方法是小學生使用的方法,不好意思貼出來。
pandas 或 pyspark 將對我有所幫助。非常感謝。
uj5u.com熱心網友回復:
我們可以嘗試merge
input = 4
out = df.assign(key = df['Value']-input).merge(df.assign(key=df['Value']),on='key')
Out[59]:
Name_x Value_x key Name_y Value_y
0 C 18 14 F 14
1 D 7 3 B 3
2 E 11 7 D 7
uj5u.com熱心網友回復:
這有點復雜,所以我建立了一個類來保存所有方法。每種方法都應該是不言自明的 使用heapq和itertools
import heapq
from itertools import combinations
import pandas as pd
class ClosestDistances:
"""
:arg data: pd.DataFrame
:arg user_selection: int
:arg points: int
:return list[tuple(dict, dict)]
"""
def __init__(self, **kwargs):
df = kwargs.get("data")
self.user_selection = kwargs.get("user_selection")
self.points = kwargs.get("points")
self.df_mapping = dict(zip(df["letter"], df["number"]))
def main(self) -> list:
possible_combinations = self.possible_combinations()
closest_points = self.nearest_difference(possible_combinations)
return self.map_nearest(closest_points)
def nearest_difference(self, combos: list) -> list:
return heapq.nsmallest(self.points, combos, lambda x: abs((x[0] - x[1]) - self.user_selection))
def possible_combinations(self) -> list:
return [sorted(x, reverse=True) for x in combinations(self.df_mapping.values(), self.points)]
def get_keys(self) -> dict:
return {v: k for k, v in self.df_mapping.items()}
def map_nearest(self, closest_points: list) -> list:
iterator = iter([{self.get_keys().get(x): x} for i in closest_points for x in i])
return list(zip(iterator, iterator))
data = pd.DataFrame({
"letter": ["A", "B", "C", "D", "E", "F", "G"],
"number": [12, 3, 18, 7, 11, 14, 5]
})
closest = ClosestDistances(data=data, user_selection=4, points=2).main()
print(closest)
[({'D': 7}, {'B': 3}), ({'C': 18}, {'F': 14})]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/529177.html
上一篇:為什么我在二分搜索中需要 -1?
