我有這樣的熊貓資料框。
api region base_path
https://apis.us/image/ us /image
https://apis.emea/video/ emea /video
https://apis.asia/docs/ asia /docs
https://apis.emea/image/ emea /image
https://apis.us/video/ us /video
https://apis.us/docs/ us /docs
https://apis.asia/location/ asia /location
從 api 串列中,很少有 api 在 1 個以上的區域中/image是通用的。例如:us和emea. 我想要的輸出資料幀是這樣的:
api_us_emea api_asia_us api_asia_emea api_us_emea_asia api_usa api_emea api_asia
https://apis.us/image/ https://apis.us/docs/ No Common api No Common api N/A N/A https://apis.asia/location/
https://apis.us/video/
在這里,對于常見的 api,我總是希望usapi 出現在列值中。例如:api_us_emea列僅包含美國 api、api_asia_emea asiaapi 和api_us_emea_asia usapi。你好,我能做到這一點嗎?
uj5u.com熱心網友回復:
我認為這個代碼片段應該給你你想要的,或者至少是一個合理的方向來解決你的問題。基本上遍歷可能的區域子集,并為該子集獲取所有相關的 base_paths。洗掉那些我們已經在包含我們當前正在查看的子集的更大子集中使用的那些。希望我有所幫助。
from collections import defaultdict
import pandas as pd
from itertools import chain, combinations
data = [['https://apis.us/image/', 'us', '/image'],
['https://apis.emea/video/', 'emea', '/video'],
['https://apis.asia/docs/', 'asia', '/docs'],
['https://apis.emea/image/', 'emea', '/image'],
['https://apis.us/video/', 'us', '/video'],
['https://apis.us/docs/', 'us', '/docs'],
['https://apis.asia/location/', 'asia', '/location']]
df = pd.DataFrame(data, columns=['api', 'region', 'base_path'])
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s) 1))
def flatten(t):
return [item for sublist in t for item in sublist]
new_dict = defaultdict(list)
for subset in reversed(list(powerset(pd.unique(df['region'])))):
if len(subset) > 0:
for api_path in pd.unique(df['base_path']):
df_path = df[df['base_path'] == api_path]
if set(subset).issubset(set(pd.unique(df_path['region']))):
new_dict[subset].append(api_path)
curr_keys = list(new_dict.keys())
for key in curr_keys:
if set(subset).issubset(set(key)) and len(key) > len(subset):
for remove_path in [x for x in new_dict[subset] if x in new_dict[key]]:
new_dict[subset].remove(remove_path)
new_df = pd.DataFrame({k: pd.Series(v) for k, v in new_dict.items()})
new_df 看起來像這樣:

uj5u.com熱心網友回復:
嘗試這個:
import itertools
import functools, operator
def find_coomon_elements(p):
return list(set.intersection(*[set(li) for li in p]))
def find_unique_elements(p, l):
merged_p = functools.reduce(operator.iconcat, p, [])
return [x for x in l if merged_p.count(x)==1]
strings_array = df["api"].str[:-1].str.split("/").str[-2:].apply(lambda x: (x[0][5:], x[1])).values
d = dict()
[d[t[0]].append(t[1]) if t[0] in list(d.keys()) else d.update({t[0]: [t[1]]}) for t in strings_array]
se = set([x[0] for x in strings_array])
combs = [list(itertools.combinations(se, i)) for i in range(1, len(se) 1)]
col1, col2 = [], []
for item in combs[0]:
col1.append("_".join(["api"] list(item)))
col2.append(["https://apis." item[0] "/" s for s in find_unique_elements([d[c] for c in d.keys()], d[item[0]])])
for i in range(1, len(combs)):
for item in combs[i]:
common = find_coomon_elements([d[c] for c in item])
if len(common)>0:
col1.append("_".join(["api"] list(item)))
col2.append(["https://apis." item[0] "/" s for s in common])
else:
col1.append("_".join(["api"] list(item)))
col2.append("No Common api")
output_df = pd.DataFrame({"col1":col1, "col2":col2})
output_df
輸出:
col1 col2
0 api_us []
1 api_asia [https://apis.asia/location]
2 api_emea []
3 api_us_asia [https://apis.us/docs]
4 api_us_emea [https://apis.us/image, https://apis.us/video]
5 api_asia_emea No Common api
6 api_us_asia_emea No Common api
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/369408.html
上一篇:帶逗號的名字需要在名字前添加空格
下一篇:每行比較兩列Pandas行
