洗掉陣列中重復的第一個值...Python、Numpy、Pandas、陣列-有解無憂

所以我確實有這個 NumPy 陣列結果（最終），我想減少它，我的意思是，如果該值重復，那么我想洗掉第一個值并保持第二個、第三個值重復等等......

import hmac
import hashlib
import time
from argparse import _MutuallyExclusiveGroup
from tkinter import *
import pandas as pd
import base64
import matplotlib.pyplot as plt
import numpy as np


key="800070FF00FF08012"
key=bytes(key,'utf-8')
collision=[]
for x in range(1,1000001):
    msg=bytes(f'{x}','utf-8')
    digest = hmac.new(key, msg,"sha256").digest()
    code = base64.b64encode(digest).decode('utf-8')
    code=code[:6]
    key=key.replace(key,digest)
    collision.append(code)

df=pd.DataFrame(collision)
df=df[df.duplicated(keep=False)]
df_index=df.index.to_numpy()
df=df.values.flatten()
final=np.stack((df_index,df),axis=1)

Results of the variable "final":

I HAVE:
[[14093 'JRp1kX']
 [43985 'KGlW7X']
 [59212 'pU97Tr']
 [90668 'ecTjTB']
 [140615 'JRp1kX']
 [218480 '25gtjT']
 [344174 'dtXg6E']
 [380467 'DdHQ3M']
 [395699 'vnFw/c']
 [503504 'dtXg6E']
 [531073 'KGlW7X']
 [633091 'ecTjTB']
 [671091 'vnFw/c']
 [672111 '25gtjT']
 [785568 'pU97Tr']
 [991540 'DdHQ3M']
 [991548 'JRp1kX']]


And I WANT TO HAVE:
 [[140615 'JRp1kX']
 [503504 'dtXg6E']
 [531073 'KGlW7X']
 [633091 'ecTjTB']
 [671091 'vnFw/c']
 [672111 '25gtjT']
 [785568 'pU97Tr']
 [991540 'DdHQ3M']
 [991548 'JRp1kX']]

消除陣列中重復的第一個值。有人有一些適用于我的情況的代碼嗎？

更簡單地說，如果你有這個串列 [1,2,3,4,5,1,3,5,5] 我想有 [2,4,1,3,5,5]

uj5u.com熱心網友回復：

df = pd.DataFrame([1, 2, 3, 4, 5, 1, 3, 5, 5])

# keep the unique rows
unique_mask = ~df.duplicated(keep=False)

# keep the repeated rows (skipping the first for each non-unique)
repeated_mask = df.duplicated()

df.loc[unique_mask | repeated_mask]

   0
1  2
3  4
5  1
6  3
7  5
8  5

uj5u.com熱心網友回復：

final是一個 numpy 陣列，因此您可以np.unique在第二列上使用來獲取第一次出現的索引和出現次數，以避免洗掉單個值

_, idx, counts = np.unique(final[:, 1], return_index=True, return_counts=True)
idx = idx[counts > 1]
final = np.delete(final, idx, axis=0)

這將適用于ndarray您的第二個一維陣列示例使用

_, idx, counts = np.unique(final, return_index=True, return_counts=True)

uj5u.com熱心網友回復：

也許你可以創建for回圈。

to_remove = list()

for i in range(len(your_list)):
   if your_list[i] in your_list[i:]:
      to_remove.append(i)

removed_count = 0
for i in to_remove:
   del your_list[i - removed_count]
   removed_count  = 1

您不能del在第一個周期立即進行，因為i要迭代下一個數字，這將導致每次洗掉一個數字時都會跳過數字。

[i - removed_count]因為每次洗掉較低的索引時，較高的索引都會立即減少一。

我認為它可以以更有效的方式撰寫，但這應該可以作業，也許幾乎沒有什么變化。

uj5u.com熱心網友回復：

生成 df 后，添加以下行：

df=pd.DataFrame(collision)
# ... your code ends here
removed_already=[]
for idx in df[df.duplicated(keep=False)].index:
    if df.loc[idx][0] not in removed_already:
         removed_already.append(df.loc[idx][0])
         df.drop(index=idx, inplace=True)
# your code continues
df_index=df.index.to_numpy()
df=df.values.flatten()
final=np.stack((df_index,df),axis=1)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/530651.html

標籤：Python数组熊猫麻木的

上一篇：（Python）相同的程序但缺少一張圖片，名稱為“output_y：389_x：150.png”

下一篇：名稱‘generateRandom’未定義”