我有一個非常大的 json 檔案,其中有數千行看起來像這樣(抓取):
[
{"result": ["/results/1138/dundalk-aw/2022-03-11/806744", "/results/1138/dundalk-aw/2022-03-11/806745", "/results/1138/dundalk-aw/2022-03-11/806746", "/results/1138/dundalk-aw/2022-03-11/806747", "/results/1138/dundalk-aw/2022-03-11/806748", "/results/1138/dundalk-aw/2022-03-11/806749", "/results/1138/dundalk-aw/2022-03-11/806750", "/results/1138/dundalk-aw/2022-03-11/806751", "/results/14/exeter/2022-03-11/804190", "/results/14/exeter/2022-03-11/804193", "/results/14/exeter/2022-03-11/804194", "/results/14/exeter/2022-03-11/804192", "/results/14/exeter/2022-03-11/804196", "/results/14/exeter/2022-03-11/804191", "/results/14/exeter/2022-03-11/804195", "/results/30/leicester/2022-03-11/804201", "/results/30/leicester/2022-03-11/804200", "/results/30/leicester/2022-03-11/804198", "/results/30/leicester/2022-03-11/804197", "/results/30/leicester/2022-03-11/804199", "/results/30/leicester/2022-03-11/804202", "/results/37/newcastle/2022-03-11/804181", "/results/37/newcastle/2022-03-11/804179", "/results/37/newcastle/2022-03-11/804182", "/results/37/newcastle/2022-03-11/804180", "/results/37/newcastle/2022-03-11/804177", "/results/37/newcastle/2022-03-11/804176", "/results/37/newcastle/2022-03-11/804178", "/results/513/wolverhampton-aw/2022-03-11/804352", "/results/513/wolverhampton-aw/2022-03-11/804353", "/results/513/wolverhampton-aw/2022-03-11/806925", "/results/513/wolverhampton-aw/2022-03-11/804350", "/results/513/wolverhampton-aw/2022-03-11/804354", "/results/513/wolverhampton-aw/2022-03-11/804349", "/results/513/wolverhampton-aw/2022-03-11/804351", "/results/1303/al-ain/2022-03-11/806926", "/results/1244/goulburn/2022-03-11/807045", "/results/869/sakhir/2022-03-11/806948", "/results/1244/goulburn/2022-03-11/807045", "/results/869/sakhir/2022-03-11/806948"]},
{"result": ["/results/8/carlisle/2022-03-10/804174", "/results/8/carlisle/2022-03-10/804172", "/results/8/carlisle/2022-03-10/804170", "/results/8/carlisle/2022-03-10/804175", "/results/8/carlisle/2022-03-10/804171", "/results/8/carlisle/2022-03-10/804173", "/results/8/carlisle/2022-03-10/805620", "/results/1353/newcastle-aw/2022-03-10/804340", "/results/1353/newcastle-aw/2022-03-10/804341", "/results/1353/newcastle-aw/2022-03-10/804338", "/results/1353/newcastle-aw/2022-03-10/804342", "/results/1353/newcastle-aw/2022-03-10/804337", "/results/1353/newcastle-aw/2022-03-10/804339", "/results/394/southwell-aw/2022-03-10/804346", "/results/394/southwell-aw/2022-03-10/804344", "/results/394/southwell-aw/2022-03-10/804345", "/results/394/southwell-aw/2022-03-10/804348", "/results/394/southwell-aw/2022-03-10/806779", "/results/394/southwell-aw/2022-03-10/804343", "/results/394/southwell-aw/2022-03-10/804347", "/results/394/southwell-aw/2022-03-10/806778", "/results/198/thurles/2022-03-10/806623", "/results/198/thurles/2022-03-10/806624", "/results/198/thurles/2022-03-10/806625", "/results/198/thurles/2022-03-10/806626", "/results/198/thurles/2022-03-10/806627", "/results/198/thurles/2022-03-10/806628", "/results/198/thurles/2022-03-10/806629", "/results/90/wincanton/2022-03-10/804183", "/results/90/wincanton/2022-03-10/804186", "/results/90/wincanton/2022-03-10/804188", "/results/90/wincanton/2022-03-10/804185", "/results/90/wincanton/2022-03-10/804187", "/results/90/wincanton/2022-03-10/804184", "/results/90/wincanton/2022-03-10/804189", "/results/219/saint-cloud/2022-03-10/807032", "/results/219/saint-cloud/2022-03-10/806812", "/results/219/saint-cloud/2022-03-10/806837", "/results/219/saint-cloud/2022-03-10/807033", "/results/219/saint-cloud/2022-03-10/807037", "/results/219/saint-cloud/2022-03-10/807041", "/results/219/saint-cloud/2022-03-10/807042", "/results/219/saint-cloud/2022-03-10/807043", "/results/219/saint-cloud/2022-03-10/807044", "/results/219/saint-cloud/2022-03-10/806837", "/results/219/saint-cloud/2022-03-10/807033"]}
]
現在,在“結果”陣列中有一些重復項。在這種情況下,例如/results/1244/goulburn/2022-03-11/807045
我怎樣才能過濾掉這些重復項?我在 Stackoverflow 上找到了一些解決方案來檢查重復的“結果”陣列,而不是檢查陣列中的任何內容是否重復。至少我嘗試過的任何事情都沒有奏效,但我想我搞砸了。嘗試了兩天,但我自己無法弄清楚這一點——或者我太愚蠢了,無法在 stackoverflow 上的類似問題中找到答案——而且我的 Java 知識非常有限。
我考慮將 json 轉換為串列,然后過濾掉重復項,但這對于大檔案來說似乎很笨重?
uj5u.com熱心網友回復:
加載完所有 JSON 資料后,您可以映射結果,洗掉重復的資料set并轉換回list以保留原始結構:
data = [{...}] # large JSON data list
data = list(map(lambda x: {'result': list(set(x['result']))}, data))
uj5u.com熱心網友回復:
您可以輕松地set在 python 中使用資料型別,并且可以使用進入 Dict 檔案的回圈并逐個執行并將其附加(使用 update())到一個集合變數,遺憾的是我懶得寫將解決您的問題的代碼,但您可以閱讀有關集合的更多資訊并撰寫代碼(
W3-School
如何附加到集合)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/442518.html
上一篇:JupyterNotebookAttributeError:Python中的SQL資料庫
下一篇:如何使其三元化?
