Pytest：比較兩個json檔案-有解無憂

我有一個創建 JSON 檔案的 API，如下所示：

"tesla_2.0": {
        "kind": "Auto",
        "tar_path": "/home/scripts/project_2/tesla_2.0.zip",
        "version": "2.0",
        "yaml_path": "/home/scripts/project_2/test.yaml",
        "name": "tesla"
    }

因為我是從檔案中讀取它，所以我使用 json.load() 會丟失保存物件的順序，除非我告訴它加載到 OrderedDict() 中。

有沒有一種簡單有效的方法來比較檔案？

 def compare_json_files(file_1, file_2):
        if not os.path.isfile(file_1):
            raise FileNotFoundError("File not found: {}".format(file_1))
        if not os.path.isfile(file_2):
            raise FileNotFoundError("File not found: {}".format(file_2))
        with open(file_1, 'r') as f1:
            data_1 = json.loads(f1)
        with open(file_2, 'r') as f2:
            data_2 = json.loads(f2)
        comparison operation

Python 版本：3.5.2

uj5u.com熱心網友回復：

我相信你可以檢查每個鍵和值。您應該首先檢查兩側的鍵集是否相等，然后逐個鍵比較才有意義。

assert(data_1.keys() == data_2.keys())
err_log = [['Err log:']] 
for k, v in data_1.items():
    try:
        assert(v == data_2[k])
    except:
        err_log.append(['Error catched for key=', k, ', data_1 value=', v, ', data_2 value=', data_2[k]])
[print(str(e)) for e in err_log]

編輯 3：在非常大的字典上測驗。

對于非常大的字典，使用排序的鍵串列的 itemgetter 可以獲得最佳結果。

遍歷字典的所有鍵是最糟糕的。使用有序的鍵串列進行迭代似乎性能稍好一些。

結果：

n = 100：
- 4e-3秒
- 3e-3秒
- 1.5e-3 秒
- 1.9e-3 秒
n = 1,000,000：
- 8.9 秒
- 9.2 秒
- 7.2 秒
- 6.9 秒
n = 10,000,000：
- 143 秒
- 130 秒
- 115 秒
- 99 秒

from copy import deepcopy
from time import time
from operator import itemgetter

n = 10000000
v = {"stuff": "here", "and": "there"}
data_1 = {str(k): deepcopy(v) for k in range(0, n)}
data_2 = {str(k): deepcopy(v) for k in range(n-1, -1, -1)}

def get_time(f):
    def _(*args, **kwargs):
        t_0 = time()
        for x in range(10):
            f(*args, **kwargs)
        return time() - t_0
    return _

def with_dict_keys(d):
    return d.keys()

def with_sorted_dict_keys(d):
    return sorted(d.keys())

@get_time
def order_n_compare(key_func, d, d_):
    k_d, k_d_ = key_func(d), key_func(d_)
    assert(k_d == k_d_)
    for k in k_d:
        assert(d[k] == d_[k])


@get_time
def itemgetter_compare(key_func, d, d_):
    k_d, k_d_ = key_func(d), key_func(d_)
    assert(k_d == k_d_)
    assert(itemgetter(*k_d)(d) == itemgetter(*k_d)(d_))

編輯 0：添加 try & except 塊以列印斷言錯誤的地方

編輯 1：修復小錯誤

Edit 2: Check computation time: the `dictionnary.keys()` operation is irrelevant over iterating through all keys in `data_1.items()` because it grows order n. So it's not really necessary to optimize it.

Note: If sorting dict.keys() is order(log(n)) then the operation time of getting dict.keys() seems to be order log(n) too.

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/329212.html

標籤：Python 蟒蛇-3.x

上一篇：膨脹瓦得到自動崩潰的飄帶

下一篇：僅將csv的最后X行轉換為json