如何有效地修復從熊貓資料框轉換的JSON檔案-有解無憂

我有一個 JSON 檔案，我在 pandas 中讀取并轉換為資料框。然后我將此檔案匯出為 CSV，以便我可以更輕松地對其進行編輯。完成后，我將 CSV 檔案讀回資料幀，然后想將其轉換回 JSON 檔案。然而，在那個程序中，大量額外的資料被自動添加到我原來的字典串列（JSON 檔案）中。

我確信我可以一起破解修復，但想知道是否有人知道處理此程序的有效方法，以便不會將新資料或列添加到我的原始 JSON 資料中？

原始 JSON（片段）：

  [
    {
        "tag": "!= (not-equal-to operator)",
        "definition": "",
        "source": [
            {
                "title": "Compare Dictionaries",
                "URL": "https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch08.html#idm45795007002280"
            }
        ]
    },
    {
        "tag": "\"intelligent\" applications",
        "definition": "",
        "source": [
            {
                "title": "Why Machine Learning?",
                "URL": "https://learning.oreilly.com/library/view/introduction-to-machine/9781449369880/https://learning.oreilly.com/library/view/introduction-to-machine/9781449369880/ch01.html#idm45613685872600"
            }
        ]
    },
    {
        "tag": "# (pound sign)",
        "definition": "",
        "source": [
            {
                "title": "Comment with #",
                "URL": "https://learning.oreilly.com/library/view/introducing-python-2nd/9781492051374/ch04.html#idm45795038172984"
            }
        ]
    },

CSV 作為資料框（自動添加索引）：

    tag definition  source
0   != (not-equal-to operator)      [{'title': 'Compare Dictionaries', 'URL': 'htt...
1   "intelligent" applications      [{'title': 'Why Machine Learning?', 'URL': 'ht...
2   # (pound sign)      [{'title': 'Comment with #', 'URL': 'https://l...
3   $ (Mac/Linux prompt)        [{'title': 'Test Driving Python', 'URL': 'http...
4   $ (anchor)      [{'title': 'Patterns: Using Specifiers', 'URL'...
... ... ... ...
11375   { } (curly brackets)        []
11376   | (vertical bar)        [{'title': 'Combinations and Operators', 'URL'...
11377   || (concatenation) function (DB2/Oracle/Postgr...       [{'title': 'Discussion', 'URL': 'https://learn...
11378   || (for Oracle Database)        [{'title': 'Including special characters', 'UR...
11379   || (vertical bar, double), concatenation opera...       [{'title': 'Including special characters', 'UR...
7009 rows × 3 columns

從 CSV 轉換后的 JSON 檔案（各種糟糕）：

{
  "0":{
    "Unnamed: 0":0,
    "tag":"!= (not-equal-to operator)",
    "definition":null,
    "source":"[{'title': 'Compare Dictionaries', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch08.html#idm45795007002280'}]"
  },
  "1":{
    "Unnamed: 0":1,
    "tag":"\"intelligent\" applications",
    "definition":null,
    "source":"[{'title': 'Why Machine Learning?', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/ch01.html#idm45613685872600'}]"
  },
  "2":{
    "Unnamed: 0":2,
    "tag":"# (pound sign)",
    "definition":null,
    "source":"[{'title': 'Comment with #', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch04.html#idm45795038172984'}]"
  },

這是我的代碼：

import pandas as pd
import json

# to dataframe
tags_df = pd.read_json('dsa_tags_flat.json')

# csv file was manually cleaned then reloaded here
cleaned_csv_df = pd.read_csv('dsa-parser-flat.csv')

# write to JSON
cleaned_csv_df.to_json(r'dsa-tags.json', orient='index', indent=2)

EDIT: I added an index=false to the code when going from dataframe to CSV, which helped, but still have the index of keys there that were not in the original JSON. I wonder if a library function out somewhere would prevent this? Or do I have to just write some loops and remove them myself?

Also, as you can see, the URL forward-slashes were escaped. Not what I wanted.

{
    "0":{
        "tag":"!= (not-equal-to operator)",
        "definition":null,
        "source":"[{'title': 'Compare Dictionaries', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch08.html#idm45795007002280'}]"
    },
    "1":{
        "tag":"\"intelligent\" applications",
        "definition":null,
        "source":"[{'title': 'Why Machine Learning?', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/https:\/\/learning.oreilly.com\/library\/view\/introduction-to-machine\/9781449369880\/ch01.html#idm45613685872600'}]"
    },
    "2":{
        "tag":"# (pound sign)",
        "definition":null,
        "source":"[{'title': 'Comment with #', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/introducing-python-2nd\/9781492051374\/ch04.html#idm45795038172984'}]"
    },
    "3":{
        "tag":"$ (Mac\/Linux prompt)",
        "definition":null,
        "source":"[{'title': 'Test Driving Python', 'URL': 'https:\/\/learning.oreilly.com\/library\/view\/data-wrangling-with\/9781491948804\/ch01.html#idm140080973230480'}]"
    },

uj5u.com熱心網友回復：

問題是您要在兩個地方添加索引。

將檔案寫入 csv 時有一次。這會在最終 JSON 檔案中添加“未命名：0”欄位。您可以index = False在to_csv將 CSV 寫入磁盤時使用方法，或index_col在讀取保存的 CSV 時指定引數read_csv。

其次，您在將 df 寫入 json 時添加了一個索引orient="index"。這會在最終的 JSON 檔案中添加最外層的索引，例如“0”、“1”。orient="records"如果您打算以與加載時類似的格式保存 json，則應該使用它。

要了解orient引數的作業原理，請參閱pandas.DataFrame.to_json

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/418925.html

標籤：

上一篇：如何從SQLServer回傳JSON物件中的陣列

下一篇：如何搜索JSON物件并提取鍵/值對？