資料匯入與預處理實驗二---json格式檔案轉換-有解無憂

一、實驗概述：
【實驗目的】

初步掌握資料采集的方法；
初步掌握利用爬蟲爬取網路資料的方法
掌握不同資料格式之間的轉換方法；

【實施環境】（使用的材料、設備、軟體） Linux或Windows作業系統環境，MySql資料庫，Python或其他高級語言

二、實驗內容
第1題爬取網路資料
【實驗要求】

爬取酷狗音樂網站（https://www.kugou.com/）上榜單前500名的歌曲名稱，演唱者，歌名和歌曲時長
將爬取的資料以JSon格式檔案保存，
讀取JSON格式任意資料，檢驗檔案格式是否正確，

【實驗程序】（步驟、記錄、資料、程式等）
請提供操作步驟及界面截圖證明，

from bs4 import BeautifulSoup
import requests
import time
import re
import json
import demjson
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}

nameList = []
singerList = []
timeList = []
song = []
total = []
keys = ['songName','singer','time']

def get_info(url, file):
    res = requests.get(url, headers=headers)
    res.encoding = file.encoding  # 同樣讀取和寫入的編碼格式
    soup = BeautifulSoup(res.text, 'lxml')
    ranks = soup.select('span.pc_temp_num')
    titles = soup.select('a.pc_temp_songname')
    times = soup.select('span.pc_temp_time')
    #jsonData = []
    for rank, title, time in zip(ranks, titles, times):
        data = {
            #'rank': rank.get_text().strip(),
            'title': title.get_text().strip(),
            'time': time.get_text().strip()
        }
        #print(data)

        singer, songName = data['title'].split(' - ')
        nameList.append(songName)
        singerList.append(singer)
        timeList.append(data['time'])
        #print(nameList)
        #print(singerList)
        #print(data['time'])
        #print(timeList)
        #print(singer, songName)
        #print(jsonData)

def output(url, file):
    songInfo = []
    for i in range(0,len(nameList)):
        #print(nameList[i])
        #print(singerList[i])
        #print(timeList[i])
        songInfo.append(nameList[i])
        songInfo.append(singerList[i])
        songInfo.append(timeList[i])
    #print(songInfo)
    for i in range(0, len(songInfo), 3):
        temp = songInfo[i:i + 3]
        song.append(temp)
    #print(len(song))
    file.write('{\n"songInfo":[\n')
    for i in range(0,len(song)):
        d = dict(zip(keys, song[i]))
        #print(d)
        file.write(json.dumps(d,ensure_ascii=False,indent=4,separators=(',', ': ')))
        if i != len(song)-1:
            file.write(',')
    file.write('\n]\n}')
def get_website_encoding(url):  # 一般每個網站自己的網頁編碼都是一致的,所以只需要搜索一次主頁確定
    res = requests.get(url, headers=headers)
    charset = re.search("charset=(.*?)>", res.text)
    if charset is not None:
        blocked = ['\'', ' ', '\"', '/']
        filter = [c for c in charset.group(1) if c not in blocked]
        return ''.join(filter)  # 修改res編碼格式為源網頁的格式,防止出現亂碼
    else:
        return res.encoding  # 沒有找到編碼格式,回傳res的默認編碼

if __name__ == '__main__':
    encoding = get_website_encoding('http://www.kugou.com')
    #print(encoding)
    urls = ['http://www.kugou.com/yy/rank/home/{}-8888.html?from=rank'.format(str(i)) for i in range(1, 23)]
with open(r'.\kugou_500.json', 'w+', encoding=encoding) as f:
    #f.write("歌手         歌名          長度\n")
    for url in urls:
        get_info(url, f)
        time.sleep(1) #緩沖一秒,防止請求頻率過快
    output(url,f)

得到的json檔案
在這里插入圖片描述
打開使用json.load打開檔案，成功輸出后代表檔案格式正確

import json

with open("kugou_500.json",'r',encoding='UTF-8') as f:
    new_dict = json.load(f)
    print(new_dict)

在這里插入圖片描述

第2題編程生成CSV檔案并轉換成JSon格式
【實驗要求】

編程生成CSV格式檔案，檔案內容如下：姓名，性別，籍貫，系別張迪，男，重慶，計算機系蘭博，男，江蘇，通信工程系黃飛，男，四川，物聯網系鄧玉春，女，陜西，計算機系周麗，女，天津，藝術系李云，女，上海，外語系
將上述CSV格式檔案轉換成JSon格式，并查詢檔案中所有女生的資訊，

【實驗程序】（步驟、記錄、資料、程式等）
請提供操作步驟及界面截圖證明，

import csv
#創建檔案物件
f = open("question02.csv","w",encoding="utf-8")
#構建csv寫入物件
csv_writer = csv.writer(f)
#構建串列頭
csv_writer.writerow(["姓名","性別","籍貫","系別"])
#寫入csv檔案內容
csv_writer.writerow(["張迪","男","重慶","計算機系"])
csv_writer.writerow(["蘭博","男","江蘇","通信工程系"])
csv_writer.writerow(["黃飛","男","四川","物聯網系"])
csv_writer.writerow(["周麗","女","天津","藝術系"])
csv_writer.writerow(["李蕓","女","上海","外語系"])

在這里插入圖片描述
轉換為json格式

import csv
import json
csvFile = open("question02.csv","r",encoding="utf-8")
jsonFile = open("question02.json","w",encoding="utf-8")

fieldNames = {"姓名","性別","籍貫","系別"}
reader = csv.DictReader(csvFile)
i = 1
jsonFile.write('{\n"personInfo":[\n')
for row in reader:
    print(row)
    jsonFile.write(json.dumps(row,ensure_ascii=False,indent=4))
    if i != 5:
        jsonFile.write(',')
        i = i+1
jsonFile.write('\n]\n}')

在這里插入圖片描述

import json
with open("question02.json","r",encoding="utf-8") as f:
    data = json.load(f)
    #print(data['personInfo'][1]['性別'])
    #print(type(data))
    for i in range(0,5):
        if data['personInfo'][i]['性別'] == '女':
            print(data['personInfo'][i])

在這里插入圖片描述

第3題. XML格式檔案與JSon的轉換
【實驗內容集要求】
(1) 讀取以下XML格式的檔案，內容如下： <?xml
version=”1.0” encoding=”gb2312”> <圖書> <書名>紅樓夢</書名> <作者>曹雪芹</作者><主要內容>描述賈寶玉和林黛玉的愛情故事</主要內容> <出版社>人民文學出版社</出版社> </圖書>
(2) 將以上XML格式檔案轉換成JSon格式，

【實驗程序】（步驟、記錄、資料、程式等）
請提供相應代碼及程式運行界面截圖，

新建xml檔案
在這里插入圖片描述

import xml.dom.minidom
import xmltodict
import json
#打開xml檔案
#dom = xml.dom.minidom.parse('question_03.xml')
#得到檔案元素物件
#root = dom.documentElement
#bb = root.getElementsByTagName('書名')
#print(bb[0].firstChild.data)

#獲取xml檔案
file = open("question_03.xml","r",encoding="utf-8")
#讀取檔案內容
xmlStr = file.read()
#print(xmlStr)
jsonStr = xmltodict.parse(xmlStr)
#print(jsonStr)
with open("question03JSON.json","w",encoding="utf-8") as f:
    f.write(str(json.dumps(jsonStr,ensure_ascii=False,indent=4,separators=(',', ': '))))

在這里插入圖片描述

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/134708.html

標籤：其他

上一篇：樂鑫Esp32-S2學習之旅② ESP32-S2 以 I2C 驅動 SHT20 獲取溫濕度資料，代碼開源！

下一篇：Easy IoT實作mqtt實驗