1.Jupyter Notebook基本介紹

Jupyter Notebook（此前被稱為IPython notebook）是一個互動式筆記本，支持運行40多種編程語言，

在開始使用notebook之前，需要先安裝該庫：（1）在命令列中執行pip install jupyter來安裝；（2）安裝Anaconda后自帶Jupyter Notebook，

在命令列中執行jupyter notebook，就會在當前目錄下啟動Jupyter服務并使用默認瀏覽器打開頁面，還可以復制鏈接在其他瀏覽器中打開，如下：

可以看到，notebook界面由以下部分組成：（1）notebook名稱；（2）主工具列，提供了保存、匯出、多載notebook，以及重啟內核等選項；（3）notebook主要區域，包含了notebook的內容編輯區，

2.Jupyter Notebook的使用

在Jupyter頁面下方的主要區域，由被稱為單元格的部分組成，每個notebook由多個單元格構成，而每個單元格又可以有不同的用途，上圖中看到的是一個代碼單元格（code cell），以[ ]開頭，在這種型別的單元格中，可以輸入任意代碼并執行，例如，輸入1 + 2并按下Shift + Enter，單元格中的代碼就會被計算，游標也會被移動到一個新的單元格中，

如果想新建一個notebook，只需要點擊New，選擇希望啟動的notebook型別即可，

簡單使用示意如下：

可以看到，notebook可以修改之前的單元格，對其重新計算，這樣就可以更新整個檔案了，如果你不想重新運行整個腳本，只想用不同的引數測驗某個程式的話，這個特性顯得尤其強大，不過，也可以重新計算整個notebook，只要點擊Cell -> Run all即可，

再測驗標題和其他代碼如下：

可以看到，在頂部添加了一個notebook的標題，還可以執行for回圈等陳述句，

3.Jupyter中使用Python

Jupyter測驗Python變數和資料型別如下：

測驗Python函式如下：

測驗Python模塊如下：

可以看到，在執行出錯時，也會拋出例外，

測驗資料讀寫如下：

資料讀寫很重要，因為進行資料分析時必須先讀取資料，進行資料處理后也要進行保存，

4.資料互動案例

加載csv資料，處理資料，保存到MongoDB資料庫

有csv檔案shopproducts.csv和userratings.csv，分別是商品資料和用戶評分資料，如下：

現在需要通過Python將其讀取出來，并將指定的欄位保存到MongoDB中，需要在Anaconda中執行命令conda install pymongo安裝pymongo，

Python代碼如下：

import pymongo


class Product:
    def __init__(self,productId:int ,name, imageUrl, categories, tags):
        self.productId = productId
        self.name = name
        self.imageUrl = imageUrl
        self.categories = categories
        self.tags = tags

    def __str__(self) -> str:
        return self.productId +'^' + self.name +'^' + self.imageUrl +'^' + self.categories +'^' + self.tags


class Rating:
    def __init__(self, userId:int, productId:int, score:float, timestamp:int):
        self.userId = userId
        self.productId = productId
        self.score = score
        self.timestamp = timestamp

    def __str__(self) -> str:
        return self.userId +'^' + self.productId +'^' + self.score +'^' + self.timestamp


if __name__ == '__main__':
    myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")
    mydb = myclient["goods-users"]
    ## val attr = item.split("\\^")
    ## // 轉換成Product
    ## Product(attr(0).toInt, attr(1).trim, attr(4).trim, attr(5).trim, attr(6).trim)

    shopproducts = mydb['shopproducts']
    with open('shopproducts.csv', 'r',encoding='UTF-8') as f:
        item = f.readline()
        while item:
            attr = item.split('^')
            product = Product(int(attr[0]), attr[1].strip(), attr[4].strip(), attr[5].strip(), attr[6].strip())
            shopproducts.insert_one(product.__dict__)
            ## print(product)
            ## print(json.dumps(obj=product.__dict__,ensure_ascii=False))
            item = f.readline()

    ## val attr = item.split(",")
    ## Rating(attr(0).toInt, attr(1).toInt, attr(2).toDouble, attr(3).toInt)
    userratings = mydb['userratings']
    with open('userratings.csv', 'r',encoding='UTF-8') as f:
        item = f.readline()
        while item:
            attr = item.split(',')
            rating = Rating(int(attr[0]), int(attr[1].strip()), float(attr[2].strip()), int(attr[3].strip()))
            userratings.insert_one(rating.__dict__)
            ## print(rating)
            item = f.readline()

在啟動MongoDB服務后，運行Python代碼，運行完成后，再通過Robo 3T查看資料庫如下：

顯然，保存資料成功，

使用Jupyter處理商鋪資料

待處理的資料是商鋪資料，如下：

包括名稱、評論數、價格、地址、評分串列等，其中評論數、價格和評分均不規則、需要進行資料清洗，

Jupyter中處理如下：

可以看到，最后得到了經過清洗后的規則資料，

完整Python代碼如下：

## 資料讀取
f = open('商鋪資料.csv', 'r', encoding='utf8')
for i in f.readlines()[1:15]:
    print(i.split(','))


## 創建comment、price、commentlist清洗函式
def fcomment(s):
    '''comment清洗函式：用空格分段，選取結果list的第一個為點評數，并且轉化為整型'''
    if '條' in s:
        return int(s.split(' ')[0])
    else:
        return '缺失資料'


def fprice(s):
    '''price清洗函式：用￥分段，選取結果list的最后一個為人均價格，并且轉化為浮點型'''
    if '￥' in s:
        return float(s.split('￥')[-1])
    else:
        return '缺失資料'


def fcommentl(s):
    '''commentlist清洗函式：用空格分段，分別清洗出質量、環境及服務資料，并轉化為浮點型'''
    if ' ' in s:
        quality = float(s.split('                                ')[0][2:])
        environment = float(s.split('                                ')[1][2:])
        service = float(s.split('                                ')[2][2:-1])
        return [quality, environment, service]
    else:
        return '缺失資料'


## 資料處理清洗
datalist = []  ## 創建空串列

f.seek(0)
n = 0  ## 創建計數變數
for i in f.readlines():
    data = i.split(',')
    ## print(data)
    classify = data[0]  ## 提取分類
    name = data[1]  ## 提取店鋪名稱
    comment_count = fcomment(data[2])  ## 提取評論數量
    star = data[3]  ## 提取星級
    price = fprice(data[4])  ## 提取人均
    address = data[5]  ## 提取地址
    quality = fcommentl(data[6])[0]  ## 提取質量評分
    env = fcommentl(data[6])[1]  ## 提取環境評分
    service = fcommentl(data[6])[2]  ## 提取服務評分
    if '缺失資料' not in [comment_count, price, quality]:  ## 用于判斷是否有資料缺失
        n += 1
        data_re = [['classify', classify],
                   ['name', name],
                   ['comment_count', comment_count],
                   ['star', star],
                   ['price', price],
                   ['address', address],
                   ['quality', quality],
                   ['environment', env],
                   ['service', service]]
        datalist.append(dict(data_re))  ## 字典生成，并存入串列datalist
        print('成功加載%i條資料' % n)
    else:
        continue

print(datalist)
print('總共加載%i條資料' % n)

f.close()

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/175980.html

標籤：Python

上一篇：初試Python

下一篇：java中靜態代碼塊詳解

Python資料分析：Jupyter Notebook 講解

1.Jupyter Notebook基本介紹

2.Jupyter Notebook的使用

3.Jupyter中使用Python

4.資料互動案例