從for回圈動態創建串列名稱-有解無憂

我正在嘗試刮掉 Goodreads 上選擇獎上列出的書籍的書籍描述。我正在使用以下函式來獲取為特定型別列出的各個 url

def get_genre_url(genre):    
    all_links = []

    for year in (range(2011,2022)):        
        url = 'https://www.goodreads.com/choiceawards/best-'   genre   '-books-'  str(year)        
        page = requests.get(url) 
        soup = bs(page.content, 'html.parser') 
        for link in soup.find_all('a',  {'class':'pollAnswer__bookLink'}):                
            all_links.append('https://www.goodreads.com'   link.get('href'))
                
    return(all_links)

獲得書籍網址后，我繼續洗掉這些網址以獲取書籍描述。

def get_description(genre_list):
    
    urls = []
    authors = []
    titles = []
    index = 0
    
    for url in genre_list:
        #print(index,url)

        page = requests.get(url)    
        soup = bs(page.content, 'html.parser')    

        authors.append(soup.find('title').get_text().split(' by ')[1])
        #print(index,authors)
        description_df = pd.DataFrame (authors, columns = ['author'])    

        titles.append(soup.find('title').get_text().split(' by ')[0])

        description_df['title'] = titles

        if soup.find('div',{'class':'readable stacked'}) is None:
            #print('This is a NoneType page:', url)
            description = soup.find('div',{'class':'TruncatedText__text TruncatedText__text--5'})
        else:
            description = soup.find('div',{'class':'readable stacked'}).get_text()
        urls.append(description)
        index  = 1

        description_df['description'] = urls
        
    return(description_df)

為了獲得我會呼叫的最終資料框（例如）

mystery_thriller_list = get_genre_url('mystery-thriller')
description_myster_thriller = get_description(mystery_thriller_list)

但是，我想要將流派串列（例如genres = ['fiction', 'mystery-thriller']）傳遞給函式，并為每個流派創建最終資料幀，其中資料框名稱將具有命名約定 description_'selected 流派'。到目前為止，我還沒有弄明白，for 回圈需要一些時間，因為它正在為每種型別的 220 本書加載資訊。

uj5u.com熱心網友回復：

您可以將所有資料幀存盤在字典中，并將鍵作為它們的流派名稱。

all_genres_descriptions = {}    
genres = ['fiction', 'mystery-thriller']
for genre in genres:
    genre_list = get_genre_url(genre)
    description_genre = get_description(genre_list)
    all_genres_descriptions[f'description_{genre}'] = description_genre

uj5u.com熱心網友回復：

夫婦的事情。對于測驗，您不需要瀏覽所有年份和書籍。我只看一年和前兩本書。要做你正在尋找的東西，你可以使用 globals()。您可能還只想創建一個資料框，但在每次迭代中添加一列“流派”并連接。從長遠來看，將所有資料放在一個資料框中可能會更容易。

genres = ['fiction', 'mystery-thriller']
for genre in genres:
    mystery_thriller_list = get_genre_url(genre)
    globals()[f"{genre.replace('-', '_')}_selected_genre"] = get_description(mystery_thriller_list)

print(fiction_selected_genre)


author  title   description
0   Haruki Murakami 1Q84 (1Q84 #1-3)    \nThe year is 1984 and the city is Tokyo.A you...
1   Sarah Addison Allen The Peach Keeper    \nThe New York Times bestselling author of The...

print(mystery_thriller_selected_genre)


author  title   description
0   Janet Evanovich | Goodreads Smokin' Seventeen (Stephanie Plum, #17) [[[<p><b><i>Where there’s smoke there’s fire, ...
1   J.D. Robb   New York to Dallas (In Death, #33)  \nTwelve years ago, Eve Dallas was just a rook...

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/407886.html

標籤：

上一篇：在Python中使用Seleniumwebdriver勾選復選框

下一篇：網頁抓取時列印出奇怪的字符