如何將這個格式奇怪的回圈列印函式轉換為具有相似輸出的資料幀？-有解無憂

我發現有一個代碼塊在我的專案中很有用，但我無法讓它以與列印時相同的給定/所需格式構建資料框（2 列）。

代碼塊和所需的輸出：

import nltk
import pandas as pd
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
 
# Step Two: Load Data
 
sentence = "Martin Luther King Jr. (born Michael King Jr.; January 15, 1929 – April 4, 1968) was an American Baptist minister and activist who became the most visible spokesman and leader in the American civil rights movement from 1955 until his assassination in 1968. King advanced civil rights through nonviolence and civil disobedience, inspired by his Christian beliefs and the nonviolent activism of Mahatma Gandhi. He was the son of early civil rights activist and minister Martin Luther King Sr."

# Step Three: Tokenise, find parts of speech and chunk words 

for sent in nltk.sent_tokenize(sentence):
  for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
     if hasattr(chunk, 'label'):
        print(chunk.label(), ' '.join(c[0] for c in chunk))

清除一列中的標簽和另一列中的物體的輸出：

PERSON Martin
PERSON Luther King
PERSON Michael King
ORGANIZATION American
GPE American
GPE Christian
PERSON Mahatma Gandhi
PERSON Martin Luther

我嘗試過這樣的事情，但結果并不那么干凈。

for sent in nltk.sent_tokenize(sentence):
  for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
     if hasattr(chunk, 'label'):
        df.append(chunk)

輸出：

    [Tree('PERSON', [('Martin', 'NNP')]),
 Tree('PERSON', [('Luther', 'NNP'), ('King', 'NNP')]),
 Tree('PERSON', [('Michael', 'NNP'), ('King', 'NNP')]),
 Tree('ORGANIZATION', [('American', 'JJ')]),
 Tree('GPE', [('American', 'NNP')]),
 Tree('GPE', [('Christian', 'JJ')]),
 Tree('PERSON', [('Mahatma', 'NNP'), ('Gandhi', 'NNP')]),
 Tree('PERSON', [('Martin', 'NNP'), ('Luther', 'NNP')])]

有沒有一種簡單的方法可以將列印格式更改為 df 僅 2 列？

uj5u.com熱心網友回復：

創建嵌套串列并轉換為 DataFrame：

L = []
for sent in nltk.sent_tokenize(sentence):
  for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
     if hasattr(chunk, 'label'):
        L.append([chunk.label(), ' '.join(c[0] for c in chunk)])
        
df = pd.DataFrame(L, columns=['a','b'])
print (df)
              a               b
0        PERSON          Martin
1        PERSON     Luther King
2        PERSON    Michael King
3  ORGANIZATION        American
4           GPE        American
5           GPE       Christian
6        PERSON  Mahatma Gandhi
7        PERSON   Martin Luther

串列理解解決方案是：

L= [[chunk.label(), ' '.join(c[0] for c in chunk)]  
     for sent in nltk.sent_tokenize(sentence) 
     for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))) 
     if hasattr(chunk, 'label')]

df = pd.DataFrame(L, columns=['a','b'])

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/409493.html

標籤：

上一篇：ValueError：形狀不匹配：繪制條形時無法將物件廣播到單個形狀

下一篇：如何在資料框中按串列值分組和計數