我有一個相當大的檔案,我必須決議。我已經成功地將它分解成一個字串串列。下一步是獲取每個字串并將其分解為元組串列。每個元組的第一個元素應該是字串開頭的數字。第二個元素應該是句子本身。使用 .split 方法很容易做到這一點。問題是,我需要每個元組(數字)的第 0 個索引是整數,而不是字串。有哪些可能的方法可以以一種不老套的方式實作這一目標?到目前為止,這是我的嘗試以及我正在使用的檔案中的一些輸入。
Input:
"0 giuliani recalled that trump initially called it a muslim ban
0 trump was seen yesterday on television in mcdonalds commercials
0 illustration newsday photo by jon naso donald trump whom a casino analyst is suing for 2 million over trumps response to the analysts dire predictions for the taj mahal in atlantic city
0 donald trump beats hillary clinton on li by just under 20000 votes
0 and an outlier a pro trump painter depicting a snake stomping president surrounded by a young family cops miners and the military"
newbfile = r'train_orig.txt'
def getlines(filename):
with open(filename, encoding='utf8') as fn:
fn_list = [line.rstrip() for line in fn]
return fn_list
def makeSentences(lines):
x = [tuple(line.split('\t')) for line in lines]
for index, number in enumerate(x):
number[0] = int(number[0])
return x
Output:
TypeError: 'tuple' object does not support item assignment
uj5u.com熱心網友回復:
str.split與maxsplit=引數一起使用:
s = """\
0 giuliani recalled that trump initially called it a muslim ban
0 trump was seen yesterday on television in mcdonalds commercials
0 illustration newsday photo by jon naso donald trump whom a casino analyst is suing for 2 million over trumps response to the analysts dire predictions for the taj mahal in atlantic city
0 donald trump beats hillary clinton on li by just under 20000 votes
0 and an outlier a pro trump painter depicting a snake stomping president surrounded by a young family cops miners and the military"""
for line in s.splitlines():
line = line.split(maxsplit=1)
my_tuple = int(line[0]), line[1]
print(my_tuple)
印刷:
(0, 'giuliani recalled that trump initially called it a muslim ban')
(0, 'trump was seen yesterday on television in mcdonalds commercials')
(0, 'illustration newsday photo by jon naso donald trump whom a casino analyst is suing for 2 million over trumps response to the analysts dire predictions for the taj mahal in atlantic city')
(0, 'donald trump beats hillary clinton on li by just under 20000 votes')
(0, 'and an outlier a pro trump painter depicting a snake stomping president surrounded by a young family cops miners and the military')
uj5u.com熱心網友回復:
您可以使用元組解包結果為.split:
tuples = []
with open(filename, encoding='utf8') as fn:
for line in fn:
split_line = line.rstrip().split('\t', maxsplit=1)
(number, sentence) = split_line
formatted_tuple = (int(number), sentence)
tuples.append(formatted_tuple)
為了清楚起見,我已經包含了變數split_line,formatted_tuple盡管在這里它們不是必需的。
uj5u.com熱心網友回復:
元組在 python 中是不可變的。
您可以使用這種方法。
re用于分割字串。
import re
def makeSentences(lines):
x = [re.split(r'\s{2,}', line) for line in lines]
x = [(int(number),string) for number, string in x]
return x
結果:
[(0, 'giuliani recalled that trump initially called it a muslim ban'),
(0, 'trump was seen yesterday on television in mcdonalds commercials'),
(0,
'illustration newsday photo by jon naso donald trump whom a casino analyst is suing for 2 million over trumps response to the analysts dire predictions for the taj mahal in atlantic city'),
(0, 'donald trump beats hillary clinton on li by just under 20000 votes'),
(0,
'and an outlier a pro trump painter depicting a snake stomping president surrounded by a young family cops miners and the military')]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/515888.html
下一篇:洗掉串列中數字的倍數
