如何從帶有id的檔案中提取文本？-有解無憂

我有兩個檔案。一個包含 id，另一個包含每個 id 的句子，但像這個例子一樣幾乎沒有變化

檔案1：

檔案2：

111   some_text_1   some_text_1
222   some_text_2  some_text_2

我需要用 id 和它的句子制作一個檔案

111_3232   some_text_1   some_text_1
111_ewe2   some_text_1   some_text_1
111_3434   some_text_1   some_text_1
222_3843h  some_text_2  some_text_2
222_39092  some_text_2  some_text_2

我試過這段代碼

import os 

f = open("id","r")
ff = open("result","w")
fff = open("sentences.txt","r")
List = fff.readlines()    
i =0 
for line_id in f.readlines():
    for line_sentence in range(len(List)):
        if line_id in List[i]:
            ff.write(line_sentence)
        else : 
            i =1

但得到了

if line_id in List[i]:
IndexError: list index out of range

因為我從file2得到了整行，而不僅僅是id......有沒有比我做得更好的方法

編輯

我嘗試使用 panads 但我對這段代碼不太熟悉

df = pd.read_csv('sentence.csv')    
for line_id in f.readline():
    for line_2 in df.iloc[:, 0] :
       for (idx, row) in df.iterrows():
            if line_id in line_2:
                ff.write(str(row)  '\n')
            else : 
                ff.write("empty"  '\n')

但得到了錯誤的資料，因為我無法很好地捕捉到正確的行

uj5u.com熱心網友回復：

基本方法

with open('file1.txt', 'r') as fd1, open('file2.txt', 'r') as fd2:
    lines1 = fd1.read().split() # remove \n
    lines2 = fd2.readlines()

new_text = ''
for l1 in lines1:
    for id_, t1, t2 in (l.split() for l in lines2):
        if l1.startswith(id_):
            new_text  = f'{l1} {t1} {t2}\n'

with open('file3.txt', 'w') as fd:
    fd.write(new_text.strip())

uj5u.com熱心網友回復：

實作結果的一種方法是將sentences和file_id對存盤在字典中并遍歷 id 檔案內容以獲得結果

sentences_dict = {}
# read all sentences into a dictionary as key value pair
with open("sentences.txt", "r") as sentences_file:
    for line in sentences_file.read().splitlines():
        split_lines = line.split(" ")
        sentences_dict.update({split_lines[0].strip():  "  ".join(split_lines[1:])})

result_file = open("result.txt", "w")

# iterate over id file and match the starting text
with open("id.txt", "r") as id_file:
    for file_id in id_file.read().splitlines():
        txt = sentences_dict.get(file_id.split("_")[0], "")
        result_file.write(f"{file_id}{txt}\n")
        
result_file.close()

with除非您與關鍵字一起打開，否則請確保始終明確關閉檔案。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/476868.html

標籤：Python 熊猫文件

上一篇：使用通配符打開單個檔案

下一篇：Python-嘗試從目錄中的多個檔案中提取包含關鍵字的行