修改CSV檔案[不能使用pandas或numpys]-有解無憂

我需要向通過 Web 匯入資料創建的 CSV 檔案添加一列。新列必須是兩行的串聯，例如 06_2018。

New_Format_Data = ''

Output_File = open('Desktop/HW3/'   state_names[counter]    '.txt','w')

for counter in range(0 , len(urls)):#Will go tru all the states.
    print (urls[counter])
    
    html = urllib.request.urlopen(urls[counter]).read().decode('utf-8')#opening url
    
    rows = html.splitlines(1)#Split the data in rows. The number 1 is very important
    
    if counter ==0:
        New_Format_Data = "Test"   rows[0] #Header
        
    for row in range(1, len(rows)): #First row...
            
        New_Format_Data  = 'Test'   '\t'   rows[row]#Adding that state column.
    
Output_File.write(New_Format_Data)#Once finished with the for loops then the it will download and close.
Output_File.close()

uj5u.com熱心網友回復：

我不知道你在行中有什么 - 以及你想要連接哪些列，所以我以第 1 列和第 2 列為例。

您必須將行（字串）拆分為值串列，然后在此串列中替換/添加值，然后將所有值連接回單個字串，然后您可以寫入新檔案。

它將需要\n從字串的末尾洗掉，因為它需要在同一行中添加新值 - 所以1insplitlines()將無用。

像這樣的東西。

我直接從串列中獲取字串，而不是使用索引和range(len(..))

for row in rows[1:]:  # get directly string instead of index
    
    # convert to list
    row = row.split(',')
    
    # create new value using column 1 and 2
    new_value = row[1]   '_'   row[2]
    
    # append to list
    row.append(new_value)
    
    # convert back to string
    row = ','.join(row)
    
    # add new row and `\n` at the end
    New_Format_Data  = 'Test'   '\t'   row   '\n'

完整代碼可能如下所示

# PEP8: at least two spaces before `#` and one space after `#

new_format_data = ''  # PEP8: `lower_case_names` for variables

output_file = open('Desktop/HW3/'   state_names[counter]    '.txt','w')

for counter, url in enumerate(urls):
    print('url:', url)
    
    html = urllib.request.urlopen(url).read().decode('utf-8')  # opening url
    
    rows = html.splitlines()  # split the data in rows. DON'T NEED `1` because I don't need `\n'
    
    if counter == 0:
        new_format_data = "Test"   rows[0]   ',new_columns'   '\n'  # header with new column
        
    for row in rows[1:]:  # get directly string instead of index
        
        # convert to list
        row = row.split(',')
        
        # create new value using column 1 and 2
        new_value = row[1]   '_'   row[2]
        
        # append to list
        row.append(new_value)
        
        # convert back to string
        row = ','.join(row)
                    
        new_format_data  = 'Test'   '\t'   row   '\n'  # adding that state column.
    
# --- after loop ---

output_file.write(new_format_data)  # once finished with the for loops then the it will download and close.
output_file.close()

但是，如果某些列具有,價值，這可能會出現問題，因為split會將其視為分隔符。所以最好使用csv可以解決所有問題的標準模塊。

就像是

import csv

output_file = open('Desktop/HW3/'   state_names[counter]    '.txt','w')

# create csv writer 
output_csv = csv.writer(output_file)

for counter, url in enumerate(urls):
    print('url:', url)
    
    html = urllib.request.urlopen(url).read().decode('utf-8')  # opening url
    
    # read all rows from csv 
    rows = list(csv.reader(html.splitlines()))
                
    if counter == 0:
        
        headers = rows[0]
        
        headers[0] = "Test"   headers[0]
        headers.append('new_colum')
        
        # write headers
        output_csv.writerow(headers)
        
    for row in rows[1:]:  # get directly string instead of index
        # create new value using column 1 and 2
        new_value = row[1]   '_'   row[2]
        
        # append to row
        row.append(new_value)
        
        # write row
        output_csv.writerow(row)
    
# --- after loop ---

output_file.close()

PEP 8——Python 代碼風格指南

uj5u.com熱心網友回復：

它最終像這樣作業：

new_format_data = ''  # PEP8: `lower_case_names` for variables

output_file = open('Desktop/HW3_2/'   state_names[counter]    '.txt','w')

for counter, url in enumerate(urls):
    print('url:', url)
    
    html = urllib.request.urlopen(url).read().decode('utf-8')  # opening url
    
    rows = html.splitlines()  # split the data in rows. DON'T NEED `1` because I don't need `\n'
    
    if counter == 0:
       # new_format_data = "Month_Year"   '\t'   rows[0]   '\n'  # header with new column
        
        new_format_data = rows[0]   "Month_Year" '\n'  # header with new column
    for row in rows[1:]:  # get directly string instead of index
        
        # convert to list
        row = row.split('\t')
        
        # create new value using column 1 and 2
        new_value = row[2]   '_'   row[1]
        
        # append to list
        row.append(new_value)
        
        # convert back to string
        row = '\t'.join(row)
                    
        new_format_data  = row   '\n'  # adding that state column.
   
output_file.write(new_format_data)  # once finished with the for loops then the it will download and close.
output_file.close()

我要修改，其實不是txt格式CSV。現在，我正在嘗試洗掉一列并過濾資訊。因此，其中一列是“年”。原始資料從開始1976。2022我只需要來自2015till的資訊2020。嘗試了一些東西，但我破壞了其余的代碼:(

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/448294.html

標籤：Python 网页抓取

上一篇：查找與特定類別匹配但一個類別不斷變化的標簽

下一篇：我應該如何刮掉<ul>中的所有<em>標簽內部文本并將它們變成熊貓資料框？