在Python上沒有任何庫的CSV計算-有解無憂

我有這樣的 CSV 檔案：

ID,TASK1,TASK2,QUIZ1,QUIZ2
11061,50,75,50,78
11062,70,80,60,50
11063,60,75,77,79
11064,52,85,50,80
11065,70,85,50,80

如何獲得新行NO并獲得TASK1, TASK2, QUIZ1,的最大和最小分數QUIZ2。并覆寫它？我限制在 Python 上使用任何庫，我的 Expect 輸出是：

NO,ID,TASK1,TASK2,QUIZ1,QUIZ2
1,11061,50,75,50,78
2,11062,70,80,60,50
3,11063,60,75,77,79
4,11064,52,85,50,80
5,11065,70,85,50,80
MAX, ,70,85,77,80
MIN, ,50,75,50,50

uj5u.com熱心網友回復：

您可以在r 模式下打開檔案，這將允許您在不重新打開的情況下讀取和寫入檔案。

代碼的簡短（并且有點復雜）版本（解釋如下）：

with open(r"filename.csv", "r ") as f:
    next(f)  # skip first line
    [*columns] = zip(*((int(i) for i in line.split(",")[1:]) for line in f))
    print("MAX", "", *map(max, columns), sep=",", file=f)
    print("MIN", "", *map(min, columns), sep=",", file=f)

現在讓我們解釋它是如何作業的。要獲得每一列的最大值/最小值，我們必須將列的所有值分組到一個容器中。要獲取行，我們應該通過 coma 分割每一行并將每個值轉換為整數。由于我們不需要第一列的值，我們可以跳過它。所有這些作業都由一行代碼完成：

[*columns] = zip(*((int(i) for i in line.split(",")[1:]) for line in f))

同樣，我們可以用更明確的方式撰寫：

rows = []
for line in f:
    row = []
    for column in line.split(",")[1:]:  # skip first column
        row.append(int(column))  # convert string to int
    rows.append(row)
columns = list(zip(*rows))

[*columns] = iterable與相同columns = list(iterable)。

在我們獲得包含每列值的元組串列后，我們可以將max()和min()應用于每個元組以獲得最大值和最小值。我們可以使用生成器運算式來做到這一點：

(max(column) for column in columns)
(min(column) for column in columns)

或使用map()：

map(max, columns)
map(min, columns)

要將結果寫入檔案，我們需要將值轉換為字串并使用以下命令將它們連接回單個字串str.join()：

max_row = ["MAX", ""]  # empty string needed to get double comas
min_row = ["MIN", ""]
for column in columns:
    max_row.append(str(max(column)))
    min_row.append(str(min(column)))
f.write("\n"   ",".join(max_row))
f.write("\n"   ",".join(min_row))

或者，我們可以使用print(). 我們可以解壓我們的結果進行列印，傳遞","給sep引數，這將使 python 使用 coma 連接所有引數并將我們的檔案物件傳遞給file引數以使其寫入檔案而不是控制臺。

print("MAX", "", *map(max, columns), sep=",", file=f)
print("MIN", "", *map(min, columns), sep=",", file=f)

所以這是答案頂部的無聊代碼版本：

with open(r"filename.csv", "r ") as f:
    next(f)  # skip first line
    rows = []
    for line in f:
        row = []
        for column in line.split(",")[1:]:  # skip first column
            row.append(int(column))  # convert string to int
        rows.append(row)
    max_row = ["MAX", ""]  # empty string needed to get double comas
    min_row = ["MIN", ""]
    for column in zip(*rows):
        max_row.append(str(max(column)))
        min_row.append(str(min(column)))
    f.write("\n"   ",".join(max_row))
    f.write("\n"   ",".join(min_row))

Upd. If you want to write result to separate file and add index column, you need to open also a destination file for writing, rewrite each line from source file to destination (adding it's index) and append minimums and maximums to the end of file.

Modified short version of code (it gets even more complicated):

with open(r"input.csv") as inp_f, \
        open(r"output.csv", "w") as out_f:
    out_f.write("NO,"   next(inp_f)) 
    [*columns] = zip(*(out_f.write(f"{index},{line.rstrip()}\n") and
                       (int(i) for i in line.split(",")[1:])
                       for index, line in enumerate(inp_f)))
    print("MAX", "", *map(max, columns), sep=",", file=out_f)
    print("MIN", "", *map(min, columns), sep=",", file=out_f)

Modified boring version of code:

with open(r"input.csv") as inp_f, \
        open(r"output.csv", "w") as out_f:
    out_f.write("NO,"   next(inp_f))
    rows = []
    for index, line in enumerate(inp_f, 1):
        out_f.write(str(index)   ","   line)
        row = []
        for column in line.split(",")[1:]:  # skip first column
            row.append(int(column))  # convert string to int
        rows.append(row)
    max_row = ["MAX", ""]  # empty string needed to get double comas
    min_row = ["MIN", ""]
    for column in zip(*rows):
        max_row.append(str(max(column)))
        min_row.append(str(min(column)))
    out_f.write("\n"   ",".join(max_row))
    out_f.write("\n"   ",".join(min_row))

Upd. I have made some tests (code). Tests results (lower is better):

olvin1: 0.687356842
olvin2: 0.5448804249999999
nikeros: 0.7540002289999999
balderman: 0.6384034139999999

So "boring" method from this answer demonstrates best performance.

uj5u.com熱心網友回復：

見下文 - 零進口

TASK1 = 1
TASK2 = 2
QUIZ1 = 3
QUIZ2 = 4

MIN = 0
MAX = 1
values = [[None,None],[None,None],[None,None],[None,None]]
lines = []
with open('in.txt') as f:
  for idx,line in enumerate(f):
    if idx > 0:
      lines.append(line)
      parts = line.split(',')
      for i in [TASK1,TASK2,QUIZ1,QUIZ2]:
        if values[i-1][MIN] is None or values[i-1][MIN] > parts[i]:
          values[i-1][MIN] = parts[i].strip()    
        if values[i-1][MAX] is None or values[i-1][MAX] < parts[i]:
          values[i-1][MAX] = parts[i].strip()
with open('out.txt','w') as f: 
  f.write('NO,ID,TASK1,TASK2,QUIZ1,QUIZ2\n') 
  for idx,line in enumerate(lines,1):
    f.write(f'{idx},{line}') 
  f.write('\n')  
  f.write('MAX, ,'   ','.join(str(x[MAX]) for x in values)) 
  f.write('\n')  
  f.write('MIN, ,'   ','.join(str(x[MIN]) for x in values))

輸出.txt

NO,ID,TASK1,TASK2,QUIZ1,QUIZ2
1,11061,50,75,50,78
2,11062,70,80,60,50
3,11063,60,75,77,79
4,11064,52,85,50,80
5,11065,70,85,50,80
MAX, ,70,85,77,80
MIN, ,50,75,50,50

uj5u.com熱心網友回復：

假設您的輸入已經在 csv 檔案中，您可以使用以下代碼：

# This reads in your file
def load_csv(p):
    with open(p, "r") as f:
        return [l.strip().split(",") for l in f]

# This isolates the numeric columns you are interested in
def to_numeric(lines):
    return [list(map(int,line[1:])) for i, line in enumerate(lines) if i > 0]

# This applies a function to the columns results of to_numeric
def apply_function(numeric_lists, func=max):
    return [func(x) for x in zip(*numeric_lists)] 

# This applies the defined function, printing your result in the desired format
def main(p):
    lines = load_csv(p)
    numeric_lists = to_numeric(lines)
    mins = apply_function(numeric_lists, func=min)
    maxs = apply_function(numeric_lists)
    for i, l in enumerate(lines):
        print(("No," if i == 0 else str(i))   ",".join(l))
    print(f'MAX, ,{",".join(map(str,maxs))}')
    print(f'MIX, ,{",".join(map(str,mins))}')

# Simply call
main(YOUR_FILE_PATH)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/359275.html

標籤：Python 文件

上一篇：如何基于跨差異列的值創建特征列

下一篇：從R中的csv檔案中洗掉多余的行，然后合并csv檔案