我有兩個 .txt 檔案,我想在其中使用第一列值將資料框分成兩部分。如果該值小于“H1000”,我們需要第一個資料幀,如果它大于或等于“H1000”,我們需要第二個資料幀。第一列以 H 開頭,后跟四個數字。在python中比較小于1000或大于1000的數字時,我想忽略H。
我試過這個東西,但它不起作用。
ht_data = all_dfs.index[all_dfs.iloc[:, 0] == "H1000"][0]
print(ht_data)
這是第一個檔案。
H0002 Version 5
H0003 Date_generated 8-Aug-11
H0004 Reporting_period_end_date 19-Jun-11
H0005 State AW
H1000 Tene_no/Combined_rept_no E75/3794
H1001 Tenem_holder Magnetic Resources NL
H1003 LLD
我希望輸出看起來像這樣 df_h:
H0002 Version 5
H0003 Date_generated 8-Aug-11
H0004 Reporting_period_end_date 19-Jun-11
H0005 State AW
df_t:
H1000 Tene_no/Combined_rept_no E75/3794
H1001 Tenem_holder Magnetic Resources NL
H1003 LLD
這是第二個檔案。
H0002 Version 45
H0003 Date_generated 6-Aug-11
H0004 Reporting_period_end_date 19-Jun-11
H0005 State AW
H0999 Tene_no/Combined_rept_no E70/3793
H1001 Tene_holder Magnetic Resources NL
這是我的代碼:
if (".txt" in str(path_txt).lower()) and path_txt.is_file():
txt_files = [Path(path_txt)]
else:
txt_files = list(Path(path_txt).glob("*.txt"))
for fn in txt_files:
all_dfs = pd.read_csv(fn,sep="\t", header=None) #Reading file
all_dfs = all_dfs.dropna(axis=1, how='all') #Drop the columns where all columns are NaN
all_dfs = all_dfs.dropna(axis=0, how='all') #Drop the rows where all columns are NaN
print(all_dfs)
ht_data = all_dfs.index[all_dfs.iloc[:, 0] == "H1000"][0]
print(ht_data)
df_h = all_dfs[0:ht_data] # Head Data
df_t = all_dfs[ht_data:] # Tene Data
誰能幫助我如何在 python 中完成這項任務?
uj5u.com熱心網友回復:
假設這個資料
import pandas as pd
data = pd.DataFrame(
[
["H0002", "Version", "5"],
["H0003", "Date_generated", "8-Aug-11"],
["H0004", "Reporting_period_end_date", "19-Jun-11"],
["H0005", "State", "AW"],
["H1000", "Tene_no/Combined_rept_no", "E75/3794"],
["H1001", "Tenem_holder Magnetic Resources", "NL"],
],
columns = ["id", "col1", "col2"]
)
我們可以創建一個高于和低于預設閾值的掩碼,例如 1000。
mask = data["id"].str.strip("H").astype(int) < 1000
df_h = data[mask]
df_t = data[~mask]
uj5u.com熱心網友回復:
如果要比較格式的值,val = HXXXX其中X數字表示為字符,請嘗試以下操作:
val = 'H1003'
val_cmp = int(val[1:])
if val_cmp < 1000:
# First Dataframe
else:
# Second Dataframe
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/386213.html
