我正在嘗試將大量字串(只有三個字串示例,但實際上我有數千個字串)替換為“replaceWord”上定義的其他字串。
- “replaceWord”沒有規律。
然而,我寫的代碼并沒有像我預期的那樣作業。
運行腳本后,輸出如下:
before after
0 test1234 test1234
1 test1234 test1234
2 test1234 1349
3 test1234 test1234
4 test1234 test1234
我需要如下輸出;
before after
1 test1234 1349
2 test9012 te1210st
3 test5678 8579
4 april I was born August
5 mcdonalds i like checkin
腳本
import os.path, time, re
import pandas as pd
import csv
body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"
replaceWord = [
["test9012","te1210st"],
["test5678","8579"],
["test1234","1349"],
["april","August"],
["mcdonalds","chicken"],
]
cols = ['before','after']
df = pd.DataFrame(index=[], columns=cols)
for word in replaceWord:
body01_after = re.sub(word[0], word[1], body01_before)
body02_after = re.sub(word[0], word[1], body02_before)
body03_after = re.sub(word[0], word[1], body03_before)
body04_after = re.sub(word[0], word[1], body04_before)
body05_after = re.sub(word[0], word[1], body05_before)
df=df.append({'before':body01_before,'after':body01_after}, ignore_index=True)
#df.head()
print(df)
df.to_csv('test_replace.csv')
uj5u.com熱心網友回復:
使用正則運算式將非數字捕獲(\D )為第一組,將數字捕獲(\d )為第二組。從第二組開始替換文本,\2然后是第一組\1
df['after'] = df['before'].str.replace(r'(\D )(\d )', r'\2\1', regex = True)
df
before after
1 test1234 1234test
2 test9012 9012test
3 test5678 5678test
編輯
似乎您沒有資料集。你有變數:
body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"
replaceWord = [
["test9012","te1210st"],
["test5678","8579"],
["test1234","1349"],
["april","August"],
["mcdonalds","chicken"],
]
# Gather the variables in a list
vars = re.findall('body0\\d[^,] ', ','.join(globals().keys()))
df = pd.DataFrame(vars, columns = ['before_1'])
# Obtain the values of the variable
df['before'] = df['before_1'].apply(lambda x:eval(x))
# replacement function
repl = lambda x: x[0] if (rp:=dict(replaceWord).get(x[0])) is None else rp
# Do the replacement
df['after'] = df['before'].str.replace('(\\w )',repl, regex= True)
df
before_1 before after
0 body01_before test1234 1349
1 body02_before test9012 te1210st
2 body03_before test5678 8579
3 body04_before i like mcdonalds i like chicken
4 body05_before I was born april I was born August
uj5u.com熱心網友回復:
這符合你的目的嗎?
words = ["test9012", "test5678", "test1234"]
updated = []
for word in words:
for i, char in enumerate(word):
if 47 < ord(char) < 58: # the character codes for digits 1-9
updated.append(f"{word[i:]}{word[:i]}")
break
print(updated)
代碼列印:['9012test', '5678test', '1234test']
uj5u.com熱心網友回復:
據我了解,您有一個字串串列和一個映射字典,格式為:{oldString1: newString1, oldString2: newString2, ...}您想用來替換原始字串串列。我能想到的最快(也許是最 Pythonic)的方法是將映射字典簡單地保存為 Python dict。例如:
mapping = {
"test9012":"9012test",
"test5678","5678test",
"test1234","1234test",
}
如果您的字串串列存盤為 Python 串列,則可以使用以下代碼獲取替換串列:
new_list = [mapping.get(key=old_string, default=old_string) for old_string in old_list]
注意:我們使用mapping.get()withdefault=old_string以便函式old_string在它不在映射字典中的情況下回傳。
如果您的字串串列存盤在 Pandas Series(或 Pandas DataFrame 的列)中,您可以快速將字串替換為:
new_list = old_list.map(mapping, na_action='ignore')
注意:我們設定na_action='ignore'以便函式old_string在它不在映射字典中的情況下回傳。
uj5u.com熱心網友回復:
您可以使用正則運算式來匹配模式。
import os.path, time, re
import pandas as pd
import csv
words = ["test9012", "test5678", "test1234"]
for word in words:
textOnlyMatch = re.match("(([a-z]|[A-Z])*)", word)
textOnly = textOnlyMatch.group(0) // take the entire match group
numberPart = word.split(textOnly)[1] // take string of number only
result = numberPart textOnly
df = df.append({'before':word,'after':result}, ignore_index=True)
#df.head()
print(df)
df.to_csv('test_replace.csv')
因此,通過使用正則運算式匹配,您可以僅分隔字母和僅數字部分。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/464519.html
標籤:Python python-3.x 代替 导出到 csv
