我正在撰寫以下代碼,該代碼將輸入帶有一些短語的廢料:
scrap= ['Mutagenesis screens define conserved functions of metabolism and longevity', 'EK Bharath Shrestha Bharat(EBSB) - 100 commonly used sentences and their translations in 22 languages - P & D', 'OEB Special Seminar: “Phylogenetics and phylogenomics of Lentinula and the origin of cultivated shiitake mushrooms”', 'Student Exchange programme (Autumn Semester 2022) in University of Skovde, Sweden - CIR - Last Date: 04.03.2022', 'Ontario Institute for Studies in Education', 'Q Quest 2022 - AU TVS CQM - Last Date: 01.03.2022', 'National Conference on "Present Innovation Approaches and Paradigm in Physical Education"', 'Mahatma Gandhi University Newsletter ‘Insider’-Published.', 'STAGE Seminar', 'BOSM', 'Faculty of Law', 'UNIVERSITY UNION ELECTION 2019-20', 'Keynote Lecture: Sustainability for Africa: ...', 'Hillary Chute, "Maus Now: Spiegelman’s...', 'Conference on ‘Sustainable agriculture and farmers empowerment’ during 16th and 17th March 2021.', 'MIT Probability Seminar', 'Name of Programme', '49th All India Conference of Dravidian Linguists', 'Grad College Social Hour (GC common lounge)', 'MIT Symphony Orchestra: Márquez, Sarasate, and...', 'SCSB Colloquium Series: Etiology and impact of...', 'Celebration of National Science Day on 28th February 2022 - Dept. of Physics', 'PICASSO Tie-dye Event', 'Lunch & Learn with Muslim Life Program', '2022 Koch Institute Image Awards', 'Ideas & Images: The Power of Visual...', '30 Minutes Towards Better Bibliographies and Footnotes! (online)', 'Virtual Workshop on "Flight to a Bright Career-Enhance your Personality"', '4th Disaster Risk and Vulnerability Conference organised by SES scheduled on Oct 9-10 & 16-17.', 'French Education Fair 2022 organized by Campus France - CIR']
現在我希望將使用prog_list中單詞的scrap中的短語附加到TRUE_PROG:
prog_list=['writing', 'cryptography', 'recoding', 'decoding', 'program', 'code', 'planning', 'programming', 'encoding', 'gull', 'scheduling', 'tease', 'program', 'code']
TRUE_PROG =[]
我撰寫了一個包含回圈的簡單代碼,但它產生了一個我沒想到的輸出:
程式代碼:
TRUE_PROG=[]
MIS_PROG=[]
c_list = []
p = string.punctuation
punc = list(p)
for i in scrap:
# print(i)
words_in_scrap = i.split()
for j in words_in_scrap:
words = j.lower()
for k in words:
# print(k)
if k in punc:
words = words.replace(k ," ")
#CLEANSED DATA
clean = words
# print("clean=",clean)
c_list.append(clean)
# print("c_list=:",c_list)
for c in c_list:
if c ==" ":
c_list.remove(c)
# print("c_list cleaned of spaces=",c_list)
for t in c_list:
if t in prog_list:
TRUE_PROG.append(i)
#print("\ni=",i,"due to t=",t)
else:
MIS_PROG.append(i)
# print("\n\nPROG=",set(TRUE_PROG),"\n\n\n MIS_PROG=", set(MIS_PROG),"\n")
如果你取消注釋 #print("\ni=",i,"due to t=",t)
你會發現一些甚至沒有這些詞的短語也被附加了。它給了我這個:
i= Lunch & Learn with Muslim Life Program due to t= program
i= 2022 Koch Institute Image Awards due to t= program
i= Ideas & Images: The Power of Visual... due to t= program
i= 30 Minutes Towards Better Bibliographies and Footnotes! (online) due to t= program
i= Virtual Workshop on "Flight to a Bright Career-Enhance your Personality" due to t= program
等等。除了第一個,其余的雖然沒有“程式”二字,但還是被加了進去。任何更正都將受到高度重視。謝謝!
uj5u.com熱心網友回復:
ps = list(set(prog_list))
for p in ps:
for s in scrap:
words = s.split()
for w in words:
if p == w.lower():
r = s f" - due to the word {p}"
TRUE_PROG.append(r)
print(TRUE_PROG)
輸出:
['Lunch & Learn with Muslim Life Program - due to the word program']
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/432739.html
上一篇:在R中迭代地獲取統計資訊
