pandas學習筆記2-有解無憂

1、篩選用法 loc、where
    ? orgin_excel.loc[(orgin_excel['投資時間'].astype(str)  <'2020-10-01')&(orgin_excel['資料狀態'].isnull())&(orgin_excel['資料源'].str.contains('調研')),'資料狀態']='洗掉'          ? orgin_excel['洗掉理由']=np.where((orgin_excel['資料狀態'] =='重復洗掉'), '問卷填寫重復',orgin_excel['洗掉理由'])          ? diaoyan_money234.loc[:,'New_ID_x']=diaoyan_money234['New_ID_x'].reindex_like(diaoyan_money234['New_ID_y'])
    ? df.query('column1 > 2 and column 2<1')          ? DataFrame.filter(items=None, like=None, regex=None, axis=None)          ? df1=df.groupby('district').filter(lambda x: x['age'].mean()>20)     結果會將所有age>20的district的行選掉，回傳所有其他值，      2、去重用法
    ? orgin_excel['資料狀態']=np.where(orgin_excel.duplicated(subset=['排名用全稱','受資方全稱','投資時間','基金全稱','投資幣種','投資金額(M)','資料源'],keep='first') &(orgin_excel['資料源'].str.contains('調研')), '重復洗掉',orgin_excel['資料狀態']) 用法：DataFrame.duplicated(subset=None, keep='first')
    ? orgin_res2 = orgin_res1.groupby(['排名用全稱','受資方全稱','投資時間_x']).filter(lambda x: len(x) > 1)
3、字符轉換整列轉字符： orgin_excel['New_ID']=orgin_excel['New_ID'].map(str) 時間轉整型 orgin_res1=orgin_res.loc[(orgin_res['投資時間_x']-orgin_res['投資時間_y']).astype('timedelta64[D]').astype(float).abs()<=90]
4、去重取條數 groupby count     ? diaoyan_org_row=orgin_excel.loc[(orgin_excel['資料源'].str.contains('調研'))&(orgin_excel['資料狀態'].isna())].groupby(['排名用全稱','受資方全稱','投資時間']).agg({'New_ID': ','.join,"條數" : "size"})
    ? diaoyan_org_list=orgin_excel.loc[orgin_excel['資料源'].str.contains('調研') & orgin_excel['資料狀態'].isnull()==True].groupby(['排名用全稱','受資方全稱','投資時間']).size().reset_index(name='counts')
    1. count：size     2. Groupby  concat ：.agg({'New_ID': ','.join}     3. Count 列重命名 .size().reset_index(name='counts')
5、關聯
    ? orgin_rows=pd.merge(diaoyan_row_org1,simutong_row_org1,on=['排名用全稱','受資方全稱'],how='inner') 用法：pd.merge(DateFrame1,DateFrame2,on = ' ',how = ' ')
關聯并取自己想要的欄位     ? diaoyan_jijin=pd.merge(orgin_excel,diaoyan_org_jijin,on=['New_ID','排名用全稱','受資方全稱','投資時間'],how='inner')[['排名用全稱','受資方全稱','投資時間','條數_y','New_ID','基金全稱']] 關聯并重置索引     ? orgin_jijin=pd.merge(diaoyan_jijin,simutong_jijin,on=['排名用全稱','受資方全稱','條數_y'],how='inner').reset_index()
6、包含 contains 單個欄位值包含某個字串     orgin_money2['來源詳情'].str.contains('問卷') 取反     orgin_money2['來源詳情'].str.contains('問卷')==False
Isin     ? new = data["Gender"].isin(["Male"])
用法：DataFrame.isin(values) (可整列使用)
7、空值處理     ? 增加一列空值     orgin_excel['資料狀態']=None     ? 篩選空值     1. Isna() isnull()     2. notna() notnull()
特別想說剛學習pandas的時候什么也不會連怎么取兩列都不知道在此特將方法貢獻給和我一樣的小菜鳥們 XX[['me','you']]

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/308217.html

標籤：Python

上一篇：【Python爬蟲】回車桌面壁紙網站美女圖片采集

下一篇：Python基礎之數字化大屏