R中的資料框-根據第一列中因子的值對列進行操作-有解無憂

我不知道如何輸入我的問題，但這是我正在嘗試做的事情（高度簡化）我有一個包含 4 列的資料框，看起來像這樣：[table][1] [1]: https://i .stack.imgur.com/KLgBh.png

前兩列是因素（部門/公司）。最后兩列是變數。

我想將最后兩列的每個值除以該特定細分市場的市場價值。如您所見，如果您查看圖片，我遇到的問題是，對于某些行業，我有 3 家公司和市場，對于其他行業，我有 2 家公司和市場等等，所以規模永遠不會相同。 .

我已經通過創建大量“輔助”資料幀來解決這個問題，其中每個資料幀只包含特定的扇區，但我相信有一種更簡單的方法可以使用 dplyr 或使用條件陳述句

類似 if(df[Segment="Seg1"]){ df['Var1']<- df['Var1']/df[4,3] & df['Var2']<- df['Var2'] /df[4,4] } else if (df[Segment="Seg2"]){ df['Var1']<- df['Var1']/df[7,3] & df['Var2']< - df['Var2']/df[7,4] } else if ....

但正如您可以想象的那樣，這在代碼方面也不是最優的，我使用的是我手動檢查的市場位置，而不是使用代碼來要求 R 找到它

也許有變異或left_join的東西？

希望我的問題很清楚有人知道嗎？

uj5u.com熱心網友回復：

library(tidyverse)
segment <- c(rep_len("Seg1", 4), rep_len("Seg2", 4))
company <- c(rep_len(c("a", "b", "c", "market"), 8))
var1 <- c(100, 100, 200, 400, 150, 200, 200, 800)
var2 <- c(200, 222, 333, 4444, 555, 666, 777, 888)
df <- data_frame(segment, company, var1, var2)
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

df |> group_by(segment) |>
   mutate(new1 = var1/var1[company == "market"], new2 = var2/var2[company == "market"], )
#> # A tibble: 8 × 6
#> # Groups:   segment [2]
#>   segment company  var1  var2  new1   new2
#>   <chr>   <chr>   <dbl> <dbl> <dbl>  <dbl>
#> 1 Seg1    a         100   200 0.25  0.0450
#> 2 Seg1    b         100   222 0.25  0.0500
#> 3 Seg1    c         200   333 0.5   0.0749
#> 4 Seg1    market    400  4444 1     1     
#> 5 Seg2    a         150   555 0.188 0.625 
#> 6 Seg2    b         200   666 0.25  0.75  
#> 7 Seg2    c         200   777 0.25  0.875 
#> 8 Seg2    market    800   888 1     1

^{由reprex 包于 2022-01-25 創建(v2.0.1)}

uj5u.com熱心網友回復：

所以我這樣解決了

1）創建了只有市場價值的新“aux”df

market.df<-df%>%
 filter(Company=='Market")

然后與 let_join 匹配（請注意，我的市場 df 比我的原始 df 小得多，所以我不能只將原始 df 除以 market.df

new.df<-left_join(df, unique(market.aux), by=='Segment', suffix=c("",".market"))
然后只是將 new.df 分成 2 個資料幀并將它們劃分

aux.1<-select(new.df, 'Variable 1', 'Variable 2') aux.2<-select(new.df, 'Variable 1.market', 'Variable 2.market') 結果<- aux .1/輔助.2
然后我只取回原始資料框的前 2 列再次添加 Segment 和 Company ...

復雜的部分是并非所有段都具有相同的長度，因此 left_join 和 unique 對于我的解決方案的作業非常重要

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/421092.html

標籤：

上一篇：InvalidIndexError：在非NAN值上使用pd.to_datetime時，重新索引僅對唯一值索引物件有效

下一篇：PandasGroupby和應用