如何為R中的每個參與者生成ID-有解無憂

我有大約 10,000 名患者的醫療資料。我想用每個患者的唯一 ID 替換他們的 ID/社會安全號碼 (Patient_SSN)。請注意，某些行具有相同的參與者 SSN，這是因為資料存盤在訪問級別。換句話說，每次訪問都存盤在一個新行中（即具有不同的日期），例如“Mary”和“John”資料。

Patient_Name = c("Alex", "Mary", "Sarah", "John", "Susan", "Jessica", "Sarah", "Karen", "Mary", "John")
Patient_SSN  =  c(1234,    43251,    9320,    2901,  3229,     4291,     9320,    9218988,    43251 ,  2901)
Visit_Date   =  c('10_21', '10_21',  '10_25', '10_25','10_26','10_27','10_28','10_28','10_28' ,'10_29')
BMI = runif(10, min=12, max =25);

data_hospital = data.frame(Patient_Name, Patient_SSN, BMI, Visit_Date)

我的問題是：如何用新 ID 替換每個 SSN 以保護參與者隱私，但請記住，某些行具有相同的 SSN？新 SSN/ID 的字符長度應與原始 Patient_SSN 字符的長度相同。預先感謝您的幫助。

uj5u.com熱心網友回復：

dplyr 有一個功能！退房?group_data：

library(dplyr)
data_hospital$newid <- data_hospital %>% group_indices(Patient_SSN)

   Patient_Name Patient_SSN      BMI Visit_Date newid
1          Alex        1234 21.70192      10_21     1
2          Mary       43251 18.75820      10_21     6
3         Sarah        9320 22.84921      10_25     5
4          John        2901 19.94831      10_25     2
5         Susan        3229 20.27007      10_26     3
6       Jessica        4291 14.39934      10_27     4
7         Sarah        9320 16.65728      10_28     5
8         Karen     9218988 17.99142      10_28     7
9          Mary       43251 20.71236      10_28     6
10         John        2901 12.67764      10_29     2

uj5u.com熱心網友回復：

一種方法是，如果您希望Pateint_SSN保留的長度，則生成一個介于 0 和 1 之間的亂數，并將其乘以10^(length_of_number)。

這不能保證它們是唯一的 ID，因此您需要檢查它并在有重復的情況下生成新的數字，但這不太可能發生。

library(dplyr)
data_hospital <- data_hospital %>% mutate(id_length = nchar(Patient_SSN))
data_hospital$random_number <- runif(n = nrow(data_hospital),min = 0, max = 1)
data_hospital <- data_hospital %>% mutate(new_id = round(random_number*10^id_length))

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/312483.html

標籤：r

上一篇：使用dplyr將新計算分組到一個資料框中

下一篇：plot.window(...)中的錯誤：需要有限的“ylim”值才能在R中生成圖形