這里有很多關于這個問題的相關問題,特別是使用left_joinfrom dplyr,但我仍然無法弄清楚。
所有我想要做的就是回報LanguageClean的Lookup基礎上匹配的Language列df。如果沒有匹配項,只需 return NA。我想LanguageClean作為新列添加到df.
我可以看到我下面的代碼正在復制ID,但我不希望它復制。該ID列與我的目的無關,盡管我需要將其保留在最終資料框中。
df <- structure(list(ID = structure(c(18L, 89L, 42L, 161L, 88L, 71L,
175L, 181L, 133L, 56L, 18L, 89L, 42L, 161L, 88L, 71L, 175L, 181L,
133L, 56L, 18L, 89L, 42L, 161L, 88L, 71L, 175L, 181L, 133L, 56L
), .Dim = c(10L, 3L)), Language = c("en", "", "lv", "en", "en",
"de", "en", "ms", "", "en"), Geo = c("us", "", "-", "us",
"us", "gb", "ca", "us", "-", "us")), class = "data.frame", row.names = c(NA,
-10L))
lookup <- structure(list(Language = c("af", "ar", "ar", "ar", "ar", "ar",
"ar", "ar", "ar", "eu", "be", "zh", "zh", "hr", "da", "nl", "en",
"en", "en", "en", "en", "en", "fo", "fi", "fr", "fr", "gd", "de",
"de", "de", "he", "hu", "id", "it", "ko", "lv", "mk", "mt", "no",
"pt", "rm", "ro", "ru", "sr", "sk", "sb", "es", "es", "es", "es",
"es", "es", "es", "es", "es", "sx", "sv", "ts", "tr", "ur", "vi",
"ji", "sq", "ar", "ar", "ar", "ar", "ar", "ar", "ar", "ar", "bg",
"ca", "zh", "zh", "cs", "nl", "en", "en", "en", "en", "en", "en",
"et", "fa", "fr", "fr", "fr", "ga", "de", "de", "el", "hi", "is",
"it", "ja", "ko", "lt", "ms", "no", "pl", "pt", "ro", "ru", "sz",
"sr", "sl", "es", "es", "es", "es", "es", "es", "es", "es", "es",
"es", "sv", "th", "tn", "uk", "ve", "xh", "zu"), LanguageClean = c("Afrikaans",
"Arabic", "Arabic", "Arabic", "Arabic", "Arabic", "Arabic", "Arabic",
"Arabic", "Basque", "Belarusian", "Chinese", "Chinese", "Croatian",
"Danish", "Dutch", "English", "English", "English", "English",
"English", "English", "Faeroese", "Finnish", "French", "French",
"Gaelic", "German", "German", "German", "Hebrew", "Hungarian",
"Indonesian", "Italian", "Korean", "Latvian", "Macedonian", "Maltese",
"Norwegian", "Portuguese", "Rhaeto-Romanic", "Romanian", "Russian",
"Serbian", "Slovak", "Sorbian", "Spanish", "Spanish", "Spanish",
"Spanish", "Spanish", "Spanish", "Spanish", "Spanish", "Spanish",
"Sutu", "Swedish", "Tsonga", "Turkish", "Urdu", "Vietnamese",
"Yiddish", "Albanian", "Arabic", "Arabic", "Arabic", "Arabic",
"Arabic", "Arabic", "Arabic", "Arabic", "Bulgarian", "Catalan",
"Chinese", "Chinese", "Czech", "Dutch", "English", "English",
"English", "English", "English", "English", "Estonian", "Farsi",
"French", "French", "French", "Irish", "German", "German", "Greek",
"Hindi", "Icelandic", "Italian", "Japanese", "Korean", "Lithuanian",
"Malaysian", "Norwegian", "Polish", "Portuguese", "Romanian",
"Russian", "Sami", "Serbian", "Slovenian", "Spanish", "Spanish",
"Spanish", "Spanish", "Spanish", "Spanish", "Spanish", "Spanish",
"Spanish", "Spanish", "Swedish", "Thai", "Tswana", "Ukrainian",
"Venda", "Xhosa", "Zulu")), class = "data.frame", row.names = c(NA,
-124L))
df <- left_join(df, lookup, by="Language")
uj5u.com熱心網友回復:
問題是您的查找表包含某些語言的多個條目。因此,您最終會得到多個匹配項。因此,解決您的問題,您可以使用dplyr::distinct以下方法從查找中過濾掉不同或獨特的組合:
library(dplyr)
df <- left_join(df, distinct(lookup, Language, LanguageClean), by = "Language")
df
#> ID.1 ID.2 ID.3 Language Geo LanguageClean
#> 1 18 18 18 en us English
#> 2 89 89 89 <NA>
#> 3 42 42 42 lv - Latvian
#> 4 161 161 161 en us English
#> 5 88 88 88 en us English
#> 6 71 71 71 de gb German
#> 7 175 175 175 en ca English
#> 8 181 181 181 ms us Malaysian
#> 9 133 133 133 - <NA>
#> 10 56 56 56 en us English
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/374312.html
上一篇:Informix到HiveQL
下一篇:R:基于“OR”陳述句的連接
