R语言处理因子之forcats包介绍（2）

Original 阿越就是我医学和生信笔记 2023-02-25

收录于合集

今天继续学习forcats包的内容，上一篇主要介绍了forcats包的主要内容，接下来将详细介绍每一个函数。

修改因子向量顺序

1.1 `fct_relevel()`

## 创建一个因子型向量
f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))
f
## [1] a b c d
## Levels: b c d a

## 把c,d放在地第1位，第2位
fct_relevel(f, c("c", "d"))
## [1] a b c d
## Levels: c d b a

## 把`a`放在第3的水平
fct_relevel(f, "a", after = 2)
## [1] a b c d
## Levels: b c a d

# 把`a`放到最后的位置
fct_relevel(f, "a", after = Inf)
## [1] a b c d
## Levels: b c d a

## 按照某个函数重新排序
fct_relevel(f, sort)
## [1] a b c d
## Levels: a b c d
## 注意这时的顺序是按照`sort(c("a","b","c","d"))`，不是按照`sort(f)`

## 按照随机顺序
fct_relevel(f, sample)
## [1] a b c d
## Levels: a b c d

## 反转顺序
fct_relevel(f, rev)
## [1] a b c d
## Levels: a d c b

下面是一个看起来很复杂，其实不复杂的例子，使用的是内置数据：gss_cat，只选择其中的2列，我们的目标是把每一列中的Don't know放到最后。

## 先看下原来的因子水平
df  <- forcats::gss_cat[, c("rincome", "denom")]
lapply(df, levels) # 对df的每一列都使用`levels()`函数
## $rincome
##  [1] "No answer"      "Don't know"     "Refused"        "$25000 or more"
##  [5] "$20000 - 24999" "$15000 - 19999" "$10000 - 14999" "$8000 to 9999" 
##  [9] "$7000 to 7999"  "$6000 to 6999"  "$5000 to 5999"  "$4000 to 4999" 
## [13] "$3000 to 3999"  "$1000 to 2999"  "Lt $1000"       "Not applicable"
## 
## $denom
##  [1] "No answer"            "Don't know"           "No denomination"     
##  [4] "Other"                "Episcopal"            "Presbyterian-dk wh"  
##  [7] "Presbyterian, merged" "Other presbyterian"   "United pres ch in us"
## [10] "Presbyterian c in us" "Lutheran-dk which"    "Evangelical luth"    
## [13] "Other lutheran"       "Wi evan luth synod"   "Lutheran-mo synod"   
## [16] "Luth ch in america"   "Am lutheran"          "Methodist-dk which"  
## [19] "Other methodist"      "United methodist"     "Afr meth ep zion"    
## [22] "Afr meth episcopal"   "Baptist-dk which"     "Other baptists"      
## [25] "Southern baptist"     "Nat bapt conv usa"    "Nat bapt conv of am" 
## [28] "Am bapt ch in usa"    "Am baptist asso"      "Not applicable"

可以看到每一列都有一个Don't know，我们要把它放到最后，顺便学习lapply的用法。

# 对df的每一列使用`fct_relevel(..., "Don't know", after = Inf)`
df2 <- lapply(df, fct_relevel, "Don't know", after = Inf) 

lapply(df2, levels) # 可以看到"Don't know"都被排在最后了
## $rincome
##  [1] "No answer"      "Refused"        "$25000 or more" "$20000 - 24999"
##  [5] "$15000 - 19999" "$10000 - 14999" "$8000 to 9999"  "$7000 to 7999" 
##  [9] "$6000 to 6999"  "$5000 to 5999"  "$4000 to 4999"  "$3000 to 3999" 
## [13] "$1000 to 2999"  "Lt $1000"       "Not applicable" "Don't know"    
## 
## $denom
##  [1] "No answer"            "No denomination"      "Other"               
##  [4] "Episcopal"            "Presbyterian-dk wh"   "Presbyterian, merged"
##  [7] "Other presbyterian"   "United pres ch in us" "Presbyterian c in us"
## [10] "Lutheran-dk which"    "Evangelical luth"     "Other lutheran"      
## [13] "Wi evan luth synod"   "Lutheran-mo synod"    "Luth ch in america"  
## [16] "Am lutheran"          "Methodist-dk which"   "Other methodist"     
## [19] "United methodist"     "Afr meth ep zion"     "Afr meth episcopal"  
## [22] "Baptist-dk which"     "Other baptists"       "Southern baptist"    
## [25] "Nat bapt conv usa"    "Nat bapt conv of am"  "Am bapt ch in usa"   
## [28] "Am baptist asso"      "Not applicable"       "Don't know"

如果当前没有某个值会报错

fct_relevel(f, "e")
## Warning: Unknown levels in `f`: e
## [1] a b c d
## Levels: b c d a

1.2 `fct_inorder()/fct_infreq()/fct_inseq()`

这3个是同一家族函数，意思一样，具体用法稍有区别：

fct_inorder(): 按照第一次出现的顺序
fct_infreq(): 按照每个水平出现的频率（从大到小）
fct_inseq(): 按照数字大小

f <- factor(c("b", "b", "a", "c", "c", "c"))
f #默认按字母顺序
## [1] b b a c c c
## Levels: a b c

fct_inorder(f) # 按第一次出现的顺序
## [1] b b a c c c
## Levels: b a c

fct_infreq(f) # 按出现的频率从大到小排列
## [1] b b a c c c
## Levels: c b a

f <- factor(1:3, levels = c("3", "2", "1"))
f
## [1] 1 2 3
## Levels: 3 2 1

fct_inseq(f) # 按照数字顺序排列，虽然你定义的顺序是"3", "2", "1"
## [1] 1 2 3
## Levels: 1 2 3

一个在画图中很有用的例子：

你画了一幅图如下：

library(ggplot2)

ggplot(starwars, aes(x = hair_color)) + 
  geom_bar() + 
  coord_flip()

plot of chunk unnamed-chunk-16

但你发现这并不是你想要的，你想按照每一种的个数多少排列好画出来，你可以选择画图前就把顺序排好，或者像这样：

ggplot(starwars, aes(x = fct_infreq(hair_color))) +
  geom_bar() +
  coord_flip()

plot of chunk unnamed-chunk-17

完美解决问题！

1.3 `fct_reorder()/fct_recorder2()/last2()/first2()`

fct_reorder()对于因子映射到位置的一维显示非常有用;fct_reorder2()用于2维显示，其中因子被映射到非位置。last2()和first2()是fct_reorder2()的辅助函数;last2()在y按照x排序时，查找y的最后一个值;first2()查找第一个值。

## 生成一个简单的tibble
df <- tibble::tribble(
  ~color,     ~a, ~b,
  "blue",      1,  2,
  "green",     6,  2,
  "purple",    3,  3,
  "red",       2,  3,
  "yellow",    5,  1
)


## 查看color这一列的顺序
df$color <- factor(df$color)
df$color
## [1] blue   green  purple red    yellow
## Levels: blue green purple red yellow

按照a这一列从小到大的顺序，排序color这一列，可以看到color的levels已经变了

fct_reorder(df$color, df$a, min)
## [1] blue   green  purple red    yellow
## Levels: blue red purple yellow green

fct_reorder()用于画图小例子：

boxplot(Sepal.Width ~ Species, data = iris)

plot of chunk unnamed-chunk-20

boxplot(Sepal.Width ~ fct_reorder(Species, Sepal.Width), data = iris)

plot of chunk unnamed-chunk-20

boxplot(Sepal.Width ~ fct_reorder(Species, Sepal.Width, .desc = TRUE), data = iris)

plot of chunk unnamed-chunk-20

fct_reorder2(df$color, df$a, df$b)
## [1] blue   green  purple red    yellow
## Levels: purple red blue green yellow

fct_reorder2()感觉很复杂的样子，但是你只要记住在画图的时候可能会用到它，神奇功能：使图例的顺序和线条的顺序一致。下面是一个小例子：

chks <- subset(ChickWeight, as.integer(Chick) < 10)
chks <- transform(chks, Chick = fct_shuffle(Chick))
chks
##     weight Time Chick Diet
## 85      42    0     8    1
## 86      50    2     8    1
## 87      61    4     8    1
## 88      71    6     8    1
## 89      84    8     8    1
## 90      93   10     8    1
## 91     110   12     8    1
## 92     116   14     8    1
## 93     126   16     8    1
## 94     134   18     8    1
## 95     125   20     8    1
## 96      42    0     9    1
## 97      51    2     9    1
## 98      59    4     9    1
## 99      68    6     9    1
## 100     85    8     9    1
## 101     96   10     9    1
## 102     90   12     9    1
## 103     92   14     9    1
## 104     93   16     9    1
## 105    100   18     9    1
## 106    100   20     9    1
## 107     98   21     9    1
## 108     41    0    10    1
## 109     44    2    10    1
## 110     52    4    10    1
## 111     63    6    10    1
## 112     74    8    10    1
## 113     81   10    10    1
## 114     89   12    10    1
## 115     96   14    10    1
## 116    101   16    10    1
## 117    112   18    10    1
## 118    120   20    10    1
## 119    124   21    10    1
## 144     41    0    13    1
## 145     48    2    13    1
## 146     53    4    13    1
## 147     60    6    13    1
## 148     65    8    13    1
## 149     67   10    13    1
## 150     71   12    13    1
## 151     70   14    13    1
## 152     71   16    13    1
## 153     81   18    13    1
## 154     91   20    13    1
## 155     96   21    13    1
## 168     41    0    15    1
## 169     49    2    15    1
## 170     56    4    15    1
## 171     64    6    15    1
## 172     68    8    15    1
## 173     68   10    15    1
## 174     67   12    15    1
## 175     68   14    15    1
## 176     41    0    16    1
## 177     45    2    16    1
## 178     49    4    16    1
## 179     51    6    16    1
## 180     57    8    16    1
## 181     51   10    16    1
## 182     54   12    16    1
## 183     42    0    17    1
## 184     51    2    17    1
## 185     61    4    17    1
## 186     72    6    17    1
## 187     83    8    17    1
## 188     89   10    17    1
## 189     98   12    17    1
## 190    103   14    17    1
## 191    113   16    17    1
## 192    123   18    17    1
## 193    133   20    17    1
## 194    142   21    17    1
## 195     39    0    18    1
## 196     35    2    18    1
## 209     41    0    20    1
## 210     47    2    20    1
## 211     54    4    20    1
## 212     58    6    20    1
## 213     65    8    20    1
## 214     73   10    20    1
## 215     77   12    20    1
## 216     89   14    20    1
## 217     98   16    20    1
## 218    107   18    20    1
## 219    115   20    20    1
## 220    117   21    20    1

ggplot(chks, aes(Time, weight, colour = Chick)) +
  geom_point() +
  geom_line()

plot of chunk unnamed-chunk-22

# 图例的顺序和线的顺序一样
ggplot(chks, aes(Time, weight, colour = fct_reorder2(Chick, Time, weight))) +
  geom_point() +
  geom_line() +
  labs(colour = "Chick")

plot of chunk unnamed-chunk-22

1.4 `fct_shuffle()`

随机重排，完全打乱顺序

f <- factor(c("a", "b", "c"))
f
## [1] a b c
## Levels: a b c
set.seed(111)
fct_shuffle(f) # 每次运行都会出现不同的顺序，除非设置种子数
## [1] a b c
## Levels: b a c

1.5 `fct_rev()`

反转顺序

f <- factor(c("a", "b", "c"))
f
## [1] a b c
## Levels: a b c
fct_rev(f)
## [1] a b c
## Levels: c b a

1.6 `fct_shift()`

将因子水平左右移动，默认向左移

x <- factor(
  c("Mon", "Tue", "Wed"),
  levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),
  ordered = TRUE
)
x
## [1] Mon Tue Wed
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

fct_shift(x)
## [1] Mon Tue Wed
## Levels: Mon < Tue < Wed < Thu < Fri < Sat < Sun

fct_shift(x, 2)
## [1] Mon Tue Wed
## Levels: Tue < Wed < Thu < Fri < Sat < Sun < Mon

fct_shift(x, -1)
## [1] Mon Tue Wed
## Levels: Sat < Sun < Mon < Tue < Wed < Thu < Fri

以上就是今天的内容，你学会了吗？欢迎点赞、评论、转发！

欢迎关注我的公众号：医学和生信笔记

“
医学和生信笔记 公众号主要分享：1.医学小知识、肛肠科小知识；2.R语言和Python相关的数据分析、可视化、机器学习等；3.生物信息学学习资料和自己的学习笔记！

往期精彩内容：

在VScode中使用R语言

R语言ggsci配色包详解

R语言ggtern包画三元图详解

R语言画好看的聚类树

台湾值不值得美国出兵？美重量级人物和智库警告：美国无法承受失去台湾

倾家荡产求热度，实则只会步入更深贫困陷阱

当一座城市决定“砸锅卖铁”

最搞笑的是，铁头的三部手机都是苹果……

税务咨询的代价：今有补税2100万，前有补税4.5个亿

R语言处理因子之forcats包介绍（2）

修改因子向量顺序

1.1 `fct_relevel()`

1.2 `fct_inorder()/fct_infreq()/fct_inseq()`

1.3 `fct_reorder()/fct_recorder2()/last2()/first2()`

1.4 `fct_shuffle()`

1.5 `fct_rev()`

1.6 `fct_shift()`

您可能也对以下帖子感兴趣

台湾值不值得美国出兵？美重量级人物和智库警告：美国无法承受失去台湾

倾家荡产求热度，实则只会步入更深贫困陷阱

当一座城市决定“砸锅卖铁”

最搞笑的是，铁头的三部手机都是苹果……

税务咨询的代价：今有补税2100万，前有补税4.5个亿

生成图片，分享到微信朋友圈

R语言处理因子之forcats包介绍（2）

修改因子向量顺序

1.1 fct_relevel()

1.2 fct_inorder()/fct_infreq()/fct_inseq()

1.3 fct_reorder()/fct_recorder2()/last2()/first2()

1.4 fct_shuffle()

1.5 fct_rev()

1.6 fct_shift()

您可能也对以下帖子感兴趣

1.1 `fct_relevel()`

1.2 `fct_inorder()/fct_infreq()/fct_inseq()`

1.3 `fct_reorder()/fct_recorder2()/last2()/first2()`

1.4 `fct_shuffle()`

1.5 `fct_rev()`

1.6 `fct_shift()`