<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>tidyverse | Tengku Hanis</title>
    <link>https://tengkuhanis.netlify.app/tag/tidyverse/</link>
      <atom:link href="https://tengkuhanis.netlify.app/tag/tidyverse/index.xml" rel="self" type="application/rss+xml" />
    <description>tidyverse</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>©Tengku Hanis 2020-2025 Made with [blogdown](https://github.com/rstudio/blogdown)</copyright><lastBuildDate>Tue, 18 May 2021 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://tengkuhanis.netlify.app/images/icon_hua2ec155b4296a9c9791d015323e16eb5_11927_512x512_fill_lanczos_center_2.png</url>
      <title>tidyverse</title>
      <link>https://tengkuhanis.netlify.app/tag/tidyverse/</link>
    </image>
    
    <item>
      <title>A summary of forcats package</title>
      <link>https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/</link>
      <pubDate>Tue, 18 May 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;img src=&#34;forcats_logo.png&#34; width=&#34;30%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;I just watched a &lt;a href=&#34;https://youtu.be/qWYgNjnHNWI&#34;&gt;youtube video by Andrew Couch&lt;/a&gt; about his commonly used function in readr, stringr, and forcats packages. Although, I have used forcats package before, I realised that I have not fully utilised all of its function.&lt;/p&gt;
&lt;p&gt;So, in this post, I have summarised main function of forcats that I find useful in my day-to-day R coding. Basically, more like a note to myself.&lt;/p&gt;
&lt;div id=&#34;main-functions&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Main functions&lt;/h2&gt;
&lt;p&gt;We will use &lt;a href=&#34;https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars&#34;&gt;mtcars data&lt;/a&gt; to demonstrate each function. forcats is part of tiyverse packages. So, it will load, once we load the tidyverse packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
glimpse(mtcars)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Rows: 32
## Columns: 11
## $ mpg  &amp;lt;dbl&amp;gt; 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
## $ cyl  &amp;lt;dbl&amp;gt; 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
## $ disp &amp;lt;dbl&amp;gt; 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
## $ hp   &amp;lt;dbl&amp;gt; 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
## $ drat &amp;lt;dbl&amp;gt; 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
## $ wt   &amp;lt;dbl&amp;gt; 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
## $ qsec &amp;lt;dbl&amp;gt; 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
## $ vs   &amp;lt;dbl&amp;gt; 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
## $ am   &amp;lt;dbl&amp;gt; 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
## $ gear &amp;lt;dbl&amp;gt; 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
## $ carb &amp;lt;dbl&amp;gt; 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are 9 forcats functions that I think very useful.&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;factor()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;factor()&lt;/code&gt; changes variable type into a factor or categorical type&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mtcars$carb &amp;lt;- factor(mtcars$carb)
glimpse(mtcars)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Rows: 32
## Columns: 11
## $ mpg  &amp;lt;dbl&amp;gt; 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
## $ cyl  &amp;lt;dbl&amp;gt; 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
## $ disp &amp;lt;dbl&amp;gt; 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
## $ hp   &amp;lt;dbl&amp;gt; 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
## $ drat &amp;lt;dbl&amp;gt; 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
## $ wt   &amp;lt;dbl&amp;gt; 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
## $ qsec &amp;lt;dbl&amp;gt; 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
## $ vs   &amp;lt;dbl&amp;gt; 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
## $ am   &amp;lt;dbl&amp;gt; 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
## $ gear &amp;lt;dbl&amp;gt; 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
## $ carb &amp;lt;fct&amp;gt; 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_inorder()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function sorts factor levels based on the order of appearance in the dataset.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_inorder(mtcars$carb) # levels based on the order of appearance&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 4 1 2 3 6 8&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_infreq()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function sorts factor levels based on the frequency of values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_count(mtcars$carb) # this is forcats function as well, count factor level&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 2
##   f         n
##   &amp;lt;fct&amp;gt; &amp;lt;int&amp;gt;
## 1 1         7
## 2 2        10
## 3 3         3
## 4 4        10
## 5 6         1
## 6 8         1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_infreq(mtcars$carb) # levels based on the frequency values&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 2 4 1 3 6 8&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;4&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_relevel()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function can be used to change the order manually.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_relevel(mtcars$carb, c(&amp;quot;8&amp;quot;, &amp;quot;6&amp;quot;, &amp;quot;4&amp;quot;, &amp;quot;3&amp;quot;, &amp;quot;2&amp;quot;, &amp;quot;1&amp;quot;)) # manually changed new levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 8 6 4 3 2 1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fct_relevel()&lt;/code&gt; can also be used to change one factor level only.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_relevel(mtcars$carb, &amp;quot;8&amp;quot;, after = 2) # change level 8 to the third place&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 1 2 8 3 4 6&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;5&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_reorder()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function changes the order based on another variable. Let’s change variable carb’s levels based on value of variable disp.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_reorder(mtcars$carb, mtcars$disp, .fun = sum, .desc = TRUE) # new level based on disp value&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 4 2 1 3 8 6&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mtcars %&amp;gt;% 
  group_by(carb) %&amp;gt;% 
  summarise(sum_disp = sum(disp)) %&amp;gt;% 
  arrange(desc(sum_disp)) # this is basically what we do with fct_reorder() above&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 2
##   carb  sum_disp
##   &amp;lt;fct&amp;gt;    &amp;lt;dbl&amp;gt;
## 1 4        3088.
## 2 2        2082.
## 3 1         940.
## 4 3         827.
## 5 8         301 
## 6 6         145&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Additionally, &lt;code&gt;fct_reorder()&lt;/code&gt; can be used with plotting as well.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Original plot
ggplot(mtcars, aes(x = carb, y = disp)) +
  geom_col()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/index.en_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Plot with changed levels
mtcars %&amp;gt;% 
  mutate(carb = fct_reorder(carb, disp, .fun = sum, .desc = TRUE)) %&amp;gt;% 
  ggplot(aes(x = carb, y = disp)) +
  geom_col()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/index.en_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;ol start=&#34;6&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_lump()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function lumps factor levels into other factors. There are 5 variants of this function:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fct_lump()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fct_lump_min()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fct_lump_n()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fct_lump_lowfreq()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The remaining one variant is &lt;code&gt;fct_lump_prop()&lt;/code&gt;. It is not in the example below as I do not find it useful at least for my current R coding routine.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;fct_lump()&lt;/code&gt; automatically lump small frequency factor group into one group.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_count(mtcars$carb) # this is forcats function as well, count factor level&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 2
##   f         n
##   &amp;lt;fct&amp;gt; &amp;lt;int&amp;gt;
## 1 1         7
## 2 2        10
## 3 3         3
## 4 4        10
## 5 6         1
## 6 8         1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_lump(mtcars$carb) %&amp;gt;% fct_count() &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 4 x 2
##   f         n
##   &amp;lt;fct&amp;gt; &amp;lt;int&amp;gt;
## 1 1         7
## 2 2        10
## 3 4        10
## 4 Other     5&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fct_lump_min()&lt;/code&gt; lump factor group into one group based on the given value.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_lump_min(mtcars$carb, min = 2)) # group 6 and 8 lump into one group&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##     1     2     3     4 Other 
##     7    10     3    10     2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fct_lump_n()&lt;/code&gt; lump all level except for the &lt;em&gt;n&lt;/em&gt; most frequent factor groups.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_lump_n(mtcars$carb, n = 2)) # 2 frequent group only, others in one group&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##     2     4 Other 
##    10    10    12&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fct_lump_lowfreq()&lt;/code&gt; lump small frequent groups into one group, while making sure that particular one group is still the smallest.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_lump_lowfreq(mtcars$carb, other_level = &amp;quot;low&amp;quot;)) # group low is still the smallest&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##   1   2   4 low 
##   7  10  10   5&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;7&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_other()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;fct_other()&lt;/code&gt; is much like &lt;code&gt;fct_lump()&lt;/code&gt;, except we manually choose which factor groups to be combined.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_other(mtcars$carb, keep = c(&amp;quot;8&amp;quot;, &amp;quot;6&amp;quot;))) &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##     6     8 Other 
##     1     1    30&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;8&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_recode()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function is used to rename or relabel the factor group.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_recode(mtcars$carb, hanis = &amp;quot;8&amp;quot;)) &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##     1     2     3     4     6 hanis 
##     7    10     3    10     1     1&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;9&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_relabel()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;fct_relabel()&lt;/code&gt; is extremely useful if we want to rename quite a number of factor groups.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(mtcars$carb) # original groups&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##  1  2  3  4  6  8 
##  7 10  3 10  1  1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_relabel(mtcars$carb, ~ c(&amp;quot;abu&amp;quot;, &amp;quot;ali&amp;quot;, &amp;quot;chong&amp;quot;, &amp;quot;siti&amp;quot;, &amp;quot;krish&amp;quot;, &amp;quot;lee&amp;quot;))) # new named groups&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##   abu   ali chong  siti krish   lee 
##     7    10     3    10     1     1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Reference:&lt;br /&gt;
&lt;a href=&#34;https://forcats.tidyverse.org/index.html&#34; class=&#34;uri&#34;&gt;https://forcats.tidyverse.org/index.html&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Base R vs tidyverse</title>
      <link>https://tengkuhanis.netlify.app/post/2021-05-04-base-r-vs-tidyverse/</link>
      <pubDate>Tue, 04 May 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/2021-05-04-base-r-vs-tidyverse/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/2021-05-04-base-r-vs-tidyverse/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;First of all, this write up is mean for a beginner in R.&lt;/p&gt;
&lt;p&gt;Things can be done in many ways in R. In facts, R has been very flexible in this regard compared to other statistical softwares. Basic things such as selecting a column, slicing a row, filtering a data based on certain condition can be done using a base R function. However, all these things can also be done using a tidyverse approach.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.tidyverse.org/&#34;&gt;Tidyverse&lt;/a&gt; basically, a collection of packages that can be loaded in a line of function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tidyverse is developed by “RStudio people” pioneered by &lt;a href=&#34;http://hadley.nz/&#34;&gt;Hadley Wickham&lt;/a&gt;, which means that these packages will be continuously maintained and updated.&lt;/p&gt;
&lt;p&gt;So, without further ado, these are the comparisons between these two approaches for some very basic thingy:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Select or deselect a column and a row&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
iris[1:5, c(&amp;quot;Sepal.Length&amp;quot;, &amp;quot;Sepal.Width&amp;quot;)]
iris[1:5,c(1,2)] # similar to above
iris[1:5, -1]

# Tidyverse
iris %&amp;gt;% 
  select(Sepal.Length, Sepal.Width) %&amp;gt;% 
  slice(1:5)
iris %&amp;gt;% 
  select(-Sepal.Length) %&amp;gt;% 
  slice(1:5)&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Filter based on condition&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
iris[iris$Species == &amp;quot;setosa&amp;quot;, ]

# Tidyverse
iris %&amp;gt;% 
  filter(Species == &amp;quot;setosa&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Mutate (transmute replace the variable)&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
iris$SL_minus10 &amp;lt;- iris$Sepal.Length - 10

# Tidyverse
iris %&amp;gt;% 
  mutate(SL_minus10 = Sepal.Length - 10)&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;4&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Sort variable&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
iris[order(-iris$Sepal.Width),]

# Tidyverse
iris %&amp;gt;% 
  arrange(desc(Sepal.Length))&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;5&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Group by (and get mean for variable Sepal.Width)&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Not really base R
doBy::summaryBy(Sepal.Width~Species, iris, FUN = mean) 

# Tidyverse
iris %&amp;gt;% 
  group_by(Species) %&amp;gt;% 
  summarise(mean_SW = mean(Sepal.Width))&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;6&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Rename variable&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
colnames(iris)[6] &amp;lt;- &amp;quot;hanis&amp;quot;

# Tidyverse
iris %&amp;gt;% 
  rename(Species = hanis)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, that’s it. Overall, tidyverse give a clarity in understanding the code as it reads from left to right. On the contrary, the base R approach reads from inside to outside, especially for a more complicated code.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
