<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>R | Tengku Hanis</title>
    <link>https://tengkuhanis.netlify.app/category/r/</link>
      <atom:link href="https://tengkuhanis.netlify.app/category/r/index.xml" rel="self" type="application/rss+xml" />
    <description>R</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>©Tengku Hanis 2020-2025 Made with [blogdown](https://github.com/rstudio/blogdown)</copyright><lastBuildDate>Wed, 22 Feb 2023 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://tengkuhanis.netlify.app/images/icon_hua2ec155b4296a9c9791d015323e16eb5_11927_512x512_fill_lanczos_center_2.png</url>
      <title>R</title>
      <link>https://tengkuhanis.netlify.app/category/r/</link>
    </image>
    
    <item>
      <title>Mapping the states in Malaysia</title>
      <link>https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/</link>
      <pubDate>Wed, 22 Feb 2023 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/</guid>
      <description>


&lt;p&gt;I have written two blog posts about making map in R:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;a href=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/&#34;&gt;Making maps with R (my first attempt ever!)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://tengkuhanis.netlify.app/post/my-first-interactive-map-with-leaflet/&#34;&gt;My first interactive map with {leaflet}&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This post is sort of a continuation to the &lt;a href=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/&#34;&gt;first blog post&lt;/a&gt;. I have shown how to plot a coordinate to a map in that post specifically for Malaysia.&lt;/p&gt;
&lt;p&gt;However, using the two approaches in the previous blog post, we cannot plot the coordinate to a certain states in Malaysia. At least I am not unable to find how to do that after googling around. But, we can plot the borneo or peninsular of Malaysia using the two approaches.&lt;/p&gt;
&lt;div id=&#34;plot-the-peninsular-of-malaysia-not-the-best-way&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Plot the peninsular of Malaysia (not the best way)&lt;/h2&gt;
&lt;p&gt;Load the necessary packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(rworldmap) 
library(tidyverse)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First, we get the data. The data is about desa clinic (klinik desa) in Malaysia.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;clinicDesa &amp;lt;- read.csv(&amp;quot;https://raw.githubusercontent.com/tengku-hanis/clinic-data/main/clinicdesa.csv&amp;quot;)
head(clinicDesa)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   id facilities_id                     name              address postcode
## 1  1    KD01010019  KLINIK DESA ASSAM BUBOK     Jalan Batu Pahat    86400
## 2  2    KD01010020   KLINIK DESA BATU PUTIH    Jalan Behor Temak    83000
## 3  3    KD01010021      KLINIK DESA BEROLEH    Jalan Parit Besar    83300
## 4  4    KD01010022        KLINIK DESA BINDU Jalan Tongkang Pecah    83010
## 5  5    KD01010023 KLINIK DESA KAMPUNG BARU   Jalan Parit Kemang    83710
## 6  6    KD01010024 KLINIK DESA KANGKAR BARU      Jalan Meng Seng    85400
##             city   district  state tel fax website email image latitude
## 1     Ayer Hitam Batu Pahat Johor       NA      NA    NA    NA 1.933330
## 2          Bagan Batu Pahat Johor       NA      NA    NA    NA 1.889100
## 3     Sri Gading Batu Pahat Johor       NA      NA    NA    NA 1.877890
## 4 Tongkang Pecah Batu Pahat Johor       NA      NA    NA    NA 1.901515
## 5    Parit Yaani Batu Pahat Johor       NA      NA    NA    NA 1.905120
## 6      Yong Peng Batu Pahat Johor       NA      NA    NA    NA 2.065310
##   longitude likes rating status
## 1  103.1167     0      0    NEW
## 2  102.8778     0      0    NEW
## 3  102.9858     0      0    NEW
## 4  102.9665     0      0    NEW
## 5  103.0372     0      0    NEW
## 6  103.1248     0      0    NEW&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First we plot the data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(clinicDesa, aes(longitude, latitude)) +
  geom_point() +
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/index.en_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Remove the two points.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;clinicDesa2 &amp;lt;- clinicDesa %&amp;gt;% filter(longitude &amp;gt; 25)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, plot the updated data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(clinicDesa2, aes(longitude, latitude)) +
  geom_point() +
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/index.en_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;From the plot, we already know the left side consists of the coordinates in the peninsular of Malaysia. So, we can limit our plot by limit the longitude &amp;lt; 105 and longitude &amp;gt; 97.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Get base map
global &amp;lt;- map_data(&amp;quot;world&amp;quot;) 

# Plot
ggplot() + 
  geom_polygon(data = global %&amp;gt;% filter(region == &amp;quot;Malaysia&amp;quot;), aes(x=long, y = lat, group = group), 
               fill = &amp;quot;gray85&amp;quot;) + 
  coord_fixed(1.3) +
  geom_point(data = clinicDesa2, aes(x = longitude, y = latitude)) +
  theme_minimal() + 
  xlab(&amp;quot;Longitude&amp;quot;) +
  ylab(&amp;quot;Latitude&amp;quot;) +
  labs(title = &amp;quot;Desa clinic in the peninsular of Malaysia&amp;quot;, 
       subtitle = &amp;quot;(Data last updated: Klinik Desa - 9 Mac 2021)&amp;quot;,
       caption = expression(paste(italic(&amp;quot;Sumber data: https://www.data.gov.my/data/ms_MY/group/pemetaan&amp;quot;)))) +
  xlim(97, 105) #limit overall map to peninsular of Malaysia&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/index.en_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;I am not going to re-explain the above and below codes as I have explain it in &lt;a href=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/&#34;&gt;the previous blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This approach also works with &lt;code&gt;rworldmap&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Get base map
world &amp;lt;- getMap(resolution = &amp;quot;low&amp;quot;)
msia &amp;lt;- world[world@data$ADMIN == &amp;quot;Malaysia&amp;quot;, ]

# Plot
ggplot() +
  geom_polygon(data = msia, aes(x = long, y = lat, group = group), fill = NA, colour = &amp;quot;black&amp;quot;) +
  geom_point(data = clinicDesa2, aes(x = longitude, y = latitude)) +
  coord_quickmap() + 
  theme_minimal() + 
  xlab(&amp;quot;Longitude&amp;quot;) +
  ylab(&amp;quot;Latitude&amp;quot;) +
  labs(title = &amp;quot;Desa clinic in the peninsular of Malaysia&amp;quot;, 
       subtitle = &amp;quot;(Data last updated: Klinik Desa - 9 Mac 2021)&amp;quot;,
       caption = expression(paste(italic(&amp;quot;Sumber data: https://www.data.gov.my/data/ms_MY/group/pemetaan&amp;quot;)))) +
  xlim(97, 105) #limit overall map to peninsular of Malaysia&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/index.en_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;As we can see using the two approaches, we can plot the borne and peninsular sides of Malaysia. But, at least to my knowledge we cannot apply this approach if we want to plot a coordinate to certain states in Malaysia.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;plot-the-states-in-malaysia&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Plot the states in Malaysia&lt;/h2&gt;
&lt;p&gt;Load the necessary package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(geodata)
library(tidyterra)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we can see from the package, we going to use a &lt;code&gt;geodata&lt;/code&gt; package. &lt;code&gt;tidyterra&lt;/code&gt; is used to supplements the ggplot. First, let’s limit the data into desa clinics in Terengganu only.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;clinic_trg &amp;lt;- 
  clinicDesa %&amp;gt;% 
  filter(state == &amp;quot;Terengganu&amp;quot;) %&amp;gt;% 
  dplyr::select(latitude, longitude) 
head(clinic_trg)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   latitude longitude
## 1  5.48533  102.4914
## 2  5.81578  102.5778
## 3  5.70886  102.4892
## 4  5.75722  102.5303
## 5  5.67444  102.6289
## 6  5.69875  102.5430&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we get the map from the &lt;code&gt;geodata&lt;/code&gt; package with the boundaries at the district level.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Malaysia &amp;lt;- gadm(country = &amp;quot;MYS&amp;quot;, level = 2, path=tempdir())&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can use the below information to limit the map to Terengganu state only.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Malaysia$NAME_1&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   [1] &amp;quot;Johor&amp;quot;           &amp;quot;Johor&amp;quot;           &amp;quot;Johor&amp;quot;           &amp;quot;Johor&amp;quot;          
##   [5] &amp;quot;Johor&amp;quot;           &amp;quot;Johor&amp;quot;           &amp;quot;Johor&amp;quot;           &amp;quot;Johor&amp;quot;          
##   [9] &amp;quot;Johor&amp;quot;           &amp;quot;Johor&amp;quot;           &amp;quot;Kedah&amp;quot;           &amp;quot;Kedah&amp;quot;          
##  [13] &amp;quot;Kedah&amp;quot;           &amp;quot;Kedah&amp;quot;           &amp;quot;Kedah&amp;quot;           &amp;quot;Kedah&amp;quot;          
##  [17] &amp;quot;Kedah&amp;quot;           &amp;quot;Kedah&amp;quot;           &amp;quot;Kedah&amp;quot;           &amp;quot;Kedah&amp;quot;          
##  [21] &amp;quot;Kedah&amp;quot;           &amp;quot;Kedah&amp;quot;           &amp;quot;Kelantan&amp;quot;        &amp;quot;Kelantan&amp;quot;       
##  [25] &amp;quot;Kelantan&amp;quot;        &amp;quot;Kelantan&amp;quot;        &amp;quot;Kelantan&amp;quot;        &amp;quot;Kelantan&amp;quot;       
##  [29] &amp;quot;Kelantan&amp;quot;        &amp;quot;Kelantan&amp;quot;        &amp;quot;Kelantan&amp;quot;        &amp;quot;Kelantan&amp;quot;       
##  [33] &amp;quot;Kuala Lumpur&amp;quot;    &amp;quot;Labuan&amp;quot;          &amp;quot;Melaka&amp;quot;          &amp;quot;Melaka&amp;quot;         
##  [37] &amp;quot;Melaka&amp;quot;          &amp;quot;Negeri Sembilan&amp;quot; &amp;quot;Negeri Sembilan&amp;quot; &amp;quot;Negeri Sembilan&amp;quot;
##  [41] &amp;quot;Negeri Sembilan&amp;quot; &amp;quot;Negeri Sembilan&amp;quot; &amp;quot;Negeri Sembilan&amp;quot; &amp;quot;Negeri Sembilan&amp;quot;
##  [45] &amp;quot;Pahang&amp;quot;          &amp;quot;Pahang&amp;quot;          &amp;quot;Pahang&amp;quot;          &amp;quot;Pahang&amp;quot;         
##  [49] &amp;quot;Pahang&amp;quot;          &amp;quot;Pahang&amp;quot;          &amp;quot;Pahang&amp;quot;          &amp;quot;Pahang&amp;quot;         
##  [53] &amp;quot;Pahang&amp;quot;          &amp;quot;Pahang&amp;quot;          &amp;quot;Pahang&amp;quot;          &amp;quot;Perak&amp;quot;          
##  [57] &amp;quot;Perak&amp;quot;           &amp;quot;Perak&amp;quot;           &amp;quot;Perak&amp;quot;           &amp;quot;Perak&amp;quot;          
##  [61] &amp;quot;Perak&amp;quot;           &amp;quot;Perak&amp;quot;           &amp;quot;Perak&amp;quot;           &amp;quot;Perak&amp;quot;          
##  [65] &amp;quot;Perak&amp;quot;           &amp;quot;Perlis&amp;quot;          &amp;quot;Pulau Pinang&amp;quot;    &amp;quot;Pulau Pinang&amp;quot;   
##  [69] &amp;quot;Pulau Pinang&amp;quot;    &amp;quot;Pulau Pinang&amp;quot;    &amp;quot;Pulau Pinang&amp;quot;    &amp;quot;Putrajaya&amp;quot;      
##  [73] &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;          
##  [77] &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;          
##  [81] &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;          
##  [85] &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;          
##  [89] &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;          
##  [93] &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;           &amp;quot;Sabah&amp;quot;          
##  [97] &amp;quot;Sabah&amp;quot;           &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;        
## [101] &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;        
## [105] &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;        
## [109] &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;        
## [113] &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;        
## [117] &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;        
## [121] &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;        
## [125] &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;         &amp;quot;Sarawak&amp;quot;        
## [129] &amp;quot;Selangor&amp;quot;        &amp;quot;Selangor&amp;quot;        &amp;quot;Selangor&amp;quot;        &amp;quot;Selangor&amp;quot;       
## [133] &amp;quot;Selangor&amp;quot;        &amp;quot;Selangor&amp;quot;        &amp;quot;Selangor&amp;quot;        &amp;quot;Selangor&amp;quot;       
## [137] &amp;quot;Selangor&amp;quot;        &amp;quot;Trengganu&amp;quot;       &amp;quot;Trengganu&amp;quot;       &amp;quot;Trengganu&amp;quot;      
## [141] &amp;quot;Trengganu&amp;quot;       &amp;quot;Trengganu&amp;quot;       &amp;quot;Trengganu&amp;quot;       &amp;quot;Trengganu&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, this is the plot for Terengganu.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Trg &amp;lt;- Malaysia[138:144,]
plot(Trg)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/index.en_files/figure-html/unnamed-chunk-12-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We going to the map this in ggplot, and stacked the map layer with the coordinate layer.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot() +
  geom_spatvector(data = Trg, color = &amp;quot;grey&amp;quot;, fill = NA) +
  geom_point(data = clinic_trg, aes(x = longitude, y = latitude, color = &amp;quot;red&amp;quot;)) +
  theme_minimal() +
  theme(legend.position = &amp;quot;none&amp;quot;) +
  xlab(&amp;quot;Longitude&amp;quot;) +
  ylab(&amp;quot;Latitude&amp;quot;) +
  labs(title = &amp;quot;Desa clinic in Terengganu, Malaysia&amp;quot;, 
       subtitle = &amp;quot;(Data last updated: Klinik Desa - 9 Mac 2021)&amp;quot;,
       caption = expression(paste(italic(&amp;quot;Sumber data: https://www.data.gov.my/data/ms_MY/group/pemetaan&amp;quot;)))) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/index.en_files/figure-html/unnamed-chunk-13-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;geom_spatvector&lt;/code&gt; is from &lt;code&gt;tidyterra&lt;/code&gt; package. Alternatively, we can plot using &lt;code&gt;geom_sf&lt;/code&gt;but we need to convert the &lt;code&gt;SpatVector&lt;/code&gt; data into &lt;code&gt;sf&lt;/code&gt; object using &lt;code&gt;sf::st_as_sf&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(data = sf::st_as_sf(Trg)) +
  geom_sf(color = &amp;quot;grey&amp;quot;, fill = NA) +
  geom_point(data = clinic_trg, aes(x = longitude, y = latitude, color = &amp;quot;red&amp;quot;)) +
  theme_minimal() +
  theme(legend.position = &amp;quot;none&amp;quot;) +
  xlab(&amp;quot;Longitude&amp;quot;) +
  ylab(&amp;quot;Latitude&amp;quot;) +
  labs(title = &amp;quot;Desa clinic in Terengganu, Malaysia&amp;quot;, 
       subtitle = &amp;quot;(Data last updated: Klinik Desa - 9 Mac 2021)&amp;quot;,
       caption = expression(paste(italic(&amp;quot;Sumber data: https://www.data.gov.my/data/ms_MY/group/pemetaan&amp;quot;)))) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/index.en_files/figure-html/unnamed-chunk-14-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Both approaches produce the same plot.&lt;/p&gt;
&lt;p&gt;We can further add district labels to the plots. For example, using the &lt;code&gt;geom_sf&lt;/code&gt;, we can stack it with &lt;code&gt;geom_sf_label&lt;/code&gt; layer. We can alternatively use &lt;code&gt;theme_void&lt;/code&gt; to remove the background and the map axis.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(data = sf::st_as_sf(Trg)) +
  geom_sf(color = &amp;quot;grey&amp;quot;, fill = NA) +
  geom_sf_label(aes(label = NAME_2)) +
  geom_point(data = clinic_trg, aes(x = longitude, y = latitude, color = &amp;quot;red&amp;quot;)) +
  theme_void() +
  theme(legend.position = &amp;quot;none&amp;quot;) +
  xlab(&amp;quot;Longitude&amp;quot;) +
  ylab(&amp;quot;Latitude&amp;quot;) +
  labs(title = &amp;quot;Desa clinic in Terengganu, Malaysia&amp;quot;, 
       subtitle = &amp;quot;(Data last updated: Klinik Desa - 9 Mac 2021)&amp;quot;,
       caption = expression(paste(italic(&amp;quot;Sumber data: https://www.data.gov.my/data/ms_MY/group/pemetaan&amp;quot;)))) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/mapping-the-states-in-malaysia/index.en_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Using UMAP preprocessing for image classification</title>
      <link>https://tengkuhanis.netlify.app/post/using-umap-preprocessing-for-image-classification/</link>
      <pubDate>Wed, 16 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/using-umap-preprocessing-for-image-classification/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/using-umap-preprocessing-for-image-classification/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;umap&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;UMAP&lt;/h2&gt;
&lt;p&gt;Uniform manifold approximation and projection or in short UMAP is a type of dimension reduction techniques. So, basically UMAP will project a set of features into a smaller space. UMAP can be a supervised technique in which we give a label or an outcome or an unsupervised one. For those interested to know in detail how UMAP works can refer to this &lt;a href=&#34;https://umap-learn.readthedocs.io/en/latest/how_umap_works.html&#34;&gt;reference&lt;/a&gt;. For those prefer a much simpler or shorter version of it, I recommend a &lt;a href=&#34;https://www.youtube.com/watch?v=eN0wFzBA4Sc&amp;amp;list=WL&amp;amp;index=2&#34;&gt;YouTube video by Joshua Starmer&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;example-in-r&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Example in R&lt;/h2&gt;
&lt;p&gt;We going to see how to apply a UMAP techniques for image preprocessing and further classify the images using kNN and naive bayes.&lt;/p&gt;
&lt;p&gt;These are the packages that we need.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(keras) #for data and reshape to tabular format
library(tidymodels)
library(embed) #for umap
library(discrim) #for naive bayes model&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We going to use the famous MNIST dataset. This dataset contained a handwritten digit from 0 to 9. This dataset is available in &lt;code&gt;keras&lt;/code&gt; package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mnist_data &amp;lt;- dataset_mnist()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loaded Tensorflow version 2.2.0&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;image_data &amp;lt;- mnist_data$train$x
image_labels &amp;lt;- mnist_data$train$y
image_data %&amp;gt;% dim()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 60000    28    28&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For example this is the image for the second row.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;image_data[2, 1:28, 1:28] %&amp;gt;% 
  t() %&amp;gt;% 
  image(col = gray.colors(256))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/using-umap-preprocessing-for-image-classification/index.en_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Next, we going to change the image into a tabular data frame format. We going to limit the data to the first 1000 rows or images out of the total 6000 images.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Reformat to tabular format
image_data &amp;lt;- array_reshape(image_data, dim = c(60000, 28*28))
image_data %&amp;gt;% dim()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 60000   784&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;image_data &amp;lt;- image_data[1:10000,]
image_labels &amp;lt;- image_labels[1:10000]

# Reformat to data frame
full_data &amp;lt;- 
  data.frame(image_data) %&amp;gt;% 
  bind_cols(label = image_labels) %&amp;gt;% 
  mutate(label = as.factor(label))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, we going to split the data and create a 3-folds cross-validation sets for the sake of simplicity.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Split data
set.seed(123)
ind &amp;lt;- initial_split(full_data)
data_train &amp;lt;- training(ind)  
data_test &amp;lt;- testing(ind)

# 10-folds CV
set.seed(123)
data_cv &amp;lt;- vfold_cv(data_train, v = 3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For recipe specification, we going to scale and center all the predictor after creating a new variable using &lt;code&gt;step_umap()&lt;/code&gt;. Notice that in &lt;code&gt;step_umap()&lt;/code&gt; we supply the outcome and we tune the number of components (&lt;code&gt;num_comp&lt;/code&gt;).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rec &amp;lt;- 
  recipe(label ~ ., data = data_train) %&amp;gt;% 
  step_umap(all_predictors(), outcome = vars(label), num_comp = tune()) %&amp;gt;% 
  step_center(all_predictors()) %&amp;gt;% 
  step_scale(all_predictors())&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We create a a base workflow.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wf &amp;lt;- 
  workflow() %&amp;gt;% 
  add_recipe(rec) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We going to use two models as classifier:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;kNN&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Naive bayes&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For each classifier, we going to create a regular grid of parameters to be tuned and further run a regular grid search.&lt;/p&gt;
&lt;p&gt;For kNN.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# knn model
knn_mod &amp;lt;- 
  nearest_neighbor(neighbors = tune()) %&amp;gt;% 
  set_mode(&amp;quot;classification&amp;quot;) %&amp;gt;% 
  set_engine(&amp;quot;kknn&amp;quot;)

# knn grid
knn_grid &amp;lt;- grid_regular(neighbors(), num_comp(range = c(2, 8)), levels = 3)

# Tune grid search
knn_tune &amp;lt;- 
  tune_grid(
  wf %&amp;gt;% add_model(knn_mod),
  resamples = data_cv,
  grid = knn_grid, 
  control = control_grid(verbose = F)
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For naive bayes.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# nb model
nb_mod &amp;lt;- 
  naive_Bayes(smoothness = tune()) %&amp;gt;% 
  set_mode(&amp;quot;classification&amp;quot;) %&amp;gt;% 
  set_engine(&amp;quot;naivebayes&amp;quot;)

# nb grid
nb_grid &amp;lt;- grid_regular(smoothness(), num_comp(range = c(2, 10)), levels = 3)

# Tune grid search
nb_tune &amp;lt;- 
  tune_grid(
    wf %&amp;gt;% add_model(nb_mod),
    resamples = data_cv,
    grid = nb_grid, 
    control = control_grid(verbose = F)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see our tuning performance of our model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# knn model
knn_tune %&amp;gt;% 
  show_best(&amp;quot;roc_auc&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 8
##   neighbors num_comp .metric .estimator  mean     n  std_err .config            
##       &amp;lt;int&amp;gt;    &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;              
## 1        10        8 roc_auc hand_till  0.961     3 0.000268 Preprocessor3_Mode~
## 2        10        5 roc_auc hand_till  0.961     3 0.000421 Preprocessor2_Mode~
## 3         5        8 roc_auc hand_till  0.959     3 0.000757 Preprocessor3_Mode~
## 4        10        2 roc_auc hand_till  0.959     3 0.000737 Preprocessor1_Mode~
## 5         5        5 roc_auc hand_till  0.958     3 0.000740 Preprocessor2_Mode~&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;knn_tune %&amp;gt;% 
  show_best(&amp;quot;accuracy&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 8
##   neighbors num_comp .metric  .estimator  mean     n std_err .config            
##       &amp;lt;int&amp;gt;    &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;              
## 1        10        8 accuracy multiclass 0.914     3 0.00104 Preprocessor3_Mode~
## 2         5        8 accuracy multiclass 0.913     3 0.00315 Preprocessor3_Mode~
## 3        10        5 accuracy multiclass 0.912     3 0.00114 Preprocessor2_Mode~
## 4         5        5 accuracy multiclass 0.91      3 0.00139 Preprocessor2_Mode~
## 5        10        2 accuracy multiclass 0.910     3 0.00175 Preprocessor1_Mode~&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# nb model
nb_tune %&amp;gt;% 
  show_best(&amp;quot;roc_auc&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 8
##   smoothness num_comp .metric .estimator  mean     n  std_err .config           
##        &amp;lt;dbl&amp;gt;    &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;             
## 1        1.5       10 roc_auc hand_till  0.971     3 0.000400 Preprocessor3_Mod~
## 2        1.5        6 roc_auc hand_till  0.971     3 0.000997 Preprocessor2_Mod~
## 3        1         10 roc_auc hand_till  0.971     3 0.000634 Preprocessor3_Mod~
## 4        1          6 roc_auc hand_till  0.970     3 0.00124  Preprocessor2_Mod~
## 5        0.5       10 roc_auc hand_till  0.969     3 0.000808 Preprocessor3_Mod~&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nb_tune %&amp;gt;% 
  show_best(&amp;quot;accuracy&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 8
##   smoothness num_comp .metric  .estimator  mean     n  std_err .config          
##        &amp;lt;dbl&amp;gt;    &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;            
## 1        1         10 accuracy multiclass 0.913     3 0.000481 Preprocessor3_Mo~
## 2        1.5       10 accuracy multiclass 0.913     3 0.000267 Preprocessor3_Mo~
## 3        0.5       10 accuracy multiclass 0.912     3 0.000462 Preprocessor3_Mo~
## 4        1.5        6 accuracy multiclass 0.911     3 0.00135  Preprocessor2_Mo~
## 5        1          6 accuracy multiclass 0.910     3 0.00157  Preprocessor2_Mo~&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we going to select the best model from the tuned parameters and finalise our model using &lt;code&gt;last_fit()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For knn model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Finalize
knn_best &amp;lt;- knn_tune %&amp;gt;% select_best(&amp;quot;roc_auc&amp;quot;)
knn_rec &amp;lt;- 
  recipe(label ~ ., data = data_train) %&amp;gt;% 
  step_umap(all_predictors(), outcome = vars(label), num_comp = knn_best$num_comp) %&amp;gt;% 
  step_center(all_predictors()) %&amp;gt;% 
  step_scale(all_predictors())

knn_wf &amp;lt;- 
  workflow() %&amp;gt;% 
  add_recipe(knn_rec) %&amp;gt;% 
  add_model(knn_mod) %&amp;gt;% 
  finalize_workflow(knn_best) 

# Last fit
knn_lastfit &amp;lt;- 
  knn_wf %&amp;gt;% 
  last_fit(ind)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For naive bayes model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Finalize
nb_best &amp;lt;- nb_tune %&amp;gt;% select_best(&amp;quot;roc_auc&amp;quot;)
nb_rec &amp;lt;- 
  recipe(label ~ ., data = data_train) %&amp;gt;% 
  step_umap(all_predictors(), outcome = vars(label), num_comp = nb_best$num_comp) %&amp;gt;% 
  step_center(all_predictors()) %&amp;gt;% 
  step_scale(all_predictors())

nb_wf &amp;lt;- 
  workflow() %&amp;gt;% 
  add_recipe(nb_rec) %&amp;gt;% 
  add_model(nb_mod) %&amp;gt;% 
  finalize_workflow(nb_best) 

# Last fit
nb_lastfit &amp;lt;- 
  nb_wf %&amp;gt;% 
  last_fit(ind)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see the model performance on the testing data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;knn_lastfit %&amp;gt;% 
  collect_metrics() %&amp;gt;% 
  mutate(model = &amp;quot;knn&amp;quot;) %&amp;gt;% 
  dplyr::bind_rows(nb_lastfit %&amp;gt;% 
                     collect_metrics() %&amp;gt;% 
                     mutate(model = &amp;quot;nb&amp;quot;)) %&amp;gt;% 
  select(-.config)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 4 x 4
##   .metric  .estimator .estimate model
##   &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;          &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;
## 1 accuracy multiclass     0.938 knn  
## 2 roc_auc  hand_till      0.971 knn  
## 3 accuracy multiclass     0.936 nb   
## 4 roc_auc  hand_till      0.980 nb&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These are the confusion matrices.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;knn_lastfit %&amp;gt;% 
  collect_predictions() %&amp;gt;%
  conf_mat(label, .pred_class) %&amp;gt;% 
  autoplot(type = &amp;quot;heatmap&amp;quot;) +
  labs(title = &amp;quot;Confusion matrix - kNN&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/using-umap-preprocessing-for-image-classification/index.en_files/figure-html/unnamed-chunk-14-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nb_lastfit %&amp;gt;% 
  collect_predictions() %&amp;gt;%
  conf_mat(label, .pred_class) %&amp;gt;% 
  autoplot(type = &amp;quot;heatmap&amp;quot;) +
  labs(title = &amp;quot;Confusion matrix - naive bayes&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/using-umap-preprocessing-for-image-classification/index.en_files/figure-html/unnamed-chunk-14-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Lastly, we can compare the ROC plots for each class.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;knn_lastfit %&amp;gt;% 
  collect_predictions() %&amp;gt;%
  mutate(id = &amp;quot;knn&amp;quot;) %&amp;gt;% 
  bind_rows(
    nb_lastfit %&amp;gt;% 
      collect_predictions() %&amp;gt;% 
      mutate(id = &amp;quot;nb&amp;quot;)
            ) %&amp;gt;% 
  group_by(id) %&amp;gt;% 
  roc_curve(label, .pred_0:.pred_9) %&amp;gt;% 
  autoplot()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/using-umap-preprocessing-for-image-classification/index.en_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I believe UMAP is quite good and can be used as one of preprocessing step in image classification. We are able to get a pretty good performance result in this post. I believe if the the parameter tuning approach is a bit more rigorous, the performance result will be a lot better.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Fitted vs predict in R</title>
      <link>https://tengkuhanis.netlify.app/post/fitted-vs-predict-in-r/</link>
      <pubDate>Sun, 09 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/fitted-vs-predict-in-r/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/fitted-vs-predict-in-r/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;There are two functions in R that seems almost similar yet different:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fitted()&lt;/code&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;predict()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;First let’s prepare some data first.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Packages
library(dplyr)

# Data
set.seed(123)
dat &amp;lt;- 
  iris %&amp;gt;% 
  mutate(twoGp = sample(c(&amp;quot;Gp1&amp;quot;, &amp;quot;Gp2&amp;quot;), 150, replace = T), #create two group factor
         twoGp = as.factor(twoGp))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is our data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(dat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species   twoGp   
##  setosa    :50   Gp1:76  
##  versicolor:50   Gp2:74  
##  virginica :50           
##                          
##                          
## &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fitted()&lt;/code&gt; is used to get a predicted values or &lt;span class=&#34;math inline&#34;&gt;\(\hat{y}\)&lt;/span&gt; based on the data. Let’s see this on the logistic regression.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;logR &amp;lt;- glm(twoGp ~ ., family = binomial(), data = dat)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These are the fitted values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fitted(logR) %&amp;gt;% head()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         1         2         3         4         5         6 
## 0.4074988 0.3385228 0.3772767 0.3555640 0.4255196 0.4602198&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For &lt;code&gt;predict()&lt;/code&gt;, we have three types:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Response&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Link - default&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Terms&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If no new data supplied to &lt;code&gt;predict()&lt;/code&gt;, it will use the original data used to fit the model.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Response&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The type response is identical to &lt;code&gt;fitted()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;predict(logR, type = &amp;quot;response&amp;quot;) %&amp;gt;% head()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         1         2         3         4         5         6 
## 0.4074988 0.3385228 0.3772767 0.3555640 0.4255196 0.4602198&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can confirm this as below.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;all.equal(fitted(logR), predict(logR, type = &amp;quot;response&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] TRUE&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Thus, &lt;code&gt;fitted()&lt;/code&gt; and &lt;code&gt;predict(type = &#34;response&#34;)&lt;/code&gt; give use predicted probabilities on the scale of the response variable. The first observation of this values can be interpreted as probability of Gp2 (Gp1 is a reference group) for first observation is 0.41.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Link&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;predict(type = &#34;link&#34;)&lt;/code&gt; gives us predicted probabilities on the logit scale or log odds prediction.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;predict(logR, type = &amp;quot;link&amp;quot;) %&amp;gt;% head() #similar to predict(logR)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          1          2          3          4          5          6 
## -0.3743150 -0.6698840 -0.5011235 -0.5946702 -0.3001551 -0.1594578&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, the log odds prediction of Gp2 for the first observation is -0.37. Hence, we can get the same values if we apply a &lt;a href=&#34;https://en.wikipedia.org/wiki/Generalized_linear_model#Link_function&#34;&gt;link function&lt;/a&gt; to the fitted values.&lt;/p&gt;
&lt;p&gt;The link function for logistic regression is:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[
ln(\frac{\mu}{1 - \mu})
\]&lt;/span&gt;
So, we apply this link function to the fitted values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;logOddsProb &amp;lt;- log(fitted(logR) / (1 - fitted(logR))) 
head(logOddsProb)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          1          2          3          4          5          6 
## -0.3743150 -0.6698840 -0.5011235 -0.5946702 -0.3001551 -0.1594578&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can further confirm this as we did previously.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;all.equal(logOddsProb, predict(logR, type = &amp;quot;link&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] TRUE&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Also, we can conclude &lt;code&gt;predict(type = &#34;link&#34;)&lt;/code&gt; give use a fitted values before an application of link function (log odds).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Terms&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Lastly, we have &lt;code&gt;predict(type = &#34;terms&#34;)&lt;/code&gt;. This type gives us a matrix of fitted values for each variable of each observation in the model on the scale of linear predictor.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;predict(logR, type = &amp;quot;terms&amp;quot;) %&amp;gt;% head() &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1   0.07988782  0.28070682    0.4819893  -0.2736677 -0.9178543
## 2   0.10138230 -0.03635661    0.4819893  -0.2736677 -0.9178543
## 3   0.12287679  0.09046877    0.5024299  -0.2736677 -0.9178543
## 4   0.13362403  0.02705608    0.4615487  -0.2736677 -0.9178543
## 5   0.09063506  0.34411951    0.4819893  -0.2736677 -0.9178543
## 6   0.04764610  0.53435757    0.4206675  -0.2188976 -0.9178543&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, if we add up the values of the first observation and the constant (or intercept), we will get the same values as the log odds prediction (&lt;code&gt;predict(type = &#34;link&#34;)&lt;/code&gt;).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;predTerm &amp;lt;- predict(logR, type = &amp;quot;terms&amp;quot;)
sum(predTerm[1, ], attr(predTerm, &amp;quot;constant&amp;quot;)) #add up the first observation and the constant&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] -0.374315&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;logOddsProb[1]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         1 
## -0.374315&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Those values also similar to if we calculate manually using coefficient from the &lt;code&gt;summary()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[
LogOdds(Gp2) = \beta_0 + \beta_1(Sepal.Length) + \beta_2(Sepal.Width) + 
\]&lt;/span&gt;
&lt;span class=&#34;math display&#34;&gt;\[
\beta_3(Petal.Length) + \beta_4(Petal.Width) + \beta_5(Species)
\]&lt;/span&gt;
So, this is the values we get from the first observation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;coef(logR)[1] + coef(logR)[2]*dat$Sepal.Length[1] + coef(logR)[3]*dat$Sepal.Width[1] + coef(logR)[4]*dat$Petal.Length[1] + coef(logR)[5]*dat$Petal.Width[1] + 0 #setosa species&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## (Intercept) 
##   -0.374315&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, in &lt;code&gt;predict(type = &#34;terms&#34;)&lt;/code&gt; the values is &lt;a href=&#34;https://www.statology.org/center-data-in-r/&#34;&gt;centered&lt;/a&gt;, thus we have a different values for constant/intercept and for &lt;span class=&#34;math inline&#34;&gt;\(\beta_1(Sepal.Length)\)&lt;/span&gt;,&lt;span class=&#34;math inline&#34;&gt;\(\beta_2(Sepal.Width)\)&lt;/span&gt; and so on. For example, the values for intercept for both models are:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Intercept/constant from predict(type = &amp;quot;terms&amp;quot;)
attr(predTerm, &amp;quot;constant&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] -0.02537694&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Intercept/constant from summary()
coef(logR)[1]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## (Intercept) 
##   -1.814251&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://stackoverflow.com/a/12201502/11215767&#34; class=&#34;uri&#34;&gt;https://stackoverflow.com/a/12201502/11215767&lt;/a&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://stackoverflow.com/a/47854088/11215767&#34; class=&#34;uri&#34;&gt;https://stackoverflow.com/a/47854088/11215767&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>My first interactive map with {leaflet}</title>
      <link>https://tengkuhanis.netlify.app/post/my-first-interactive-map-with-leaflet/</link>
      <pubDate>Sun, 28 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/my-first-interactive-map-with-leaflet/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/my-first-interactive-map-with-leaflet/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/my-first-interactive-map-with-leaflet/index.en_files/htmlwidgets/htmlwidgets.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/my-first-interactive-map-with-leaflet/index.en_files/pymjs/pym.v1.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/my-first-interactive-map-with-leaflet/index.en_files/widgetframe-binding/widgetframe.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;I have tried creating a map with ggplot2 &lt;a href=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/&#34;&gt;previously&lt;/a&gt;. In this post, I will try to create an interactive map using &lt;code&gt;leaflet&lt;/code&gt; package in R.&lt;/p&gt;
&lt;p&gt;These are the required packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(tidygeocoder)
library(leaflet)
library(htmltools)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, I’m going to use a clinics location data in Malaysia. I already uploaded this data tomy &lt;a href=&#34;https://github.com/tengku-hanis/clinic-data&#34;&gt;GitHub repo&lt;/a&gt;. I will skip the explanation for the pre-processing part, but it is the same pre-processing as my &lt;a href=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/&#34;&gt;previous post&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Read the data
clinic1m &amp;lt;- read.csv(&amp;quot;https://raw.githubusercontent.com/tengku-hanis/clinic-data/main/clinic1m.csv&amp;quot;)
clinicDesa &amp;lt;- read.csv(&amp;quot;https://raw.githubusercontent.com/tengku-hanis/clinic-data/main/clinicdesa.csv&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;details&gt;
&lt;summary&gt;
Show code for pre-processing
&lt;/summary&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Get the missing coordinate based on postal codes
clinic1m2 &amp;lt;- 
  clinic1m %&amp;gt;%
  mutate(country = &amp;quot;malaysia&amp;quot;) %&amp;gt;% 
  select(name, postcode, country) %&amp;gt;% 
  mutate(postcode = ifelse(nchar(postcode) == 4, paste0(0, postcode), postcode)) %&amp;gt;%
  geocode(postalcode = postcode, country = country, method = &amp;quot;osm&amp;quot;)

# Add coordinate from external sources for the still missing coordinates
add_coord &amp;lt;- 
  read.table(header = T, text = &amp;quot;
postal_code    latitude   longitude
16070            6.0334    102.3499
26060            3.6228    102.3926
90700            5.8456    118.0571
26060            3.6228    102.3926&amp;quot;)

# Drop clinics with the still missing coordinate
clinic1m2 &amp;lt;- 
  clinic1m2 %&amp;gt;% 
  mutate(lat = ifelse(postcode %in% add_coord$postal_code, add_coord$latitude, lat), 
         long = ifelse(postcode %in% add_coord$postal_code, add_coord$longitude, long)) %&amp;gt;% 
  drop_na() #drop 2 clinic1m

# Bind the 2 data
all_clinic &amp;lt;- 
  clinic1m2 %&amp;gt;% 
  mutate(Type = &amp;quot;1Malaysia&amp;quot;) %&amp;gt;% 
  select(name, Type, lat, long) %&amp;gt;% 
  bind_rows(clinicDesa %&amp;gt;% 
              mutate(Type = &amp;quot;Desa&amp;quot;, 
                     lat = latitude, 
                     long = longitude) %&amp;gt;% 
              select(name, Type, lat, long)) %&amp;gt;% 
  mutate(name = str_to_title(name))&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;p&gt;First, we going to plot the coordinates to see if there is anything strange.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(all_clinic, aes(long, lat, color = Type)) +
  geom_point() +
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/my-first-interactive-map-with-leaflet/index.en_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, we are going to remove the two isolated points as seen from the plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;all_clinic2 &amp;lt;- all_clinic %&amp;gt;% filter(long &amp;gt; 25)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once we have our data ready, we can supply to &lt;code&gt;leaflet&lt;/code&gt;. We can choose the type of map from &lt;code&gt;addProviderTiles()&lt;/code&gt;. Some need an api, but the one we choose here does not. We supply the longitude and latitude of our data to &lt;code&gt;addCircleMarkers()&lt;/code&gt;, and &lt;code&gt;clusterOptions&lt;/code&gt; to cluster our data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;leaflet(all_clinic2) %&amp;gt;% 
  addProviderTiles(providers$Stamen.Watercolor) %&amp;gt;%
  addProviderTiles(providers$Stamen.TerrainLabels) %&amp;gt;%
  addCircleMarkers(~long, ~lat, 
                   clusterOptions = markerClusterOptions())&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;htmlwidget-1&#34; style=&#34;width:100%;height:480px;&#34; class=&#34;widgetframe html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-1&#34;&gt;{&#34;x&#34;:{&#34;url&#34;:&#34;index.en_files/figure-html//widgets/widget_unnamed-chunk-7.html&#34;,&#34;options&#34;:{&#34;xdomain&#34;:&#34;*&#34;,&#34;allowfullscreen&#34;:false,&#34;lazyload&#34;:false}},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p&gt;Next, we can add a label.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;labels &amp;lt;- 
  sprintf(&amp;quot;&amp;lt;strong&amp;gt;%s&amp;lt;/strong&amp;gt;&amp;quot;, all_clinic$name) %&amp;gt;% 
  lapply(htmltools::HTML)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Also, we can add a mini map to our map. Here, I change the type of map to a more appropriate one.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;leaflet(all_clinic2) %&amp;gt;% 
  addProviderTiles(providers$OpenStreetMap) %&amp;gt;%
  addCircleMarkers(~long, ~lat, popup = ~labels, # popup add the label
                   clusterOptions = markerClusterOptions()) %&amp;gt;% 
    # add a mini map
  addMiniMap(tiles = providers$OpenStreetMap, zoomLevelOffset = -3)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;htmlwidget-2&#34; style=&#34;width:100%;height:480px;&#34; class=&#34;widgetframe html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-2&#34;&gt;{&#34;x&#34;:{&#34;url&#34;:&#34;index.en_files/figure-html//widgets/widget_unnamed-chunk-10.html&#34;,&#34;options&#34;:{&#34;xdomain&#34;:&#34;*&#34;,&#34;allowfullscreen&#34;:false,&#34;lazyload&#34;:false}},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p&gt;Notice that the coordinates look more accurate as compared to the map I created with &lt;code&gt;ggplot2&lt;/code&gt; previously.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://lauriebaker.rbind.io/post/where_work/&#34; class=&#34;uri&#34;&gt;https://lauriebaker.rbind.io/post/where_work/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://laurielbaker.github.io/DSCA_leaflet_mapping_in_r/&#34; class=&#34;uri&#34;&gt;https://laurielbaker.github.io/DSCA_leaflet_mapping_in_r/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Variable selection for imputation model in {mice}</title>
      <link>https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/</link>
      <pubDate>Mon, 22 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;some-note&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Some note&lt;/h2&gt;
&lt;p&gt;I have written a &lt;a href=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/&#34;&gt;short post&lt;/a&gt; about missing data and multiple imputation in &lt;code&gt;mice&lt;/code&gt; package previously. This post will add to that previous post.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;imputation-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Imputation model&lt;/h2&gt;
&lt;p&gt;Imputation model is the model that we use for our imputation approach. There is another term which is complete-data model. This is a model that we want to fit after we impute the missing values (i.e; the complete-data model is the final model).&lt;/p&gt;
&lt;p&gt;Generally, we need to include as many relevant variables into the imputation model. However, this general advise may not be very efficient as we may have multicollinearity and computational issue if we include too many predictors. As a rule of thumb, the number of included variables should be no more than 15-20. &lt;a href=&#34;https://www.jstatsoft.org/article/view/v045i03&#34;&gt;van Buuren &lt;em&gt;et al&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt; &lt;a href=&#34;https://www.jstatsoft.org/article/view/v045i03&#34;&gt;(2011)&lt;/a&gt; mentioned that increased in explained variance in linear regression is negligible after 15 variables are included.&lt;/p&gt;
&lt;p&gt;There are 4 steps suggested by &lt;a href=&#34;https://stefvanbuuren.name/publications/Flexible%20multivariate%20-%20TNO99054%201999.pdf&#34;&gt;van Buuren &lt;em&gt;et al.&lt;/em&gt; (1999)&lt;/a&gt; for variable selection in the case of big data:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;Include all variables that appear in the complete-data model (final model)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This may include the interaction terms as well (passive imputation can be used to specify the interaction terms in &lt;code&gt;mice&lt;/code&gt; package)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Include variable that have influence on the occurrence of the missing data&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This can be assessed by a correlation matrix between NAs variables and non-NAs variables&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Include variable that explain a considerable amount of variance&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This can be crudely assessed by a correlation matrix between NAs variables and non-NAs variables&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove variable that have too many missing values within the subgroup of incomplete cases&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This can be assessed by a proportion of usable cases (PUC) - how many cases with missing data in a certain variable have an observed values on the predictor variables&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;All these steps should be done on the key variables only. There is another more efficient yet laborious approach suggested by &lt;a href=&#34;https://stefvanbuuren.name/publications/Flexible%20multiple%20-%20TNO99045%201999.pdf&#34;&gt;Oudshoorn &lt;em&gt;et al.&lt;/em&gt; (1999)&lt;/a&gt;, which take into account important predictor of predictors. We are going to focus on the four steps above, and not cover the latter suggested approach in this post.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-codes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;R codes&lt;/h2&gt;
&lt;p&gt;These are the required packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(mice)
library(corrplot)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(airquality)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      Ozone           Solar.R           Wind             Temp      
##  Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
##  1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
##  Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
##  Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
##  3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
##  Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
##  NA&amp;#39;s   :37       NA&amp;#39;s   :7                                       
##      Month            Day      
##  Min.   :5.000   Min.   : 1.0  
##  1st Qu.:6.000   1st Qu.: 8.0  
##  Median :7.000   Median :16.0  
##  Mean   :6.993   Mean   :15.8  
##  3rd Qu.:8.000   3rd Qu.:23.0  
##  Max.   :9.000   Max.   :31.0  
## &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We have 2 variables; Ozone and Solar.R with missing values or NAs. We can further explore the pattern of missing variable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;md.pattern(airquality)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/index.en_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##     Wind Temp Month Day Solar.R Ozone   
## 111    1    1     1   1       1     1  0
## 35     1    1     1   1       1     0  1
## 5      1    1     1   1       0     1  1
## 2      1    1     1   1       0     0  2
##        0    0     0   0       7    37 44&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are 2 rows with NAs in Ozone and Solar.R, 35 rows with NAs only in Ozone, and 5 rows with NAs only in Solar.R. Next, we can check the correlation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor(airquality, use = &amp;quot;pairwise.complete.obs&amp;quot;) |&amp;gt;
  corrplot(method = &amp;quot;number&amp;quot;, type = &amp;quot;upper&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/index.en_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The correlations of Ozone-Temp and Ozone-Wind are the highest. Now, let’s do a correlation between the NAs variable and non-NAs variable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor(y = airquality, x = !is.na(airquality), use = &amp;quot;pairwise.complete.obs&amp;quot;) |&amp;gt;
  round(digits = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         Ozone Solar.R  Wind Temp Month   Day
## Ozone      NA   -0.02 -0.05 0.00  0.26 -0.05
## Solar.R     0      NA  0.06 0.11  0.11  0.17
## Wind       NA      NA    NA   NA    NA    NA
## Temp       NA      NA    NA   NA    NA    NA
## Month      NA      NA    NA   NA    NA    NA
## Day        NA      NA    NA   NA    NA    NA&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can ignore the warnings and the NAs as only Ozone and Solar.R have a missing values. So, the highest correlation is 0.26 between Month-Ozone - correlation between Month values with Ozone-related NAs and Month values with non-Ozone-related NAs. The column variable in the correlation matrix is the indicators of NAs and the row variables is the variable with observed values.&lt;/p&gt;
&lt;p&gt;Lastly we can calculate ‘manually’ the PUC (proportion of usable cases). &lt;code&gt;md.pairs()&lt;/code&gt; here calculate the number of observation per variable pair.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;var_pair &amp;lt;- md.pairs(airquality)
round(var_pair$mr / (var_pair$mr + var_pair$mm), digits = 3)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         Ozone Solar.R Wind Temp Month Day
## Ozone   0.000   0.946    1    1     1   1
## Solar.R 0.714   0.000    1    1     1   1
## Wind      NaN     NaN  NaN  NaN   NaN NaN
## Temp      NaN     NaN  NaN  NaN   NaN NaN
## Month     NaN     NaN  NaN  NaN   NaN NaN
## Day       NaN     NaN  NaN  NaN   NaN NaN&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Low value of PUC indicate there is a little information on the predictor to impute the target NAs variable. NaN is shown as the variables have no missing values. The row variable are the target variables to be imputed, and the column variables are the predictors in imputation model. We can see that to impute Solar.R (on the row) Ozone has a little less information (0.714) compare to Wind, Temp, and Day. The diagonal elements will always be 0 or NaN. So, from here we can drop predictors with say, 0 PUC as they contain no information to help impute the target NAs variable.&lt;/p&gt;
&lt;p&gt;Actually, we have a nice function from &lt;code&gt;mice&lt;/code&gt; that can do what we ‘manually’ did just now.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;quickpred(airquality)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         Ozone Solar.R Wind Temp Month Day
## Ozone       0       1    1    1     1   0
## Solar.R     1       0    0    1     1   1
## Wind        0       0    0    0     0   0
## Temp        0       0    0    0     0   0
## Month       0       0    0    0     0   0
## Day         0       0    0    0     0   0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, the column variables are the predictors, and the row variables are the target NAs variables. The above matrix is known as predictor matrix, which going to be used in the imputation model. 1 denote a variable included as predictors and 0 vice versa. The two main arguments in &lt;code&gt;quickpred()&lt;/code&gt; are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;mincor - if any of the absolute values in the two correlation matrix that we did earlier above 0.1 (default), the predictors will be included in the predictor matrix&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;minpuc - the default values for PUC is 0, so the predictors are retained even if they have no information to help imputation model&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notice that, variable Day is excluded from the predictors of Ozone. The correlation values are 0 and -0.05 from the first and second correlation matrices, respectively which do not exceed the default setting of 0.1. That’s why, variable Day is excluded. Also, we can observe a similar situation for variable Wind , which is excluded from the predictors of Solar.R (the correlation coefficients are -0.60 and 0.06). The negative (-) sign does not matter as we actually evaluate the absolute values.&lt;/p&gt;
&lt;p&gt;Intuitively, we can change these two arguments as we see fit to do a variable selection for imputation model. Once we finalise our variable selection, we can do the multiple imputation using &lt;code&gt;mice()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Finalised variable selection
var_sel &amp;lt;- quickpred(airquality)
var_sel&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         Ozone Solar.R Wind Temp Month Day
## Ozone       0       1    1    1     1   0
## Solar.R     1       0    0    1     1   1
## Wind        0       0    0    0     0   0
## Temp        0       0    0    0     0   0
## Month       0       0    0    0     0   0
## Day         0       0    0    0     0   0&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Impute
imp &amp;lt;- mice(airquality, m = 5, predictorMatrix = var_sel, printFlag = F)
imp&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##   Ozone Solar.R    Wind    Temp   Month     Day 
##   &amp;quot;pmm&amp;quot;   &amp;quot;pmm&amp;quot;      &amp;quot;&amp;quot;      &amp;quot;&amp;quot;      &amp;quot;&amp;quot;      &amp;quot;&amp;quot; 
## PredictorMatrix:
##         Ozone Solar.R Wind Temp Month Day
## Ozone       0       1    1    1     1   0
## Solar.R     1       0    0    1     1   1
## Wind        0       0    0    0     0   0
## Temp        0       0    0    0     0   0
## Month       0       0    0    0     0   0
## Day         0       0    0    0     0   0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice that &lt;code&gt;mice()&lt;/code&gt; uses the predictor matrix that we provide.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.jstatsoft.org/article/view/v045i03&#34; class=&#34;uri&#34;&gt;https://www.jstatsoft.org/article/view/v045i03&lt;/a&gt; - paper written by Staf van Buuren (a bit outdated in terms of codes, but runnable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://stefvanbuuren.name/fimd/&#34; class=&#34;uri&#34;&gt;https://stefvanbuuren.name/fimd/&lt;/a&gt; - online book written by Stef van Buuren (See chapter 6.3.2 and 9.1.6)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Making maps with R (my first attempt ever!)</title>
      <link>https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/</link>
      <pubDate>Fri, 12 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;As written in the title of the post, this is my first try ever in making a map with R. I found a great data on the distribution of the clinics in Malaysia. The two types of clinic that we have here:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Klinik 1Malaysia (1Malaysia clinic)&lt;/li&gt;
&lt;li&gt;Klinik Desa (Desa clinic)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Originally, these two data are a separated data. Both of the data can be downloaded from &lt;a href=&#34;https://www.data.gov.my/data/ms_MY/group/pemetaan&#34;&gt;here&lt;/a&gt;. Also, I have uploaded the data into my &lt;a href=&#34;https://github.com/tengku-hanis/clinic-data&#34;&gt;GitHub repo&lt;/a&gt; for those interested. Klinik Desa data have a latitude and longitude information, but Klinik 1Malaysia data does not.&lt;/p&gt;
&lt;p&gt;These are the required packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(rworldmap) #to get a Malaysia map
library(tidyverse)
library(tidygeocoder) #to get latitude and logitude&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Read the data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;clinic1m &amp;lt;- read.csv(&amp;quot;https://raw.githubusercontent.com/tengku-hanis/clinic-data/main/clinic1m.csv&amp;quot;)
clinicDesa &amp;lt;- read.csv(&amp;quot;https://raw.githubusercontent.com/tengku-hanis/clinic-data/main/clinicdesa.csv&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First, we need to get a latitude and longitude information for Klinik 1Malaysia data. So, we going to retrieve the coordinates based on the postal code, though this is not very accurate. We can use &lt;code&gt;tidygeocoder&lt;/code&gt; for this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;clinic1m2 &amp;lt;- 
  clinic1m %&amp;gt;%
  mutate(country = &amp;quot;malaysia&amp;quot;) %&amp;gt;% 
  select(name, postcode, country) %&amp;gt;% 
  mutate(postcode = ifelse(nchar(postcode) == 4, paste0(0, postcode), postcode)) %&amp;gt;%
  geocode(postalcode = postcode, country = country, method = &amp;quot;osm&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Further checking on the data, we notice that 5 clinics have no coordinate info.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;clinic1m2 %&amp;gt;% filter(is.na(lat) | is.na(long))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 5
##   name                                     postcode country    lat  long
##   &amp;lt;chr&amp;gt;                                    &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 Klinik 1 Malaysia Bandar Lela            90700    malaysia    NA    NA
## 2 Klinik 1 Malaysia Batu Melintang         17250    malaysia    NA    NA
## 3 Klinik 1 Malaysia Cakerapurnama          45010    malaysia    NA    NA
## 4 Klinik 1 Malaysia Jelawat                16070    malaysia    NA    NA
## 5 Klinik 1 Malaysia Taman Kempadang Makmur 26060    malaysia    NA    NA&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;some-data-pre-processing&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Some data pre-processing&lt;/h2&gt;
&lt;p&gt;So, I found this &lt;a href=&#34;https://www.listendata.com/2020/11/zip-code-to-latitude-and-longitude.html&#34;&gt;data&lt;/a&gt; after some googling time, which give coordinate based on the postal code. So, we going to add in the missing coordinate based on this online data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;add_coord &amp;lt;- 
  read.table(header = T, text = &amp;quot;
postal_code    latitude   longitude
16070            6.0334    102.3499
26060            3.6228    102.3926
90700            5.8456    118.0571
26060            3.6228    102.3926&amp;quot;)

clinic1m2 &amp;lt;- 
  clinic1m2 %&amp;gt;% 
  mutate(lat = ifelse(postcode %in% add_coord$postal_code, add_coord$latitude, lat), 
         long = ifelse(postcode %in% add_coord$postal_code, add_coord$longitude, long)) %&amp;gt;% 
  drop_na() #drop 2 clinic1m&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Even after add in the missing coordinate, we still missing 2 coordinates. So, we going to drop those 2 clinics. Next, we combine both data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;all_clinic &amp;lt;- 
  clinic1m2 %&amp;gt;% 
  mutate(Type = &amp;quot;1Malaysia&amp;quot;) %&amp;gt;% 
  select(Type, lat, long) %&amp;gt;% 
  bind_rows(clinicDesa %&amp;gt;% 
              mutate(Type = &amp;quot;Desa&amp;quot;, 
                     lat = latitude, 
                     long = longitude) %&amp;gt;% 
              select(Type, lat, long))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s try plotting the data first.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(all_clinic, aes(long, lat, color = Type)) +
  geom_point() +
  theme_minimal() #should remove the isolated two data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/index.en_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We have 2 isolated points from Klinik Desa data. We will drop these 2 points as well.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;all_clinic2 &amp;lt;- all_clinic %&amp;gt;% filter(long &amp;gt; 25)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;plotting-the-map&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Plotting the map&lt;/h2&gt;
&lt;p&gt;There are 2 ways to plot our data to Malaysia map, that we going to cover in this post.&lt;/p&gt;
&lt;div id=&#34;map-from-ggplot2&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;1) map from &lt;code&gt;ggplot2&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;First, we need to get the map.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;global &amp;lt;- map_data(&amp;quot;world&amp;quot;) #get map&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once, we retrieved the map, we need to filter the region to Malaysia. The rest of the codes are &lt;code&gt;ggplot2&lt;/code&gt; function as we know it.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot() + 
  geom_polygon(data = global %&amp;gt;% filter(region == &amp;quot;Malaysia&amp;quot;), aes(x=long, y = lat, group = group), 
               fill = &amp;quot;gray85&amp;quot;) + 
  coord_fixed(1.3) +
  geom_point(data = all_clinic2, aes(x = long, y = lat, group = Type, color = Type, shape = Type)) +
  theme_void() + 
  xlab(&amp;quot;Longitude&amp;quot;) +
  ylab(&amp;quot;Latitude&amp;quot;) +
  labs(title = &amp;quot;Klinik 1Malaysia dan Klinik Desa di Malaysia&amp;quot;, 
       subtitle = &amp;quot;(Data dikemaskini: Klinik 1Malaysia - 16 Mac 2021, Klinik Desa - 9 Mac 2021)&amp;quot;,
       caption = expression(paste(italic(&amp;quot;Sumber data: https://www.data.gov.my/data/ms_MY/group/pemetaan&amp;quot;))), 
       color = &amp;quot;Jenis klinik:&amp;quot;, 
       shape = &amp;quot;Jenis klinik:&amp;quot;) +
  theme(plot.title = element_text(hjust = 0.5), 
        plot.subtitle = element_text(hjust = 0.5), 
        legend.position = &amp;quot;bottom&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/index.en_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;map-from-rworldmap&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;2) map from &lt;code&gt;rworldmap&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The flow is similar, we need to get the map first. Then, restrict the map to Malaysia region.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;world &amp;lt;- getMap(resolution = &amp;quot;low&amp;quot;) #get map
msia &amp;lt;- world[world@data$ADMIN == &amp;quot;Malaysia&amp;quot;, ]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The rest of the codes are similar to the first approach. But, we going to change the theme a bit.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot() +
  geom_polygon(data = msia, aes(x = long, y = lat, group = group), fill = NA, colour = &amp;quot;black&amp;quot;) +
  geom_point(data = all_clinic2, aes(x = long, y = lat, group = Type, color = Type, shape = Type)) +
  coord_quickmap() + 
  theme_minimal() + 
  xlab(&amp;quot;Longitude&amp;quot;) +
  ylab(&amp;quot;Latitude&amp;quot;) +
  labs(title = &amp;quot;Klinik 1Malaysia dan Klinik Desa di Malaysia&amp;quot;, 
       subtitle = &amp;quot;(Data dikemaskini: Klinik 1Malaysia - 16 Mac 2021, Klinik Desa - 9 Mac 2021)&amp;quot;,
       caption = expression(paste(italic(&amp;quot;Sumber data: https://www.data.gov.my/data/ms_MY/group/pemetaan&amp;quot;))), 
       color = &amp;quot;Jenis klinik:&amp;quot;, 
       shape = &amp;quot;Jenis klinik:&amp;quot;) +
  theme(plot.title = element_text(hjust = 0.5), 
        plot.subtitle = element_text(hjust = 0.5), 
        legend.position = &amp;quot;bottom&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/making-maps-with-r-my-first-attempt-ever/index.en_files/figure-html/unnamed-chunk-12-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The coordinates that we have are not as accurate as it should, or maybe there is something wrong that I miss along the way. As we can see, we have clinics on the ocean. As far as I know, we Malaysian are not that advanced yet. Also, noticed that we severely lacking clinics in Sarawak, given that our data is correct.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Some COVID-19 plots for Southeast Asian countries</title>
      <link>https://tengkuhanis.netlify.app/post/some-covid-19-plots-for-southeast-asian-countries/</link>
      <pubDate>Wed, 10 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/some-covid-19-plots-for-southeast-asian-countries/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/some-covid-19-plots-for-southeast-asian-countries/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;Recently, I found a GitHub &lt;a href=&#34;https://github.com/owid/covid-19-data/tree/master/public/data&#34;&gt;repo&lt;/a&gt; containing a global COVID-19 dataset. I thought, why not try to do some plotting for Southeast Asian countries. So, I downloaded the data and limited the data to Southeast Asian countries only (Brunei, Indonesia, Malaysia, Philippines, Singapore, Thailand and Vietnam). I have uploaded this restricted data to my GitHub &lt;a href=&#34;https://github.com/tengku-hanis/data-owid-covid&#34;&gt;repo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We are not going to do anything fancy, just some visualisations.&lt;/p&gt;
&lt;p&gt;Let’s begin by reading the data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
covid_sea &amp;lt;- read_csv(&amp;quot;https://raw.githubusercontent.com/tengku-hanis/data-owid-covid/main/covid_sea.csv&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We are going to compare between each Southeast Asian countries in terms of:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Daily cases&lt;/li&gt;
&lt;li&gt;Daily deaths&lt;/li&gt;
&lt;li&gt;Daily tests&lt;/li&gt;
&lt;li&gt;Daily vaccinations&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Before that, we need to make a function, as all the above items have a generic things to plot with the exception on y axis.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;easy_plot &amp;lt;- function(var1, lab_title, yaxis_lab, span = 0.14){
  covid_sea %&amp;gt;% 
    select(date, location, {{var1}}) %&amp;gt;% 
    drop_na() %&amp;gt;% 
    ggplot(aes(date, {{var1}}, color = location)) +
    geom_smooth(se = F, span = 0.14) +
    geom_point(aes(color = location), alpha = 0.2) +
    geom_line(aes(color = location), alpha = 0.2, linetype = &amp;quot;dashed&amp;quot;) +
    labs(title = {{lab_title}}) +
    ylab({{yaxis_lab}}) +
    xlab(&amp;quot;Date&amp;quot;) +
    theme_minimal() 
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;var1 is going to be the item/variable that we want to compare, lab_title is the plot title, yaxis_lab is the label on the y axis, and span is just how smooth our smoothen line should be.&lt;/p&gt;
&lt;div id=&#34;daily-cases&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Daily cases&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;easy_plot(new_cases, &amp;quot;Daily cases for southeast Asian countries&amp;quot;, &amp;quot;Daily cases&amp;quot;, span = 0.8)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/some-covid-19-plots-for-southeast-asian-countries/index.en_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We cannot compare in terms of the frequency as big countries like Indonesia is expected to had a higher number of daily cases. A smoothen line though very basic, may indicate a simple trend. Thailand, Malaysia, Philippines and Indonesia seems to had a decreasing trend of cases. On the other hand, the daily cases in Vietnam seems to start to increase. Singapore had a more stabilised trend of cases, though a higher number of cases was observed in the latest period. Lastly, Brunei had too little cases, for us to see any sort of trend at the scale of the between countries comparison.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;daily-deaths&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Daily deaths&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;easy_plot(new_deaths, &amp;quot;Daily deaths for southeast Asian countries&amp;quot;, &amp;quot;Daily deaths&amp;quot;, span = 0.8)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/some-covid-19-plots-for-southeast-asian-countries/index.en_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Philippines and Indonesia seems started to had a bit of increasing trend. Other countries look okay.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;daily-tests&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Daily tests&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;easy_plot(new_tests, &amp;quot;Daily tests for southeast Asian countries&amp;quot;, &amp;quot;Daily tests&amp;quot;, span = 0.2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/some-covid-19-plots-for-southeast-asian-countries/index.en_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The daily tests plot looks a bit weird for Vietnam. Actually, the daily tests below zero are not avaliable (not sure if there is no test done in the period or the values is just missing). Hence, the weird looking plot for Vietnam. Data for Brunei and Thailand are not available. Malaysia seems to be quite aggressive in COVID-19 testing, even on par with Indonesia. Also, Vietnam seems to be very aggressive in the latest period, probably to cover the lack of COVID-19 testing previously.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;daily-vaccinations&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Daily vaccinations&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;easy_plot(new_vaccinations, &amp;quot;Daily vaccinations for southeast Asian countries&amp;quot;, &amp;quot;Daily vaccinations&amp;quot;, span = 0.9)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/some-covid-19-plots-for-southeast-asian-countries/index.en_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Malaysia and Singapore had quite a similar distribution. Vietnam, Philippines, Thailand and Indonesia quite similar in which they had a series of wave in the rate of vaccinations, though the trend of wave for Thailand is less obvious. Again, the number in Brunei was too little for us to see any trend or distribution at this scale.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;malaysia-situation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Malaysia situation&lt;/h2&gt;
&lt;p&gt;Let’s do a plot, specific to Malaysia. We going to scale the numbers, so that we able to see a comparison in term of trend or distribution.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;covid_sea %&amp;gt;% 
  filter(location == &amp;quot;Malaysia&amp;quot;) %&amp;gt;% 
  mutate(new_cases = scale(new_cases), 
         new_deaths = scale(new_deaths), 
         new_tests = scale(new_tests), 
         new_vaccinations = scale(new_vaccinations)) %&amp;gt;% 
  ggplot(aes(date)) +
  geom_line(aes(y = new_cases, color = &amp;quot;new_cases&amp;quot;), alpha = 0.3) +
  geom_line(aes(y = new_deaths, color = &amp;quot;new_deaths&amp;quot;), alpha = 0.3) +
  geom_line(aes(y = new_tests, color = &amp;quot;new_tests&amp;quot;), alpha = 0.3) +
  geom_line(aes(y = new_vaccinations, color = &amp;quot;new_vaccinations&amp;quot;), alpha = 0.3) +
  geom_point(aes(y = new_cases, color = &amp;quot;new_cases&amp;quot;), alpha = 0.3) +
  geom_point(aes(y = new_deaths, color = &amp;quot;new_deaths&amp;quot;), alpha = 0.3) +
  geom_point(aes(y = new_tests, color = &amp;quot;new_tests&amp;quot;), alpha = 0.3) +
  geom_point(aes(y = new_vaccinations, color = &amp;quot;new_vaccinations&amp;quot;), alpha = 0.3) +
  geom_smooth(aes(y = new_cases, color = &amp;quot;new_cases&amp;quot;), se = F, span = 0.3) +
  geom_smooth(aes(y = new_deaths, color = &amp;quot;new_deaths&amp;quot;), se = F, span = 0.3) +
  geom_smooth(aes(y = new_tests, color = &amp;quot;new_tests&amp;quot;), se = F, span = 0.3) +
  geom_smooth(aes(y = new_vaccinations, color = &amp;quot;new_vaccinations&amp;quot;), se = F, span = 0.6) +
  labs(title = &amp;quot;Situation in Malaysia&amp;quot;) +
  ylab(&amp;quot;Scaled Frequency&amp;quot;) +
  xlab(&amp;quot;Date&amp;quot;) +
  guides(color = guide_legend(&amp;quot;Items&amp;quot;)) +
  scale_color_discrete(labels = c(&amp;quot;Daily cases&amp;quot;, &amp;quot;Daily deaths&amp;quot;, &amp;quot;Daily tests&amp;quot;, &amp;quot;Daily vaccinations&amp;quot;)) +
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/some-covid-19-plots-for-southeast-asian-countries/index.en_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Interestingly, as the number of vaccination increased up to a certain threshold, the number of daily cases and daily deaths started to decreased. Obviously, the daily testing also decreased as in Malaysia, COVID-19 testing is done based on suspected cases and their persons of contact instead of mass testing.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Disclaimer: Please take anything written here with a massive grain of salt.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Data source:
&lt;a href=&#34;https://github.com/owid/covid-19-data/tree/master/public/data&#34; class=&#34;uri&#34;&gt;https://github.com/owid/covid-19-data/tree/master/public/data&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Extract a table from a pdf</title>
      <link>https://tengkuhanis.netlify.app/post/extract-a-table-from-a-pdf/</link>
      <pubDate>Mon, 01 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/extract-a-table-from-a-pdf/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/extract-a-table-from-a-pdf/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;In a couple of days, I am going to conduct a pre-conference workshop for Malaysian &lt;a href=&#34;https://www.r-conference.com/&#34;&gt;R conference 2021&lt;/a&gt;. So, some of the data that I am going to use for this workshop is available in a table in pdf form. Hence, this post is about how I get that particular table from the pdf into R for further analysis.&lt;/p&gt;
&lt;p&gt;So, this is a table we going to extract.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;images/table.png&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;extracting-a-table-from-pdf&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Extracting a table from pdf&lt;/h2&gt;
&lt;p&gt;We going to use &lt;code&gt;tabulizer&lt;/code&gt; package for this. However, not every pdf works with this package. In our case, it works but need further preprocessing.&lt;/p&gt;
&lt;p&gt;Load the required packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tabulizer)
library(dplyr)
library(stringr)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Read a table from a pdf.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;raw_table &amp;lt;- extract_tables(&amp;quot;https://static-content.springer.com/esm/art%3A10.1038%2Fs41440-021-00720-3/MediaObjects/41440_2021_720_MOESM1_ESM.pdf&amp;quot;, 
                          pages = 17, 
                          output = &amp;quot;data.frame&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, this is the extracted table.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;raw_table[[1]] %&amp;gt;% head(10)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                X     X.1     X.2     X.3  X.4     X.5 X.6     X.7  X.8
## 1                                                                     
## 2                                                                     
## 3    Ahmed, 2019 Unclear Unclear Unclear High Unclear Low Unclear High
## 4                                                                     
## 5   Badrov, 2013 Unclear    High    High High Unclear Low Unclear High
## 6   Baross, 2012 Unclear Unclear    High High Unclear Low Unclear High
## 7   Baross, 2013 Unclear Unclear    High High Unclear Low Unclear High
## 8  Carlson, 2016     Low    High    High  Low Unclear Low     Low High
## 9  Correia, 2020     Low     Low     Low High Unclear Low     Low High
## 10                                                                    
##                              X.9
## 1      1- selection bias: random
## 2            sequence generation
## 3  2- selection bias: allocation
## 4                    concealment
## 5                               
## 6   3- reporting bias: selective
## 7                      reporting
## 8                               
## 9  4- Performance bias: blinding
## 10  (participants and personnel)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, a few preprocessing steps needed:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Remove column X.9 - this column supposed to be a header&lt;/li&gt;
&lt;li&gt;Rename a header based on column X.9&lt;/li&gt;
&lt;li&gt;Remove a space between the author name - “Ahmed,2019” instead of “Ahmed, 2019”&lt;/li&gt;
&lt;li&gt;Remove empty rows&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;irt_rob &amp;lt;- 
  raw_table[[1]] %&amp;gt;% 
  select(-X.9) %&amp;gt;%  
  rename(Study = X, 
         Random.sequence.generation. = X.1, 
         Allocation.concealment. = X.2,
         Selective.reporting. = X.3,
         Blinding.of.participants.and.personnel. = X.4, 
         Blinding.of.outcome.assessment = X.5, 
         Incomplete.outcome.data = X.6, 
         Other.sources.of.bias. = X.7, 
         Overall = X.8) %&amp;gt;% 
  as_tibble() %&amp;gt;% 
  mutate(Study = str_replace_all(Study, &amp;quot; &amp;quot;, &amp;quot;&amp;quot;)) %&amp;gt;% 
  mutate(id_del = str_match(Study, &amp;quot;.&amp;quot;)) %&amp;gt;% 
  filter(!is.na(id_del)) %&amp;gt;% 
  select(-id_del)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, our data is ready.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;irt_rob&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          Study Random.sequence.generation. Allocation.concealment.
## 1   Ahmed,2019                     Unclear                 Unclear
## 2  Badrov,2013                     Unclear                    High
## 3  Baross,2012                     Unclear                 Unclear
## 4  Baross,2013                     Unclear                 Unclear
## 5 Carlson,2016                         Low                    High
##   Selective.reporting. Blinding.of.participants.and.personnel.
## 1              Unclear                                    High
## 2                 High                                    High
## 3                 High                                    High
## 4                 High                                    High
## 5                 High                                     Low
##   Blinding.of.outcome.assessment Incomplete.outcome.data Other.sources.of.bias.
## 1                        Unclear                     Low                Unclear
## 2                        Unclear                     Low                Unclear
## 3                        Unclear                     Low                Unclear
## 4                        Unclear                     Low                Unclear
## 5                        Unclear                     Low                    Low
##   Overall
## 1    High
## 2    High
## 3    High
## 4    High
## 5    High&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>A short note on multiple imputation</title>
      <link>https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/</link>
      <pubDate>Fri, 29 Oct 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;background&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;Missing data is quite challenging to deal with. Deleting it may be the easiest solution, but may not be the best solution. Missing data can be categorised into 3 types (&lt;a href=&#34;https://www.jstor.org/stable/2335739&#34;&gt;Rubin, 1976&lt;/a&gt;):&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;MCAR&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Missing Completely At Random&lt;/li&gt;
&lt;li&gt;Example; some of the observations are missing due to lost of records during the flood&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MAR&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Missing At Random&lt;/li&gt;
&lt;li&gt;Example; variable income are missing as some participant refuse to give their salary information which they deems as very personal information&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MNAR&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Missing Not At Random&lt;/li&gt;
&lt;li&gt;Example; weight variable is missing for morbidly obese participants since the scale is unable to weight them&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Out of the 3 types above, the most problematic is MNAR, though there exist methods to deal with this type. For example, the &lt;a href=&#34;https://cran.r-project.org/web/packages/miceMNAR/miceMNAR.pdf&#34;&gt;miceMNAR&lt;/a&gt; package in R.&lt;/p&gt;
&lt;p&gt;There are several approaches in handling missing data:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;Listwise-deletion&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Best approach if the amount of missingness is very small&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simple imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using mean/median/mode imputation&lt;/li&gt;
&lt;li&gt;This approach is not advisable as it leads to bias due to reduce variance, though the mean is not affected&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Single imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Simple imputation above is considered as single imputation as well&lt;/li&gt;
&lt;li&gt;This approach ignores uncertainty of the imputation and almost always underestimate the variance&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A bit advanced and it cover the limitation of single imputation approach&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, the main assumption for any imputation methods is the missingness should be MCAR or MAR.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;multiple-imputation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Multiple imputation&lt;/h2&gt;
&lt;p&gt;In short, there are 2 approaches of multiple imputation implemented by packages in R:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;Joint modeling (JM) or joint multivariate normal distribution multiple imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The main assumption for this method is that the observed data follows a multivariate normal distribution&lt;/li&gt;
&lt;li&gt;A violation of this assumption produces incorrect values, though a slight violation is still okay&lt;/li&gt;
&lt;li&gt;Some packages that implemented this method: &lt;code&gt;Amelia&lt;/code&gt; and &lt;code&gt;norm&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fully conditional specification (FCS) or conditional multiple imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Also known as multivariate imputation by chained equation (MICE)&lt;/li&gt;
&lt;li&gt;This approach is a bit flexible as distribution is assumed for each variable rather than the whole dataset&lt;/li&gt;
&lt;li&gt;Some package that implemented this method: &lt;code&gt;mice&lt;/code&gt; and &lt;code&gt;mi&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div id=&#34;example&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Example&lt;/h2&gt;
&lt;p&gt;In &lt;code&gt;mice&lt;/code&gt; package, the general steps are:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;mice()&lt;/code&gt; - impute the NAs&lt;/li&gt;
&lt;li&gt;&lt;code&gt;with()&lt;/code&gt; - run the analysis (lm, glm, etc)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pool()&lt;/code&gt; - pool the results&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:unnamed-chunk-1&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;Screenshot%202021-11-20%20145517.png&#34; alt=&#34;Main steps in mice package.&#34; width=&#34;90%&#34; height=&#34;90%&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Main steps in mice package.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;These are the required packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(mice)
library(VIM)
#library(missForest) we want to use prodNA() function from this package
library(naniar)
library(niceFunction) #install from github (https://github.com/tengku-hanis/niceFunction)
library(dplyr)
library(gtsummary)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We going to produce some NAs randomly.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(123)
dat &amp;lt;- iris %&amp;gt;% 
  select(-Sepal.Length)%&amp;gt;% 
  missForest::prodNA(0.2) %&amp;gt;%  # randomly insert 20% NAs
  mutate(Sepal.Length = iris$Sepal.Length)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Explore the NAs and the data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;naniar::miss_var_summary(dat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 3
##   variable     n_miss pct_miss
##   &amp;lt;chr&amp;gt;         &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt;
## 1 Petal.Length     38     25.3
## 2 Sepal.Width      33     22  
## 3 Species          28     18.7
## 4 Petal.Width      21     14  
## 5 Sepal.Length      0      0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Some references recommend to remove variables with more than 50% NAs. However, we purposely introduce 20% NAs into our data.&lt;/p&gt;
&lt;p&gt;As a guideline, we can check for MCAR for our NAs.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;naniar::mcar_test(dat) #p &amp;gt; 0.05, MCAR is indicated&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 1 x 4
##   statistic    df p.value missing.patterns
##       &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;            &amp;lt;int&amp;gt;
## 1      38.8    40   0.522               14&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next step is to evaluate the pattern of missingness in our data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;md.pattern(dat, rotate.names = T, plot = T) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##    Sepal.Length Petal.Width Species Sepal.Width Petal.Length    
## 64            1           1       1           1            1   0
## 21            1           1       1           1            0   1
## 15            1           1       1           0            1   1
## 3             1           1       1           0            0   2
## 14            1           1       0           1            1   1
## 4             1           1       0           1            0   2
## 6             1           1       0           0            1   2
## 2             1           1       0           0            0   3
## 7             1           0       1           1            1   1
## 6             1           0       1           1            0   2
## 4             1           0       1           0            1   2
## 2             1           0       1           0            0   3
## 1             1           0       0           1            1   2
## 1             1           0       0           0            1   3
##               0          21      28          33           38 120&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;aggr(dat, prop = F, numbers = T) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We have 13 patterns (numbers on the right) of NAs in our data. These 2 functions work well with small dataset, but with a larger dataset (and with lot more pattern of NAs), it’s probably quite difficult to assess the pattern.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;matrixplot()&lt;/code&gt; probably more appropriate for a larger dataset.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;matrixplot(dat)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In terms of the missingness pattern, we can also assess the distribution of NAs of Sepal.Width is dependent on the variable Sepal.Length.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;niceFunction::histNA_byVar(dat, Sepal.Width, Sepal.Length)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;As we can see the distribution and range of the histograms of the NAs (True) and non-NAs (False) is quite similar. Thus, this may indicated that Sepal.Width is at least MAR. However, by right we should do this for each pair of numerical variable before jumping into any conclusion.&lt;/p&gt;
&lt;p&gt;Another good thing to assess is the correlation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Data with 1 = NAs, 0 = non-NAs
x &amp;lt;- as.data.frame(abs(is.na(dat))) %&amp;gt;% 
  dplyr::select(-Sepal.Length) #pick variable with NAs only&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Firstly, the correlation between the variables with missing data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor(x) %&amp;gt;% 
  corrplot::corrplot()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;No high correlation among variable with NAs. Secondly, let’s see correlation between NAs in a variable and the observed values of other variables.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor(dat %&amp;gt;% mutate(Species = as.numeric(Species)), x, use = &amp;quot;pairwise.complete.obs&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##               Sepal.Width Petal.Length  Petal.Width     Species
## Sepal.Width            NA  0.049158733 -0.065917718  0.09948263
## Petal.Length  0.042075695           NA -0.004572405 -0.17265919
## Petal.Width   0.096195805 -0.003320601           NA -0.11024288
## Species       0.045849046 -0.104143925 -0.081055707          NA
## Sepal.Length -0.006435044 -0.052871701 -0.091024799 -0.08527514&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, there is no high correlation. But, if we were to interpret this correlation matrix; the rows are the observed variables and the columns represent the missingness. For example, missing values of Sepal.Width is more likely to be missing for observations with a high value of Petal.Width (r = 0.05 indicates it’s highly unlikely though).&lt;/p&gt;
&lt;p&gt;Now, we can do multiple imputation. These are the methods in the &lt;code&gt;mice&lt;/code&gt; package:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;methods(mice)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] mice.impute.2l.bin       mice.impute.2l.lmer      mice.impute.2l.norm     
##  [4] mice.impute.2l.pan       mice.impute.2lonly.mean  mice.impute.2lonly.norm 
##  [7] mice.impute.2lonly.pmm   mice.impute.cart         mice.impute.jomoImpute  
## [10] mice.impute.lda          mice.impute.logreg       mice.impute.logreg.boot 
## [13] mice.impute.mean         mice.impute.midastouch   mice.impute.mnar.logreg 
## [16] mice.impute.mnar.norm    mice.impute.norm         mice.impute.norm.boot   
## [19] mice.impute.norm.nob     mice.impute.norm.predict mice.impute.panImpute   
## [22] mice.impute.passive      mice.impute.pmm          mice.impute.polr        
## [25] mice.impute.polyreg      mice.impute.quadratic    mice.impute.rf          
## [28] mice.impute.ri           mice.impute.sample       mice.mids               
## [31] mice.theme              
## see &amp;#39;?methods&amp;#39; for accessing help and source code&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By default, mice uses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pmm (predictive mean matching) for numeric data&lt;/li&gt;
&lt;li&gt;logreg (logistic regression imputation) for binary data, factor with 2 levels&lt;/li&gt;
&lt;li&gt;polyreg (polytomous regression imputation) for unordered categorical data (factor &amp;gt; 2 levels)&lt;/li&gt;
&lt;li&gt;polr (proportional odds model) for ordered, &amp;gt; 2 levels&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;let’s run the mice function to our data:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;imp &amp;lt;- mice(dat, m = 5, seed=1234, maxit = 5, printFlag = F) 
imp&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##  Sepal.Width Petal.Length  Petal.Width      Species Sepal.Length 
##        &amp;quot;pmm&amp;quot;        &amp;quot;pmm&amp;quot;        &amp;quot;pmm&amp;quot;    &amp;quot;polyreg&amp;quot;           &amp;quot;&amp;quot; 
## PredictorMatrix:
##              Sepal.Width Petal.Length Petal.Width Species Sepal.Length
## Sepal.Width            0            1           1       1            1
## Petal.Length           1            0           1       1            1
## Petal.Width            1            1           0       1            1
## Species                1            1           1       0            1
## Sepal.Length           1            1           1       1            0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we can do some diagnostic assessment on the imputed data. This is our imputed data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;imp$imp$Sepal.Width %&amp;gt;% head()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      1   2   3   4   5
## 5  3.4 3.4 4.1 3.1 3.5
## 13 3.2 3.1 3.2 3.6 3.1
## 14 3.1 3.2 2.9 3.4 3.0
## 23 3.6 3.2 3.0 3.8 3.1
## 26 4.1 3.0 3.1 3.5 3.0
## 34 3.4 3.7 3.7 3.4 4.4&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One important thing to check is the convergence. We are going increase the number of iteration for this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;imp_conv &amp;lt;- mice.mids(imp, maxit = 30, printFlag = F)
plot(imp_conv)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-16-1.png&#34; width=&#34;672&#34; /&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-16-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The line in the plot should be intermingled and no obvious trend should be observed. Our plot above indicates a convergence.&lt;/p&gt;
&lt;p&gt;We can also assess density plot of imputed data and the observed data. Blue color is the observed data and red color is the imputed data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;densityplot(imp)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-17-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can further assess variable Sepal.Width.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;densityplot(imp, ~ Sepal.Width | .imp)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-18-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Lastly, we can assess the strip plot. The imputed observations (red color) should not distributed too far from the observed data (blue color).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;stripplot(imp)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-19-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, once we finish the diagnostic checking, we can actually go back and change the imputation method for Sepal.Width, since the its distribution changes quite differently at each iteration. But, we are not going to do that, instead we are going to do the analysis.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# run regression
fit &amp;lt;- with(imp, lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species))
# pool all imputed set
pooled &amp;lt;- pool(fit) 
summary(pooled)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                term   estimate  std.error statistic       df      p.value
## 1       (Intercept)  2.2008307 0.34577321  6.364954 29.02484 5.859560e-07
## 2       Sepal.Width  0.5233500 0.09717217  5.385801 50.89918 1.854832e-06
## 3      Petal.Length  0.7409159 0.09020153  8.214006 12.73722 1.921415e-06
## 4       Petal.Width -0.3623895 0.18562168 -1.952301 22.34517 6.354332e-02
## 5 Speciesversicolor -0.3891112 0.28166528 -1.381467 15.07547 1.872683e-01
## 6  Speciesvirginica -0.5237106 0.42629920 -1.228505 10.82804 2.452897e-01&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since we have the original dataset without the NAs, we going to compare them.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mimpute &amp;lt;- 
  fit %&amp;gt;% 
  tbl_regression() #with mice

noimpute &amp;lt;- 
  dat %&amp;gt;% 
  lm(Sepal.Length ~ ., data = .) %&amp;gt;% 
  tbl_regression() #w/o mice

original &amp;lt;- 
  iris %&amp;gt;% 
  lm(Sepal.Length ~ ., data = .) %&amp;gt;% 
  tbl_regression() #original data

tbl_merge(
  tbls = list(mimpute, noimpute, original), 
  tab_spanner = c(&amp;quot;With MICE&amp;quot;, &amp;quot;Without MICE&amp;quot;, &amp;quot;Original data&amp;quot;)
)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;kofvwjwgme&#34; style=&#34;overflow-x:auto;overflow-y:auto;width:auto;height:auto;&#34;&gt;
&lt;style&gt;html {
  font-family: -apple-system, BlinkMacSystemFont, &#39;Segoe UI&#39;, Roboto, Oxygen, Ubuntu, Cantarell, &#39;Helvetica Neue&#39;, &#39;Fira Sans&#39;, &#39;Droid Sans&#39;, Arial, sans-serif;
}

#kofvwjwgme .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#kofvwjwgme .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#kofvwjwgme .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#kofvwjwgme .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#kofvwjwgme .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kofvwjwgme .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#kofvwjwgme .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#kofvwjwgme .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#kofvwjwgme .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#kofvwjwgme .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#kofvwjwgme .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#kofvwjwgme .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#kofvwjwgme .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#kofvwjwgme .gt_from_md &gt; :first-child {
  margin-top: 0;
}

#kofvwjwgme .gt_from_md &gt; :last-child {
  margin-bottom: 0;
}

#kofvwjwgme .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#kofvwjwgme .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#kofvwjwgme .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#kofvwjwgme .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#kofvwjwgme .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#kofvwjwgme .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#kofvwjwgme .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#kofvwjwgme .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kofvwjwgme .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#kofvwjwgme .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#kofvwjwgme .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#kofvwjwgme .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#kofvwjwgme .gt_left {
  text-align: left;
}

#kofvwjwgme .gt_center {
  text-align: center;
}

#kofvwjwgme .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#kofvwjwgme .gt_font_normal {
  font-weight: normal;
}

#kofvwjwgme .gt_font_bold {
  font-weight: bold;
}

#kofvwjwgme .gt_font_italic {
  font-style: italic;
}

#kofvwjwgme .gt_super {
  font-size: 65%;
}

#kofvwjwgme .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 65%;
}
&lt;/style&gt;
&lt;table class=&#34;gt_table&#34;&gt;
  
  &lt;thead class=&#34;gt_col_headings&#34;&gt;
    &lt;tr&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_left&#34; rowspan=&#34;2&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Characteristic&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_center gt_columns_top_border gt_column_spanner_outer&#34; rowspan=&#34;1&#34; colspan=&#34;3&#34;&gt;
        &lt;span class=&#34;gt_column_spanner&#34;&gt;With MICE&lt;/span&gt;
      &lt;/th&gt;
      &lt;th class=&#34;gt_center gt_columns_top_border gt_column_spanner_outer&#34; rowspan=&#34;1&#34; colspan=&#34;3&#34;&gt;
        &lt;span class=&#34;gt_column_spanner&#34;&gt;Without MICE&lt;/span&gt;
      &lt;/th&gt;
      &lt;th class=&#34;gt_center gt_columns_top_border gt_column_spanner_outer&#34; rowspan=&#34;1&#34; colspan=&#34;3&#34;&gt;
        &lt;span class=&#34;gt_column_spanner&#34;&gt;Original data&lt;/span&gt;
      &lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Beta&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;95% CI&lt;/strong&gt;&lt;sup class=&#34;gt_footnote_marks&#34;&gt;1&lt;/sup&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;p-value&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Beta&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;95% CI&lt;/strong&gt;&lt;sup class=&#34;gt_footnote_marks&#34;&gt;1&lt;/sup&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;p-value&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Beta&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;95% CI&lt;/strong&gt;&lt;sup class=&#34;gt_footnote_marks&#34;&gt;1&lt;/sup&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;p-value&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody class=&#34;gt_table_body&#34;&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;Sepal.Width&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.52&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.33, 0.72&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.48&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.17, 0.79&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.003&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.50&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.33, 0.67&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;Petal.Length&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.74&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.55, 0.94&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.71&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.51, 0.90&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.83&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.69, 1.0&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;Petal.Width&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.36&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.75, 0.02&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.064&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.35&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.85, 0.14&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.2&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.32&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.61, -0.02&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.039&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;Species&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34; style=&#34;text-align: left; text-indent: 10px;&#34;&gt;setosa&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34; style=&#34;text-align: left; text-indent: 10px;&#34;&gt;versicolor&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.39&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.0, 0.21&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.2&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.42&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.1, 0.30&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.3&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.72&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.2, -0.25&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.003&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34; style=&#34;text-align: left; text-indent: 10px;&#34;&gt;virginica&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.52&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.5, 0.42&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.2&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.42&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.5, 0.63&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.4&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.0&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.7, -0.36&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.003&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
  
  &lt;tfoot&gt;
    &lt;tr class=&#34;gt_footnotes&#34;&gt;
      &lt;td colspan=&#34;10&#34;&gt;
        &lt;p class=&#34;gt_footnote&#34;&gt;
          &lt;sup class=&#34;gt_footnote_marks&#34;&gt;
            &lt;em&gt;1&lt;/em&gt;
          &lt;/sup&gt;
           
          CI = Confidence Interval
          &lt;br /&gt;
        &lt;/p&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tfoot&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;There is a different in the result between the original dataset (no NAs) and with mice imputation. Probably, exploring other imputation methods will produce a better result.&lt;/p&gt;
&lt;p&gt;There are a lot more that are not cover in this post. For example &lt;a href=&#34;https://www.gerkovink.com/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html&#34;&gt;passive imputation and post-processing&lt;/a&gt;. In fact, there are a series of &lt;a href=&#34;https://github.com/amices/mice#vignettes&#34;&gt;vignettes&lt;/a&gt; written by Gerko Vink and Stef van Buuren (both are the authors of &lt;code&gt;mice&lt;/code&gt;) which provides a good tutorial on using &lt;code&gt;mice&lt;/code&gt; though quite advanced.&lt;/p&gt;
&lt;p&gt;Suggested online books (though, I have not really studied both of the books yet):&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;a href=&#34;https://stefvanbuuren.name/fimd/&#34;&gt;Flexible imputation of missing data&lt;/a&gt; by Stef van Buuren&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://bookdown.org/mwheymans/bookmi/&#34;&gt;Applied missing data analysis with SPSS and (R)Studio&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;References for this post:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;a href=&#34;http://www.cs.uni.edu/~jacobson/4772/week11/R_in_Action.pdf&#34;&gt;R in Action, Data analysis and graphics with R&lt;/a&gt; (Chapter 15)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://data.library.virginia.edu/getting-started-with-multiple-imputation-in-r/&#34; class=&#34;uri&#34;&gt;https://data.library.virginia.edu/getting-started-with-multiple-imputation-in-r/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://stats.idre.ucla.edu/r/faq/how-do-i-perform-multiple-imputation-using-predictive-mean-matching-in-r/&#34; class=&#34;uri&#34;&gt;https://stats.idre.ucla.edu/r/faq/how-do-i-perform-multiple-imputation-using-predictive-mean-matching-in-r/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.jstatsoft.org/article/view/v045i03&#34;&gt;mice: Multivariate Imputation by Chained Equations in R&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>COVID-19 vaccine interest in Malaysia</title>
      <link>https://tengkuhanis.netlify.app/post/covid-19-vaccine-interest-in-malaysia/</link>
      <pubDate>Sun, 17 Oct 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/covid-19-vaccine-interest-in-malaysia/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/covid-19-vaccine-interest-in-malaysia/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;We are going to do a basic google trends search using &lt;code&gt;gtrendsR&lt;/code&gt; package and do some plotting with &lt;code&gt;ggplot2&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;These are the required packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(gtrendsR)
library(tidyverse)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Run &lt;code&gt;gtrends()&lt;/code&gt; function to search our keywords of interest (i.e; type of vaccine). So far, we only used &lt;a href=&#34;https://covidnow.moh.gov.my/vaccinations/&#34;&gt;4 type of vaccines&lt;/a&gt; in Malaysia.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;vaccine &amp;lt;- gtrends(c(&amp;quot;pfizer&amp;quot;, &amp;quot;astrazeneca&amp;quot;, &amp;quot;sinovac&amp;quot;, &amp;quot;cansino&amp;quot;), geo = &amp;quot;MY&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, plot our keywords.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(vaccine)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/covid-19-vaccine-interest-in-malaysia/index.en_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;
Probably, it’s better if we filter our date to when the COVID-19 pandemic started, which is around March 2020.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;vaccine$interest_over_time %&amp;gt;% 
  group_by(keyword) %&amp;gt;% 
  filter(hits != &amp;quot;&amp;lt;1&amp;quot; &amp;amp; date &amp;gt; as.Date(&amp;quot;2020-03-01&amp;quot;)) %&amp;gt;% 
  mutate(hits = as.numeric(hits), 
         date = as.Date(date)) %&amp;gt;% 
  ggplot() + 
  geom_line(aes(x = date, y = hits, color = keyword), size = 0.8) +
  theme_minimal() +
  labs(title = &amp;quot;COVID-19 vaccine interest in Malaysia&amp;quot;, y = &amp;quot;Search hits&amp;quot;, x = &amp;quot;Date&amp;quot;) +
  scale_x_date(date_breaks = &amp;quot;4 month&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/covid-19-vaccine-interest-in-malaysia/index.en_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, AstraZeneca vaccine is of high interest, probably due to infamous blood clotting issue. Next, we can also get the search keywords based on the states.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;vaccine$interest_by_region %&amp;gt;% 
  group_by(location) %&amp;gt;% 
  ggplot(aes(location, hits, fill = keyword)) +
  geom_col(alpha = 0.8) +
  coord_flip() +
  theme_minimal() +
  scale_fill_viridis_d() +
  labs(title = &amp;quot;COVID-19 vaccine interest in Malaysia by states&amp;quot;, y = &amp;quot;Search hits&amp;quot;, x = &amp;quot;&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/covid-19-vaccine-interest-in-malaysia/index.en_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Lastly, we can plot the search keywords based on the city.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;vaccine$interest_by_city %&amp;gt;% 
  group_by(location) %&amp;gt;% 
  drop_na() %&amp;gt;% 
  ggplot(aes(location, hits, fill = keyword)) +
  geom_col(alpha = 0.8) +
  coord_flip() +
  theme_minimal() +
  scale_fill_viridis_d() +
  labs(title = &amp;quot;COVID-19 vaccine interest in Malaysia by cities&amp;quot;, y = &amp;quot;Search hits&amp;quot;, x = &amp;quot;&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/covid-19-vaccine-interest-in-malaysia/index.en_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;gtrendsR&lt;/code&gt; with just a bit of plots certainly very useful if we want to gauge certain issues in the community.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Wordcloud of COVID-19 research in Malaysia</title>
      <link>https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/</link>
      <pubDate>Sat, 11 Sep 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/htmlwidgets/htmlwidgets.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/wordcloud2/wordcloud.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/wordcloud2/wordcloud2.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/wordcloud2/hover.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/wordcloud2-binding/wordcloud2.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;Let’s see how much research has been done in term of COVID-19 in Malaysia. In this analysis, we are going to use &lt;a href=&#34;https://www.scopus.com/search/form.uri?display=basic&amp;amp;zone=header&amp;amp;origin=#basic&#34;&gt;Scopus database&lt;/a&gt; to access the relevant research or papers. In this analysis we are going to use 4 specific parts of the scientific paper:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Title&lt;/li&gt;
&lt;li&gt;Abstract&lt;/li&gt;
&lt;li&gt;Author’s keywords&lt;/li&gt;
&lt;li&gt;Scopus’s keywords&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&#34;images/sample%20paper.png&#34; alt=&#34;Sample of paper&#34; /&gt;
Above is a sample of paper that shows the section of scientific paper that we are going to use in our analysis. The Scopus’s keywords are generated by the Scopus database, thus, it does not available on the paper.&lt;/p&gt;
&lt;p&gt;So, the analysis will be applied separately on these 4 parts of the papers. Also, we are going to use map (equivalent to loop) since the flow of the analysis is similar.&lt;/p&gt;
&lt;p&gt;Load the related packages. The main package is &lt;code&gt;quanteda&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(quanteda)
library(quanteda.textstats)
library(quanteda.textplots)
library(patchwork)
library(wordcloud2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I have uploaded the data that I downloaded from the Scopus database into my GitHub.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Read data from GitHub repo
df &amp;lt;- read.csv(&amp;quot;https://raw.githubusercontent.com/tengku-hanis/scopus-data/main/covid-malaysia.csv&amp;quot;) %&amp;gt;% 
  janitor::clean_names() %&amp;gt;% 
  rename(title =i_title)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First, we need to tokenize the sentence. In other words, we break down the sentences into words.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Tokenize
tok_list &amp;lt;- 
  df %&amp;gt;% 
  select(title, abstract, author_keywords, index_keywords) %&amp;gt;% 
  map(tokens, 
      remove_punct = T, 
      remove_numbers = T,               
      remove_symbols = T)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we remove words that are not meaningful such ‘a’, ‘the’, etc. These words are known as stop words.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Remove stop words
nostop_toks &amp;lt;- 
  tok_list %&amp;gt;% 
  map(tokens_select, 
      c(tidytext::stop_words$word, stopwords(&amp;quot;en&amp;quot;)), 
      selection = &amp;quot;remove&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, we create a document feature matrix (DFM). Basically DFM is a matrix that represent the frequency of each word (feature) in each document (in our case, paper or manuscript). Another name for DFM is document term matrix (DTM). &lt;code&gt;quanteda&lt;/code&gt; uses the term DFM, some other packages use the term DTM.&lt;/p&gt;
&lt;p&gt;Additionally, we also apply term frequency-inverse document frequency (TF-IDF) metrics. In scientific papers, the words such as ‘determine’, ‘conclusion’, ‘introduction’, etc are very frequent, and these words are not meaningful as well. Instead of removing manually one by one, we use TF-IDF. So, TF-IDF basically remove the words that are too common, thus we get only the relevant or important words.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Create DFM and apply tf_idf
covid_dfm_list &amp;lt;- 
  nostop_toks %&amp;gt;% 
  map(dfm) %&amp;gt;% 
  map(dfm_tfidf)&lt;/code&gt;&lt;/pre&gt;
Once, we have our words (tokens), we can create a plot of most relevant terms based on TF-IDF.
&lt;details&gt;
&lt;summary&gt;
Show code
&lt;/summary&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Plot top features
A &amp;lt;- 
  covid_dfm_list$title %&amp;gt;% 
  textstat_frequency(n = 15, force = T) %&amp;gt;% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point(size = 4, colour = &amp;quot;blueviolet&amp;quot;) +
  coord_flip() +
  labs(x = NULL, y = &amp;quot;Frequency (tf-idf)&amp;quot;) +
  theme_minimal() +
  labs(title = &amp;quot;Top relevant terms for covid research based on the title&amp;quot;)

B &amp;lt;- 
  covid_dfm_list$abstract %&amp;gt;% 
  textstat_frequency(n = 15, force = T) %&amp;gt;% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point(size = 4, colour = &amp;quot;darkolivegreen3&amp;quot;) +
  coord_flip() +
  labs(x = NULL, y = &amp;quot;Frequency (tf-idf)&amp;quot;) +
  theme_minimal() +
  labs(title = &amp;quot;Top relevant terms for covid research based on the abstract&amp;quot;)

C &amp;lt;- 
  covid_dfm_list$author_keywords %&amp;gt;% 
  textstat_frequency(n = 15, force = T) %&amp;gt;% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point(size = 4, colour = &amp;quot;deepskyblue2&amp;quot;) +
  coord_flip() +
  labs(x = NULL, y = &amp;quot;Frequency (tf-idf)&amp;quot;) +
  theme_minimal() +
  labs(title = &amp;quot;Top relevant terms for covid research based on the author&amp;#39;s keywords&amp;quot;)

D &amp;lt;- 
  covid_dfm_list$index_keywords %&amp;gt;% 
  textstat_frequency(n = 15, force = T) %&amp;gt;% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point(size = 4, colour = &amp;quot;aquamarine2&amp;quot;) +
  coord_flip() +
  labs(x = NULL, y = &amp;quot;Frequency (tf-idf)&amp;quot;) +
  theme_minimal() +
  labs(title = &amp;quot;Top relevant terms for covid research based on the Scopus&amp;#39;s keywords&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;p&gt;These are the plots of the most relevant terms in COVID-19 research in Malaysia.
&lt;img src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/figure-html/unnamed-chunk-7-2.png&#34; width=&#34;672&#34; /&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/figure-html/unnamed-chunk-7-3.png&#34; width=&#34;672&#34; /&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/wordcloud-of-covid-19-research-in-malaysia/index.en_files/figure-html/unnamed-chunk-7-4.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;wordcloud&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Wordcloud&lt;/h2&gt;
&lt;p&gt;Finally, we can make our wordcloud, but we need to convert our DFM to data frame first. Also, we are going to round the value of TF-IDF and limit to top 1000 terms only.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;covid_wc &amp;lt;- 
  covid_dfm_list %&amp;gt;% 
  map(textstat_frequency, force = T)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Actually, &lt;code&gt;quanteda&lt;/code&gt; itself is able to produce a wordcloud. However, the wordcloud from &lt;code&gt;wordcloud2&lt;/code&gt; is more interactive and we can see the value of TF-IDF if we click the words.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wordcloud2(covid_wc$title%&amp;gt;% 
             slice(1:1000) %&amp;gt;% 
             mutate(frequency = round(frequency)))&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:unnamed-chunk-9&#34;&gt;&lt;/span&gt;
&lt;div id=&#34;htmlwidget-1&#34; style=&#34;width:672px;height:480px;&#34; class=&#34;wordcloud2 html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-1&#34;&gt;{&#34;x&#34;:{&#34;word&#34;:[&#34;pandemic&#34;,&#34;malaysia&#34;,&#34;ð&#34;,&#34;impact&#34;,&#34;health&#34;,&#34;study&#34;,&#34;patients&#34;,&#34;sars-cov-2&#34;,&#34;learning&#34;,&#34;review&#34;,&#34;analysis&#34;,&#34;students&#34;,&#34;malaysian&#34;,&#34;coronavirus&#34;,&#34;control&#34;,&#34;online&#34;,&#34;global&#34;,&#34;outbreak&#34;,&#34;social&#34;,&#34;risk&#34;,&#34;healthcare&#34;,&#34;model&#34;,&#34;covid-19&#34;,&#34;challenges&#34;,&#34;education&#34;,&#34;disease&#34;,&#34;workers&#34;,&#34;lockdown&#34;,&#34;factors&#34;,&#34;countries&#34;,&#34;survey&#34;,&#34;mental&#34;,&#34;response&#34;,&#34;potential&#34;,&#34;psychological&#34;,&#34;management&#34;,&#34;system&#34;,&#34;perspective&#34;,&#34;movement&#34;,&#34;public&#34;,&#34;clinical&#34;,&#34;role&#34;,&#34;medical&#34;,&#34;infection&#34;,&#34;â&#34;,&#34;detection&#34;,&#34;cross-sectional&#34;,&#34;university&#34;,&#34;implications&#34;,&#34;impacts&#34;,&#34;systematic&#34;,&#34;transmission&#34;,&#34;covid&#34;,&#34;care&#34;,&#34;development&#34;,&#34;evidence&#34;,&#34;effect&#34;,&#34;meta-analysis&#34;,&#34;experience&#34;,&#34;teaching&#34;,&#34;based&#34;,&#34;treatment&#34;,&#34;knowledge&#34;,&#34;practice&#34;,&#34;approach&#34;,&#34;economic&#34;,&#34;strategies&#34;,&#34;effects&#34;,&#34;ñ&#34;,&#34;quality&#34;,&#34;media&#34;,&#34;measures&#34;,&#34;indonesia&#34;,&#34;asia&#34;,&#34;amid&#34;,&#34;vaccine&#34;,&#34;pakistan&#34;,&#34;mortality&#34;,&#34;application&#34;,&#34;tourism&#34;,&#34;images&#34;,&#34;intention&#34;,&#34;digital&#34;,&#34;coping&#34;,&#34;anxiety&#34;,&#34;spread&#34;,&#34;islamic&#34;,&#34;epidemic&#34;,&#34;data&#34;,&#34;people&#34;,&#34;stress&#34;,&#34;screening&#34;,&#34;future&#34;,&#34;performance&#34;,&#34;post-covid-19&#34;,&#34;perception&#34;,&#34;crisis&#34;,&#34;machine&#34;,&#34;industry&#34;,&#34;era&#34;,&#34;hospital&#34;,&#34;adults&#34;,&#34;assessment&#34;,&#34;critical&#34;,&#34;services&#34;,&#34;international&#34;,&#34;time&#34;,&#34;financial&#34;,&#34;rapid&#34;,&#34;depression&#34;,&#34;food&#34;,&#34;bangladesh&#34;,&#34;sustainable&#34;,&#34;stock&#34;,&#34;understanding&#34;,&#34;deep&#34;,&#34;virtual&#34;,&#34;molecular&#34;,&#34;x-ray&#34;,&#34;patient&#34;,&#34;prevention&#34;,&#34;asian&#34;,&#34;chest&#34;,&#34;responses&#34;,&#34;comparison&#34;,&#34;acute&#34;,&#34;effectiveness&#34;,&#34;perspectives&#34;,&#34;population&#34;,&#34;dynamics&#34;,&#34;e-learning&#34;,&#34;policy&#34;,&#34;awareness&#34;,&#34;security&#34;,&#34;inhibitors&#34;,&#34;framework&#34;,&#34;cancer&#34;,&#34;severe&#34;,&#34;network&#34;,&#34;perceptions&#34;,&#34;prediction&#34;,&#34;relationship&#34;,&#34;association&#34;,&#34;technology&#34;,&#34;china&#34;,&#34;recovery&#34;,&#34;mco&#34;,&#34;drugs&#34;,&#34;lessons&#34;,&#34;women&#34;,&#34;air&#34;,&#34;perceived&#34;,&#34;research&#34;,&#34;neural&#34;,&#34;behavior&#34;,&#34;level&#34;,&#34;life&#34;,&#34;ct&#34;,&#34;exploring&#34;,&#34;information&#34;,&#34;therapy&#34;,&#34;overview&#34;,&#34;distress&#34;,&#34;environmental&#34;,&#34;monitoring&#34;,&#34;emergency&#34;,&#34;southeast&#34;,&#34;strategy&#34;,&#34;market&#34;,&#34;environment&#34;,&#34;syndrome&#34;,&#34;status&#34;,&#34;community&#34;,&#34;children&#34;,&#34;preventive&#34;,&#34;prevalence&#34;,&#34;diagnosis&#34;,&#34;adoption&#34;,&#34;sector&#34;,&#34;approaches&#34;,&#34;threat&#34;,&#34;modelling&#34;,&#34;preliminary&#34;,&#34;managing&#34;,&#34;characteristics&#34;,&#34;region&#34;,&#34;post&#34;,&#34;practices&#34;,&#34;due&#34;,&#34;recommendations&#34;,&#34;waste&#34;,&#34;respiratory&#34;,&#34;preparedness&#34;,&#34;human&#34;,&#34;support&#34;,&#34;fear&#34;,&#34;amidst&#34;,&#34;therapeutic&#34;,&#34;drug&#34;,&#34;resilience&#34;,&#34;current&#34;,&#34;travel&#34;,&#34;severity&#34;,&#34;attitude&#34;,&#34;vaccines&#34;,&#34;effective&#34;,&#34;methods&#34;,&#34;insights&#34;,&#34;news&#34;,&#34;economy&#34;,&#34;acceptance&#34;,&#34;systems&#34;,&#34;vaccination&#34;,&#34;dental&#34;,&#34;infections&#34;,&#34;image&#34;,&#34;academic&#34;,&#34;physical&#34;,&#34;activity&#34;,&#34;report&#34;,&#34;influence&#34;,&#34;implementation&#34;,&#34;diagnostic&#34;,&#34;protease&#34;,&#34;diseases&#34;,&#34;immune&#34;,&#34;outcomes&#34;,&#34;well-being&#34;,&#34;surgery&#34;,&#34;period&#34;,&#34;home&#34;,&#34;evaluation&#34;,&#34;experiences&#34;,&#34;developing&#34;,&#34;studentsâ&#34;,&#34;energy&#34;,&#34;professionals&#34;,&#34;ðµð&#34;,&#34;sharing&#34;,&#34;opportunities&#34;,&#34;behaviour&#34;,&#34;method&#34;,&#34;sars&#34;,&#34;virus&#34;,&#34;design&#34;,&#34;qualitative&#34;,&#34;predictors&#34;,&#34;government&#34;,&#34;related&#34;,&#34;student&#34;,&#34;engagement&#34;,&#34;national&#34;,&#34;infected&#34;,&#34;pneumonia&#34;,&#34;editor&#34;,&#34;pharmacy&#34;,&#34;handling&#34;,&#34;models&#34;,&#34;safe&#34;,&#34;india&#34;,&#34;consequences&#34;,&#34;society&#34;,&#34;emerging&#34;,&#34;nexus&#34;,&#34;hydroxychloroquine&#34;,&#34;adverse&#34;,&#34;classification&#34;,&#34;testing&#34;,&#34;safety&#34;,&#34;moderating&#34;,&#34;psychosocial&#34;,&#34;findings&#34;,&#34;symptoms&#34;,&#34;trials&#34;,&#34;issues&#34;,&#34;readiness&#34;,&#34;distance&#34;,&#34;influencing&#34;,&#34;protection&#34;,&#34;mechanisms&#34;,&#34;training&#34;,&#34;surgical&#34;,&#34;techniques&#34;,&#34;letter&#34;,&#34;de&#34;,&#34;fake&#34;,&#34;en&#34;,&#34;reality&#34;,&#34;di&#34;,&#34;hospitals&#34;,&#34;investigation&#34;,&#34;price&#34;,&#34;oral&#34;,&#34;communication&#34;,&#34;country&#34;,&#34;context&#34;,&#34;markets&#34;,&#34;interventions&#34;,&#34;spike&#34;,&#34;diabetes&#34;,&#34;asia-pacific&#34;,&#34;identification&#34;,&#34;assessing&#34;,&#34;science&#34;,&#34;technologies&#34;,&#34;iot&#34;,&#34;educational&#34;,&#34;malaysiaâ&#34;,&#34;trends&#34;,&#34;combating&#34;,&#34;satisfaction&#34;,&#34;sustainability&#34;,&#34;pandemik&#34;,&#34;service&#34;,&#34;negative&#34;,&#34;considerations&#34;,&#34;action&#34;,&#34;integrated&#34;,&#34;pattern&#34;,&#34;literature&#34;,&#34;la&#34;,&#34;position&#34;,&#34;sleep&#34;,&#34;protein&#34;,&#34;hybrid&#34;,&#34;covidâ&#34;,&#34;theory&#34;,&#34;responsibility&#34;,&#34;call&#34;,&#34;middle-income&#34;,&#34;tertiary&#34;,&#34;comparative&#34;,&#34;pacific&#34;,&#34;sabah&#34;,&#34;mitigating&#34;,&#34;events&#34;,&#34;nationwide&#34;,&#34;protective&#34;,&#34;secondary&#34;,&#34;delivery&#34;,&#34;age&#34;,&#34;phase&#34;,&#34;confirmed&#34;,&#34;hospitalized&#34;,&#34;saudi&#34;,&#34;main&#34;,&#34;mathematical&#34;,&#34;business&#34;,&#34;uk&#34;,&#34;concerns&#34;,&#34;natural&#34;,&#34;mass&#34;,&#34;telemedicine&#34;,&#34;scenario&#34;,&#34;wave&#34;,&#34;daily&#34;,&#34;frontline&#34;,&#34;normal&#34;,&#34;medicine&#34;,&#34;applications&#34;,&#34;learned&#34;,&#34;quarantine&#34;,&#34;forecasting&#34;,&#34;roles&#34;,&#34;predicting&#34;,&#34;modeling&#34;,&#34;building&#34;,&#34;ace2&#34;,&#34;attitudes&#34;,&#34;antiviral&#34;,&#34;mitigation&#34;,&#34;simulation&#34;,&#34;world&#34;,&#34;reduce&#34;,&#34;dynamic&#34;,&#34;manifestations&#34;,&#34;rate&#34;,&#34;ðµ&#34;,&#34;mobile&#34;,&#34;endoscopy&#34;,&#34;genome&#34;,&#34;type&#34;,&#34;spatial&#34;,&#34;marketing&#34;,&#34;major&#34;,&#34;structural&#34;,&#34;fuzzy&#34;,&#34;empirical&#34;,&#34;aftermath&#34;,&#34;combat&#34;,&#34;behavioural&#34;,&#34;outpatient&#34;,&#34;affect&#34;,&#34;suspected&#34;,&#34;cardiovascular&#34;,&#34;products&#34;,&#34;policies&#34;,&#34;destination&#34;,&#34;examining&#34;,&#34;smart&#34;,&#34;properties&#34;,&#34;bangladeshi&#34;,&#34;arabia&#34;,&#34;artificial&#34;,&#34;randomized&#34;,&#34;storm&#34;,&#34;providers&#34;,&#34;key&#34;,&#34;death&#34;,&#34;platform&#34;,&#34;barriers&#34;,&#34;school&#34;,&#34;dan&#34;,&#34;pulmonary&#34;,&#34;province&#34;,&#34;protocol&#34;,&#34;activities&#34;,&#34;supply&#34;,&#34;index&#34;,&#34;scale&#34;,&#34;systemic&#34;,&#34;times&#34;,&#34;consensus&#34;,&#34;contact&#34;,&#34;medicines&#34;,&#34;computational&#34;,&#34;hiv&#34;,&#34;laboratory&#34;,&#34;corporate&#34;,&#34;narrative&#34;,&#34;link&#34;,&#34;convolutional&#34;,&#34;confinement&#34;,&#34;usage&#34;,&#34;statements&#34;,&#34;body&#34;,&#34;cloud&#34;,&#34;influenza&#34;,&#34;continuity&#34;,&#34;disaster&#34;,&#34;nursing&#34;,&#34;aerosol&#34;,&#34;levels&#34;,&#34;religious&#34;,&#34;silent&#34;,&#34;proteins&#34;,&#34;scoping&#34;,&#34;united&#34;,&#34;kingdom&#34;,&#34;staff&#34;,&#34;correction&#34;,&#34;solidarity&#34;,&#34;web-based&#34;,&#34;versus&#34;,&#34;chinese&#34;,&#34;covid-19â&#34;,&#34;success&#34;,&#34;mutation&#34;,&#34;individuals&#34;,&#34;addressing&#34;,&#34;detect&#34;,&#34;uncertainty&#34;,&#34;silico&#34;,&#34;residents&#34;,&#34;compliance&#34;,&#34;test&#34;,&#34;construction&#34;,&#34;docking&#34;,&#34;database&#34;,&#34;climate&#34;,&#34;animal&#34;,&#34;automatic&#34;,&#34;features&#34;,&#34;conditions&#34;,&#34;efficient&#34;,&#34;networks&#34;,&#34;cerebral&#34;,&#34;venous&#34;,&#34;cytokine&#34;,&#34;teachers&#34;,&#34;undergoing&#34;,&#34;elective&#34;,&#34;district&#34;,&#34;income&#34;,&#34;results&#34;,&#34;plant&#34;,&#34;surveillance&#34;,&#34;pandemics&#34;,&#34;detected&#34;,&#34;asymptomatic&#34;,&#34;male&#34;,&#34;goals&#34;,&#34;innovation&#34;,&#34;receptor&#34;,&#34;intervention&#34;,&#34;employee&#34;,&#34;wellbeing&#34;,&#34;nigeria&#34;,&#34;resources&#34;,&#34;studies&#34;,&#34;mediating&#34;,&#34;aspects&#34;,&#34;innovative&#34;,&#34;green&#34;,&#34;inflammatory&#34;,&#34;practical&#34;,&#34;setting&#34;,&#34;immunity&#34;,&#34;local&#34;,&#34;injury&#34;,&#34;transformation&#34;,&#34;participation&#34;,&#34;collaboration&#34;,&#34;positive&#34;,&#34;banking&#34;,&#34;cluster&#34;,&#34;remdesivir&#34;,&#34;motivation&#34;,&#34;statement&#34;,&#34;pakistani&#34;,&#34;neurosurgical&#34;,&#34;sensing&#34;,&#34;stability&#34;,&#34;equipment&#34;,&#34;estimation&#34;,&#34;distancing&#34;,&#34;host&#34;,&#34;remote&#34;,&#34;alternative&#34;,&#34;guidelines&#34;,&#34;improving&#34;,&#34;asean&#34;,&#34;war&#34;,&#34;migrant&#34;,&#34;patterns&#34;,&#34;google&#34;,&#34;comprehensive&#34;,&#34;share&#34;,&#34;stroke&#34;,&#34;isolation&#34;,&#34;collective&#34;,&#34;hand&#34;,&#34;promote&#34;,&#34;reproductive&#34;,&#34;longitudinal&#34;,&#34;concurrent&#34;,&#34;sectors&#34;,&#34;detecting&#34;,&#34;matter&#34;,&#34;pm2.5&#34;,&#34;change&#34;,&#34;interactive&#34;,&#34;antibody&#34;,&#34;cells&#34;,&#34;option&#34;,&#34;reported&#34;,&#34;plastic&#34;,&#34;opportunity&#34;,&#34;middle&#34;,&#34;algorithms&#34;,&#34;fresh&#34;,&#34;saliva&#34;,&#34;influences&#34;,&#34;distribution&#34;,&#34;gender&#34;,&#34;project&#34;,&#34;estimating&#34;,&#34;descriptive&#34;,&#34;mining&#34;,&#34;emotion&#34;,&#34;opinion&#34;,&#34;content&#34;,&#34;humans&#34;,&#34;traditional&#34;,&#34;copd&#34;,&#34;thrombosis&#34;,&#34;industrial&#34;,&#34;angiotensin&#34;,&#34;convalescent&#34;,&#34;plasma&#34;,&#34;fractal-fractional&#34;,&#34;cohort&#34;,&#34;mask&#34;,&#34;emotional&#34;,&#34;midst&#34;,&#34;hesitancy&#34;,&#34;density&#34;,&#34;urban&#34;,&#34;obese&#34;,&#34;iran&#34;,&#34;smes&#34;,&#34;initiatives&#34;,&#34;guide&#34;,&#34;small-scale&#34;,&#34;thailand&#34;,&#34;italy&#34;,&#34;essential&#34;,&#34;trend&#34;,&#34;treat&#34;,&#34;scenarios&#34;,&#34;efficacy&#34;,&#34;measure&#34;,&#34;scan&#34;,&#34;nurses&#34;,&#34;geriatric&#34;,&#34;cell&#34;,&#34;city&#34;,&#34;preventing&#34;,&#34;algorithm&#34;,&#34;renin-angiotensin&#34;,&#34;optimal&#34;,&#34;facilities&#34;,&#34;self-efficacy&#34;,&#34;proposed&#34;,&#34;return&#34;,&#34;eating&#34;,&#34;growth&#34;,&#34;short-term&#34;,&#34;lung&#34;,&#34;situation&#34;,&#34;tool&#34;,&#34;bibliometric&#34;,&#34;wake&#34;,&#34;binding&#34;,&#34;analisis&#34;,&#34;disorder&#34;,&#34;rheumatic&#34;,&#34;restriction&#34;,&#34;curve&#34;,&#34;tracing&#34;,&#34;institution&#34;,&#34;update&#34;,&#34;actions&#34;,&#34;conspiracy&#34;,&#34;theories&#34;,&#34;hypertension&#34;,&#34;intelligence&#34;,&#34;semasa&#34;,&#34;dalam&#34;,&#34;direct&#34;,&#34;paediatric&#34;,&#34;gastroenterology&#34;,&#34;platforms&#34;,&#34;induced&#34;,&#34;yemen&#34;,&#34;targeted&#34;,&#34;topsis&#34;,&#34;risks&#34;,&#34;predictive&#34;,&#34;stressors&#34;,&#34;therapeutics&#34;,&#34;resistance&#34;,&#34;aid&#34;,&#34;leadership&#34;,&#34;spectrum&#34;,&#34;variations&#34;,&#34;architecture&#34;,&#34;pollution&#34;,&#34;box&#34;,&#34;responding&#34;,&#34;efforts&#34;,&#34;volatility&#34;,&#34;silver&#34;,&#34;poor&#34;,&#34;conceptual&#34;,&#34;healthy&#34;,&#34;violence&#34;,&#34;viral&#34;,&#34;identify&#34;,&#34;chain&#34;,&#34;frailty&#34;,&#34;thromboembolism&#34;,&#34;klang&#34;,&#34;valley&#34;,&#34;persistent&#34;,&#34;penang&#34;,&#34;reactions&#34;,&#34;asthma&#34;,&#34;unprecedented&#34;,&#34;leading&#34;,&#34;trajectory&#34;,&#34;engineering&#34;,&#34;airway&#34;,&#34;kuala&#34;,&#34;lumpur&#34;,&#34;selangor&#34;,&#34;feature&#34;,&#34;fight&#34;,&#34;africa&#34;,&#34;pandemicâ&#34;,&#34;belief&#34;,&#34;thinking&#34;,&#34;targets&#34;,&#34;exposure&#34;,&#34;meteorological&#34;,&#34;robust&#34;,&#34;reducing&#34;,&#34;willingness&#34;,&#34;classroom&#34;,&#34;iraq&#34;,&#34;ã&#34;,&#34;solutions&#34;,&#34;malay&#34;,&#34;emergence&#34;,&#34;common&#34;,&#34;clustering&#34;,&#34;ñƒñ&#34;,&#34;counselling&#34;,&#34;inhaler&#34;,&#34;lower&#34;,&#34;radiotherapy&#34;,&#34;cruise&#34;,&#34;gynecological&#34;,&#34;lupus&#34;,&#34;brand&#34;,&#34;fasting&#34;,&#34;music&#34;,&#34;pandemije&#34;,&#34;gen&#34;,&#34;shocks&#34;,&#34;vitamin&#34;,&#34;liver&#34;,&#34;proposal&#34;,&#34;upper&#34;,&#34;universal&#34;,&#34;coverage&#34;,&#34;multiple&#34;,&#34;populations&#34;,&#34;covid-19-related&#34;,&#34;function&#34;,&#34;search&#34;,&#34;types&#34;,&#34;universities&#34;,&#34;communications&#34;,&#34;sri&#34;,&#34;sexual&#34;,&#34;sedentary&#34;,&#34;ventilation&#34;,&#34;private&#34;,&#34;lives&#34;,&#34;scans&#34;,&#34;dexamethasone&#34;,&#34;aged&#34;,&#34;dataset&#34;,&#34;nanomaterials&#34;,&#34;cities&#34;,&#34;wavelet-based&#34;,&#34;cov-2&#34;,&#34;reduction&#34;,&#34;consumers&#34;,&#34;buying&#34;,&#34;survivors&#34;,&#34;factor&#34;,&#34;personal&#34;,&#34;limited&#34;,&#34;lifestyle&#34;,&#34;admitted&#34;,&#34;masks&#34;,&#34;ongoing&#34;,&#34;past&#34;,&#34;panic&#34;,&#34;rising&#34;,&#34;infrastructure&#34;,&#34;anti-sars-cov-2&#34;,&#34;peptides&#34;,&#34;mrna&#34;,&#34;affected&#34;,&#34;administration&#34;,&#34;kits&#34;,&#34;projects&#34;,&#34;phytochemicals&#34;,&#34;large-scale&#34;,&#34;restrictions&#34;,&#34;sentiment&#34;,&#34;strains&#34;,&#34;co2&#34;,&#34;options&#34;,&#34;provide&#34;,&#34;solution&#34;,&#34;supporting&#34;,&#34;circular&#34;,&#34;prophylaxis&#34;,&#34;low&#34;,&#34;outbreaks&#34;,&#34;inhibitor&#34;,&#34;controlled&#34;,&#34;combined&#34;,&#34;modulation&#34;,&#34;sir&#34;,&#34;derivative&#34;,&#34;wastewater&#34;,&#34;tocilizumab&#34;,&#34;commentary&#34;,&#34;planning&#34;,&#34;perioperative&#34;,&#34;illness&#34;,&#34;college&#34;,&#34;importance&#34;,&#34;agricultural&#34;,&#34;costs&#34;,&#34;enhancing&#34;,&#34;socioeconomic&#34;,&#34;entrepreneurs&#34;,&#34;peninsular&#34;,&#34;australian&#34;,&#34;norms&#34;,&#34;paradigm&#34;,&#34;entrepreneurial&#34;,&#34;contagion&#34;,&#34;integration&#34;,&#34;disinfectant&#34;,&#34;terhadap&#34;,&#34;pengalaman&#34;,&#34;improved&#34;,&#34;road&#34;,&#34;fluid&#34;,&#34;sociodemographic&#34;,&#34;process&#34;,&#34;borneo&#34;,&#34;lineage&#34;,&#34;epidemiological&#34;,&#34;weight&#34;,&#34;controlling&#34;,&#34;reproduction&#34;,&#34;nursesâ&#34;,&#34;aquatic&#34;,&#34;chains&#34;,&#34;palliative&#34;,&#34;technique&#34;,&#34;europe&#34;,&#34;burnout&#34;,&#34;cross&#34;,&#34;sectional&#34;,&#34;emergencies&#34;,&#34;preadmission&#34;,&#34;disorders&#34;,&#34;repurposing&#34;,&#34;revolution&#34;,&#34;vulnerable&#34;,&#34;affecting&#34;,&#34;examination&#34;,&#34;implementing&#34;,&#34;mixed-method&#34;,&#34;statistical&#34;,&#34;dentists&#34;,&#34;pregnancy&#34;,&#34;progression&#34;,&#34;users&#34;,&#34;focused&#34;,&#34;qt&#34;,&#34;caring&#34;,&#34;precautionary&#34;,&#34;nigerian&#34;,&#34;tools&#34;,&#34;drives&#34;,&#34;free&#34;,&#34;agenda&#34;,&#34;prevent&#34;,&#34;female&#34;,&#34;metabolism&#34;,&#34;nasopharyngeal&#34;,&#34;converting&#34;,&#34;enzyme&#34;,&#34;manage&#34;,&#34;institutional&#34;,&#34;synthetic&#34;,&#34;routine&#34;,&#34;myths&#34;,&#34;polymerase&#34;,&#34;genetic&#34;,&#34;herd&#34;,&#34;wuhan&#34;,&#34;special&#34;,&#34;kidney&#34;,&#34;sample&#34;,&#34;possibly&#34;,&#34;isolated&#34;,&#34;quarantined&#34;,&#34;augmented&#34;,&#34;e-commerce&#34;,&#34;antibiotics&#34;,&#34;recurrent&#34;,&#34;mini-review&#34;,&#34;battling&#34;,&#34;internet&#34;,&#34;programme&#34;,&#34;fever&#34;,&#34;adult&#34;,&#34;advance&#34;,&#34;schools&#34;,&#34;immunomodulatory&#34;,&#34;dysfunction&#34;,&#34;infectious&#34;,&#34;english&#34;,&#34;inquiry&#34;,&#34;singapore&#34;,&#34;firms&#34;,&#34;postgraduate&#34;,&#34;instant&#34;,&#34;burden&#34;,&#34;australia&#34;,&#34;indicators&#34;,&#34;intentions&#34;,&#34;homes&#34;,&#34;interactions&#34;,&#34;indonesian&#34;,&#34;aquaculture&#34;,&#34;heparin&#34;,&#34;concentration&#34;,&#34;reporting&#34;,&#34;insecurity&#34;,&#34;happiness&#34;,&#34;massive&#34;,&#34;nutrition&#34;,&#34;automated&#34;,&#34;america&#34;,&#34;journal&#34;,&#34;urology&#34;,&#34;observational&#34;,&#34;dominant&#34;,&#34;methodology&#34;,&#34;determinants&#34;,&#34;domestic&#34;,&#34;blood&#34;,&#34;referral&#34;,&#34;temporal&#34;,&#34;communities&#34;,&#34;waves&#34;,&#34;availability&#34;,&#34;selected&#34;,&#34;regions&#34;,&#34;biological&#34;,&#34;benefits&#34;,&#34;finance&#34;,&#34;health-care&#34;,&#34;logistics&#34;,&#34;deployment&#34;,&#34;geographical&#34;,&#34;critically&#34;,&#34;ill&#34;,&#34;south-east&#34;,&#34;mini&#34;,&#34;requirements&#34;,&#34;exploration&#34;,&#34;statins&#34;,&#34;regulatory&#34;,&#34;faculty&#34;,&#34;correlation&#34;,&#34;rights&#34;,&#34;weather&#34;,&#34;visual&#34;,&#34;battle&#34;,&#34;extraction&#34;,&#34;respond&#34;,&#34;lining&#34;,&#34;robotic&#34;,&#34;generated&#34;,&#34;esl&#34;,&#34;electronic&#34;,&#34;compounds&#34;,&#34;orientation&#34;,&#34;pasaran&#34;,&#34;faced&#34;,&#34;na&#34;,&#34;zinc&#34;,&#34;putative&#34;,&#34;ethical&#34;,&#34;disinfection&#34;,&#34;apps&#34;,&#34;mindfulness&#34;,&#34;malaysia&#39;s&#34;,&#34;cascade&#34;,&#34;orthopaedic&#34;,&#34;contemporary&#34;,&#34;sarawak&#34;,&#34;manifestation&#34;,&#34;assay&#34;,&#34;worldwide&#34;,&#34;target&#34;,&#34;chloroquine&#34;,&#34;pharmacologic&#34;,&#34;agents&#34;,&#34;cycle&#34;,&#34;south&#34;,&#34;corticosteroids&#34;,&#34;corona&#34;,&#34;mediated&#34;,&#34;neurological&#34;,&#34;reverse&#34;,&#34;transcription&#34;,&#34;amplification&#34;,&#34;prophylactic&#34;,&#34;reference&#34;,&#34;multicenter&#34;,&#34;azithromycin&#34;,&#34;pharmacotherapeutic&#34;,&#34;receiving&#34;,&#34;al-quran&#34;,&#34;expert&#34;,&#34;plan&#34;],&#34;freq&#34;:[259,198,193,144,143,135,134,132,128,122,113,100,99,93,90,90,88,84,80,80,79,76,76,75,74,74,71,70,69,68,68,68,68,68,68,68,67,67,66,65,63,63,63,62,61,61,61,58,58,57,57,56,56,55,53,53,53,53,52,51,51,49,49,49,49,49,49,49,48,48,48,47,47,45,45,44,44,44,44,44,43,43,42,42,41,41,40,40,40,40,40,40,40,40,38,38,38,38,38,37,37,37,37,37,37,37,37,36,36,36,36,35,35,35,35,35,34,34,34,34,34,34,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,31,31,31,31,31,31,31,31,31,31,31,31,30,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,28,28,28,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,25,25,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,23,23,23,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,19,19,19,19,19,19,19,19,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,16,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,14,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,11,11,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8],&#34;fontFamily&#34;:&#34;Segoe UI&#34;,&#34;fontWeight&#34;:&#34;bold&#34;,&#34;color&#34;:&#34;random-dark&#34;,&#34;minSize&#34;:0,&#34;weightFactor&#34;:0.694980694980695,&#34;backgroundColor&#34;:&#34;white&#34;,&#34;gridSize&#34;:0,&#34;minRotation&#34;:-0.785398163397448,&#34;maxRotation&#34;:0.785398163397448,&#34;shuffle&#34;:true,&#34;rotateRatio&#34;:0.4,&#34;shape&#34;:&#34;circle&#34;,&#34;ellipticity&#34;:0.65,&#34;figBase64&#34;:null,&#34;hover&#34;:null},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Top 1000 terms extracted from the title
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wordcloud2(covid_wc$abstract %&amp;gt;% 
             slice(1:1000) %&amp;gt;% 
             mutate(frequency = round(frequency)))&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:unnamed-chunk-10&#34;&gt;&lt;/span&gt;
&lt;div id=&#34;htmlwidget-2&#34; style=&#34;width:672px;height:480px;&#34; class=&#34;wordcloud2 html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-2&#34;&gt;{&#34;x&#34;:{&#34;word&#34;:[&#34;patients&#34;,&#34;students&#34;,&#34;learning&#34;,&#34;study&#34;,&#34;health&#34;,&#34;covid-19&#34;,&#34;sars-cov-2&#34;,&#34;pandemic&#34;,&#34;anxiety&#34;,&#34;online&#34;,&#34;malaysia&#34;,&#34;social&#34;,&#34;data&#34;,&#34;disease&#34;,&#34;model&#34;,&#34;countries&#34;,&#34;mco&#34;,&#34;research&#34;,&#34;coronavirus&#34;,&#34;impact&#34;,&#34;healthcare&#34;,&#34;control&#34;,&#34;risk&#34;,&#34;results&#34;,&#34;analysis&#34;,&#34;virus&#34;,&#34;ci&#34;,&#34;infection&#34;,&#34;clinical&#34;,&#34;care&#34;,&#34;education&#34;,&#34;knowledge&#34;,&#34;stress&#34;,&#34;measures&#34;,&#34;public&#34;,&#34;depression&#34;,&#34;spread&#34;,&#34;significant&#34;,&#34;system&#34;,&#34;outbreak&#34;,&#34;information&#34;,&#34;psychological&#34;,&#34;factors&#34;,&#34;world&#34;,&#34;studies&#34;,&#34;findings&#34;,&#34;reported&#34;,&#34;respondents&#34;,&#34;media&#34;,&#34;global&#34;,&#34;medical&#34;,&#34;vaccine&#34;,&#34;lockdown&#34;,&#34;paper&#34;,&#34;participants&#34;,&#34;transmission&#34;,&#34;review&#34;,&#34;positive&#34;,&#34;mental&#34;,&#34;people&#34;,&#34;based&#34;,&#34;economic&#34;,&#34;severe&#34;,&#34;symptoms&#34;,&#34;management&#34;,&#34;methods&#34;,&#34;survey&#34;,&#34;level&#34;,&#34;current&#34;,&#34;treatment&#34;,&#34;food&#34;,&#34;respiratory&#34;,&#34;perceived&#34;,&#34;due&#34;,&#34;development&#34;,&#34;performance&#34;,&#34;total&#34;,&#34;mortality&#34;,&#34;quality&#34;,&#34;challenges&#34;,&#34;rights&#34;,&#34;government&#34;,&#34;including&#34;,&#34;teaching&#34;,&#34;strategies&#34;,&#34;conducted&#34;,&#34;services&#34;,&#34;future&#34;,&#34;time&#34;,&#34;human&#34;,&#34;approach&#34;,&#34;workers&#34;,&#34;method&#34;,&#34;malaysian&#34;,&#34;reserved&#34;,&#34;crisis&#34;,&#34;significantly&#34;,&#34;effects&#34;,&#34;support&#34;,&#34;tourism&#34;,&#34;period&#34;,&#34;proposed&#34;,&#34;movement&#34;,&#34;design&#34;,&#34;effect&#34;,&#34;university&#34;,&#34;potential&#34;,&#34;compared&#34;,&#34;rate&#34;,&#34;abstract&#34;,&#34;affected&#34;,&#34;found&#34;,&#34;score&#34;,&#34;effective&#34;,&#34;financial&#34;,&#34;limited&#34;,&#34;march&#34;,&#34;author&#34;,&#34;provide&#34;,&#34;population&#34;,&#34;acute&#34;,&#34;technology&#34;,&#34;china&#34;,&#34;levels&#34;,&#34;intention&#34;,&#34;air&#34;,&#34;activities&#34;,&#34;detection&#34;,&#34;syndrome&#34;,&#34;caused&#34;,&#34;aims&#34;,&#34;related&#34;,&#34;negative&#34;,&#34;impacts&#34;,&#34;literature&#34;,&#34;physical&#34;,&#34;questionnaire&#34;,&#34;waste&#34;,&#34;practice&#34;,&#34;response&#34;,&#34;market&#34;,&#34;viral&#34;,&#34;infected&#34;,&#34;article&#34;,&#34;relationship&#34;,&#34;authors&#34;,&#34;vaccines&#34;,&#34;increased&#34;,&#34;conclusion&#34;,&#34;international&#34;,&#34;e-learning&#34;,&#34;worldwide&#34;,&#34;practices&#34;,&#34;patient&#34;,&#34;role&#34;,&#34;images&#34;,&#34;test&#34;,&#34;age&#34;,&#34;industry&#34;,&#34;islamic&#34;,&#34;responses&#34;,&#34;prevalence&#34;,&#34;community&#34;,&#34;implications&#34;,&#34;perception&#34;,&#34;process&#34;,&#34;background&#34;,&#34;cancer&#34;,&#34;published&#34;,&#34;coping&#34;,&#34;digital&#34;,&#34;distancing&#34;,&#34;drugs&#34;,&#34;developed&#34;,&#34;systems&#34;,&#34;country&#34;,&#34;evidence&#34;,&#34;sector&#34;,&#34;business&#34;,&#34;science&#34;,&#34;individuals&#34;,&#34;experience&#34;,&#34;identify&#34;,&#34;association&#34;,&#34;policy&#34;,&#34;terms&#34;,&#34;hospital&#34;,&#34;included&#34;,&#34;reduce&#34;,&#34;daily&#34;,&#34;purpose&#34;,&#34;increase&#34;,&#34;identified&#34;,&#34;understanding&#34;,&#34;children&#34;,&#34;life&#34;,&#34;implementation&#34;,&#34;society&#34;,&#34;â&#34;,&#34;situation&#34;,&#34;stock&#34;,&#34;models&#34;,&#34;collected&#34;,&#34;home&#34;,&#34;access&#34;,&#34;trials&#34;,&#34;accuracy&#34;,&#34;prevention&#34;,&#34;april&#34;,&#34;main&#34;,&#34;lower&#34;,&#34;investigate&#34;,&#34;epidemic&#34;,&#34;travel&#34;,&#34;observed&#34;,&#34;diseases&#34;,&#34;attitude&#34;,&#34;technologies&#34;,&#34;revealed&#34;,&#34;image&#34;,&#34;personal&#34;,&#34;aimed&#34;,&#34;distress&#34;,&#34;environment&#34;,&#34;infections&#34;,&#34;drug&#34;,&#34;diagnosis&#34;,&#34;result&#34;,&#34;nature&#34;,&#34;scale&#34;,&#34;confirmed&#34;,&#34;preventive&#34;,&#34;issues&#34;,&#34;acceptance&#34;,&#34;communication&#34;,&#34;objective&#34;,&#34;safety&#34;,&#34;resources&#34;,&#34;switzerland&#34;,&#34;virtual&#34;,&#34;major&#34;,&#34;testing&#34;,&#34;outcomes&#34;,&#34;performed&#34;,&#34;variables&#34;,&#34;articles&#34;,&#34;phase&#34;,&#34;developing&#34;,&#34;framework&#34;,&#34;provided&#34;,&#34;key&#34;,&#34;techniques&#34;,&#34;deaths&#34;,&#34;considered&#34;,&#34;critical&#34;,&#34;features&#34;,&#34;rapid&#34;,&#34;low&#34;,&#34;screening&#34;,&#34;women&#34;,&#34;universities&#34;,&#34;cross-sectional&#34;,&#34;springer&#34;,&#34;status&#34;,&#34;asia&#34;,&#34;environmental&#34;,&#34;emergency&#34;,&#34;majority&#34;,&#34;activity&#34;,&#34;hcws&#34;,&#34;indonesia&#34;,&#34;institutions&#34;,&#34;severity&#34;,&#34;policies&#34;,&#34;protein&#34;,&#34;assess&#34;,&#34;fear&#34;,&#34;days&#34;,&#34;death&#34;,&#34;licensee&#34;,&#34;security&#34;,&#34;news&#34;,&#34;protective&#34;,&#34;change&#34;,&#34;examine&#34;,&#34;mdpi&#34;,&#34;basel&#34;,&#34;national&#34;,&#34;contact&#34;,&#34;multiple&#34;,&#34;obtained&#34;,&#34;conditions&#34;,&#34;aim&#34;,&#34;distribution&#34;,&#34;selected&#34;,&#34;local&#34;,&#34;conclusions&#34;,&#34;prevent&#34;,&#34;behaviour&#34;,&#34;elsevier&#34;,&#34;sample&#34;,&#34;dental&#34;,&#34;staff&#34;,&#34;quarantine&#34;,&#34;regression&#34;,&#34;ensure&#34;,&#34;hospitals&#34;,&#34;engagement&#34;,&#34;addition&#34;,&#34;income&#34;,&#34;reduction&#34;,&#34;specific&#34;,&#34;training&#34;,&#34;interventions&#34;,&#34;lack&#34;,&#34;restrictions&#34;,&#34;effectiveness&#34;,&#34;characteristics&#34;,&#34;satisfaction&#34;,&#34;student&#34;,&#34;improve&#34;,&#34;application&#34;,&#34;methodology&#34;,&#34;stroke&#34;,&#34;moderate&#34;,&#34;december&#34;,&#34;al&#34;,&#34;ppe&#34;,&#34;criteria&#34;,&#34;network&#34;,&#34;region&#34;,&#34;organization&#34;,&#34;importance&#34;,&#34;growth&#34;,&#34;energy&#34;,&#34;pakistan&#34;,&#34;teachers&#34;,&#34;sustainable&#34;,&#34;systematic&#34;,&#34;scores&#34;,&#34;recommendations&#34;,&#34;academic&#34;,&#34;economy&#34;,&#34;million&#34;,&#34;guidelines&#34;,&#34;approaches&#34;,&#34;strategy&#34;,&#34;implemented&#34;,&#34;vaccination&#34;,&#34;recent&#34;,&#34;evaluate&#34;,&#34;adults&#34;,&#34;family&#34;,&#34;similar&#34;,&#34;distributed&#34;,&#34;equipment&#34;,&#34;recovery&#34;,&#34;correlation&#34;,&#34;reduced&#34;,&#34;binding&#34;,&#34;body&#34;,&#34;sharing&#34;,&#34;researchers&#34;,&#34;wuhan&#34;,&#34;machine&#34;,&#34;analyzed&#34;,&#34;immune&#34;,&#34;assessment&#34;,&#34;awareness&#34;,&#34;x-ray&#34;,&#34;religious&#34;,&#34;sensitivity&#34;,&#34;globally&#34;,&#34;explore&#34;,&#34;ongoing&#34;,&#34;led&#34;,&#34;behavior&#34;,&#34;publishing&#34;,&#34;direct&#34;,&#34;procedures&#34;,&#34;theory&#34;,&#34;rates&#34;,&#34;determine&#34;,&#34;develop&#34;,&#34;resilience&#34;,&#34;bangladesh&#34;,&#34;exposure&#34;,&#34;context&#34;,&#34;factor&#34;,&#34;focus&#34;,&#34;understand&#34;,&#34;pneumonia&#34;,&#34;questions&#34;,&#34;readiness&#34;,&#34;well-being&#34;,&#34;î&#34;,&#34;deep&#34;,&#34;relevant&#34;,&#34;required&#34;,&#34;asian&#34;,&#34;usage&#34;,&#34;compounds&#34;,&#34;differences&#34;,&#34;odds&#34;,&#34;january&#34;,&#34;copyright&#34;,&#34;threat&#34;,&#34;covid&#34;,&#34;aspects&#34;,&#34;chest&#34;,&#34;essential&#34;,&#34;google&#34;,&#34;delivery&#34;,&#34;sustainability&#34;,&#34;sectors&#34;,&#34;affect&#34;,&#34;monitoring&#34;,&#34;primary&#34;,&#34;qualitative&#34;,&#34;ratio&#34;,&#34;discussed&#34;,&#34;ct&#34;,&#34;hand&#34;,&#34;antiviral&#34;,&#34;common&#34;,&#34;infectious&#34;,&#34;perceptions&#34;,&#34;existing&#34;,&#34;suggest&#34;,&#34;normal&#34;,&#34;sampling&#34;,&#34;professionals&#34;,&#34;universiti&#34;,&#34;concern&#34;,&#34;solution&#34;,&#34;search&#34;,&#34;individual&#34;,&#34;carried&#34;,&#34;standard&#34;,&#34;concerns&#34;,&#34;assessed&#34;,&#34;tested&#34;,&#34;lives&#34;,&#34;attitudes&#34;,&#34;facilities&#34;,&#34;rapidly&#34;,&#34;quantitative&#34;,&#34;service&#34;,&#34;preparedness&#34;,&#34;influence&#34;,&#34;limitations&#34;,&#34;presence&#34;,&#34;confidence&#34;,&#34;internet&#34;,&#34;parameters&#34;,&#34;applications&#34;,&#34;index&#34;,&#34;report&#34;,&#34;databases&#34;,&#34;ace2&#34;,&#34;analyses&#34;,&#34;efforts&#34;,&#34;increasing&#34;,&#34;type&#34;,&#34;previous&#34;,&#34;markets&#34;,&#34;inhibitors&#34;,&#34;unprecedented&#34;,&#34;applied&#34;,&#34;highly&#34;,&#34;reports&#34;,&#34;products&#34;,&#34;wave&#34;,&#34;structural&#34;,&#34;compliance&#34;,&#34;surgery&#34;,&#34;crucial&#34;,&#34;received&#34;,&#34;faced&#34;,&#34;source&#34;,&#34;âˆ&#34;,&#34;production&#34;,&#34;adequate&#34;,&#34;providing&#34;,&#34;impacted&#34;,&#34;enhance&#34;,&#34;illness&#34;,&#34;therapy&#34;,&#34;intervention&#34;,&#34;educational&#34;,&#34;governments&#34;,&#34;dataset&#34;,&#34;technique&#34;,&#34;vulnerable&#34;,&#34;spreading&#34;,&#34;hydroxychloroquine&#34;,&#34;platform&#34;,&#34;molecular&#34;,&#34;experiences&#34;,&#34;day&#34;,&#34;gender&#34;,&#34;practical&#34;,&#34;demonstrated&#34;,&#34;laboratory&#34;,&#34;faculty&#34;,&#34;traditional&#34;,&#34;attention&#34;,&#34;address&#34;,&#34;lead&#34;,&#34;analyze&#34;,&#34;surveillance&#34;,&#34;studentsâ&#34;,&#34;recommended&#34;,&#34;fake&#34;,&#34;prediction&#34;,&#34;set&#34;,&#34;range&#34;,&#34;dynamics&#34;,&#34;mild&#34;,&#34;examined&#34;,&#34;measure&#34;,&#34;investigated&#34;,&#34;platforms&#34;,&#34;poor&#34;,&#34;increases&#34;,&#34;pm2.5&#34;,&#34;classification&#34;,&#34;chain&#34;,&#34;diagnostic&#34;,&#34;pooled&#34;,&#34;trend&#34;,&#34;items&#34;,&#34;involved&#34;,&#34;emotional&#34;,&#34;mobile&#34;,&#34;samples&#34;,&#34;objectives&#34;,&#34;nurses&#34;,&#34;tools&#34;,&#34;mass&#34;,&#34;size&#34;,&#34;saliva&#34;,&#34;qtc&#34;,&#34;predict&#34;,&#34;form&#34;,&#34;uk&#34;,&#34;play&#34;,&#34;average&#34;,&#34;male&#34;,&#34;adverse&#34;,&#34;tests&#34;,&#34;statistical&#34;,&#34;classes&#34;,&#34;therapeutic&#34;,&#34;licence&#34;,&#34;taylor&#34;,&#34;francis&#34;,&#34;telemedicine&#34;,&#34;wellbeing&#34;,&#34;predicted&#34;,&#34;evaluation&#34;,&#34;types&#34;,&#34;june&#34;,&#34;shown&#34;,&#34;active&#34;,&#34;employees&#34;,&#34;challenge&#34;,&#34;employed&#34;,&#34;trading&#34;,&#34;india&#34;,&#34;mitigate&#34;,&#34;past&#34;,&#34;patterns&#34;,&#34;analysed&#34;,&#34;affecting&#34;,&#34;motivation&#34;,&#34;consequences&#34;,&#34;reality&#34;,&#34;private&#34;,&#34;providers&#34;,&#34;opportunities&#34;,&#34;issue&#34;,&#34;managing&#34;,&#34;medicine&#34;,&#34;experienced&#34;,&#34;confinement&#34;,&#34;meta-analysis&#34;,&#34;include&#34;,&#34;causing&#34;,&#34;supply&#34;,&#34;scientific&#34;,&#34;outcome&#34;,&#34;events&#34;,&#34;descriptive&#34;,&#34;authorities&#34;,&#34;consumers&#34;,&#34;potentially&#34;,&#34;introduction&#34;,&#34;organizations&#34;,&#34;risks&#34;,&#34;interviews&#34;,&#34;secondary&#34;,&#34;demand&#34;,&#34;sd&#34;,&#34;times&#34;,&#34;declared&#34;,&#34;rna&#34;,&#34;tool&#34;,&#34;construction&#34;,&#34;users&#34;,&#34;apps&#34;,&#34;resulted&#34;,&#34;sleep&#34;,&#34;stability&#34;,&#34;completed&#34;,&#34;safe&#34;,&#34;burden&#34;,&#34;action&#34;,&#34;fever&#34;,&#34;leading&#34;,&#34;hygiene&#34;,&#34;networks&#34;,&#34;alternative&#34;,&#34;sources&#34;,&#34;outbreaks&#34;,&#34;female&#34;,&#34;suggested&#34;,&#34;adjusted&#34;,&#34;estimated&#34;,&#34;llc&#34;,&#34;finally&#34;,&#34;specifically&#34;,&#34;i.e&#34;,&#34;months&#34;,&#34;marketing&#34;,&#34;emerged&#34;,&#34;original&#34;,&#34;disorders&#34;,&#34;mechanisms&#34;,&#34;ieee&#34;,&#34;long-term&#34;,&#34;solutions&#34;,&#34;pollution&#34;,&#34;distance&#34;,&#34;decision&#34;,&#34;globe&#34;,&#34;benefits&#34;,&#34;pattern&#34;,&#34;temperature&#34;,&#34;lung&#34;,&#34;ict&#34;,&#34;iot&#34;,&#34;policymakers&#34;,&#34;chronic&#34;,&#34;reproduction&#34;,&#34;central&#34;,&#34;recently&#34;,&#34;insights&#34;,&#34;aor&#34;,&#34;algorithm&#34;,&#34;southeast&#34;,&#34;difference&#34;,&#34;isolation&#34;,&#34;sars&#34;,&#34;assay&#34;,&#34;describe&#34;,&#34;emergence&#34;,&#34;content&#34;,&#34;protection&#34;,&#34;skills&#34;,&#34;manage&#34;,&#34;healthy&#34;,&#34;originality&#34;,&#34;adopted&#34;,&#34;barriers&#34;,&#34;designed&#34;,&#34;february&#34;,&#34;mitigation&#34;,&#34;communities&#34;,&#34;asymptomatic&#34;,&#34;comprehensive&#34;,&#34;oral&#34;,&#34;highlights&#34;,&#34;trust&#34;,&#34;males&#34;,&#34;continue&#34;,&#34;affects&#34;,&#34;humans&#34;,&#34;engineering&#34;,&#34;person&#34;,&#34;prices&#34;,&#34;fast&#34;,&#34;pubmed&#34;,&#34;english&#34;,&#34;medium&#34;,&#34;spike&#34;,&#34;close&#34;,&#34;prior&#34;,&#34;reporting&#34;,&#34;target&#34;,&#34;regions&#34;,&#34;short&#34;,&#34;discuss&#34;,&#34;penerbit&#34;,&#34;africa&#34;,&#34;host&#34;,&#34;neural&#34;,&#34;predictors&#34;,&#34;basis&#34;,&#34;females&#34;,&#34;school&#34;,&#34;natural&#34;,&#34;institute&#34;,&#34;week&#34;,&#34;genome&#34;,&#34;emerging&#34;,&#34;pandemics&#34;,&#34;east&#34;,&#34;aerosol&#34;,&#34;suspected&#34;,&#34;conventional&#34;,&#34;specificity&#34;,&#34;achieve&#34;,&#34;odl&#34;,&#34;unique&#34;,&#34;version&#34;,&#34;detected&#34;,&#34;investment&#34;,&#34;guide&#34;,&#34;disruption&#34;,&#34;press&#34;,&#34;median&#34;,&#34;burnout&#34;,&#34;availability&#34;,&#34;collection&#34;,&#34;emerald&#34;,&#34;informa&#34;,&#34;e.g&#34;,&#34;basic&#34;,&#34;city&#34;,&#34;ability&#34;,&#34;require&#34;,&#34;planning&#34;,&#34;creative&#34;,&#34;capacity&#34;,&#34;cells&#34;,&#34;cell&#34;,&#34;independent&#34;,&#34;wiley&#34;,&#34;dentists&#34;,&#34;epidemiological&#34;,&#34;self-efficacy&#34;,&#34;effectively&#34;,&#34;price&#34;,&#34;values&#34;,&#34;proteins&#34;,&#34;behavioural&#34;,&#34;surgical&#34;,&#34;journal&#34;,&#34;facing&#34;,&#34;protease&#34;,&#34;medicines&#34;,&#34;coverage&#34;,&#34;field&#34;,&#34;reducing&#34;,&#34;oil&#34;,&#34;condition&#34;,&#34;evaluated&#34;,&#34;real-time&#34;,&#34;contribute&#34;,&#34;highlight&#34;,&#34;receiving&#34;,&#34;partial&#34;,&#34;efficacy&#34;,&#34;living&#34;,&#34;proper&#34;,&#34;remdesivir&#34;,&#34;remains&#34;,&#34;studied&#34;,&#34;statistics&#34;,&#34;reviewed&#34;,&#34;cognitive&#34;,&#34;hypertension&#34;,&#34;sites&#34;,&#34;aged&#34;,&#34;fight&#34;,&#34;exclusive&#34;,&#34;inclusion&#34;,&#34;immunity&#34;,&#34;complications&#34;,&#34;banking&#34;,&#34;additional&#34;,&#34;detect&#34;,&#34;building&#34;,&#34;consumption&#34;,&#34;assist&#34;,&#34;efficient&#34;,&#34;improvement&#34;,&#34;coronaviruses&#34;,&#34;incidence&#34;,&#34;united&#34;,&#34;job&#34;,&#34;estimate&#34;,&#34;properties&#34;,&#34;no2&#34;,&#34;behavioral&#34;,&#34;expected&#34;,&#34;improved&#34;,&#34;doctors&#34;,&#34;destination&#34;,&#34;modelling&#34;,&#34;electronic&#34;,&#34;involving&#34;,&#34;simulation&#34;,&#34;morbidity&#34;,&#34;chinese&#34;,&#34;database&#34;,&#34;initial&#34;,&#34;widely&#34;,&#34;face-to-face&#34;,&#34;integrated&#34;,&#34;diabetes&#34;,&#34;logistic&#34;,&#34;decrease&#34;,&#34;versus&#34;,&#34;interval&#34;,&#34;web&#34;,&#34;demographic&#34;,&#34;phases&#34;,&#34;date&#34;,&#34;influenza&#34;,&#34;achieved&#34;,&#34;actions&#34;,&#34;additionally&#34;,&#34;tracing&#34;,&#34;smart&#34;,&#34;europe&#34;,&#34;contagious&#34;,&#34;middle&#34;,&#34;perspective&#34;,&#34;climate&#34;,&#34;swab&#34;,&#34;language&#34;,&#34;linear&#34;,&#34;treat&#34;,&#34;interaction&#34;,&#34;interactions&#34;,&#34;zakat&#34;,&#34;health-care&#34;,&#34;eating&#34;,&#34;resulting&#34;,&#34;relationships&#34;,&#34;questionnaires&#34;,&#34;discussion&#34;,&#34;license&#34;,&#34;spatial&#34;,&#34;ministry&#34;,&#34;vital&#34;,&#34;discusses&#34;,&#34;recorded&#34;,&#34;usefulness&#34;,&#34;programs&#34;,&#34;american&#34;,&#34;depressive&#34;,&#34;materials&#34;,&#34;strong&#34;,&#34;modified&#34;,&#34;cost&#34;,&#34;explored&#34;,&#34;random&#34;,&#34;weeks&#34;,&#34;adoption&#34;,&#34;comorbidities&#34;,&#34;ml&#34;,&#34;algorithms&#34;,&#34;comparison&#34;,&#34;improving&#34;,&#34;established&#34;,&#34;structure&#34;,&#34;function&#34;,&#34;physicians&#34;,&#34;degree&#34;,&#34;entry&#34;,&#34;highlighted&#34;,&#34;reaction&#34;,&#34;singapore&#34;,&#34;infrastructure&#34;,&#34;complex&#34;,&#34;suitable&#34;,&#34;section&#34;,&#34;demonstrate&#34;,&#34;sciences&#34;,&#34;challenging&#34;,&#34;generated&#34;,&#34;decreased&#34;,&#34;south&#34;,&#34;masks&#34;,&#34;selection&#34;,&#34;hajj&#34;,&#34;rt-pcr&#34;,&#34;single&#34;,&#34;proportion&#34;,&#34;b.v&#34;,&#34;equation&#34;,&#34;ai&#34;,&#34;series&#34;,&#34;feature&#34;,&#34;mechanism&#34;,&#34;believed&#34;,&#34;accurate&#34;,&#34;scopus&#34;,&#34;reliability&#34;,&#34;pâ&#34;,&#34;lt&#34;,&#34;measured&#34;,&#34;history&#34;,&#34;finding&#34;,&#34;liver&#34;,&#34;companies&#34;,&#34;sars-cov&#34;,&#34;nigeria&#34;,&#34;receptor&#34;,&#34;frontline&#34;,&#34;combination&#34;,&#34;software&#34;,&#34;determined&#34;,&#34;urgent&#34;,&#34;returns&#34;,&#34;participated&#34;,&#34;acid&#34;,&#34;post-covid-19&#34;,&#34;psychosocial&#34;,&#34;successful&#34;,&#34;empirical&#34;,&#34;directly&#34;,&#34;real&#34;,&#34;spss&#34;,&#34;duration&#34;,&#34;plan&#34;,&#34;beginning&#34;,&#34;covid-19-related&#34;,&#34;success&#34;,&#34;blood&#34;,&#34;personnel&#34;,&#34;remain&#34;,&#34;imposed&#34;,&#34;created&#34;,&#34;examines&#34;,&#34;requires&#34;,&#34;called&#34;,&#34;commons&#34;,&#34;focused&#34;,&#34;final&#34;,&#34;water&#34;,&#34;periods&#34;,&#34;advanced&#34;,&#34;utilized&#34;,&#34;addressing&#34;,&#34;damage&#34;,&#34;citizens&#34;,&#34;plasma&#34;,&#34;amount&#34;,&#34;influenced&#34;,&#34;curve&#34;,&#34;curb&#34;,&#34;discovered&#34;,&#34;icu&#34;,&#34;treatments&#34;,&#34;nations&#34;,&#34;optimal&#34;,&#34;urological&#34;,&#34;neurosurgical&#34;,&#34;rf-ssa&#34;,&#34;trends&#34;,&#34;frequency&#34;,&#34;italy&#34;,&#34;stakeholders&#34;,&#34;reliable&#34;,&#34;handling&#34;,&#34;enhanced&#34;,&#34;august&#34;,&#34;agents&#34;,&#34;viruses&#34;,&#34;perspectives&#34;,&#34;emotion&#34;,&#34;negatively&#34;,&#34;introduced&#34;,&#34;analyse&#34;,&#34;residents&#34;,&#34;adult&#34;,&#34;corona&#34;,&#34;approved&#34;,&#34;nursing&#34;,&#34;thinking&#34;,&#34;forced&#34;,&#34;complete&#34;,&#34;suggests&#34;,&#34;populations&#34;,&#34;emissions&#34;,&#34;economies&#34;],&#34;freq&#34;:[634,569,561,539,526,483,481,447,443,441,433,417,396,394,388,374,367,355,353,352,343,336,335,334,332,331,330,329,322,321,319,314,311,310,302,300,299,299,296,295,294,293,292,290,290,288,288,285,284,282,280,279,279,276,273,272,271,271,269,268,268,265,264,261,260,258,257,257,252,252,252,250,248,248,248,248,242,241,241,241,237,232,232,231,230,230,230,228,228,227,226,226,225,225,224,224,223,221,221,219,215,215,214,214,214,213,211,211,211,210,209,208,206,204,204,202,199,199,198,197,196,196,196,196,195,195,194,194,194,192,192,192,191,190,190,190,189,189,187,186,186,185,185,185,183,182,179,179,178,178,177,176,176,176,175,175,172,172,172,171,169,169,169,168,168,167,167,165,165,165,164,164,164,162,162,161,161,161,160,160,159,159,159,159,158,158,157,157,157,156,155,155,154,154,154,154,154,153,153,153,152,152,152,152,152,151,151,151,151,150,150,149,149,149,149,148,148,148,148,148,147,147,147,146,146,145,144,144,144,143,142,142,142,142,142,141,141,141,141,140,140,140,140,140,139,139,139,139,138,138,138,138,138,137,137,137,137,137,136,136,136,135,135,134,134,133,133,133,133,133,132,132,131,131,131,131,130,130,130,129,128,128,128,128,127,127,127,127,126,126,126,126,125,125,125,125,125,124,124,124,123,123,123,123,122,122,122,121,121,121,120,120,120,120,120,119,119,119,119,119,118,118,118,118,118,118,118,117,116,116,116,116,116,115,115,115,115,115,115,115,114,114,114,114,114,114,113,113,113,113,112,112,112,112,112,112,111,111,111,111,111,111,111,111,111,110,110,110,110,110,110,109,109,109,109,109,109,108,108,108,108,107,107,107,107,107,107,107,107,106,106,106,105,105,105,105,105,105,105,105,105,105,105,105,105,104,104,104,104,104,104,103,103,103,102,102,102,102,102,102,102,102,101,101,101,101,101,101,101,101,101,101,101,101,101,100,99,99,99,99,99,99,99,98,98,98,98,98,98,98,98,97,97,97,97,97,97,97,97,97,96,96,96,95,95,95,95,95,95,95,95,95,95,95,94,94,94,94,94,94,94,94,94,94,94,94,94,94,94,94,93,93,93,93,93,93,93,92,92,92,92,92,92,92,92,91,91,91,91,91,91,91,91,91,91,90,90,90,90,90,90,90,90,90,90,90,90,89,89,89,89,89,89,88,88,88,88,88,88,88,88,88,88,88,88,87,87,87,87,87,87,87,87,86,86,86,86,86,86,86,86,86,86,86,86,86,85,85,85,85,85,85,85,84,84,84,84,84,84,84,84,84,84,84,84,84,83,83,83,83,83,83,83,83,82,82,82,82,82,82,82,82,82,81,81,81,81,81,81,81,81,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,79,79,79,79,79,79,79,79,78,78,78,78,78,78,78,78,77,77,77,77,77,77,77,77,77,77,77,76,76,76,76,76,76,76,76,76,76,76,75,75,75,75,75,75,75,75,75,75,75,75,75,75,75,75,74,74,74,74,74,74,74,74,74,74,74,74,73,73,73,73,73,73,73,73,73,73,73,72,72,72,72,72,72,72,72,72,72,72,72,71,71,71,71,71,71,71,71,71,71,71,71,71,71,71,71,71,70,70,70,70,70,70,70,70,70,70,70,70,70,70,70,70,70,70,69,69,69,69,69,69,69,69,69,69,69,69,69,68,68,68,68,68,68,68,68,68,68,68,68,68,68,68,68,68,68,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,63,63,63,63,63,63,63,63,63,63,63,63,63,63,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,61,61,61,61,61,61,61,61,61,61,61,61,61,61,61,61,61,61,61,61,61,61,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,59,59,59,59,59,59,59,59,59,59,59,59,59,59,59,59,59,59,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,56,56,56,56,56,56],&#34;fontFamily&#34;:&#34;Segoe UI&#34;,&#34;fontWeight&#34;:&#34;bold&#34;,&#34;color&#34;:&#34;random-dark&#34;,&#34;minSize&#34;:0,&#34;weightFactor&#34;:0.28391167192429,&#34;backgroundColor&#34;:&#34;white&#34;,&#34;gridSize&#34;:0,&#34;minRotation&#34;:-0.785398163397448,&#34;maxRotation&#34;:0.785398163397448,&#34;shuffle&#34;:true,&#34;rotateRatio&#34;:0.4,&#34;shape&#34;:&#34;circle&#34;,&#34;ellipticity&#34;:0.65,&#34;figBase64&#34;:null,&#34;hover&#34;:null},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: Top 1000 terms extracted from the abstract
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wordcloud2(covid_wc$author_keywords%&amp;gt;% 
             slice(1:1000) %&amp;gt;% 
             mutate(frequency = round(frequency)))&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:unnamed-chunk-11&#34;&gt;&lt;/span&gt;
&lt;div id=&#34;htmlwidget-3&#34; style=&#34;width:672px;height:480px;&#34; class=&#34;wordcloud2 html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-3&#34;&gt;{&#34;x&#34;:{&#34;word&#34;:[&#34;covid-19&#34;,&#34;pandemic&#34;,&#34;learning&#34;,&#34;coronavirus&#34;,&#34;health&#34;,&#34;sars-cov-2&#34;,&#34;malaysia&#34;,&#34;social&#34;,&#34;online&#34;,&#34;education&#34;,&#34;disease&#34;,&#34;analysis&#34;,&#34;control&#34;,&#34;anxiety&#34;,&#34;technology&#34;,&#34;students&#34;,&#34;teaching&#34;,&#34;mental&#34;,&#34;movement&#34;,&#34;model&#34;,&#34;public&#34;,&#34;management&#34;,&#34;media&#34;,&#34;stress&#34;,&#34;healthcare&#34;,&#34;lockdown&#34;,&#34;machine&#34;,&#34;psychological&#34;,&#34;quality&#34;,&#34;risk&#34;,&#34;medical&#34;,&#34;food&#34;,&#34;policy&#34;,&#34;system&#34;,&#34;depression&#34;,&#34;vaccine&#34;,&#34;respiratory&#34;,&#34;care&#34;,&#34;university&#34;,&#34;impact&#34;,&#34;clinical&#34;,&#34;deep&#34;,&#34;knowledge&#34;,&#34;economic&#34;,&#34;virus&#34;,&#34;diseases&#34;,&#34;tourism&#34;,&#34;digital&#34;,&#34;neural&#34;,&#34;network&#34;,&#34;theory&#34;,&#34;waste&#34;,&#34;development&#34;,&#34;islamic&#34;,&#34;image&#34;,&#34;epidemic&#34;,&#34;e-learning&#34;,&#34;mortality&#34;,&#34;performance&#34;,&#34;infectious&#34;,&#34;sustainable&#34;,&#34;workers&#34;,&#34;covid&#34;,&#34;syndrome&#34;,&#34;artificial&#34;,&#34;intention&#34;,&#34;antiviral&#34;,&#34;drug&#34;,&#34;asia&#34;,&#34;transmission&#34;,&#34;practice&#34;,&#34;infection&#34;,&#34;global&#34;,&#34;index&#34;,&#34;perception&#34;,&#34;air&#34;,&#34;security&#34;,&#34;acceptance&#34;,&#34;medicine&#34;,&#34;stock&#34;,&#34;distance&#34;,&#34;distancing&#34;,&#34;information&#34;,&#34;pneumonia&#34;,&#34;communication&#34;,&#34;resilience&#34;,&#34;screening&#34;,&#34;acute&#34;,&#34;perceived&#34;,&#34;attitude&#34;,&#34;virtual&#34;,&#34;outbreak&#34;,&#34;sars&#34;,&#34;sustainability&#34;,&#34;data&#34;,&#34;crisis&#34;,&#34;pollution&#34;,&#34;community&#34;,&#34;measures&#34;,&#34;fear&#34;,&#34;review&#34;,&#34;protective&#34;,&#34;support&#34;,&#34;behavior&#34;,&#34;emergency&#34;,&#34;response&#34;,&#34;financial&#34;,&#34;student&#34;,&#34;systems&#34;,&#34;epidemiology&#34;,&#34;life&#34;,&#34;quarantine&#34;,&#34;prevention&#34;,&#34;industry&#34;,&#34;intelligence&#34;,&#34;forecasting&#34;,&#34;smart&#34;,&#34;coping&#34;,&#34;pandemics&#34;,&#34;drugs&#34;,&#34;x-ray&#34;,&#34;travel&#34;,&#34;detection&#34;,&#34;survey&#34;,&#34;cancer&#34;,&#34;monitoring&#34;,&#34;bangladesh&#34;,&#34;hydroxychloroquine&#34;,&#34;physical&#34;,&#34;pakistan&#34;,&#34;immunity&#34;,&#34;molecular&#34;,&#34;decision&#34;,&#34;equipment&#34;,&#34;services&#34;,&#34;news&#34;,&#34;distress&#34;,&#34;price&#34;,&#34;service&#34;,&#34;factors&#34;,&#34;chest&#34;,&#34;destination&#34;,&#34;human&#34;,&#34;satisfaction&#34;,&#34;research&#34;,&#34;hospital&#34;,&#34;regression&#34;,&#34;diagnosis&#34;,&#34;vaccines&#34;,&#34;modelling&#34;,&#34;simulation&#34;,&#34;behaviour&#34;,&#34;covid19&#34;,&#34;mco&#34;,&#34;networks&#34;,&#34;transfer&#34;,&#34;supply&#34;,&#34;chain&#34;,&#34;environmental&#34;,&#34;therapy&#34;,&#34;corona&#34;,&#34;rapid&#34;,&#34;government&#34;,&#34;personal&#34;,&#34;stability&#34;,&#34;southeast&#34;,&#34;home&#34;,&#34;market&#34;,&#34;internet&#34;,&#34;motivation&#34;,&#34;fake&#34;,&#34;optimization&#34;,&#34;neurosurgery&#34;,&#34;events&#34;,&#34;energy&#34;,&#34;surgery&#34;,&#34;children&#34;,&#34;change&#34;,&#34;hiv&#34;,&#34;assessment&#34;,&#34;safety&#34;,&#34;population&#34;,&#34;strategy&#34;,&#34;protection&#34;,&#34;treatment&#34;,&#34;international&#34;,&#34;immune&#34;,&#34;activity&#34;,&#34;environment&#34;,&#34;vaccination&#34;,&#34;docking&#34;,&#34;2019-ncov&#34;,&#34;women&#34;,&#34;engagement&#34;,&#34;severe&#34;,&#34;challenges&#34;,&#34;diabetes&#34;,&#34;mobile&#34;,&#34;project&#34;,&#34;spike&#34;,&#34;markets&#34;,&#34;adverse&#34;,&#34;convolutional&#34;,&#34;perceptions&#34;,&#34;protein&#34;,&#34;protease&#34;,&#34;sharing&#34;,&#34;plasma&#34;,&#34;storm&#34;,&#34;disorders&#34;,&#34;educational&#34;,&#34;volatility&#34;,&#34;oil&#34;,&#34;countries&#34;,&#34;ct&#34;,&#34;reproduction&#34;,&#34;climate&#34;,&#34;impacts&#34;,&#34;well-being&#34;,&#34;mass&#34;,&#34;preventive&#34;,&#34;mathematical&#34;,&#34;readiness&#34;,&#34;worker&#34;,&#34;people&#34;,&#34;policies&#34;,&#34;school&#34;,&#34;indonesia&#34;,&#34;diagnostic&#34;,&#34;resources&#34;,&#34;nigeria&#34;,&#34;images&#34;,&#34;ace2&#34;,&#34;approach&#34;,&#34;systematic&#34;,&#34;infections&#34;,&#34;nutrition&#34;,&#34;rna&#34;,&#34;cytokine&#34;,&#34;business&#34;,&#34;fuzzy&#34;,&#34;pharmacy&#34;,&#34;prediction&#34;,&#34;pharmacists&#34;,&#34;classification&#34;,&#34;academic&#34;,&#34;adults&#34;,&#34;china&#34;,&#34;wavelet&#34;,&#34;economy&#34;,&#34;test&#34;,&#34;intervention&#34;,&#34;saudi&#34;,&#34;arabia&#34;,&#34;herd&#34;,&#34;critical&#34;,&#34;method&#34;,&#34;computed&#34;,&#34;tomography&#34;,&#34;viral&#34;,&#34;dental&#34;,&#34;awareness&#34;,&#34;preparedness&#34;,&#34;status&#34;,&#34;application&#34;,&#34;trend&#34;,&#34;forest&#34;,&#34;finance&#34;,&#34;rate&#34;,&#34;testing&#34;,&#34;strategies&#34;,&#34;africa&#34;,&#34;google&#34;,&#34;brand&#34;,&#34;iran&#34;,&#34;literacy&#34;,&#34;loss&#34;,&#34;nervous&#34;,&#34;hand&#34;,&#34;hygiene&#34;,&#34;cytokines&#34;,&#34;training&#34;,&#34;therapeutics&#34;,&#34;body&#34;,&#34;indices&#34;,&#34;construction&#34;,&#34;stroke&#34;,&#34;convalescent&#34;,&#34;ict&#34;,&#34;modeling&#34;,&#34;patients&#34;,&#34;reality&#34;,&#34;responsibility&#34;,&#34;equity&#34;,&#34;gold&#34;,&#34;misinformation&#34;,&#34;pedagogy&#34;,&#34;meta-analysis&#34;,&#34;thinking&#34;,&#34;recovery&#34;,&#34;experience&#34;,&#34;hesitancy&#34;,&#34;sir&#34;,&#34;malaysian&#34;,&#34;multi-criteria&#34;,&#34;industrial&#34;,&#34;delivery&#34;,&#34;derivative&#34;,&#34;numerical&#34;,&#34;national&#34;,&#34;cov&#34;,&#34;interaction&#34;,&#34;world&#34;,&#34;repurposing&#34;,&#34;innovation&#34;,&#34;biomarkers&#34;,&#34;vector&#34;,&#34;wellbeing&#34;,&#34;angiotensin-converting&#34;,&#34;enzyme&#34;,&#34;contact&#34;,&#34;growth&#34;,&#34;pathology&#34;,&#34;mining&#34;,&#34;chloroquine&#34;,&#34;design&#34;,&#34;smes&#34;,&#34;telemedicine&#34;,&#34;characteristics&#34;,&#34;oral&#34;,&#34;influenza&#34;,&#34;employee&#34;,&#34;fractional&#34;,&#34;algorithm&#34;,&#34;revolution&#34;,&#34;otolaryngology&#34;,&#34;angiotensin&#34;,&#34;leadership&#34;,&#34;privacy&#34;,&#34;iomt&#34;,&#34;country&#34;,&#34;convolution&#34;,&#34;staff&#34;,&#34;isolation&#34;,&#34;consumption&#34;,&#34;cardiovascular&#34;,&#34;sars-cov2&#34;,&#34;aids&#34;,&#34;sensitivity&#34;,&#34;generation&#34;,&#34;laboratory&#34;,&#34;imaging&#34;,&#34;basic&#34;,&#34;sentiment&#34;,&#34;psychosocial&#34;,&#34;feature&#34;,&#34;cnn&#34;,&#34;pulmonary&#34;,&#34;stimulus&#34;,&#34;inflammatory&#34;,&#34;anaesthesia&#34;,&#34;religious&#34;,&#34;secondary&#34;,&#34;evolution&#34;,&#34;inhibitors&#34;,&#34;factor&#34;,&#34;aid&#34;,&#34;normal&#34;,&#34;remdesivir&#34;,&#34;spectrum&#34;,&#34;random&#34;,&#34;study&#34;,&#34;receptor&#34;,&#34;sciences&#34;,&#34;multiple&#34;,&#34;symptoms&#34;,&#34;banking&#34;,&#34;nanoparticles&#34;,&#34;curve&#34;,&#34;practices&#34;,&#34;action&#34;,&#34;topsis&#34;,&#34;english&#34;,&#34;asean&#34;,&#34;corporate&#34;,&#34;conservation&#34;,&#34;outcome&#34;,&#34;urology&#34;,&#34;carbon&#34;,&#34;remote&#34;,&#34;cross-sectional&#34;,&#34;seir&#34;,&#34;entropy&#34;,&#34;opportunity&#34;,&#34;hearing&#34;,&#34;visual&#34;,&#34;spillover&#34;,&#34;type&#34;,&#34;mask&#34;,&#34;airline&#34;,&#34;wastewater&#34;,&#34;covidâ&#34;,&#34;trade&#34;,&#34;science&#34;,&#34;perspective&#34;,&#34;frailty&#34;,&#34;app&#34;,&#34;psychology&#34;,&#34;discourse&#34;,&#34;customer&#34;,&#34;organizational&#34;,&#34;text&#34;,&#34;music&#34;,&#34;belief&#34;,&#34;sequence&#34;,&#34;middle-income&#34;,&#34;bitcoin&#34;,&#34;investment&#34;,&#34;male&#34;,&#34;panel&#34;,&#34;circular&#34;,&#34;cell&#34;,&#34;wave&#34;,&#34;gender&#34;,&#34;plastic&#34;,&#34;cloud&#34;,&#34;severity&#34;,&#34;uncertainty&#34;,&#34;real-time&#34;,&#34;infrastructure&#34;,&#34;sensors&#34;,&#34;electronic&#34;,&#34;content&#34;,&#34;co2&#34;,&#34;collaboration&#34;,&#34;chronic&#34;,&#34;venous&#34;,&#34;immunotherapy&#34;,&#34;fractal-fractional&#34;,&#34;fisheries&#34;,&#34;aerosols&#34;,&#34;trial&#34;,&#34;occupational&#34;,&#34;density&#34;,&#34;income&#34;,&#34;entrepreneurial&#34;,&#34;surveillance&#34;,&#34;institutions&#34;,&#34;natural&#34;,&#34;vulnerability&#34;,&#34;linear&#34;,&#34;structural&#34;,&#34;engineering&#34;,&#34;mindfulness&#34;,&#34;features&#34;,&#34;mitigation&#34;,&#34;reproductive&#34;,&#34;aquaculture&#34;,&#34;biosensor&#34;,&#34;asia-pacific&#34;,&#34;adaptive&#34;,&#34;disinfection&#34;,&#34;eating&#34;,&#34;methods&#34;,&#34;iot&#34;,&#34;favipiravir&#34;,&#34;recurrent&#34;,&#34;singular&#34;,&#34;vulnerable&#34;,&#34;integration&#34;,&#34;interventions&#34;,&#34;k-nearest&#34;,&#34;vision&#34;,&#34;economics&#34;,&#34;pregnancy&#34;,&#34;fatality&#34;,&#34;tracing&#34;,&#34;guidelines&#34;,&#34;models&#34;,&#34;conspiracy&#34;,&#34;nasopharyngeal&#34;,&#34;augmented&#34;,&#34;antibody&#34;,&#34;results&#34;,&#34;mutation&#34;,&#34;health-care&#34;,&#34;shopping&#34;,&#34;lifestyle&#34;,&#34;resistance&#34;,&#34;attitudes&#34;,&#34;availability&#34;,&#34;emissions&#34;,&#34;arima&#34;,&#34;medicines&#34;,&#34;family&#34;,&#34;returns&#34;,&#34;behavioural&#34;,&#34;governance&#34;,&#34;tool&#34;,&#34;language&#34;,&#34;clinic&#34;,&#34;qualitative&#34;,&#34;orientation&#34;,&#34;insecurity&#34;,&#34;emerging&#34;,&#34;zoonotic&#34;,&#34;logistic&#34;,&#34;sars-cov&#34;,&#34;zinc&#34;,&#34;rises&#34;,&#34;cov-2&#34;,&#34;cognitive&#34;,&#34;experiences&#34;,&#34;teachers&#34;,&#34;classroom&#34;,&#34;correlation&#34;,&#34;dynamic&#34;,&#34;ann&#34;,&#34;professionals&#34;,&#34;geographical&#34;,&#34;thromboembolism&#34;,&#34;coronaviruses&#34;,&#34;dentistry&#34;,&#34;endoscopy&#34;,&#34;clustering&#34;,&#34;proteomics&#34;,&#34;sources&#34;,&#34;computing&#34;,&#34;tree&#34;,&#34;planning&#34;,&#34;poverty&#34;,&#34;sales&#34;,&#34;jakarta&#34;,&#34;efficiency&#34;,&#34;operator&#34;,&#34;systemic&#34;,&#34;gut&#34;,&#34;chart&#34;,&#34;water&#34;,&#34;exercise&#34;,&#34;gastrointestinal&#34;,&#34;ecological&#34;,&#34;binary&#34;,&#34;selection&#34;,&#34;form&#34;,&#34;rights&#34;,&#34;wildlife&#34;,&#34;continuity&#34;,&#34;green&#34;,&#34;upper&#34;,&#34;pacific&#34;,&#34;function&#34;,&#34;infertility&#34;,&#34;semen&#34;,&#34;concern&#34;,&#34;collaborative&#34;,&#34;azithromycin&#34;,&#34;buying&#34;,&#34;usage&#34;,&#34;surface&#34;,&#34;product&#34;,&#34;scarcity&#34;,&#34;adoption&#34;,&#34;sequencing&#34;,&#34;accessibility&#34;,&#34;mers-cov&#34;,&#34;biomarker&#34;,&#34;processing&#34;,&#34;aviation&#34;,&#34;rt-pcr&#34;,&#34;polymerase&#34;,&#34;variants&#34;,&#34;adolescents&#34;,&#34;cycle&#34;,&#34;panic&#34;,&#34;module&#34;,&#34;domestic&#34;,&#34;emission&#34;,&#34;pm2.5&#34;,&#34;lightweight&#34;,&#34;non-pharmaceutical&#34;,&#34;dynamics&#34;,&#34;twitter&#34;,&#34;opinion&#34;,&#34;animal&#34;,&#34;cerebral&#34;,&#34;thrombosis&#34;,&#34;tools&#34;,&#34;enterprises&#34;,&#34;sector&#34;,&#34;existence&#34;,&#34;adams-bashforth&#34;,&#34;ab&#34;,&#34;package&#34;,&#34;bias&#34;,&#34;emotional&#34;,&#34;city&#34;,&#34;illness&#34;,&#34;law&#34;,&#34;window&#34;,&#34;decision-making&#34;,&#34;process&#34;,&#34;droplets&#34;,&#34;scan&#34;,&#34;cells&#34;,&#34;inflammation&#34;,&#34;physics&#34;,&#34;biosensors&#34;,&#34;segmentation&#34;,&#34;airway&#34;,&#34;sedentary&#34;,&#34;weight&#34;,&#34;active&#34;,&#34;matrix&#34;,&#34;optimal&#34;,&#34;nurses&#34;,&#34;chains&#34;,&#34;palliative&#34;,&#34;capacity&#34;,&#34;techniques&#34;,&#34;coherence&#34;,&#34;burnout&#34;,&#34;evaluation&#34;,&#34;devices&#34;,&#34;surveys&#34;,&#34;questionnaires&#34;,&#34;psychiatry&#34;,&#34;organization&#34;,&#34;ivermectin&#34;,&#34;robot&#34;,&#34;goals&#34;,&#34;sdg&#34;,&#34;behavioral&#34;,&#34;rational&#34;,&#34;phytochemicals&#34;,&#34;neighbor&#34;,&#34;matter&#34;,&#34;blood&#34;,&#34;nucleocapsid&#34;,&#34;workplace&#34;,&#34;inhibitor&#34;,&#34;consensus&#34;,&#34;particulate&#34;,&#34;limited&#34;,&#34;migrant&#34;,&#34;silver&#34;,&#34;job&#34;,&#34;pay&#34;,&#34;marketing&#34;,&#34;technologies&#34;,&#34;lstm&#34;,&#34;institution&#34;,&#34;chinese&#34;,&#34;facilities&#34;,&#34;saliva&#34;,&#34;dysfunction&#34;,&#34;hypertension&#34;,&#34;thailand&#34;,&#34;coverage&#34;,&#34;immunoassay&#34;,&#34;targeted&#34;,&#34;agents&#34;,&#34;lung&#34;,&#34;genetic&#34;,&#34;yemen&#34;,&#34;parameters&#34;,&#34;integrated&#34;,&#34;addiction&#34;,&#34;opioid&#34;,&#34;substance&#34;,&#34;stressors&#34;,&#34;apps&#34;,&#34;behaviors&#34;,&#34;effectiveness&#34;,&#34;trust&#34;,&#34;sarawak&#34;,&#34;tb&#34;,&#34;stigma&#34;,&#34;taiwan&#34;,&#34;mhealth&#34;,&#34;utaut2&#34;,&#34;emotion&#34;,&#34;spatial&#34;,&#34;pollutants&#34;,&#34;rural&#34;,&#34;singapore&#34;,&#34;exchange&#34;,&#34;utaut&#34;,&#34;principles&#34;,&#34;humanitarian&#34;,&#34;disaster&#34;,&#34;fiscal&#34;,&#34;barriers&#34;,&#34;self-efficacy&#34;,&#34;pattern&#34;,&#34;relationship&#34;,&#34;trials&#34;,&#34;employment&#34;,&#34;inclusion&#34;,&#34;contagion&#34;,&#34;asthma&#34;,&#34;happiness&#34;,&#34;alternative&#34;,&#34;death&#34;,&#34;ppe&#34;,&#34;condition&#34;,&#34;mcdm&#34;,&#34;violence&#34;,&#34;simulations&#34;,&#34;temporal&#34;,&#34;users&#34;,&#34;coronavirus-2&#34;,&#34;shortages&#34;,&#34;india&#34;,&#34;literature&#34;,&#34;logistics&#34;,&#34;d-dimer&#34;,&#34;fintech&#34;,&#34;sabah&#34;,&#34;unemployment&#34;,&#34;issues&#34;,&#34;ground-glass&#34;,&#34;regulatory&#34;,&#34;exposure&#34;,&#34;spread&#34;,&#34;forecast&#34;,&#34;hospitality&#34;,&#34;willingness&#34;,&#34;agency&#34;,&#34;x-rays&#34;,&#34;esl&#34;,&#34;pathogenesis&#34;,&#34;low&#34;,&#34;studentâ&#34;,&#34;ethical&#34;,&#34;consumer&#34;,&#34;injury&#34;,&#34;delay&#34;,&#34;aerosol&#34;,&#34;efficacy&#34;,&#34;habits&#34;,&#34;gamification&#34;,&#34;resource&#34;,&#34;local&#34;,&#34;monetary&#34;,&#34;oxidative&#34;,&#34;scale&#34;,&#34;ncov&#34;,&#34;tract&#34;,&#34;ethics&#34;,&#34;fractal&#34;,&#34;complexity&#34;,&#34;covid-&#34;,&#34;studies&#34;,&#34;mellitus&#34;,&#34;divide&#34;,&#34;wallet&#34;,&#34;software&#34;,&#34;disinfectant&#34;,&#34;mosque&#34;,&#34;post-acute&#34;,&#34;graph&#34;,&#34;health-promoting&#34;,&#34;structures&#34;,&#34;cruise&#34;,&#34;haematology&#34;,&#34;t-cell&#34;,&#34;kidney&#34;,&#34;aedes&#34;,&#34;microbiome&#34;,&#34;aec&#34;,&#34;anatomy&#34;,&#34;framing&#34;,&#34;atrial&#34;,&#34;kit&#34;,&#34;absolute&#34;,&#34;shrinkage&#34;,&#34;lasso&#34;,&#34;ultrasound&#34;,&#34;fracture&#34;,&#34;expectancy&#34;,&#34;peptides&#34;,&#34;website&#34;,&#34;liver&#34;,&#34;corticosteroid&#34;,&#34;motility&#34;,&#34;space&#34;,&#34;referees&#34;,&#34;structure&#34;,&#34;publication&#34;,&#34;ensemble&#34;,&#34;commodities&#34;,&#34;sexual&#34;,&#34;sri&#34;,&#34;lanka&#34;,&#34;behaviours&#34;,&#34;outdoors&#34;,&#34;play&#34;,&#34;cytomegalovirus&#34;,&#34;ministry&#34;,&#34;private&#34;,&#34;fcv-19s&#34;,&#34;infodemiology&#34;,&#34;diploma&#34;,&#34;production&#34;,&#34;pls-sem&#34;,&#34;conventional&#34;,&#34;outpatient&#34;,&#34;entry&#34;,&#34;hajj&#34;,&#34;graphene&#34;,&#34;value-added&#34;,&#34;purchasing&#34;,&#34;cost&#34;,&#34;platform&#34;,&#34;epilepsy&#34;,&#34;turkey&#34;,&#34;japan&#34;,&#34;nanomaterials&#34;,&#34;fossil&#34;,&#34;fuel&#34;,&#34;peptide&#34;,&#34;bioinformatics&#34;,&#34;descriptive&#34;,&#34;child&#34;,&#34;thoracic&#34;,&#34;communications&#34;,&#34;reliability&#34;,&#34;validity&#34;,&#34;antigen&#34;,&#34;foreign&#34;,&#34;main&#34;,&#34;restrictions&#34;,&#34;repair&#34;,&#34;success&#34;,&#34;projects&#34;,&#34;coagulopathy&#34;,&#34;immunomodulatory&#34;,&#34;obstructive&#34;,&#34;immunomodulation&#34;,&#34;procedures&#34;,&#34;department&#34;,&#34;spacer&#34;,&#34;loneliness&#34;,&#34;bayesian&#34;,&#34;size&#34;,&#34;reactions&#34;,&#34;crowding&#34;,&#34;culture&#34;,&#34;disinfectants&#34;,&#34;effects&#34;,&#34;entrepreneurs&#34;,&#34;coastal&#34;,&#34;therapeutic&#34;,&#34;biomedical&#34;,&#34;balance&#34;,&#34;cross-cultural&#34;,&#34;empathy&#34;,&#34;individualism&#34;,&#34;power&#34;,&#34;multilevel&#34;,&#34;equation&#34;,&#34;paediatric&#34;,&#34;responses&#34;,&#34;equilibrium&#34;,&#34;adaptation&#34;,&#34;relations&#34;,&#34;electrochemical&#34;,&#34;region&#34;,&#34;seafood&#34;,&#34;prices&#34;,&#34;handwashing&#34;,&#34;planned&#34;,&#34;mixed&#34;,&#34;taste&#34;,&#34;road&#34;,&#34;transport&#34;,&#34;technical&#34;,&#34;video&#34;,&#34;geofencing&#34;,&#34;location&#34;,&#34;tracking&#34;,&#34;andrology&#34;,&#34;adult&#34;,&#34;capitalists&#34;,&#34;handling&#34;,&#34;mutual&#34;,&#34;assistance&#34;,&#34;advantages&#34;,&#34;eigentriples&#34;,&#34;length&#34;,&#34;fourth&#34;,&#34;adolescent&#34;,&#34;long-covid&#34;,&#34;phenoconversion&#34;,&#34;correlations&#34;,&#34;non-rational&#34;,&#34;self-isolation&#34;,&#34;synthetic&#34;,&#34;affect&#34;,&#34;bibliometric&#34;,&#34;optical&#34;,&#34;zoonosis&#34;,&#34;blended&#34;,&#34;cluster&#34;,&#34;nsp15&#34;,&#34;phase&#34;,&#34;prognostic&#34;,&#34;indicators&#34;,&#34;post&#34;,&#34;dentists&#34;,&#34;goal&#34;,&#34;promotion&#34;,&#34;set&#34;,&#34;guidance&#34;,&#34;nanomedicine&#34;,&#34;reinfection&#34;,&#34;sirs&#34;,&#34;qtc&#34;,&#34;prolongation&#34;,&#34;commitment&#34;,&#34;trends&#34;,&#34;recognition&#34;,&#34;measurement&#34;,&#34;firm&#34;,&#34;intelligent&#34;,&#34;marine&#34;,&#34;agricultural&#34;,&#34;regulation&#34;,&#34;anti-covid-19&#34;,&#34;traditional&#34;,&#34;e-government&#34;,&#34;reasoned&#34;,&#34;theories&#34;,&#34;b40&#34;,&#34;household&#34;,&#34;disposal&#34;,&#34;metabolic&#34;,&#34;swab&#34;,&#34;medicinal&#34;,&#34;plants&#34;,&#34;converting&#34;,&#34;turnover&#34;,&#34;lmics&#34;,&#34;quantitative&#34;,&#34;binding&#34;,&#34;comparative&#34;,&#34;flavonoid&#34;,&#34;electrocardiogram&#34;,&#34;prolonged&#34;,&#34;susceptibility&#34;,&#34;dining&#34;,&#34;experiencescape&#34;,&#34;female&#34;,&#34;travelers&#34;,&#34;compliance&#34;,&#34;wuhan&#34;,&#34;mpro&#34;,&#34;layer&#34;,&#34;myanmar&#34;,&#34;prioritisation&#34;,&#34;serological&#34;,&#34;lupus&#34;,&#34;erythematosus&#34;,&#34;harm&#34;,&#34;reduction&#34;,&#34;agonist&#34;,&#34;disorder&#34;,&#34;borneo&#34;,&#34;sociodemographic&#34;,&#34;wellness&#34;,&#34;migration&#34;,&#34;database&#34;,&#34;arm&#34;,&#34;arduino&#34;,&#34;nano&#34;,&#34;confinement&#34;,&#34;kap&#34;,&#34;asymptomatic&#34;,&#34;mann-kendall&#34;,&#34;rf&#34;,&#34;ssa&#34;,&#34;anthropogenic&#34;,&#34;aquatic&#34;,&#34;tuberculosis&#34;,&#34;peritraumatic&#34;,&#34;operation&#34;,&#34;blockchain&#34;,&#34;integrity&#34;,&#34;particle&#34;,&#34;swarm&#34;,&#34;domain&#34;,&#34;oropharyngeal&#34;,&#34;smell&#34;,&#34;capital&#34;,&#34;sensorineural&#34;,&#34;s-o-r&#34;,&#34;renal&#34;,&#34;failure&#34;,&#34;transfusion&#34;],&#34;freq&#34;:[225,191,189,179,175,166,119,114,114,103,94,83,77,74,72,70,68,66,65,65,63,63,61,58,58,57,57,56,56,56,55,55,54,54,53,52,50,49,49,49,49,49,48,47,47,46,46,45,45,45,45,45,44,44,44,43,43,43,42,42,41,41,41,41,38,38,37,37,37,37,37,36,36,36,35,35,35,34,34,34,34,34,33,33,33,33,33,33,33,32,32,32,32,32,32,32,32,31,31,31,31,31,31,31,31,31,30,30,30,29,29,29,29,29,29,29,28,28,27,27,27,27,27,27,27,26,26,26,26,26,26,26,26,26,26,25,25,25,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,23,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,21,21,21,21,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,16,16,16,16,16,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,14,14,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,12,12,12,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6],&#34;fontFamily&#34;:&#34;Segoe UI&#34;,&#34;fontWeight&#34;:&#34;bold&#34;,&#34;color&#34;:&#34;random-dark&#34;,&#34;minSize&#34;:0,&#34;weightFactor&#34;:0.8,&#34;backgroundColor&#34;:&#34;white&#34;,&#34;gridSize&#34;:0,&#34;minRotation&#34;:-0.785398163397448,&#34;maxRotation&#34;:0.785398163397448,&#34;shuffle&#34;:true,&#34;rotateRatio&#34;:0.4,&#34;shape&#34;:&#34;circle&#34;,&#34;ellipticity&#34;:0.65,&#34;figBase64&#34;:null,&#34;hover&#34;:null},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: Top 1000 terms extracted from the author’s keywords
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wordcloud2(covid_wc$index_keywords%&amp;gt;% 
             slice(1:1000) %&amp;gt;% 
             mutate(frequency = round(frequency)))&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:unnamed-chunk-12&#34;&gt;&lt;/span&gt;
&lt;div id=&#34;htmlwidget-4&#34; style=&#34;width:672px;height:480px;&#34; class=&#34;wordcloud2 html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-4&#34;&gt;{&#34;x&#34;:{&#34;word&#34;:[&#34;health&#34;,&#34;disease&#34;,&#34;coronavirus&#34;,&#34;care&#34;,&#34;virus&#34;,&#34;adult&#34;,&#34;drug&#34;,&#34;study&#34;,&#34;aged&#34;,&#34;pneumonia&#34;,&#34;infection&#34;,&#34;male&#34;,&#34;female&#34;,&#34;malaysia&#34;,&#34;betacoronavirus&#34;,&#34;covid-19&#34;,&#34;risk&#34;,&#34;pandemic&#34;,&#34;control&#34;,&#34;human&#34;,&#34;syndrome&#34;,&#34;respiratory&#34;,&#34;clinical&#34;,&#34;humans&#34;,&#34;analysis&#34;,&#34;social&#34;,&#34;middle&#34;,&#34;article&#34;,&#34;protein&#34;,&#34;sars-cov-2&#34;,&#34;learning&#34;,&#34;acute&#34;,&#34;pandemics&#34;,&#34;personnel&#34;,&#34;viral&#34;,&#34;cross-sectional&#34;,&#34;mental&#34;,&#34;transmission&#34;,&#34;agent&#34;,&#34;patient&#34;,&#34;severe&#34;,&#34;management&#34;,&#34;hospital&#34;,&#34;factor&#34;,&#34;system&#34;,&#34;infections&#34;,&#34;waste&#34;,&#34;blood&#34;,&#34;medical&#34;,&#34;anxiety&#34;,&#34;education&#34;,&#34;therapy&#34;,&#34;reaction&#34;,&#34;public&#34;,&#34;priority&#34;,&#34;journal&#34;,&#34;chain&#34;,&#34;mortality&#34;,&#34;review&#34;,&#34;stress&#34;,&#34;questionnaire&#34;,&#34;assessment&#34;,&#34;polymerase&#34;,&#34;prevention&#34;,&#34;studies&#34;,&#34;angiotensin&#34;,&#34;child&#34;,&#34;epidemiology&#34;,&#34;air&#34;,&#34;epidemic&#34;,&#34;time&#34;,&#34;adolescent&#34;,&#34;controlled&#34;,&#34;practice&#34;,&#34;enzyme&#34;,&#34;acid&#34;,&#34;procedures&#34;,&#34;cell&#34;,&#34;diagnosis&#34;,&#34;interleukin&#34;,&#34;tomography&#34;,&#34;quality&#34;,&#34;receptor&#34;,&#34;organization&#34;,&#34;severity&#34;,&#34;quarantine&#34;,&#34;lung&#34;,&#34;letter&#34;,&#34;china&#34;,&#34;depression&#34;,&#34;nonhuman&#34;,&#34;outcome&#34;,&#34;vaccine&#34;,&#34;surveys&#34;,&#34;asia&#34;,&#34;global&#34;,&#34;behavior&#34;,&#34;data&#34;,&#34;pollution&#34;,&#34;factors&#34;,&#34;research&#34;,&#34;environmental&#34;,&#34;surgery&#34;,&#34;rate&#34;,&#34;topic&#34;,&#34;binding&#34;,&#34;virology&#34;,&#34;isolation&#34;,&#34;computer&#34;,&#34;communicable&#34;,&#34;attitude&#34;,&#34;major&#34;,&#34;diagnostic&#34;,&#34;information&#34;,&#34;psychology&#34;,&#34;disorder&#34;,&#34;hydroxychloroquine&#34;,&#34;decision&#34;,&#34;reverse&#34;,&#34;treatment&#34;,&#34;occupational&#34;,&#34;monitoring&#34;,&#34;psychological&#34;,&#34;scale&#34;,&#34;activity&#34;,&#34;economic&#34;,&#34;systems&#34;,&#34;safety&#34;,&#34;screening&#34;,&#34;equipment&#34;,&#34;protective&#34;,&#34;transcription&#34;,&#34;service&#34;,&#34;detection&#34;,&#34;complication&#34;,&#34;prevalence&#34;,&#34;media&#34;,&#34;united&#34;,&#34;cancer&#34;,&#34;policy&#34;,&#34;inhibitor&#34;,&#34;immunity&#34;,&#34;antiviral&#34;,&#34;food&#34;,&#34;pakistan&#34;,&#34;trial&#34;,&#34;questionnaires&#34;,&#34;diabetes&#34;,&#34;emergency&#34;,&#34;molecular&#34;,&#34;rna&#34;,&#34;testing&#34;,&#34;mellitus&#34;,&#34;computed&#34;,&#34;student&#34;,&#34;immune&#34;,&#34;population&#34;,&#34;comorbidity&#34;,&#34;cytokine&#34;,&#34;artificial&#34;,&#34;status&#34;,&#34;model&#34;,&#34;agents&#34;,&#34;neural&#34;,&#34;machine&#34;,&#34;fever&#34;,&#34;antibody&#34;,&#34;laboratory&#34;,&#34;sensitivity&#34;,&#34;viruses&#34;,&#34;vaccination&#34;,&#34;age&#34;,&#34;perception&#34;,&#34;distress&#34;,&#34;brain&#34;,&#34;x-ray&#34;,&#34;image&#34;,&#34;intensive&#34;,&#34;retrospective&#34;,&#34;test&#34;,&#34;hypertension&#34;,&#34;converting&#34;,&#34;diseases&#34;,&#34;government&#34;,&#34;world&#34;,&#34;spike&#34;,&#34;immunoglobulin&#34;,&#34;chronic&#34;,&#34;delivery&#34;,&#34;failure&#34;,&#34;techniques&#34;,&#34;mass&#34;,&#34;cost&#34;,&#34;survey&#34;,&#34;development&#34;,&#34;networks&#34;,&#34;hospitalization&#34;,&#34;guideline&#34;,&#34;ventilation&#34;,&#34;income&#34;,&#34;distancing&#34;,&#34;kidney&#34;,&#34;support&#34;,&#34;response&#34;,&#34;liver&#34;,&#34;internet&#34;,&#34;effect&#34;,&#34;knowledge&#34;,&#34;impact&#34;,&#34;lopinavir&#34;,&#34;tract&#34;,&#34;real&#34;,&#34;interferon&#34;,&#34;ritonavir&#34;,&#34;azithromycin&#34;,&#34;radiography&#34;,&#34;personal&#34;,&#34;online&#34;,&#34;life&#34;,&#34;dipeptidyl&#34;,&#34;antivirus&#34;,&#34;students&#34;,&#34;asymptomatic&#34;,&#34;genetic&#34;,&#34;east&#34;,&#34;exposure&#34;,&#34;body&#34;,&#34;thrombosis&#34;,&#34;lymphocyte&#34;,&#34;hand&#34;,&#34;telemedicine&#34;,&#34;deep&#34;,&#34;specificity&#34;,&#34;purification&#34;,&#34;international&#34;,&#34;teaching&#34;,&#34;illness&#34;,&#34;elderly&#34;,&#34;level&#34;,&#34;sars&#34;,&#34;coughing&#34;,&#34;imaging&#34;,&#34;examination&#34;,&#34;sex&#34;,&#34;assisted&#34;,&#34;remdesivir&#34;,&#34;systematic&#34;,&#34;planning&#34;,&#34;bangladesh&#34;,&#34;university&#34;,&#34;sleep&#34;,&#34;cardiovascular&#34;,&#34;europe&#34;,&#34;contact&#34;,&#34;medicine&#34;,&#34;infectious&#34;,&#34;heart&#34;,&#34;gene&#34;,&#34;glycoprotein&#34;,&#34;metabolism&#34;,&#34;communication&#34;,&#34;models&#34;,&#34;tumor&#34;,&#34;africa&#34;,&#34;chloroquine&#34;,&#34;carboxypeptidase&#34;,&#34;thorax&#34;,&#34;indonesia&#34;,&#34;country&#34;,&#34;mutation&#34;,&#34;physical&#34;,&#34;immunodeficiency&#34;,&#34;unit&#34;,&#34;elective&#34;,&#34;feature&#34;,&#34;animal&#34;,&#34;influenza&#34;,&#34;insulin&#34;,&#34;energy&#34;,&#34;oxygen&#34;,&#34;industry&#34;,&#34;distance&#34;,&#34;incidence&#34;,&#34;injury&#34;,&#34;sequence&#34;,&#34;meta&#34;,&#34;extract&#34;,&#34;association&#34;,&#34;community&#34;,&#34;surgical&#34;,&#34;structure&#34;,&#34;vaccines&#34;,&#34;classification&#34;,&#34;forecasting&#34;,&#34;services&#34;,&#34;thromboembolism&#34;,&#34;report&#34;,&#34;corticosteroid&#34;,&#34;design&#34;,&#34;comparative&#34;,&#34;genetics&#34;,&#34;economics&#34;,&#34;death&#34;,&#34;antagonist&#34;,&#34;gastrointestinal&#34;,&#34;distribution&#34;,&#34;low&#34;,&#34;particulate&#34;,&#34;matter&#34;,&#34;index&#34;,&#34;interaction&#34;,&#34;unclassified&#34;,&#34;accuracy&#34;,&#34;venous&#34;,&#34;cohort&#34;,&#34;simulation&#34;,&#34;pressure&#34;,&#34;symptom&#34;,&#34;biological&#34;,&#34;technology&#34;,&#34;aerosol&#34;,&#34;note&#34;,&#34;physiology&#34;,&#34;coronaviruses&#34;,&#34;movement&#34;,&#34;network&#34;,&#34;statistical&#34;,&#34;heparin&#34;,&#34;hepatitis&#34;,&#34;environment&#34;,&#34;derivative&#34;,&#34;pregnancy&#34;,&#34;asthma&#34;,&#34;dynamics&#34;,&#34;physician&#34;,&#34;efficacy&#34;,&#34;lockdown&#34;,&#34;pathogenicity&#34;,&#34;weight&#34;,&#34;randomized&#34;,&#34;training&#34;,&#34;necrosis&#34;,&#34;home&#34;,&#34;method&#34;,&#34;dyspnea&#34;,&#34;immunology&#34;,&#34;genome&#34;,&#34;educational&#34;,&#34;sustainable&#34;,&#34;newborn&#34;,&#34;animals&#34;,&#34;plasma&#34;,&#34;release&#34;,&#34;antigen&#34;,&#34;change&#34;,&#34;facility&#34;,&#34;surveillance&#34;,&#34;obesity&#34;,&#34;mobile&#34;,&#34;intelligence&#34;,&#34;aspect&#34;,&#34;water&#34;,&#34;construction&#34;,&#34;function&#34;,&#34;access&#34;,&#34;singapore&#34;,&#34;industrial&#34;,&#34;adverse&#34;,&#34;replication&#34;,&#34;vitamin&#34;,&#34;algorithm&#34;,&#34;fear&#34;,&#34;cerebrovascular&#34;,&#34;plastic&#34;,&#34;washing&#34;,&#34;follow&#34;,&#34;swab&#34;,&#34;approach&#34;,&#34;expression&#34;,&#34;throat&#34;,&#34;multiple&#34;,&#34;disorders&#34;,&#34;process&#34;,&#34;methods&#34;,&#34;endoscopy&#34;,&#34;immunization&#34;,&#34;antibiotic&#34;,&#34;load&#34;,&#34;infant&#34;,&#34;nasopharynx&#34;,&#34;developing&#34;,&#34;count&#34;,&#34;satisfaction&#34;,&#34;kingdom&#34;,&#34;outbreaks&#34;,&#34;pathophysiology&#34;,&#34;coping&#34;,&#34;fatigue&#34;,&#34;nucleic&#34;,&#34;disinfection&#34;,&#34;pain&#34;,&#34;performance&#34;,&#34;prediction&#34;,&#34;compliance&#34;,&#34;awareness&#34;,&#34;angiotensin-converting&#34;,&#34;inhibitors&#34;,&#34;qualitative&#34;,&#34;critical&#34;,&#34;professional&#34;,&#34;tocilizumab&#34;,&#34;difference&#34;,&#34;reduction&#34;,&#34;multicenter&#34;,&#34;anticoagulant&#34;,&#34;prognosis&#34;,&#34;geographic&#34;,&#34;engineering&#34;,&#34;north&#34;,&#34;diarrhea&#34;,&#34;algorithms&#34;,&#34;amino&#34;,&#34;innate&#34;,&#34;production&#34;,&#34;coronary&#34;,&#34;pharmacy&#34;,&#34;dioxide&#34;,&#34;dependent&#34;,&#34;phylogeny&#34;,&#34;e-learning&#34;,&#34;malaysian&#34;,&#34;reactive&#34;,&#34;socioeconomics&#34;,&#34;favipiravir&#34;,&#34;school&#34;,&#34;computing&#34;,&#34;activities&#34;,&#34;nucleocapsid&#34;,&#34;prospective&#34;,&#34;repositioning&#34;,&#34;cooperation&#34;,&#34;dna&#34;,&#34;preschool&#34;,&#34;processing&#34;,&#34;nasopharyngeal&#34;,&#34;amplification&#34;,&#34;consensus&#34;,&#34;nitrogen&#34;,&#34;variation&#34;,&#34;entry&#34;,&#34;south&#34;,&#34;job&#34;,&#34;carbon&#34;,&#34;transfusion&#34;,&#34;digital&#34;,&#34;adaptive&#34;,&#34;mathematical&#34;,&#34;nose&#34;,&#34;loss&#34;,&#34;procedure&#34;,&#34;effectiveness&#34;,&#34;culture&#34;,&#34;supply&#34;,&#34;proteins&#34;,&#34;travel&#34;,&#34;participation&#34;,&#34;tuberculosis&#34;,&#34;america&#34;,&#34;alanine&#34;,&#34;attitudes&#34;,&#34;chemistry&#34;,&#34;demography&#34;,&#34;fatality&#34;,&#34;dexamethasone&#34;,&#34;antiinflammatory&#34;,&#34;convolutional&#34;,&#34;hospitals&#34;,&#34;spread&#34;,&#34;technique&#34;,&#34;workforce&#34;,&#34;workplace&#34;,&#34;security&#34;,&#34;administration&#34;,&#34;clustering&#34;,&#34;aminotransferase&#34;,&#34;phase&#34;,&#34;resilience&#34;,&#34;tertiary&#34;,&#34;cross&#34;,&#34;based&#34;,&#34;quantitative&#34;,&#34;storm&#34;,&#34;chinese&#34;,&#34;measurement&#34;,&#34;handling&#34;,&#34;obstructive&#34;,&#34;reproduction&#34;,&#34;spatial&#34;,&#34;southeast&#34;,&#34;sector&#34;,&#34;correlation&#34;,&#34;local&#34;,&#34;pathology&#34;,&#34;financial&#34;,&#34;neoplasm&#34;,&#34;inflammation&#34;,&#34;nucleotide&#34;,&#34;italy&#34;,&#34;enhancement&#34;,&#34;sequencing&#34;,&#34;hemorrhage&#34;,&#34;intelligent&#34;,&#34;headache&#34;,&#34;pyrolysis&#34;,&#34;interview&#34;,&#34;protease&#34;,&#34;plant&#34;,&#34;survival&#34;,&#34;clotting&#34;,&#34;regression&#34;,&#34;anesthesia&#34;,&#34;preoperative&#34;,&#34;sustainability&#34;,&#34;poverty&#34;,&#34;temperature&#34;,&#34;artery&#34;,&#34;nurse&#34;,&#34;burden&#34;,&#34;hearing&#34;,&#34;literature&#34;,&#34;software&#34;,&#34;socioeconomic&#34;,&#34;applications&#34;,&#34;deficiency&#34;,&#34;ward&#34;,&#34;household&#34;,&#34;signal&#34;,&#34;glucose&#34;,&#34;zinc&#34;,&#34;neurosurgery&#34;,&#34;admission&#34;,&#34;patterns&#34;,&#34;event&#34;,&#34;exacerbation&#34;,&#34;morbidity&#34;,&#34;application&#34;,&#34;kinase&#34;,&#34;predictive&#34;,&#34;docking&#34;,&#34;site&#34;,&#34;biology&#34;,&#34;complement&#34;,&#34;proteinase&#34;,&#34;taiwan&#34;,&#34;cluster&#34;,&#34;employment&#34;,&#34;well-being&#34;,&#34;devices&#34;,&#34;gamma&#34;,&#34;hygiene&#34;,&#34;evaluation&#34;,&#34;type&#34;,&#34;worker&#34;,&#34;neoplasms&#34;,&#34;methodology&#34;,&#34;mechanism&#34;,&#34;assay&#34;,&#34;nursing&#34;,&#34;oil&#34;,&#34;utilization&#34;,&#34;scoring&#34;,&#34;effects&#34;,&#34;ribavirin&#34;,&#34;dimer&#34;,&#34;contamination&#34;,&#34;tourism&#34;,&#34;theory&#34;,&#34;epidemiological&#34;,&#34;transport&#34;,&#34;basic&#34;,&#34;disposal&#34;,&#34;specimen&#34;,&#34;stroke&#34;,&#34;chest&#34;,&#34;renin&#34;,&#34;family&#34;,&#34;vomiting&#34;,&#34;rating&#34;,&#34;host&#34;,&#34;tissue&#34;,&#34;length&#34;,&#34;stay&#34;,&#34;india&#34;,&#34;staff&#34;,&#34;videoconferencing&#34;,&#34;infarction&#34;,&#34;infertility&#34;,&#34;metformin&#34;,&#34;bilirubin&#34;,&#34;myalgia&#34;,&#34;resource&#34;,&#34;conditions&#34;,&#34;wellbeing&#34;,&#34;editorial&#34;,&#34;density&#34;,&#34;qt&#34;,&#34;resistance&#34;,&#34;longitudinal&#34;,&#34;smoking&#34;,&#34;sexual&#34;,&#34;sinus&#34;,&#34;turkey&#34;,&#34;malaria&#34;,&#34;fibrinolytic&#34;,&#34;thailand&#34;,&#34;antibodies&#34;,&#34;dose&#34;,&#34;intervention&#34;,&#34;organ&#34;,&#34;frailty&#34;,&#34;fuzzy&#34;,&#34;vector&#34;,&#34;computerized&#34;,&#34;aldosterone&#34;,&#34;extraction&#34;,&#34;shedding&#34;,&#34;coronavirinae&#34;,&#34;radiation&#34;,&#34;intake&#34;,&#34;validity&#34;,&#34;intubation&#34;,&#34;climate&#34;,&#34;experiment&#34;,&#34;hyperglycemia&#34;,&#34;workload&#34;,&#34;crowding&#34;,&#34;mixed&#34;,&#34;reductase&#34;,&#34;islam&#34;,&#34;exercise&#34;,&#34;oropharynx&#34;,&#34;rural&#34;,&#34;postoperative&#34;,&#34;accident&#34;,&#34;spatiotemporal&#34;,&#34;primary&#34;,&#34;single&#34;,&#34;observational&#34;,&#34;nausea&#34;,&#34;nonstructural&#34;,&#34;domestic&#34;,&#34;pacific&#34;,&#34;monoclonal&#34;,&#34;japan&#34;,&#34;passive&#34;,&#34;republic&#34;,&#34;adenosine&#34;,&#34;hemorrhagic&#34;,&#34;immunosuppressive&#34;,&#34;ambulatory&#34;,&#34;healthcare&#34;,&#34;acceptance&#34;,&#34;short&#34;,&#34;emotional&#34;,&#34;gender&#34;,&#34;interactions&#34;,&#34;immunomodulation&#34;,&#34;trend&#34;,&#34;macrophage&#34;,&#34;modeling&#34;,&#34;center&#34;,&#34;asian&#34;,&#34;smear&#34;,&#34;critically&#34;,&#34;ill&#34;,&#34;strategy&#34;,&#34;standard&#34;,&#34;outpatient&#34;,&#34;real-time&#34;,&#34;anosmia&#34;,&#34;rhinorrhea&#34;,&#34;physicians&#34;,&#34;guidelines&#34;,&#34;tests&#34;,&#34;behavioral&#34;,&#34;cerebral&#34;,&#34;remote&#34;,&#34;functional&#34;,&#34;brazil&#34;,&#34;herd&#34;,&#34;embolism&#34;,&#34;iran&#34;,&#34;mining&#34;,&#34;sperm&#34;,&#34;marketing&#34;,&#34;epilepsy&#34;,&#34;immunoassay&#34;,&#34;biomarkers&#34;,&#34;promotion&#34;,&#34;emotion&#34;,&#34;period&#34;,&#34;ratio&#34;,&#34;testis&#34;,&#34;fractional&#34;,&#34;spain&#34;,&#34;antihypertensive&#34;,&#34;bioinformatics&#34;,&#34;carrier&#34;,&#34;1beta&#34;,&#34;physiological&#34;,&#34;immunotherapy&#34;,&#34;vein&#34;,&#34;west&#34;,&#34;protection&#34;,&#34;score&#34;,&#34;methotrexate&#34;,&#34;predisposition&#34;,&#34;atmospheric&#34;,&#34;consumption&#34;,&#34;secondary&#34;,&#34;evidence&#34;,&#34;serine&#34;,&#34;muscle&#34;,&#34;sore&#34;,&#34;oseltamivir&#34;,&#34;alpha&#34;,&#34;beta&#34;,&#34;lactate&#34;,&#34;dehydrogenase&#34;,&#34;arterial&#34;,&#34;spectrometry&#34;,&#34;glycemic&#34;,&#34;burnout&#34;,&#34;operating&#34;,&#34;adrenal&#34;,&#34;hemoglobin&#34;,&#34;dissemination&#34;,&#34;spectroscopy&#34;,&#34;taste&#34;,&#34;aortic&#34;,&#34;selection&#34;,&#34;nervous&#34;,&#34;dizziness&#34;,&#34;differential&#34;,&#34;science&#34;,&#34;western&#34;,&#34;concept&#34;,&#34;intention&#34;,&#34;database&#34;,&#34;activation&#34;,&#34;structural&#34;,&#34;reproducibility&#34;,&#34;ncov&#34;,&#34;laryngoscopy&#34;,&#34;nanoparticle&#34;,&#34;bayes&#34;,&#34;theorem&#34;,&#34;rheumatic&#34;,&#34;nanomedicine&#34;,&#34;competence&#34;,&#34;frail&#34;,&#34;violence&#34;,&#34;ethnic&#34;,&#34;concentration&#34;,&#34;preventive&#34;,&#34;referral&#34;,&#34;fatty&#34;,&#34;lavage&#34;,&#34;transfer&#34;,&#34;lifestyle&#34;,&#34;philippines&#34;,&#34;sneezing&#34;,&#34;antimalarial&#34;,&#34;pollutant&#34;,&#34;motivation&#34;,&#34;urban&#34;,&#34;wuhan&#34;,&#34;pharmacist&#34;,&#34;fluid&#34;,&#34;bulgaria&#34;,&#34;deafness&#34;,&#34;valve&#34;,&#34;publication&#34;,&#34;universities&#34;,&#34;infectivity&#34;,&#34;linked&#34;,&#34;immunosorbent&#34;,&#34;search&#34;,&#34;measures&#34;,&#34;adaptation&#34;,&#34;structured&#34;,&#34;trials&#34;,&#34;size&#34;,&#34;product&#34;,&#34;experience&#34;,&#34;inventory&#34;,&#34;marker&#34;,&#34;methylprednisolone&#34;,&#34;religion&#34;,&#34;azathioprine&#34;,&#34;regulation&#34;,&#34;inflammatory&#34;,&#34;virtual&#34;,&#34;consultation&#34;,&#34;medium&#34;,&#34;patient-to-professional&#34;,&#34;medication&#34;,&#34;recycling&#34;,&#34;square&#34;,&#34;palliative&#34;,&#34;maternal&#34;,&#34;essential&#34;,&#34;ischemia&#34;,&#34;lupus&#34;,&#34;combination&#34;,&#34;esophagus&#34;,&#34;workflow&#34;,&#34;urology&#34;,&#34;oral&#34;,&#34;language&#34;,&#34;line&#34;,&#34;cycle&#34;,&#34;emission&#34;,&#34;disaster&#34;,&#34;department&#34;,&#34;series&#34;,&#34;history&#34;,&#34;natural&#34;,&#34;numerical&#34;,&#34;gas&#34;,&#34;particle&#34;,&#34;australia&#34;,&#34;transduction&#34;,&#34;feeding&#34;,&#34;oxidative&#34;,&#34;immunocompromised&#34;,&#34;falciparum&#34;,&#34;electrochemical&#34;,&#34;degradation&#34;,&#34;incineration&#34;,&#34;morbid&#34;,&#34;vertigo&#34;,&#34;abuse&#34;,&#34;endoscopic&#34;,&#34;solid&#34;,&#34;hiv&#34;,&#34;chains&#34;,&#34;vulnerable&#34;,&#34;error&#34;,&#34;uncertainty&#34;,&#34;viet&#34;,&#34;nam&#34;,&#34;sulfur&#34;,&#34;convolution&#34;,&#34;recombinant&#34;,&#34;parameters&#34;,&#34;histogram&#34;,&#34;organizational&#34;,&#34;program&#34;,&#34;saudi&#34;,&#34;arabia&#34;,&#34;commerce&#34;,&#34;sampling&#34;,&#34;occupation&#34;,&#34;glucocorticoid&#34;,&#34;risks&#34;,&#34;iot&#34;,&#34;teleconsultation&#34;,&#34;serology&#34;,&#34;tools&#34;,&#34;sites&#34;,&#34;growth&#34;,&#34;capacity&#34;,&#34;peptide&#34;,&#34;bronchoscopy&#34;,&#34;results&#34;,&#34;endotracheal&#34;,&#34;availability&#34;,&#34;shortage&#34;,&#34;mask&#34;,&#34;countries&#34;,&#34;electronic&#34;,&#34;placebo&#34;,&#34;pathogenesis&#34;,&#34;renin-angiotensin&#34;,&#34;platforms&#34;,&#34;affinity&#34;,&#34;newcastle-ottawa&#34;,&#34;crisis&#34;,&#34;chemical&#34;,&#34;validation&#34;,&#34;stigma&#34;,&#34;bleeding&#34;,&#34;conceptual&#34;,&#34;pathogen&#34;,&#34;computational&#34;,&#34;shop&#34;,&#34;peptidyl-dipeptidase&#34;,&#34;malignant&#34;,&#34;anticoagulants&#34;,&#34;pharmaceutical&#34;,&#34;sanitizer&#34;,&#34;municipal&#34;,&#34;radiology&#34;,&#34;architecture&#34;,&#34;nepal&#34;,&#34;erythematosus&#34;,&#34;intestine&#34;,&#34;toxicity&#34;,&#34;microbiology&#34;,&#34;tenofovir&#34;,&#34;linear&#34;,&#34;ireland&#34;,&#34;genital&#34;,&#34;yemen&#34;,&#34;immunologic&#34;,&#34;nigeria&#34;,&#34;folic&#34;,&#34;private&#34;,&#34;biosensing&#34;,&#34;korea&#34;,&#34;iv&#34;,&#34;dental&#34;,&#34;obstruction&#34;,&#34;monoxide&#34;,&#34;politics&#34;,&#34;humoral&#34;,&#34;systemic&#34;,&#34;dietary&#34;,&#34;germany&#34;,&#34;toll&#34;,&#34;business&#34;,&#34;cardiac&#34;,&#34;sample&#34;,&#34;dengue&#34;,&#34;transplantation&#34;,&#34;creatinine&#34;,&#34;entropy&#34;,&#34;ii&#34;,&#34;impedance&#34;,&#34;mapping&#34;,&#34;power&#34;,&#34;fracture&#34;,&#34;people&#34;,&#34;thoracic&#34;,&#34;partial&#34;,&#34;sensing&#34;,&#34;framework&#34;,&#34;cognitive&#34;,&#34;insurance&#34;,&#34;person&#34;,&#34;psychosocial&#34;,&#34;logistic&#34;,&#34;cultural&#34;,&#34;inhibition&#34;,&#34;nearest&#34;,&#34;myanmar&#34;,&#34;membrane&#34;,&#34;strategic&#34;,&#34;condition&#34;,&#34;content&#34;,&#34;traffic&#34;,&#34;virulence&#34;,&#34;diet&#34;,&#34;cd4&#34;,&#34;tachycardia&#34;,&#34;ferritin&#34;,&#34;respiration&#34;,&#34;comparison&#34;,&#34;storage&#34;,&#34;oxygenation&#34;,&#34;cd8&#34;,&#34;networking&#34;,&#34;market&#34;,&#34;automatic&#34;,&#34;current&#34;,&#34;coefficient&#34;,&#34;ethnicity&#34;,&#34;transmissions&#34;,&#34;mitigation&#34;,&#34;interpersonal&#34;,&#34;misinformation&#34;,&#34;oxides&#34;,&#34;emissions&#34;,&#34;pulmonary&#34;,&#34;baricitinib&#34;,&#34;convalescent&#34;,&#34;erythrocyte&#34;,&#34;metabolic&#34;,&#34;role&#34;,&#34;protocol&#34;,&#34;sectors&#34;,&#34;antiinfective&#34;,&#34;umifenovir&#34;,&#34;resources&#34;,&#34;monocyte&#34;,&#34;phenomena&#34;,&#34;southeastern&#34;,&#34;society&#34;,&#34;shock&#34;],&#34;freq&#34;:[674,530,524,418,377,371,363,362,359,349,335,318,311,310,309,305,295,290,290,287,273,264,262,260,255,250,246,242,237,236,228,223,220,219,218,217,209,201,197,197,194,189,182,181,177,176,174,170,165,164,162,160,159,159,156,156,156,155,155,155,154,153,153,152,147,146,145,144,141,139,137,137,136,136,135,134,133,133,131,130,129,127,127,126,125,124,123,123,123,123,121,120,120,118,115,113,113,112,111,110,109,109,109,108,108,108,108,106,106,106,105,105,105,105,105,104,102,102,102,102,101,100,100,100,100,100,99,98,98,97,97,97,97,96,96,95,95,95,94,94,94,92,92,92,92,91,91,91,90,89,89,88,88,88,88,88,88,87,86,86,86,86,86,86,85,85,85,84,84,82,82,82,82,81,81,80,80,80,80,79,79,78,78,78,77,77,77,77,77,76,76,76,76,76,75,75,74,73,73,72,71,71,71,70,70,70,70,69,68,68,68,68,67,67,67,67,67,66,66,65,65,65,65,65,65,65,65,65,64,64,64,64,64,63,63,63,63,62,62,62,62,62,62,62,62,62,61,61,61,61,61,61,61,61,61,61,61,61,60,60,59,59,59,59,58,57,57,57,57,57,56,56,56,56,56,56,56,55,55,54,54,54,54,54,54,54,54,53,53,53,53,53,52,52,52,52,52,52,51,51,51,51,51,50,50,50,50,50,50,50,50,49,49,49,49,49,49,49,49,49,48,48,48,48,48,48,48,48,48,48,47,47,47,47,47,47,47,47,46,46,46,45,45,45,45,45,45,45,45,45,45,45,45,45,45,44,44,44,44,44,44,44,44,44,44,43,43,43,43,43,43,43,43,43,43,43,43,43,43,43,42,42,42,42,42,42,42,42,42,42,41,41,41,41,41,41,41,41,41,41,41,41,41,41,41,41,41,40,40,40,40,40,40,40,40,40,39,39,39,39,39,39,39,39,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,38,37,37,37,37,37,37,37,37,36,36,36,36,36,36,36,36,36,36,36,36,36,36,35,35,35,35,35,35,35,35,35,35,35,35,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,30,30,30,30,30,30,30,30,30,30,30,30,30,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,26,26,26,26,26,26,26,26,26,26,26,26,26,26,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17],&#34;fontFamily&#34;:&#34;Segoe UI&#34;,&#34;fontWeight&#34;:&#34;bold&#34;,&#34;color&#34;:&#34;random-dark&#34;,&#34;minSize&#34;:0,&#34;weightFactor&#34;:0.267062314540059,&#34;backgroundColor&#34;:&#34;white&#34;,&#34;gridSize&#34;:0,&#34;minRotation&#34;:-0.785398163397448,&#34;maxRotation&#34;:0.785398163397448,&#34;shuffle&#34;:true,&#34;rotateRatio&#34;:0.4,&#34;shape&#34;:&#34;circle&#34;,&#34;ellipticity&#34;:0.65,&#34;figBase64&#34;:null,&#34;hover&#34;:null},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: Top 1000 terms extracted from the Scopus’s keywords
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are some weird symbols in the plot and the wordcloud, it’s better remove to it. However, I am to lazy to remove it, so I will leave it 😃.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;These are some of the explorative text analysis that can be done. These relevant terms may provide some insight to our current research of COVID-19 in Malaysia. However, by no means its fully reflect our current COVID-19 research.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Hyperparameter tuning in tidymodels</title>
      <link>https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/</link>
      <pubDate>Sun, 05 Sep 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;This post will not go very detail in each of the approach of hyperparameter tuning. This post mainly aims to summarize a few things that I studied for the last couple of days.
Generally, there are two approaches to hyperparameter tuning in tidymodels.&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Grid search:&lt;br /&gt;
– Regular grid search&lt;br /&gt;
– Random grid search&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Iterative search:&lt;br /&gt;
– Bayesian optimization&lt;br /&gt;
– Simulated annealing&lt;/li&gt;
&lt;/ol&gt;
&lt;div id=&#34;grid-search&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Grid search&lt;/h2&gt;
&lt;p&gt;So, in grid search, we provide the combination of parameters and the algorithm will go through each combination of parameters. There are two types of grid search:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Regular grid search&lt;br /&gt;
– The algorithm will go through each combinations of parameters.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;grid_regular(mtry(c(1, 13)), 
             trees(), 
             min_n(),
             levels = 3) # how many from each parameter&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 27 x 3
##     mtry trees min_n
##    &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;
##  1     1     1     2
##  2     7     1     2
##  3    13     1     2
##  4     1  1000     2
##  5     7  1000     2
##  6    13  1000     2
##  7     1  2000     2
##  8     7  2000     2
##  9    13  2000     2
## 10     1     1    21
## # ... with 17 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Random grid search&lt;br /&gt;
– The algorithm will randomly select a number of combination of parameters instead of go through each of them.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;grid_random(mtry(c(1, 13)),
            trees(), 
            min_n(), 
            size = 100) # size of parameters combination&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 100 x 3
##     mtry trees min_n
##    &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;
##  1     5  1216    40
##  2     8  1374    13
##  3     9   859    39
##  4     6   282    12
##  5     2  1210     9
##  6     8  1828    39
##  7    11   550    14
##  8    13  1157    32
##  9     5   282     6
## 10    10  1018    28
## # ... with 90 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By default, tidymodels uses space-filling-design to make sure the combination of parameters are on “equidistance” to each other.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;iterative-search&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Iterative search&lt;/h2&gt;
&lt;p&gt;In iterative search, we need to specify some initial parameters/values to start the search.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Bayesian optimization&lt;br /&gt;
– This algorithm/function will search the next best combination of parameters based on the previous combination of parameters (priori).&lt;/li&gt;
&lt;li&gt;Simulated annealing&lt;br /&gt;
– Generally, this algorithm works relatively similar to bayesian optimization.&lt;br /&gt;
– However, as the figure below illustrates this algorithm is able to explore in the worst combination of parameters for a short term (barrier of local search), in order to find the best combination of parameters (global minima).
&lt;img src=&#34;images/sim-anneal.png&#34; alt=&#34;Simulated annealing&#34; /&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Futher details on iterative search or both methods above can be found &lt;a href=&#34;https://www.tmwr.org/iterative-search.html#iterative-search&#34;&gt;here&lt;/a&gt;. So, as both iterative methods need a starting parameters, we can actually combine with any of the grid search methods.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;other-methods&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Other methods&lt;/h2&gt;
&lt;p&gt;By default, if we do not supply any combination of parameters, tidymodels will randomly pick 10 combinations of parameters from the default range of values from the model. Additionally, we can set this values to other values as shown below:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tune_grid(
  resamples = dat_cv, # cross validation data set
  grid = 20,  # 20 combinations of parameters
  control = control, # some control parameters
  metrics = metrics # some metrics parameters (roc_auc, etc)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are another special cases of grid search; &lt;code&gt;tune_race_anova()&lt;/code&gt; and &lt;code&gt;tune_race_win_loss()&lt;/code&gt;. Both of these methods supposed to be more efficient way of grid search. In general, both methods evaluate the tuning parameters on a small initial set. The combination of parameters with a worst performance will be eliminated. Thus, makes them more efficient in grid search. The main difference between these two methods is how the worst combination of parameters are evaluated and eliminated.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-codes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;R codes&lt;/h2&gt;
&lt;p&gt;Load the packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Packages
library(tidyverse)
library(tidymodels)
library(finetune)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We will only use a small chunk of the data for ease of computation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Data
data(income, package = &amp;quot;kernlab&amp;quot;)

# Make data smaller for computation
set.seed(2021)
income2 &amp;lt;- 
  income %&amp;gt;% 
  filter(INCOME == &amp;quot;[75.000-&amp;quot; | INCOME == &amp;quot;[50.000-75.000)&amp;quot;) %&amp;gt;% 
  slice_sample(n = 600) %&amp;gt;% 
  mutate(INCOME = fct_drop(INCOME), 
         INCOME = fct_recode(INCOME, 
                             rich = &amp;quot;[75.000-&amp;quot;,
                             less_rich = &amp;quot;[50.000-75.000)&amp;quot;), 
         INCOME = factor(INCOME, ordered = F)) %&amp;gt;% 
  mutate(across(-INCOME, fct_drop))

# Summary of data
glimpse(income2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Rows: 600
## Columns: 14
## $ INCOME         &amp;lt;fct&amp;gt; less_rich, rich, rich, rich, less_rich, rich, rich, les~
## $ SEX            &amp;lt;fct&amp;gt; F, M, F, M, F, F, F, M, F, M, M, M, F, F, F, F, M, M, M~
## $ MARITAL.STATUS &amp;lt;fct&amp;gt; Married, Married, Married, Single, Single, NA, Married,~
## $ AGE            &amp;lt;ord&amp;gt; 35-44, 25-34, 45-54, 18-24, 18-24, 14-17, 25-34, 25-34,~
## $ EDUCATION      &amp;lt;ord&amp;gt; 1 to 3 years of college, Grad Study, College graduate, ~
## $ OCCUPATION     &amp;lt;fct&amp;gt; &amp;quot;Professional/Managerial&amp;quot;, &amp;quot;Professional/Managerial&amp;quot;, &amp;quot;~
## $ AREA           &amp;lt;ord&amp;gt; 10+ years, 7-10 years, 10+ years, -1 year, 4-6 years, 7~
## $ DUAL.INCOMES   &amp;lt;fct&amp;gt; Yes, Yes, Yes, Not Married, Not Married, Not Married, N~
## $ HOUSEHOLD.SIZE &amp;lt;ord&amp;gt; Five, Two, Four, Two, Four, Two, Three, Two, Five, One,~
## $ UNDER18        &amp;lt;ord&amp;gt; Three, None, None, None, None, None, One, None, Three, ~
## $ HOUSEHOLDER    &amp;lt;fct&amp;gt; Own, Own, Own, Rent, Family, Own, Own, Rent, Own, Own, ~
## $ HOME.TYPE      &amp;lt;fct&amp;gt; House, House, House, House, House, Apartment, House, Ho~
## $ ETHNIC.CLASS   &amp;lt;fct&amp;gt; White, White, White, White, White, White, White, White,~
## $ LANGUAGE       &amp;lt;fct&amp;gt; English, English, English, English, English, NA, Englis~&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Outcome variable
table(income2$INCOME)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## less_rich      rich 
##       362       238&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Missing data
DataExplorer::plot_missing(income)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/index.en_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Split the data and create a 10-fold cross validation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(2021)
dat_index &amp;lt;- initial_split(income2, strata = INCOME)
dat_train &amp;lt;- training(dat_index)
dat_test &amp;lt;- testing(dat_index)

## CV
set.seed(2021)
dat_cv &amp;lt;- vfold_cv(dat_train, v = 10, repeats = 1, strata = INCOME)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We going to impute the NAs with mode value since all the variable are categorical.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Recipe
dat_rec &amp;lt;- 
  recipe(INCOME ~ ., data = dat_train) %&amp;gt;% 
  step_impute_mode(all_predictors()) %&amp;gt;% 
  step_ordinalscore(AGE, EDUCATION, AREA, HOUSEHOLD.SIZE, UNDER18)

# Model
rf_mod &amp;lt;- 
  rand_forest(mtry = tune(),
              trees = tune(),
              min_n = tune()) %&amp;gt;% 
  set_mode(&amp;quot;classification&amp;quot;) %&amp;gt;% 
  set_engine(&amp;quot;ranger&amp;quot;)

# Workflow
rf_wf &amp;lt;- 
  workflow() %&amp;gt;% 
  add_recipe(dat_rec) %&amp;gt;% 
  add_model(rf_mod)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Parameters for grid search&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Regular grid
reg_grid &amp;lt;- grid_regular(mtry(c(1, 13)), 
                         trees(), 
                         min_n(), 
                         levels = 3)

# Random grid
rand_grid &amp;lt;- grid_random(mtry(c(1, 13)), 
                         trees(), 
                         min_n(), 
                         size = 100)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tune models using regular grid search. We going to use &lt;code&gt;doParallel&lt;/code&gt; library to do parallel processing.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ctrl &amp;lt;- control_grid(save_pred = T,
                        extract = extract_model)
measure &amp;lt;- metric_set(roc_auc)  

# Parallel for regular grid
library(doParallel)

# Create a cluster object and then register: 
cl &amp;lt;- makePSOCKcluster(4)
registerDoParallel(cl)

# Run tune
set.seed(2021)
tune_regular &amp;lt;- 
  rf_wf %&amp;gt;% 
  tune_grid(
    resamples = dat_cv, 
    grid = reg_grid,         
    control = ctrl, 
    metrics = measure)

stopCluster(cl)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Result for regular grid search:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;autoplot(tune_regular)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/index.en_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;show_best(tune_regular)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 9
##    mtry trees min_n .metric .estimator  mean     n std_err .config              
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;                
## 1     7  1000    21 roc_auc binary     0.690    10  0.0148 Preprocessor1_Model14
## 2     7  1000    40 roc_auc binary     0.689    10  0.0179 Preprocessor1_Model23
## 3     7  2000    40 roc_auc binary     0.689    10  0.0178 Preprocessor1_Model26
## 4     7  1000     2 roc_auc binary     0.688    10  0.0173 Preprocessor1_Model05
## 5     7  2000    21 roc_auc binary     0.688    10  0.0159 Preprocessor1_Model17&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tune models using random grid search.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Parallel for random grid
# Create a cluster object and then register: 
cl &amp;lt;- makePSOCKcluster(4)
registerDoParallel(cl)

# Run tune
set.seed(2021)
tune_random &amp;lt;- 
  rf_wf %&amp;gt;% 
  tune_grid(
    resamples = dat_cv, 
    grid = rand_grid,         
    control = ctrl, 
    metrics = measure)

stopCluster(cl)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Result for random grid search:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;autoplot(tune_random)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/index.en_files/figure-html/unnamed-chunk-13-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;show_best(tune_random)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 9
##    mtry trees min_n .metric .estimator  mean     n std_err .config              
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;                
## 1     4  1016     4 roc_auc binary     0.694    10  0.0164 Preprocessor1_Model0~
## 2     5  1360     3 roc_auc binary     0.693    10  0.0168 Preprocessor1_Model0~
## 3     6   129    14 roc_auc binary     0.693    10  0.0164 Preprocessor1_Model0~
## 4     5  1235     3 roc_auc binary     0.692    10  0.0168 Preprocessor1_Model0~
## 5     6   160    31 roc_auc binary     0.692    10  0.0172 Preprocessor1_Model0~&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Random grid search has slightly a better result. Let’s use this random search result as a base for iterative search. Firstly, we limit the parameters based on the plot from a random grid search.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rf_param &amp;lt;- 
  rf_wf %&amp;gt;% 
  parameters() %&amp;gt;% 
  update(mtry = mtry(c(5, 13)), 
         trees = trees(c(1, 500)), 
         min_n = min_n(c(5, 30)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we do a bayesian optimization.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Parallel for bayesian optimization
# Create a cluster object and then register: 
cl &amp;lt;- makePSOCKcluster(4)
registerDoParallel(cl)

# Run tune
set.seed(2021)
bayes_tune &amp;lt;-  
  rf_wf %&amp;gt;% 
  tune_bayes(    
    resamples = dat_cv,
    param_info = rf_param,
    iter = 60,
    initial = tune_random, # result from random grid search        
    control = control_bayes(no_improve = 30, verbose = T, save_pred = T), 
    metrics = measure)

stopCluster(cl)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Result for bayesian optimization.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;autoplot(bayes_tune, &amp;quot;performance&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/index.en_files/figure-html/unnamed-chunk-16-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;show_best(bayes_tune)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 10
##    mtry trees min_n .metric .estimator  mean     n std_err .config         .iter
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;           &amp;lt;int&amp;gt;
## 1     4  1016     4 roc_auc binary     0.694    10  0.0164 Preprocessor1_~     0
## 2     5  1360     3 roc_auc binary     0.693    10  0.0168 Preprocessor1_~     0
## 3     6   129    14 roc_auc binary     0.693    10  0.0164 Preprocessor1_~     0
## 4     6   189    15 roc_auc binary     0.693    10  0.0153 Iter1               1
## 5     5  1235     3 roc_auc binary     0.692    10  0.0168 Preprocessor1_~     0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We get a slightly better result from bayesian optimization. I will not do a simulated annealing approach since I get an error, though I am not sure why.&lt;/p&gt;
&lt;p&gt;Lastly, we do a race anova.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Parallel for race anova
# Create a cluster object and then register: 
cl &amp;lt;- makePSOCKcluster(4)
registerDoParallel(cl)

# Run tune
set.seed(2021)
tune_efficient &amp;lt;- 
  rf_wf %&amp;gt;% 
  tune_race_anova(
    resamples = dat_cv, 
    grid = rand_grid,         
    control = control_race(verbose_elim = T, save_pred = T), 
    metrics = measure)

stopCluster(cl)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We get a relatively similar result to random grid search but with faster computation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;autoplot(tune_efficient)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/index.en_files/figure-html/unnamed-chunk-19-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;show_best(tune_efficient)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 9
##    mtry trees min_n .metric .estimator  mean     n std_err .config              
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;                
## 1     5  1425     5 roc_auc binary     0.695    10  0.0161 Preprocessor1_Model0~
## 2    11   406     2 roc_auc binary     0.694    10  0.0183 Preprocessor1_Model0~
## 3     6   631     3 roc_auc binary     0.692    10  0.0171 Preprocessor1_Model0~
## 4     7  1264     4 roc_auc binary     0.692    10  0.0159 Preprocessor1_Model0~
## 5     9  1264     3 roc_auc binary     0.692    10  0.0188 Preprocessor1_Model0~&lt;/code&gt;&lt;/pre&gt;
We can also compare ROCs of all approaches. All approaches looks more or less similar.
&lt;details&gt;
&lt;summary&gt;
Show code
&lt;/summary&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# regular grid
rf_reg &amp;lt;- 
  tune_regular %&amp;gt;% 
  select_best(metric = &amp;quot;roc_auc&amp;quot;)

reg_auc &amp;lt;- 
  tune_regular %&amp;gt;% 
  collect_predictions(parameters = rf_reg) %&amp;gt;% 
  roc_curve(INCOME, .pred_less_rich) %&amp;gt;% 
  mutate(model = &amp;quot;regular_grid&amp;quot;)

# random grid
rf_rand &amp;lt;- 
  tune_random %&amp;gt;% 
  select_best(metric = &amp;quot;roc_auc&amp;quot;)

rand_auc &amp;lt;- 
  tune_random %&amp;gt;% 
  collect_predictions(parameters = rf_rand) %&amp;gt;% 
  roc_curve(INCOME, .pred_less_rich) %&amp;gt;% 
  mutate(model = &amp;quot;random_grid&amp;quot;)

# bayes
rf_bayes &amp;lt;- 
  bayes_tune %&amp;gt;% 
  select_best(metric = &amp;quot;roc_auc&amp;quot;)

bayes_auc &amp;lt;- 
  bayes_tune %&amp;gt;% 
  collect_predictions(parameters = rf_bayes) %&amp;gt;% 
  roc_curve(INCOME, .pred_less_rich) %&amp;gt;% 
  mutate(model = &amp;quot;bayes&amp;quot;)

# race_anova
rf_eff &amp;lt;- 
  tune_efficient %&amp;gt;% 
  select_best(metric = &amp;quot;roc_auc&amp;quot;)

eff_auc &amp;lt;- 
  tune_efficient %&amp;gt;% 
  collect_predictions(parameters = rf_eff) %&amp;gt;%
  roc_curve(INCOME, .pred_less_rich) %&amp;gt;% 
  mutate(model = &amp;quot;race_anova&amp;quot;)

# Compare ROC between all tuning approach
bind_rows(reg_auc, rand_auc, bayes_auc, eff_auc) %&amp;gt;% 
  ggplot(aes(x = 1 - specificity, y = sensitivity, col = model)) + 
  geom_path(lwd = 1.5, alpha = 0.8) +
  geom_abline(lty = 3) + 
  coord_equal() + 
  scale_color_viridis_d(option = &amp;quot;plasma&amp;quot;, end = .6) +
  theme_bw()&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/index.en_files/figure-html/unnamed-chunk-21-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Finally, we fit our best model (bayesian optimization) to the testing data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Finalize workflow
best_rf &amp;lt;-
  select_best(bayes_tune, &amp;quot;roc_auc&amp;quot;)

final_wf &amp;lt;- 
  rf_wf %&amp;gt;% 
  finalize_workflow(best_rf)
final_wf&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## == Workflow ====================================================================
## Preprocessor: Recipe
## Model: rand_forest()
## 
## -- Preprocessor ----------------------------------------------------------------
## 2 Recipe Steps
## 
## * step_impute_mode()
## * step_ordinalscore()
## 
## -- Model -----------------------------------------------------------------------
## Random Forest Model Specification (classification)
## 
## Main Arguments:
##   mtry = 4
##   trees = 1016
##   min_n = 4
## 
## Computational engine: ranger&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Last fit
test_fit &amp;lt;- 
  final_wf %&amp;gt;%
  last_fit(dat_index) 

# Evaluation metrics 
test_fit %&amp;gt;%
  collect_metrics()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 4
##   .metric  .estimator .estimate .config             
##   &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;          &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;               
## 1 accuracy binary         0.583 Preprocessor1_Model1
## 2 roc_auc  binary         0.611 Preprocessor1_Model1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;test_fit %&amp;gt;%
  collect_predictions() %&amp;gt;% 
  roc_curve(INCOME, .pred_less_rich) %&amp;gt;% 
  autoplot()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/hyperparameter-tuning-in-tidymodels/index.en_files/figure-html/unnamed-chunk-22-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The result is not that good. Our AUC is quite lower. However, we did use only about 8% from the overall data. Nonetheless, the aim of this post is to cover an overview of hyperparameter tuning in tidymodels.&lt;/p&gt;
&lt;p&gt;Additionally, there are another two function to construct parameter grids that I did not cover in this post; &lt;code&gt;grid_max_entropy()&lt;/code&gt; and &lt;code&gt;grid_latin_hypercube()&lt;/code&gt;. Both of these functions do not have much resources explaining them (or at least I did not found it), however, for those interested, a good start will be the tidymodels &lt;a href=&#34;https://dials.tidymodels.org/reference/grid_max_entropy.html&#34;&gt;website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;References:&lt;br /&gt;
&lt;a href=&#34;https://www.tmwr.org/grid-search.html&#34; class=&#34;uri&#34;&gt;https://www.tmwr.org/grid-search.html&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;https://www.tmwr.org/iterative-search.html&#34; class=&#34;uri&#34;&gt;https://www.tmwr.org/iterative-search.html&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;https://oliviergimenez.github.io/learning-machine-learning/#&#34; class=&#34;uri&#34;&gt;https://oliviergimenez.github.io/learning-machine-learning/#&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;https://towardsdatascience.com/optimization-techniques-simulated-annealing-d6a4785a1de7&#34; class=&#34;uri&#34;&gt;https://towardsdatascience.com/optimization-techniques-simulated-annealing-d6a4785a1de7&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Data exploration in R</title>
      <link>https://tengkuhanis.netlify.app/post/data-exploration-in-r/</link>
      <pubDate>Sun, 22 Aug 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/data-exploration-in-r/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;These are some of the packages that I find useful for data exploration. Basically, this post serves more as my note for future reference. I will list out packages (and some awesome functions from that particular package) rather than specific functions. Further, base R and tidyverse packages will not be included specifically in this list.&lt;/p&gt;
&lt;p&gt;Load supporting packages&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The data we are going to use is from dlookr package:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;glimpse(heartfailure)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Rows: 299
## Columns: 13
## $ age               &amp;lt;int&amp;gt; 75, 55, 65, 50, 65, 90, 75, 60, 65, 80, 75, 62, 45, ~
## $ anaemia           &amp;lt;fct&amp;gt; No, No, No, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, N~
## $ cpk_enzyme        &amp;lt;dbl&amp;gt; 582, 7861, 146, 111, 160, 47, 246, 315, 157, 123, 81~
## $ diabetes          &amp;lt;fct&amp;gt; No, No, No, No, Yes, No, No, Yes, No, No, No, No, No~
## $ ejection_fraction &amp;lt;dbl&amp;gt; 20, 38, 20, 20, 20, 40, 15, 60, 65, 35, 38, 25, 30, ~
## $ hblood_pressure   &amp;lt;fct&amp;gt; Yes, No, No, No, No, Yes, No, No, No, Yes, Yes, Yes,~
## $ platelets         &amp;lt;dbl&amp;gt; 265000, 263358, 162000, 210000, 327000, 204000, 1270~
## $ creatinine        &amp;lt;dbl&amp;gt; 1.90, 1.10, 1.30, 1.90, 2.70, 2.10, 1.20, 1.10, 1.50~
## $ sodium            &amp;lt;dbl&amp;gt; 130, 136, 129, 137, 116, 132, 137, 131, 138, 133, 13~
## $ sex               &amp;lt;fct&amp;gt; Male, Male, Male, Male, Female, Male, Male, Male, Fe~
## $ smoking           &amp;lt;fct&amp;gt; No, No, Yes, No, No, Yes, No, Yes, No, Yes, Yes, Yes~
## $ time              &amp;lt;int&amp;gt; 4, 6, 7, 7, 8, 8, 10, 10, 10, 10, 10, 10, 11, 11, 12~
## $ death_event       &amp;lt;fct&amp;gt; Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We will create a few NAs in our data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(2021)
heartfailure[sample(seq(nrow(heartfailure)), 20), &amp;quot;age&amp;quot;] &amp;lt;- NA
heartfailure[sample(seq(nrow(heartfailure)), 10), &amp;quot;sex&amp;quot;] &amp;lt;- NA&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;1) dataMaid&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(dataMaid)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One of the very useful function in dataMaid is &lt;code&gt;makeDataReport()&lt;/code&gt; which give report on the data. By default it will give a pdf output, but other output options such as word and html are also available.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;makeDataReport(heartfailure, replace = T)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the output example in &lt;a href=&#34;https://tengkuhanis.netlify.app/files/dataMaid_heartfailure.pdf&#34;&gt;pdf&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2) DataExplorer&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(DataExplorer)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;General visualization:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heartfailure %&amp;gt;% plot_intro()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Since we have missing data, we can further visualize it:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heartfailure %&amp;gt;% plot_missing()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heartfailure %&amp;gt;% profile_missing()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##              feature num_missing pct_missing
## 1                age          20  0.06688963
## 2            anaemia           0  0.00000000
## 3         cpk_enzyme           0  0.00000000
## 4           diabetes           0  0.00000000
## 5  ejection_fraction           0  0.00000000
## 6    hblood_pressure           0  0.00000000
## 7          platelets           0  0.00000000
## 8         creatinine           0  0.00000000
## 9             sodium           0  0.00000000
## 10               sex          10  0.03344482
## 11           smoking           0  0.00000000
## 12              time           0  0.00000000
## 13       death_event           0  0.00000000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can also do a correlation plot&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heartfailure %&amp;gt;% 
  select_if(is.numeric) %&amp;gt;% 
  drop_na() %&amp;gt;% 
  plot_correlation()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;However, I do think correlation plot from corrplot packages gives a better and clean plot. Here is a plot from corrplot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(corrplot)

heartfailure %&amp;gt;% 
  select_if(is.numeric) %&amp;gt;% 
  drop_na() %&amp;gt;% 
  cor() %&amp;gt;% 
  corrplot(type = &amp;quot;upper&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Finally, we can get an overall html report from DataExplorer package using the function &lt;code&gt;create_report()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3) dlookr&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(dlookr)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can assess normality of the data using this package. The code below will plot normality for all numeric variable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heartfailure %&amp;gt;% 
  plot_normality()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, for the sake of the simplicity in this post, we will run only for one variable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heartfailure %&amp;gt;% 
  plot_normality(age)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/figure-html/unnamed-chunk-14-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can also get a correlation matrix plot from this package, and no need to remove the NAs and filter the numeric variable before running the function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heartfailure %&amp;gt;% 
  plot_correlate()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Lastly, from dlookr we can get the overall report of the data exploration in pdf (and other formats as well). This report is quite comprehensive, have a &lt;a href=&#34;https://tengkuhanis.netlify.app/files/EDA_Paged_Report.pdf&#34;&gt;look&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heartfailure %&amp;gt;% 
  eda_paged_report(target = &amp;quot;death_event&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;4) skimr&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;skimr package, especially &lt;code&gt;skim()&lt;/code&gt; function did not display correctly when using the blogdown. Hence, I included the screenshot of the result that we will typically see in the R console.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(skimr)
skim(heartfailure) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;images/black.png&#34; style=&#34;width:100.0%;height:100.0%&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, from skimr we can get an overview that includes the histogram for numerical data as well.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5) outliertree&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This package identify outlier using a decision tree. I will not go in detail about the approach, but for those who want to read &lt;a href=&#34;https://arxiv.org/abs/2001.00636&#34;&gt;further&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(outliertree)
outlier.tree(heartfailure)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Reporting top 2 outliers [out of 2 found]
## 
## row [251] - suspicious column: [creatinine] - suspicious value: [0.50]
##  distribution: 96.000% &amp;gt;= 0.70 - [mean: 1.35] - [sd: 1.22] - [norm. obs: 24]
##  given:
##      [cpk_enzyme] &amp;gt; [1610.00] (value: 2522.00)
## 
## 
## row [32] - suspicious column: [cpk_enzyme] - suspicious value: [23.00]
##  distribution: 98.958% &amp;gt;= 47.00 - [mean: 677.01] - [sd: 1321.86] - [norm. obs: 95]
##  given:
##      [death_event] = [Yes]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Outlier Tree model
##  Numeric variables: 7
##  Categorical variables: 6
## 
## Consists of 369 clusters, spread across 48 tree branches&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can further explore the detected outliers using histogram and boxplot. Let’s do for variable creatinine.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# histogram
hist(heartfailure$creatinine, breaks = 50, col = &amp;quot;navy&amp;quot;,
     xlab = &amp;quot;Creatinine&amp;quot;, 
     main = &amp;quot;Creatinine level&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/figure-html/unnamed-chunk-19-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# boxplot
boxplot(heartfailure$creatinine)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/data-exploration-in-r/index.en_files/figure-html/unnamed-chunk-19-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Probably in the future I will delve into more detail about outlier detection and any awesome packages in R related to it. If I ever written any post about it, I will link it here.&lt;/p&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;These are some useful package that I find. I may edit this post in the future to add more additional data exploration package. Furthermore, there are shiny apps for data exploration as well, though I think it’s better to sticks with coded approach in data analysis/exploration. Thus, I did not explore those apps in this post. Another thing to remember is to set the variable type accordingly prior to the data exploration.&lt;/p&gt;
&lt;p&gt;Hope this is useful!&lt;/p&gt;
&lt;p&gt;References:&lt;br /&gt;
&lt;a href=&#34;https://github.com/ekstroem/dataMaid&#34; class=&#34;uri&#34;&gt;https://github.com/ekstroem/dataMaid&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;https://finnstats.com/index.php/2021/05/04/exploratory-data-analysis/&#34; class=&#34;uri&#34;&gt;https://finnstats.com/index.php/2021/05/04/exploratory-data-analysis/&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;https://cran.r-project.org/web/packages/dlookr/vignettes/EDA.html&#34; class=&#34;uri&#34;&gt;https://cran.r-project.org/web/packages/dlookr/vignettes/EDA.html&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;https://cran.r-project.org/web/packages/outliertree/vignettes/Introducing_OutlierTree.html&#34; class=&#34;uri&#34;&gt;https://cran.r-project.org/web/packages/outliertree/vignettes/Introducing_OutlierTree.html&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>A summary of forcats package</title>
      <link>https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/</link>
      <pubDate>Tue, 18 May 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;img src=&#34;forcats_logo.png&#34; width=&#34;30%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;I just watched a &lt;a href=&#34;https://youtu.be/qWYgNjnHNWI&#34;&gt;youtube video by Andrew Couch&lt;/a&gt; about his commonly used function in readr, stringr, and forcats packages. Although, I have used forcats package before, I realised that I have not fully utilised all of its function.&lt;/p&gt;
&lt;p&gt;So, in this post, I have summarised main function of forcats that I find useful in my day-to-day R coding. Basically, more like a note to myself.&lt;/p&gt;
&lt;div id=&#34;main-functions&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Main functions&lt;/h2&gt;
&lt;p&gt;We will use &lt;a href=&#34;https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars&#34;&gt;mtcars data&lt;/a&gt; to demonstrate each function. forcats is part of tiyverse packages. So, it will load, once we load the tidyverse packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
glimpse(mtcars)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Rows: 32
## Columns: 11
## $ mpg  &amp;lt;dbl&amp;gt; 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
## $ cyl  &amp;lt;dbl&amp;gt; 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
## $ disp &amp;lt;dbl&amp;gt; 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
## $ hp   &amp;lt;dbl&amp;gt; 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
## $ drat &amp;lt;dbl&amp;gt; 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
## $ wt   &amp;lt;dbl&amp;gt; 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
## $ qsec &amp;lt;dbl&amp;gt; 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
## $ vs   &amp;lt;dbl&amp;gt; 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
## $ am   &amp;lt;dbl&amp;gt; 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
## $ gear &amp;lt;dbl&amp;gt; 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
## $ carb &amp;lt;dbl&amp;gt; 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are 9 forcats functions that I think very useful.&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;factor()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;factor()&lt;/code&gt; changes variable type into a factor or categorical type&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mtcars$carb &amp;lt;- factor(mtcars$carb)
glimpse(mtcars)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Rows: 32
## Columns: 11
## $ mpg  &amp;lt;dbl&amp;gt; 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
## $ cyl  &amp;lt;dbl&amp;gt; 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
## $ disp &amp;lt;dbl&amp;gt; 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
## $ hp   &amp;lt;dbl&amp;gt; 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
## $ drat &amp;lt;dbl&amp;gt; 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
## $ wt   &amp;lt;dbl&amp;gt; 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
## $ qsec &amp;lt;dbl&amp;gt; 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
## $ vs   &amp;lt;dbl&amp;gt; 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
## $ am   &amp;lt;dbl&amp;gt; 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
## $ gear &amp;lt;dbl&amp;gt; 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
## $ carb &amp;lt;fct&amp;gt; 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_inorder()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function sorts factor levels based on the order of appearance in the dataset.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_inorder(mtcars$carb) # levels based on the order of appearance&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 4 1 2 3 6 8&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_infreq()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function sorts factor levels based on the frequency of values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_count(mtcars$carb) # this is forcats function as well, count factor level&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 2
##   f         n
##   &amp;lt;fct&amp;gt; &amp;lt;int&amp;gt;
## 1 1         7
## 2 2        10
## 3 3         3
## 4 4        10
## 5 6         1
## 6 8         1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_infreq(mtcars$carb) # levels based on the frequency values&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 2 4 1 3 6 8&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;4&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_relevel()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function can be used to change the order manually.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_relevel(mtcars$carb, c(&amp;quot;8&amp;quot;, &amp;quot;6&amp;quot;, &amp;quot;4&amp;quot;, &amp;quot;3&amp;quot;, &amp;quot;2&amp;quot;, &amp;quot;1&amp;quot;)) # manually changed new levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 8 6 4 3 2 1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fct_relevel()&lt;/code&gt; can also be used to change one factor level only.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_relevel(mtcars$carb, &amp;quot;8&amp;quot;, after = 2) # change level 8 to the third place&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 1 2 8 3 4 6&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;5&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_reorder()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function changes the order based on another variable. Let’s change variable carb’s levels based on value of variable disp.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;levels(mtcars$carb) # original levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;1&amp;quot; &amp;quot;2&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;6&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_reorder(mtcars$carb, mtcars$disp, .fun = sum, .desc = TRUE) # new level based on disp value&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 4 2 1 3 8 6&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mtcars %&amp;gt;% 
  group_by(carb) %&amp;gt;% 
  summarise(sum_disp = sum(disp)) %&amp;gt;% 
  arrange(desc(sum_disp)) # this is basically what we do with fct_reorder() above&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 2
##   carb  sum_disp
##   &amp;lt;fct&amp;gt;    &amp;lt;dbl&amp;gt;
## 1 4        3088.
## 2 2        2082.
## 3 1         940.
## 4 3         827.
## 5 8         301 
## 6 6         145&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Additionally, &lt;code&gt;fct_reorder()&lt;/code&gt; can be used with plotting as well.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Original plot
ggplot(mtcars, aes(x = carb, y = disp)) +
  geom_col()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/index.en_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Plot with changed levels
mtcars %&amp;gt;% 
  mutate(carb = fct_reorder(carb, disp, .fun = sum, .desc = TRUE)) %&amp;gt;% 
  ggplot(aes(x = carb, y = disp)) +
  geom_col()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-summary-of-forcats-package/index.en_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;ol start=&#34;6&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_lump()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function lumps factor levels into other factors. There are 5 variants of this function:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fct_lump()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fct_lump_min()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fct_lump_n()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fct_lump_lowfreq()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The remaining one variant is &lt;code&gt;fct_lump_prop()&lt;/code&gt;. It is not in the example below as I do not find it useful at least for my current R coding routine.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;fct_lump()&lt;/code&gt; automatically lump small frequency factor group into one group.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_count(mtcars$carb) # this is forcats function as well, count factor level&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 2
##   f         n
##   &amp;lt;fct&amp;gt; &amp;lt;int&amp;gt;
## 1 1         7
## 2 2        10
## 3 3         3
## 4 4        10
## 5 6         1
## 6 8         1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fct_lump(mtcars$carb) %&amp;gt;% fct_count() &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 4 x 2
##   f         n
##   &amp;lt;fct&amp;gt; &amp;lt;int&amp;gt;
## 1 1         7
## 2 2        10
## 3 4        10
## 4 Other     5&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fct_lump_min()&lt;/code&gt; lump factor group into one group based on the given value.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_lump_min(mtcars$carb, min = 2)) # group 6 and 8 lump into one group&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##     1     2     3     4 Other 
##     7    10     3    10     2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fct_lump_n()&lt;/code&gt; lump all level except for the &lt;em&gt;n&lt;/em&gt; most frequent factor groups.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_lump_n(mtcars$carb, n = 2)) # 2 frequent group only, others in one group&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##     2     4 Other 
##    10    10    12&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;fct_lump_lowfreq()&lt;/code&gt; lump small frequent groups into one group, while making sure that particular one group is still the smallest.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_lump_lowfreq(mtcars$carb, other_level = &amp;quot;low&amp;quot;)) # group low is still the smallest&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##   1   2   4 low 
##   7  10  10   5&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;7&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_other()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;fct_other()&lt;/code&gt; is much like &lt;code&gt;fct_lump()&lt;/code&gt;, except we manually choose which factor groups to be combined.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_other(mtcars$carb, keep = c(&amp;quot;8&amp;quot;, &amp;quot;6&amp;quot;))) &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##     6     8 Other 
##     1     1    30&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;8&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_recode()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function is used to rename or relabel the factor group.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_recode(mtcars$carb, hanis = &amp;quot;8&amp;quot;)) &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##     1     2     3     4     6 hanis 
##     7    10     3    10     1     1&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;9&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;fct_relabel()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;fct_relabel()&lt;/code&gt; is extremely useful if we want to rename quite a number of factor groups.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(mtcars$carb) # original groups&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##  1  2  3  4  6  8 
##  7 10  3 10  1  1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(fct_relabel(mtcars$carb, ~ c(&amp;quot;abu&amp;quot;, &amp;quot;ali&amp;quot;, &amp;quot;chong&amp;quot;, &amp;quot;siti&amp;quot;, &amp;quot;krish&amp;quot;, &amp;quot;lee&amp;quot;))) # new named groups&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##   abu   ali chong  siti krish   lee 
##     7    10     3    10     1     1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Reference:&lt;br /&gt;
&lt;a href=&#34;https://forcats.tidyverse.org/index.html&#34; class=&#34;uri&#34;&gt;https://forcats.tidyverse.org/index.html&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Handling imbalanced data</title>
      <link>https://tengkuhanis.netlify.app/post/handling-imbalanced-data/</link>
      <pubDate>Fri, 14 May 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/handling-imbalanced-data/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/handling-imbalanced-data/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;overview&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Overview&lt;/h2&gt;
&lt;p&gt;Imbalance data happens when there is unequal distribution of data within a categorical outcome variable. Imbalance data occurs due to several reasons such as biased sampling method and measurement errors. However, the imbalance may also be the inherent characteristic of the data. For example, a rare disease predictive model, in this case, the imbalance is expected.&lt;/p&gt;
&lt;p&gt;Generally, there are two types of imbalanced problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Slight imbalance: the imbalance is small, like 4:6&lt;/li&gt;
&lt;li&gt;Severe imbalance: the imbalance is large, like 1:100 or more&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In slight imbalanced cases, usually it is not a concern, while severe imbalanced cases require a more specialised method to to build a predictive model.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-problem&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The problem&lt;/h2&gt;
&lt;p&gt;What’s the problem with the imbalanced data?&lt;br /&gt;
Firstly, a predictive model of an imbalanced data is bias towards the majority class. The minority class becomes harder to predict as there are few data from this class. So, the detection rate for a minority class will be very low.
Secondly, accuracy is not a good measure in this case. We may get a good accuracy,but in reality the accuracy does not reflect the unequal distribution of the data. This is known as an &lt;a href=&#34;https://en.wikipedia.org/wiki/Accuracy_paradox&#34;&gt;accuracy paradox&lt;/a&gt;. Imagine we have 90% of data belong to the majority class, while the remaining 10% belong to the minority class. So, just by predicting all data as a majority class, the model can easily get 90% accuracy.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;handling-approach&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Handling approach&lt;/h2&gt;
&lt;p&gt;The easiest approach is to collect more data, though this may not be practical in all situation. Fortunately, there are a few machine learning techniques available to tackle this problem.&lt;/p&gt;
&lt;p&gt;Here is a summary of resampling techniques available in &lt;code&gt;themis&lt;/code&gt; package.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;method-themis.png&#34; width=&#34;90%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Over-sampling approach is preferred when the dataset is small. The under-sampling approach can be used when the dataset is large, though this approach may lead to loss of information. Additionally, ensemble technique such as random forest is said to be able to model the imbalanced data, though some references/blogs say otherwise.&lt;/p&gt;
&lt;p&gt;So, we are going to compare four of over-sampling techniques (upsample, SMOTE, ADASYN, and ROSE), and three of under-sampling techniques (downsample, nearmiss and tomek). The base model is a decision tree, which will be used for all the techniques. The decision trees are not going to be extensively hyperparameter tuned, for the sake of simplicity. Additionally, random forest is also going to be included in the comparison.&lt;/p&gt;
&lt;p&gt;The dataset is from &lt;a href=&#34;https://raw.githubusercontent.com/finnstats/finnstats/main/binary.csv&#34;&gt;here&lt;/a&gt;. This is a summary of the dataset.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(df)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  admit        gre             gpa        rank   
##  0:273   Min.   :220.0   Min.   :2.260   1: 61  
##  1:127   1st Qu.:520.0   1st Qu.:3.130   2:151  
##          Median :580.0   Median :3.395   3:121  
##          Mean   :587.7   Mean   :3.390   4: 67  
##          3rd Qu.:660.0   3rd Qu.:3.670          
##          Max.   :800.0   Max.   :4.000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we can see from the summary, variable admit has a moderate imbalanced data about 1:3 ratio.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(df, aes(admit)) + 
  geom_bar() +
  theme_bw()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/handling-imbalanced-data/index.en_files/figure-html/barplot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Below is the code for each model.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;
Show code
&lt;/summary&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Packages
library(tidyverse)
library(magrittr)
library(tidymodels)
library(themis)

# Data
df &amp;lt;- read.csv(&amp;quot;https://raw.githubusercontent.com/finnstats/finnstats/main/binary.csv&amp;quot;)

# Split data
set.seed(1234)
df_split &amp;lt;- initial_split(df)
df_train &amp;lt;- training(df_split)
df_test &amp;lt;- testing(df_split)

# 1) Decision tree ----

# Recipe
dt_rec &amp;lt;- 
  recipe(admit ~., data = df_train) %&amp;gt;% 
  step_mutate_at(c(&amp;quot;admit&amp;quot;, &amp;quot;rank&amp;quot;), fn = as_factor) %&amp;gt;% 
  step_dummy(rank)

df_train_rec &amp;lt;- 
  dt_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = NULL)
  
df_test_rec &amp;lt;- 
  dt_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = df_test)

## 10-folds CV
set.seed(1234)
df_cv &amp;lt;- vfold_cv(df_train_rec)

# Tune and finalize workflow
## Specify model
dt_mod &amp;lt;- 
  decision_tree(
    cost_complexity = tune(),
    tree_depth = tune(),
    min_n = tune()
  ) %&amp;gt;% 
  set_engine(&amp;quot;rpart&amp;quot;) %&amp;gt;% 
  set_mode(&amp;quot;classification&amp;quot;)

## Specify workflow
dt_wf &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(dt_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
dt_tune &amp;lt;- 
  dt_wf %&amp;gt;% 
  tune_grid(resamples = df_cv,
            metrics = metric_set(accuracy))

## Select best model
best_tune &amp;lt;- dt_tune %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
dt_wf_final &amp;lt;- 
  dt_wf %&amp;gt;% 
  finalize_workflow(best_tune)

# Fit on train data
dt_train &amp;lt;- 
  dt_wf_final %&amp;gt;% 
  fit(data = df_train_rec)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(dt_train, new_data = df_test_rec)) %&amp;gt;% 
  rename(pred = .pred_class)

# 2) Oversampling ----
## step_upsample() ----

# Recipe
up_rec &amp;lt;- 
  recipe(admit ~., data = df_train) %&amp;gt;% 
  step_mutate_at(c(&amp;quot;admit&amp;quot;, &amp;quot;rank&amp;quot;), fn = as_factor) %&amp;gt;% 
  step_dummy(rank) %&amp;gt;% 
  step_upsample(admit,
                seed = 1234)

df_train_up &amp;lt;- 
  up_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = NULL)

df_test_rec_up &amp;lt;- 
  up_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = df_test)

## 10-folds CV
set.seed(1234)
df_cv_up &amp;lt;- vfold_cv(df_train_up)

# Tune and finalize workflow
## Specify model
# same as before

## Specify workflow
dt_wf_up &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(dt_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
dt_tune_up &amp;lt;- 
  dt_wf_up %&amp;gt;% 
  tune_grid(resamples = df_cv_up,
            metrics = metric_set(accuracy))

## Select best model
best_tune_up &amp;lt;- dt_tune_up %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
dt_wf_final_up &amp;lt;- 
  dt_wf_up %&amp;gt;% 
  finalize_workflow(best_tune_up)

# Fit on train data
dt_train_up &amp;lt;- 
  dt_wf_final_up %&amp;gt;% 
  fit(data = df_train_up)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(dt_train_up, new_data = df_test_rec_up)) %&amp;gt;% 
  rename(pred_up = .pred_class)

## step_smote() ----

# Recipe
smote_rec &amp;lt;- 
  recipe(admit ~., data = df_train) %&amp;gt;% 
  step_mutate_at(c(&amp;quot;admit&amp;quot;, &amp;quot;rank&amp;quot;), fn = as_factor) %&amp;gt;% 
  step_dummy(rank) %&amp;gt;% 
  step_smote(admit, 
             seed = 1234)

df_train_smote &amp;lt;- 
  smote_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = NULL)

df_test_rec_smote &amp;lt;- 
  smote_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = df_test)

## 10-folds CV
set.seed(1234)
df_cv_smote &amp;lt;- vfold_cv(df_train_smote)

# Tune and finalize workflow
## Specify model
# same as before

## Specify workflow
dt_wf_smote &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(dt_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
dt_tune_smote &amp;lt;- 
  dt_wf_smote %&amp;gt;% 
  tune_grid(resamples = df_cv_smote,
            metrics = metric_set(accuracy))

## Select best model
best_tune_smote &amp;lt;- dt_tune_smote %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
dt_wf_final_smote &amp;lt;- 
  dt_wf_smote %&amp;gt;% 
  finalize_workflow(best_tune_smote)

# Fit on train data
dt_train_smote &amp;lt;- 
  dt_wf_final_smote %&amp;gt;% 
  fit(data = df_train_smote)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(dt_train_smote, new_data = df_test_rec_smote)) %&amp;gt;% 
  rename(pred_smote = .pred_class)

## step_rose() ----

# Recipe
rose_rec &amp;lt;- 
  recipe(admit ~., data = df_train) %&amp;gt;% 
  step_mutate_at(c(&amp;quot;admit&amp;quot;, &amp;quot;rank&amp;quot;), fn = as_factor) %&amp;gt;% 
  step_dummy(rank) %&amp;gt;% 
  step_rose(admit, 
             seed = 1234)

df_train_rose &amp;lt;- 
  rose_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = NULL)

df_test_rec_rose &amp;lt;- 
  rose_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = df_test)

## 10-folds CV
set.seed(1234)
df_cv_rose &amp;lt;- vfold_cv(df_train_rose)

# Tune and finalize workflow
## Specify model
# same as before

## Specify workflow
dt_wf_rose &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(dt_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
dt_tune_rose &amp;lt;- 
  dt_wf_rose %&amp;gt;% 
  tune_grid(resamples = df_cv_rose,
            metrics = metric_set(accuracy))

## Select best model
best_tune_rose &amp;lt;- dt_tune_rose %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
dt_wf_final_rose &amp;lt;- 
  dt_wf_rose %&amp;gt;% 
  finalize_workflow(best_tune_rose)

# Fit on train data
dt_train_rose &amp;lt;- 
  dt_wf_final_rose %&amp;gt;% 
  fit(data = df_train_rose)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(dt_train_rose, new_data = df_test_rec_rose)) %&amp;gt;% 
  rename(pred_rose = .pred_class)

## step_adasyn() ----

# Recipe
adasyn_rec &amp;lt;- 
  recipe(admit ~., data = df_train) %&amp;gt;% 
  step_mutate_at(c(&amp;quot;admit&amp;quot;, &amp;quot;rank&amp;quot;), fn = as_factor) %&amp;gt;% 
  step_dummy(rank) %&amp;gt;% 
  step_adasyn(admit, 
            seed = 1234)

df_train_adasyn &amp;lt;- 
  adasyn_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = NULL)

df_test_rec_adasyn &amp;lt;- 
  adasyn_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = df_test)

## 10-folds CV
set.seed(1234)
df_cv_adasyn &amp;lt;- vfold_cv(df_train_adasyn)

# Tune and finalize workflow
## Specify model
# same as before

## Specify workflow
dt_wf_adasyn &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(dt_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
dt_tune_adasyn &amp;lt;- 
  dt_wf_adasyn %&amp;gt;% 
  tune_grid(resamples = df_cv_adasyn,
            metrics = metric_set(accuracy))

## Select best model
best_tune_adasyn &amp;lt;- dt_tune_adasyn %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
dt_wf_final_adasyn &amp;lt;- 
  dt_wf_adasyn %&amp;gt;% 
  finalize_workflow(best_tune_adasyn)

# Fit on train data
dt_train_adasyn &amp;lt;- 
  dt_wf_final_adasyn %&amp;gt;% 
  fit(data = df_train_adasyn)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(dt_train_adasyn, new_data = df_test_rec_adasyn)) %&amp;gt;% 
  rename(pred_adasyn = .pred_class)

# 3) Undersampling ----
## step_downsample() ----

# Recipe
down_rec &amp;lt;- 
  recipe(admit ~., data = df_train) %&amp;gt;% 
  step_mutate_at(c(&amp;quot;admit&amp;quot;, &amp;quot;rank&amp;quot;), fn = as_factor) %&amp;gt;% 
  step_dummy(rank) %&amp;gt;% 
  step_downsample(admit,
                seed = 1234)

df_train_down &amp;lt;- 
  down_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = NULL)

df_test_rec_down &amp;lt;- 
  down_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = df_test)

## 10-folds CV
set.seed(1234)
df_cv_down &amp;lt;- vfold_cv(df_train_down)

# Tune and finalize workflow
## Specify model
# same as before

## Specify workflow
dt_wf_down &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(dt_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
dt_tune_down &amp;lt;- 
  dt_wf_down %&amp;gt;% 
  tune_grid(resamples = df_cv_down,
            metrics = metric_set(accuracy))

## Select best model
best_tune_down &amp;lt;- dt_tune_down %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
dt_wf_final_down &amp;lt;- 
  dt_wf_down %&amp;gt;% 
  finalize_workflow(best_tune_down)

# Fit on train data
dt_train_down &amp;lt;- 
  dt_wf_final_down %&amp;gt;% 
  fit(data = df_train_down)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(dt_train_down, new_data = df_test_rec_down)) %&amp;gt;% 
  rename(pred_down = .pred_class)

## step_nearmiss() ----

# Recipe
nearmiss_rec &amp;lt;- 
  recipe(admit ~., data = df_train) %&amp;gt;% 
  step_mutate_at(c(&amp;quot;admit&amp;quot;, &amp;quot;rank&amp;quot;), fn = as_factor) %&amp;gt;% 
  step_dummy(rank) %&amp;gt;% 
  step_nearmiss(admit,
                  seed = 1234)

df_train_nearmiss &amp;lt;- 
  nearmiss_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = NULL)

df_test_rec_nearmiss &amp;lt;- 
  nearmiss_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = df_test)

## 10-folds CV
set.seed(1234)
df_cv_nearmiss &amp;lt;- vfold_cv(df_train_nearmiss)

# Tune and finalize workflow
## Specify model
# same as before

## Specify workflow
dt_wf_nearmiss &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(dt_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
dt_tune_nearmiss &amp;lt;- 
  dt_wf_nearmiss %&amp;gt;% 
  tune_grid(resamples = df_cv_nearmiss,
            metrics = metric_set(accuracy))

## Select best model
best_tune_nearmiss &amp;lt;- dt_tune_nearmiss %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
dt_wf_final_nearmiss &amp;lt;- 
  dt_wf_nearmiss %&amp;gt;% 
  finalize_workflow(best_tune_nearmiss)

# Fit on train data
dt_train_nearmiss &amp;lt;- 
  dt_wf_final_nearmiss %&amp;gt;% 
  fit(data = df_train_nearmiss)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(dt_train_nearmiss, new_data = df_test_rec_nearmiss)) %&amp;gt;% 
  rename(pred_nearmiss = .pred_class)

## step_tomek() ----

# Recipe
tomek_rec &amp;lt;- 
  recipe(admit ~., data = df_train) %&amp;gt;% 
  step_mutate_at(c(&amp;quot;admit&amp;quot;, &amp;quot;rank&amp;quot;), fn = as_factor) %&amp;gt;% 
  step_dummy(rank) %&amp;gt;% 
  step_tomek(admit,
                  seed = 1234)

df_train_tomek &amp;lt;- 
  tomek_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = NULL)

df_test_rec_tomek &amp;lt;- 
  tomek_rec %&amp;gt;% 
  prep() %&amp;gt;% 
  bake(new_data = df_test)

## 10-folds CV
set.seed(1234)
df_cv_tomek &amp;lt;- vfold_cv(df_train_tomek)

# Tune and finalize workflow
## Specify model
# same as before

## Specify workflow
dt_wf_tomek &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(dt_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
dt_tune_tomek &amp;lt;- 
  dt_wf_tomek %&amp;gt;% 
  tune_grid(resamples = df_cv_tomek,
            metrics = metric_set(accuracy))

## Select best model
best_tune_tomek &amp;lt;- dt_tune_tomek %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
dt_wf_final_tomek &amp;lt;- 
  dt_wf_tomek %&amp;gt;% 
  finalize_workflow(best_tune_tomek)

# Fit on train data
dt_train_tomek &amp;lt;- 
  dt_wf_final_tomek %&amp;gt;% 
  fit(data = df_train_tomek)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(dt_train_tomek, new_data = df_test_rec_tomek)) %&amp;gt;% 
  rename(pred_tomek = .pred_class)

# 4) Ensemble approach: random forest ----

## 10-folds CV
set.seed(1234)
df_cv &amp;lt;- vfold_cv(df_train_rec)

# Tune and finalize workflow
## Specify model
rf_mod &amp;lt;- rand_forest(
 mtry = tune(),
 trees = tune(),
 min_n = tune()
 ) %&amp;gt;% 
  set_engine(&amp;quot;ranger&amp;quot;) %&amp;gt;% 
  set_mode(&amp;quot;classification&amp;quot;)

## Specify workflow
rf_wf &amp;lt;- 
  workflow() %&amp;gt;% 
  add_model(rf_mod) %&amp;gt;% 
  add_formula(admit ~.)

## Tune model
set.seed(1234)
rf_tune &amp;lt;- 
  rf_wf %&amp;gt;% 
  tune_grid(resamples = df_cv,
            metrics = metric_set(accuracy))

## Select best model
best_tune &amp;lt;- rf_tune %&amp;gt;% select_best(&amp;quot;accuracy&amp;quot;)

## Finalize workflow
rf_wf_final &amp;lt;- 
  rf_wf %&amp;gt;% 
  finalize_workflow(best_tune)

# Fit on train data
rf_train &amp;lt;- 
  rf_wf_final %&amp;gt;% 
  fit(data = df_train_rec)

# Fit on test data and get accuracy
df_test  %&amp;lt;&amp;gt;%  
  bind_cols(predict(rf_train, new_data = df_test_rec)) %&amp;gt;% 
  rename(pred_rf = .pred_class)&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;p&gt;Now, let’s get the accuracy, sensitivity, specificity, and &lt;a href=&#34;https://en.wikipedia.org/wiki/Matthews_correlation_coefficient#Advantages_of_MCC_over_accuracy_and_F1_score&#34;&gt;Mathews Correlation Coefficient (MCC)&lt;/a&gt; for each model.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;
Show code
&lt;/summary&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Get all measurements
df_test$admit %&amp;lt;&amp;gt;% as_factor()
pred_col &amp;lt;- colnames(df_test)[5:13]
result &amp;lt;- vector(&amp;quot;list&amp;quot;, 0)
sensi &amp;lt;- vector(&amp;quot;list&amp;quot;, 0)
specif &amp;lt;- vector(&amp;quot;list&amp;quot;, 0)
mathew &amp;lt;- vector(&amp;quot;list&amp;quot;, 0)

for (i in seq_along(pred_col)) {
  # accuracy
  result[[i]] &amp;lt;-
    df_test %&amp;gt;% 
    accuracy(admit, df_test[,pred_col[i]])
  
  # sensitivity
  sensi[[i]] &amp;lt;-
    df_test %&amp;gt;% 
    sensitivity(admit, df_test[,pred_col[i]])
  
  # specificity
  specif[[i]] &amp;lt;-
    df_test %&amp;gt;% 
    specificity(admit, df_test[,pred_col[i]])
  
  # MCC
  mathew[[i]] &amp;lt;-
    df_test %&amp;gt;% 
    mcc(admit, df_test[,pred_col[i]])
}

## Turn into dataframe
result  %&amp;lt;&amp;gt;%  
  enframe() %&amp;gt;% 
  unnest(cols = c(&amp;quot;value&amp;quot;)) %&amp;gt;% 
  rename(model = name, 
         accuracy = .estimate) %&amp;gt;% 
  select(model, accuracy) %&amp;gt;% 
  mutate(model = factor(model,labels = 
                          c(
                            &amp;quot;1&amp;quot; = &amp;quot;base&amp;quot;,
                            &amp;quot;2&amp;quot; = &amp;quot;upsample&amp;quot;,
                            &amp;quot;3&amp;quot; = &amp;quot;smote&amp;quot;,
                            &amp;quot;4&amp;quot; = &amp;quot;rose&amp;quot;,
                            &amp;quot;5&amp;quot; = &amp;quot;adasyn&amp;quot;,
                            &amp;quot;6&amp;quot; = &amp;quot;downsample&amp;quot;,
                            &amp;quot;7&amp;quot; = &amp;quot;nearmiss&amp;quot;,
                            &amp;quot;8&amp;quot; = &amp;quot;tomek&amp;quot;,
                            &amp;quot;9&amp;quot; = &amp;quot;random_forest&amp;quot;
                            )
                        ))

sensi  %&amp;lt;&amp;gt;%  
  enframe() %&amp;gt;% 
  unnest(cols = c(&amp;quot;value&amp;quot;))

specif %&amp;lt;&amp;gt;% 
  enframe() %&amp;gt;% 
  unnest(cols = c(&amp;quot;value&amp;quot;))

mathew %&amp;lt;&amp;gt;% 
  enframe() %&amp;gt;% 
  unnest(cols = c(&amp;quot;value&amp;quot;))

result %&amp;lt;&amp;gt;% 
  bind_cols(sensitive = sensi$.estimate, specific = specif$.estimate, mathew = mathew$.estimate)

# Plot the result
result %&amp;gt;% 
  pivot_longer(cols = 2:5, names_to = &amp;quot;measure&amp;quot;) %&amp;gt;% 
  ggplot(aes(x = model, y = value, fill = measure)) +
  geom_bar(position = &amp;quot;dodge&amp;quot;, stat = &amp;quot;identity&amp;quot;) +
  theme_bw() +
  coord_flip() +
  geom_text(aes(label = paste0(round(value*100, digits = 1), &amp;quot;%&amp;quot;)), 
            position = position_dodge(0.9), vjust = 0.3, size = 2.7, hjust = -0.1) +
  labs(title = &amp;quot;Comparison of unbalanced data techniques&amp;quot;, 
       x = &amp;quot;Techniques&amp;quot;, 
       y = &amp;quot;Performance&amp;quot;) +
  scale_fill_discrete(name = &amp;quot;Metrics:&amp;quot;,
                      labels = c(&amp;quot;Accuracy&amp;quot;, &amp;quot;MCC&amp;quot;, &amp;quot;Sensitivity&amp;quot;, &amp;quot;Specificity&amp;quot;)) +
  theme(legend.position = &amp;quot;bottom&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/handling-imbalanced-data/index.en_files/figure-html/summary-measure2-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can see from the above plot, the base model (decision tree) clearly has a low detection rate for a minority class (specificity). All methods able to increase the specificity, while sacrificing the accuracy and sensitivity. As mentioned earlier, accuracy is not a good metrics for this kind of model (ie; accuracy paradox). MCC on the other hand, takes into account all values of confusion matrix; true positive, false positive, true negative, and false negative. Hence, MCC is more informative compared to accuracy (and F score, which has not been included in the plot, for the sake of simplicity).&lt;/p&gt;
&lt;p&gt;A more balanced model probably downsample approach based on MCC, specificity, and sensitivity. However, this does not mean that downsample technique is the best as I believes each technique behaves differently from one data to another.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;a href=&#34;https://themis.tidymodels.org/reference/index.html&#34; class=&#34;uri&#34;&gt;https://themis.tidymodels.org/reference/index.html&lt;/a&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/&#34; class=&#34;uri&#34;&gt;https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/&lt;/a&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6413-7&#34; class=&#34;uri&#34;&gt;https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6413-7&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Exponentially Weighted Average in Deep Learning</title>
      <link>https://tengkuhanis.netlify.app/post/exponentially-weighted-average-in-deep-learning/</link>
      <pubDate>Sun, 09 May 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/exponentially-weighted-average-in-deep-learning/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/exponentially-weighted-average-in-deep-learning/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;I have been reading about lost functions and optimisers in deep learning for the last couple of days when I stumble upon the term Exponentially Weighted Average (EWA). So, in this post I aims to explain my understanding of EWA.&lt;/p&gt;
&lt;div id=&#34;overview-of-ewa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Overview of EWA&lt;/h2&gt;
&lt;p&gt;EWA basically is an important concept in deep learning and have been used in several optimisers to smoothen the noise of the data.&lt;/p&gt;
&lt;p&gt;Let’s see the formula for EWA:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;formula.png&#34; width=&#34;60%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;V&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; is some smoothen value at point &lt;em&gt;t&lt;/em&gt;, while &lt;em&gt;S&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; is a data point at point &lt;em&gt;t&lt;/em&gt;. &lt;em&gt;B&lt;/em&gt; here is a hyperparameter that we need to tune in our network. So, the choice of &lt;em&gt;B&lt;/em&gt; will determine how many data points that we average the value of &lt;em&gt;V&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;beta.png&#34; width=&#34;80%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ewa-in-deep-learnings-optimiser&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;EWA in deep learnings’ optimiser&lt;/h2&gt;
&lt;p&gt;So, some of the optimisers that adopt the approach of EWA are (red box indicates the EWA part in each formula):&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Stochastic gradient descent (SGD) with momentum&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The issue with SGD is the present of noise while searching for global minima. So, SGD with momentum integrated the EWA, which reduces these noises and helps the network converges faster.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;SGD-momentum2.png&#34; width=&#34;80%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Adaptive delta (Adadelta) and Root Mean Square Propagation (RMSprop)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Adadelta and RMSprop are proposed in attempt to solve the issue of diminishing learning rate of adaptive gradient (Adagrad) optimiser. The use of EWA in both optimisers actually helps to achieve this. Both optimisers have quite a similar formula, but attached below is the formula for Adadelta.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;adadelta2.png&#34; width=&#34;80%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Adaptive moment estimation (ADAM)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;ADAM basically combined the SGD with momentum with Adadelta. As shown earlier, both optimisers use EWA.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;more-details-on-ewa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;More details on EWA&lt;/h2&gt;
&lt;p&gt;Now, let’s go back to EWA. Here is the example of calculation of EWA:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;seq1.png&#34; width=&#34;90%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Keep in mind that &lt;em&gt;t&lt;sub&gt;3&lt;/sub&gt;&lt;/em&gt; is the latest time point, followed by &lt;em&gt;t&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;t&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt;, respectively. So, if we want to calculate &lt;em&gt;V&lt;sub&gt;3&lt;/sub&gt;&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;seq2.png&#34; width=&#34;90%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, if we were to varies the value of &lt;em&gt;B&lt;/em&gt; across the equation (while the values of &lt;em&gt;a&lt;sub&gt;1&lt;/sub&gt;…a&lt;sub&gt;n&lt;/sub&gt;&lt;/em&gt; remain constant), we can do so in R.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse) 

func &amp;lt;- function(b) (1 - b) * b^((20:1) - 1)
beta &amp;lt;- seq(0.1, 0.9, by=0.2)

dat &amp;lt;- t(sapply(beta, func)) %&amp;gt;% 
  as.data.frame()
colnames(dat)[1:20] &amp;lt;- 1:20

dat %&amp;gt;%  
  mutate(beta = as_factor(beta)) %&amp;gt;%
  pivot_longer(cols = 1:20, names_to = &amp;quot;data_point&amp;quot;, values_to = &amp;quot;weight&amp;quot;) %&amp;gt;% 
  ggplot(aes(x=as.numeric(data_point), y=weight, color=beta)) +
  geom_line() +
  geom_point() +
  scale_x_continuous(breaks = 1:20) +
  labs(title = &amp;quot;Change of Exponentially Weighted Average function&amp;quot;, 
       subtitle = &amp;quot;Time at t20 is the recent time, and t1 is the initial time&amp;quot;) +
  scale_colour_discrete(&amp;quot;Beta:&amp;quot;) +
  xlab(&amp;quot;Time(t)&amp;quot;) +
  ylab(&amp;quot;Weights/Coefficients&amp;quot;) +
  theme_bw()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/exponentially-weighted-average-in-deep-learning/index.en_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Note that time at t&lt;sub&gt;20&lt;/sub&gt; is the recent time, and t&lt;sub&gt;1&lt;/sub&gt; is the initial time. Thus, two main points from the above plot are:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;The EWA function acts in a decaying manner.&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;As beta, &lt;em&gt;B&lt;/em&gt; increases we actually put more emphasize on the recent data point.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Side note: I have tried to do the plot in plotly, not sure why it did not work&lt;/em&gt; 😕&lt;/p&gt;
&lt;p&gt;References:&lt;br /&gt;
1) &lt;a href=&#34;https://towardsdatascience.com/deep-learning-optimizers-436171c9e23f&#34; class=&#34;uri&#34;&gt;https://towardsdatascience.com/deep-learning-optimizers-436171c9e23f&lt;/a&gt; (all the equations are from this reference)&lt;br /&gt;
2) &lt;a href=&#34;https://youtu.be/NxTFlzBjS-4&#34; class=&#34;uri&#34;&gt;https://youtu.be/NxTFlzBjS-4&lt;/a&gt;&lt;br /&gt;
3) &lt;a href=&#34;https://medium.com/@dhartidhami/exponentially-weighted-averages-5de212b5be46&#34; class=&#34;uri&#34;&gt;https://medium.com/@dhartidhami/exponentially-weighted-averages-5de212b5be46&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Base R vs tidyverse</title>
      <link>https://tengkuhanis.netlify.app/post/2021-05-04-base-r-vs-tidyverse/</link>
      <pubDate>Tue, 04 May 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/2021-05-04-base-r-vs-tidyverse/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/2021-05-04-base-r-vs-tidyverse/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;First of all, this write up is mean for a beginner in R.&lt;/p&gt;
&lt;p&gt;Things can be done in many ways in R. In facts, R has been very flexible in this regard compared to other statistical softwares. Basic things such as selecting a column, slicing a row, filtering a data based on certain condition can be done using a base R function. However, all these things can also be done using a tidyverse approach.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.tidyverse.org/&#34;&gt;Tidyverse&lt;/a&gt; basically, a collection of packages that can be loaded in a line of function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tidyverse is developed by “RStudio people” pioneered by &lt;a href=&#34;http://hadley.nz/&#34;&gt;Hadley Wickham&lt;/a&gt;, which means that these packages will be continuously maintained and updated.&lt;/p&gt;
&lt;p&gt;So, without further ado, these are the comparisons between these two approaches for some very basic thingy:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Select or deselect a column and a row&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
iris[1:5, c(&amp;quot;Sepal.Length&amp;quot;, &amp;quot;Sepal.Width&amp;quot;)]
iris[1:5,c(1,2)] # similar to above
iris[1:5, -1]

# Tidyverse
iris %&amp;gt;% 
  select(Sepal.Length, Sepal.Width) %&amp;gt;% 
  slice(1:5)
iris %&amp;gt;% 
  select(-Sepal.Length) %&amp;gt;% 
  slice(1:5)&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Filter based on condition&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
iris[iris$Species == &amp;quot;setosa&amp;quot;, ]

# Tidyverse
iris %&amp;gt;% 
  filter(Species == &amp;quot;setosa&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Mutate (transmute replace the variable)&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
iris$SL_minus10 &amp;lt;- iris$Sepal.Length - 10

# Tidyverse
iris %&amp;gt;% 
  mutate(SL_minus10 = Sepal.Length - 10)&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;4&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Sort variable&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
iris[order(-iris$Sepal.Width),]

# Tidyverse
iris %&amp;gt;% 
  arrange(desc(Sepal.Length))&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;5&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Group by (and get mean for variable Sepal.Width)&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Not really base R
doBy::summaryBy(Sepal.Width~Species, iris, FUN = mean) 

# Tidyverse
iris %&amp;gt;% 
  group_by(Species) %&amp;gt;% 
  summarise(mean_SW = mean(Sepal.Width))&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;6&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Rename variable&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Base R
colnames(iris)[6] &amp;lt;- &amp;quot;hanis&amp;quot;

# Tidyverse
iris %&amp;gt;% 
  rename(Species = hanis)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, that’s it. Overall, tidyverse give a clarity in understanding the code as it reads from left to right. On the contrary, the base R approach reads from inside to outside, especially for a more complicated code.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Loop vs apply in R</title>
      <link>https://tengkuhanis.netlify.app/post/loop-vs-apply-in-r/</link>
      <pubDate>Tue, 04 May 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/loop-vs-apply-in-r/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/loop-vs-apply-in-r/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;I have heard quite a several times that apply function is faster than loop function in R. Loop function is said to be inefficient, though in certain situation loop is the only way.&lt;/p&gt;
&lt;p&gt;Let’s compare between loop function and apply function in R.&lt;/p&gt;
&lt;p&gt;First, make a very big fake data contain a list of vector.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(2021)
xlist &amp;lt;- list(col1 = rnorm(10000000), 
              col2 = rnorm(10000000),
              col3 = rnorm(100000000),
              col4 = rnorm(1000000)) # this will take a few seconds&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, calculate the mean of each vector using &lt;code&gt;for loop()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ptm &amp;lt;- proc.time() #-- start the clock

mean_loop &amp;lt;- vector(&amp;quot;list&amp;quot;, 0) # place holder for a value
for (i in seq_along(xlist)) {
  mean_loop[[i]] &amp;lt;- mean(xlist[[i]])
}

proc.time() - ptm #-- stop the clock (time in seconds)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    user  system elapsed 
##    0.38    0.00    0.37&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, using &lt;code&gt;lapply()&lt;/code&gt; function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ptm &amp;lt;- proc.time() #-- start the clock

mean_apply &amp;lt;- lapply(xlist, mean)

proc.time() - ptm #-- stop the clock&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    user  system elapsed 
##    0.34    0.00    0.35&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, &lt;code&gt;lapply()&lt;/code&gt; is a little bit faster. Obviously, with a very big dataset and a more complicated objective, &lt;code&gt;lapply()&lt;/code&gt; is the right choice, but for a “normal” size dataset, the use of any of the two functions probably do not make much different.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
