<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>missing data | Tengku Hanis</title>
    <link>https://tengkuhanis.netlify.app/tag/missing-data/</link>
      <atom:link href="https://tengkuhanis.netlify.app/tag/missing-data/index.xml" rel="self" type="application/rss+xml" />
    <description>missing data</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>©Tengku Hanis 2020-2025 Made with [blogdown](https://github.com/rstudio/blogdown)</copyright><lastBuildDate>Tue, 04 Jan 2022 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://tengkuhanis.netlify.app/images/icon_hua2ec155b4296a9c9791d015323e16eb5_11927_512x512_fill_lanczos_center_2.png</url>
      <title>missing data</title>
      <link>https://tengkuhanis.netlify.app/tag/missing-data/</link>
    </image>
    
    <item>
      <title>Stepwise selection after multiple imputation</title>
      <link>https://tengkuhanis.netlify.app/post/stepwise-selection-after-multiple-imputation/</link>
      <pubDate>Tue, 04 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/stepwise-selection-after-multiple-imputation/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/stepwise-selection-after-multiple-imputation/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;some-note&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Some note&lt;/h2&gt;
&lt;p&gt;I have written two post previously about multiple imputation using &lt;code&gt;mice&lt;/code&gt; package:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;a href=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/&#34;&gt;A short note on multiple imputation&lt;/a&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/&#34;&gt;Variable selection for imputation model in {mice}&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This post probably my last post about multiple imputation using &lt;code&gt;mice&lt;/code&gt; package.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;stepwise-selection&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Stepwise selection&lt;/h2&gt;
&lt;p&gt;The general steps in &lt;code&gt;mice&lt;/code&gt; package are:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;mice()&lt;/code&gt; - impute the NAs&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;with()&lt;/code&gt; - run the analysis (lm, glm, etc)&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pool()&lt;/code&gt; - pool the results&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For backward and forward selection, we can do it manually after pooling the results in step 3, but we cannot do this for stepwise selection.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://books.google.com.my/books/about/Development_Implementation_and_Evaluatio.html?id=-Y0TywAACAAJ&amp;amp;redir_esc=y&#34;&gt;Brand (1999)&lt;/a&gt; proposed this solution:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Perform stepwise selection separately on each imputed dataset&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Fit a preliminary model that contains all variables that present in at least half of the models in the step 1&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Apply backward elimination on the variables in the preliminary model (the variable is removed one by one if p &amp;gt; 0.05)&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Repeat step 3 until all variables have p values &amp;lt; 0.05&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So, we going to do this solution and use multivariate Wald test (&lt;code&gt;D1()&lt;/code&gt; in &lt;code&gt;mice&lt;/code&gt; package) for model comparison instead of pooled likelihood ratio p value.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;example-in-r&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Example in R&lt;/h2&gt;
&lt;p&gt;Load the packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(mice)
library(tidyverse)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Create a missing data. We going to use the famous &lt;code&gt;mtcars&lt;/code&gt; dataset, which already available in R.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(123)
dat &amp;lt;- 
  mtcars %&amp;gt;% 
  mutate(across(c(vs, am), as.factor)) %&amp;gt;% 
  select(-mpg) %&amp;gt;% 
  missForest::prodNA(0.1) %&amp;gt;% 
  bind_cols(mpg = mtcars$mpg)
summary(dat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       cyl             disp             hp             drat      
##  Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
##  1st Qu.:4.000   1st Qu.:120.7   1st Qu.:103.0   1st Qu.:3.150  
##  Median :6.000   Median :225.0   Median :123.0   Median :3.715  
##  Mean   :6.148   Mean   :232.8   Mean   :147.4   Mean   :3.642  
##  3rd Qu.:8.000   3rd Qu.:334.0   3rd Qu.:180.0   3rd Qu.:3.920  
##  Max.   :8.000   Max.   :472.0   Max.   :335.0   Max.   :4.930  
##  NA&amp;#39;s   :5       NA&amp;#39;s   :1       NA&amp;#39;s   :4       NA&amp;#39;s   :2      
##        wt             qsec          vs        am          gear     
##  Min.   :1.513   Min.   :14.50   0   :17   0   :18   Min.   :3.00  
##  1st Qu.:2.429   1st Qu.:16.88   1   :11   1   :10   1st Qu.:3.00  
##  Median :3.203   Median :17.51   NA&amp;#39;s: 4   NA&amp;#39;s: 4   Median :4.00  
##  Mean   :3.112   Mean   :17.75                       Mean   :3.71  
##  3rd Qu.:3.533   3rd Qu.:18.83                       3rd Qu.:4.00  
##  Max.   :5.424   Max.   :22.90                       Max.   :5.00  
##  NA&amp;#39;s   :4       NA&amp;#39;s   :2                           NA&amp;#39;s   :1     
##       carb            mpg       
##  Min.   :1.000   Min.   :10.40  
##  1st Qu.:2.000   1st Qu.:15.43  
##  Median :2.000   Median :19.20  
##  Mean   :2.667   Mean   :20.09  
##  3rd Qu.:4.000   3rd Qu.:22.80  
##  Max.   :6.000   Max.   :33.90  
##  NA&amp;#39;s   :5&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Run &lt;code&gt;mice()&lt;/code&gt; on missing data with 10 imputed datasets (&lt;code&gt;m = 10&lt;/code&gt;).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;datImp &amp;lt;- mice(dat, m = 10, printFlag = F, seed = 123)
datImp&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Class: mids
## Number of multiple imputations:  10 
## Imputation methods:
##      cyl     disp       hp     drat       wt     qsec       vs       am 
##    &amp;quot;pmm&amp;quot;    &amp;quot;pmm&amp;quot;    &amp;quot;pmm&amp;quot;    &amp;quot;pmm&amp;quot;    &amp;quot;pmm&amp;quot;    &amp;quot;pmm&amp;quot; &amp;quot;logreg&amp;quot; &amp;quot;logreg&amp;quot; 
##     gear     carb      mpg 
##    &amp;quot;pmm&amp;quot;    &amp;quot;pmm&amp;quot;       &amp;quot;&amp;quot; 
## PredictorMatrix:
##      cyl disp hp drat wt qsec vs am gear carb mpg
## cyl    0    1  1    1  1    1  1  1    1    1   1
## disp   1    0  1    1  1    1  1  1    1    1   1
## hp     1    1  0    1  1    1  1  1    1    1   1
## drat   1    1  1    0  1    1  1  1    1    1   1
## wt     1    1  1    1  0    1  1  1    1    1   1
## qsec   1    1  1    1  1    0  1  1    1    1   1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Run stepwise selection on each imputed dataset.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sc &amp;lt;- list(upper = ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb, 
           lower = ~ 1)
exp &amp;lt;- expression(f1 &amp;lt;- lm(mpg ~ 1),
                  f2 &amp;lt;- step(f1, scope = sc, trace = 0))
fit &amp;lt;- with(datImp, exp)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we calculate how many times each variable selected in the each model by stepwise selection.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit$analyses %&amp;gt;% 
  map(formula) %&amp;gt;% #get the formula
  map(terms) %&amp;gt;% #get the terms
  map(labels) %&amp;gt;% #get the name of variables
  unlist() %&amp;gt;% 
  table()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## .
##   am carb  cyl disp drat   hp qsec   vs   wt 
##    7    5    3    2    4    5    3    4    7&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We going to select:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;am&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;carb&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;hp&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;wt&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These variables appear at least in the half of the models. We have 10 imputed datasets, so, 10 models. Next, we fit a preliminary model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit_full1 &amp;lt;- with(datImp, lm(mpg ~ am + carb + hp + wt))
pool(fit_full1) %&amp;gt;% 
  summary()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          term    estimate  std.error statistic       df      p.value
## 1 (Intercept) 33.33683070 3.30280913 10.093478 15.81838 2.688191e-08
## 2         am1  3.06689135 1.94363342  1.577917 13.06329 1.384846e-01
## 3        carb -0.64791214 0.65564816 -0.988201 11.64959 3.431353e-01
## 4          hp -0.03414274 0.01159828 -2.943777 20.47239 7.895170e-03
## 5          wt -2.39586280 1.22218829 -1.960306 13.54830 7.085513e-02&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We exclude carb variable in the next model as it has the largest non-significant p value.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit_full2 &amp;lt;- with(datImp, lm(mpg ~ am + hp + wt))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we compare using multivariate Wald test.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;D1(fit_full1, fit_full2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    test statistic df1     df2 dfcom   p.value       riv
##  1 ~~ 2 0.9765411   1 9.21378    27 0.3482934 0.6935655&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;P &amp;gt; 0.05. So, we opt for the simpler model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pool(fit_full2) %&amp;gt;% 
  summary()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          term    estimate  std.error statistic       df      p.value
## 1 (Intercept) 33.75666324 3.30083213 10.226713 16.87762 1.195383e-08
## 2         am1  2.50264907 1.79966590  1.390619 15.31418 1.842201e-01
## 3          hp -0.03950216 0.01162689 -3.397482 17.65719 3.280147e-03
## 4          wt -2.75412354 1.15870950 -2.376889 15.03403 3.116779e-02&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We see that am variable has the largest non-significant p value. So, we exclude this variable in the next model and compare the two latest models using multivariate Wald test.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit_full3 &amp;lt;- with(datImp, lm(mpg ~ hp + wt))
D1(fit_full2, fit_full3)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    test statistic df1      df2 dfcom   p.value       riv
##  1 ~~ 2   1.93382   1 12.90982    28 0.1878483 0.4392918&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, we opt for the simple model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pool(fit_full3) %&amp;gt;% 
  summary()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          term    estimate  std.error statistic       df      p.value
## 1 (Intercept) 37.50546490 1.91102857 19.625800 23.65472 4.440892e-16
## 2          hp -0.03263534 0.01042989 -3.129021 21.20234 5.031751e-03
## 3          wt -3.92792051 0.75157304 -5.226266 19.78033 4.238231e-05&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There is no non-significant variable in the model anymore. Thus, this is our final model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gtsummary::tbl_regression(fit_full3)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;ybehlmrayy&#34; style=&#34;overflow-x:auto;overflow-y:auto;width:auto;height:auto;&#34;&gt;
&lt;style&gt;html {
  font-family: -apple-system, BlinkMacSystemFont, &#39;Segoe UI&#39;, Roboto, Oxygen, Ubuntu, Cantarell, &#39;Helvetica Neue&#39;, &#39;Fira Sans&#39;, &#39;Droid Sans&#39;, Arial, sans-serif;
}

#ybehlmrayy .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#ybehlmrayy .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#ybehlmrayy .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#ybehlmrayy .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#ybehlmrayy .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ybehlmrayy .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#ybehlmrayy .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#ybehlmrayy .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#ybehlmrayy .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#ybehlmrayy .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#ybehlmrayy .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#ybehlmrayy .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#ybehlmrayy .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#ybehlmrayy .gt_from_md &gt; :first-child {
  margin-top: 0;
}

#ybehlmrayy .gt_from_md &gt; :last-child {
  margin-bottom: 0;
}

#ybehlmrayy .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#ybehlmrayy .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#ybehlmrayy .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#ybehlmrayy .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#ybehlmrayy .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#ybehlmrayy .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#ybehlmrayy .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#ybehlmrayy .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ybehlmrayy .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#ybehlmrayy .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#ybehlmrayy .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#ybehlmrayy .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#ybehlmrayy .gt_left {
  text-align: left;
}

#ybehlmrayy .gt_center {
  text-align: center;
}

#ybehlmrayy .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#ybehlmrayy .gt_font_normal {
  font-weight: normal;
}

#ybehlmrayy .gt_font_bold {
  font-weight: bold;
}

#ybehlmrayy .gt_font_italic {
  font-style: italic;
}

#ybehlmrayy .gt_super {
  font-size: 65%;
}

#ybehlmrayy .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 65%;
}
&lt;/style&gt;
&lt;table class=&#34;gt_table&#34;&gt;
  
  &lt;thead class=&#34;gt_col_headings&#34;&gt;
    &lt;tr&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_left&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Characteristic&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Beta&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;95% CI&lt;/strong&gt;&lt;sup class=&#34;gt_footnote_marks&#34;&gt;1&lt;/sup&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;p-value&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody class=&#34;gt_table_body&#34;&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;hp&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.03&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.05, -0.01&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.005&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;wt&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-3.9&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-5.5, -2.4&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
  
  &lt;tfoot&gt;
    &lt;tr class=&#34;gt_footnotes&#34;&gt;
      &lt;td colspan=&#34;4&#34;&gt;
        &lt;p class=&#34;gt_footnote&#34;&gt;
          &lt;sup class=&#34;gt_footnote_marks&#34;&gt;
            &lt;em&gt;1&lt;/em&gt;
          &lt;/sup&gt;
           
          CI = Confidence Interval
          &lt;br /&gt;
        &lt;/p&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tfoot&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;Reference:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://stefvanbuuren.name/fimd/sec-stepwise.html&#34; class=&#34;uri&#34;&gt;https://stefvanbuuren.name/fimd/sec-stepwise.html&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Variable selection for imputation model in {mice}</title>
      <link>https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/</link>
      <pubDate>Mon, 22 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;some-note&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Some note&lt;/h2&gt;
&lt;p&gt;I have written a &lt;a href=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/&#34;&gt;short post&lt;/a&gt; about missing data and multiple imputation in &lt;code&gt;mice&lt;/code&gt; package previously. This post will add to that previous post.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;imputation-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Imputation model&lt;/h2&gt;
&lt;p&gt;Imputation model is the model that we use for our imputation approach. There is another term which is complete-data model. This is a model that we want to fit after we impute the missing values (i.e; the complete-data model is the final model).&lt;/p&gt;
&lt;p&gt;Generally, we need to include as many relevant variables into the imputation model. However, this general advise may not be very efficient as we may have multicollinearity and computational issue if we include too many predictors. As a rule of thumb, the number of included variables should be no more than 15-20. &lt;a href=&#34;https://www.jstatsoft.org/article/view/v045i03&#34;&gt;van Buuren &lt;em&gt;et al&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt; &lt;a href=&#34;https://www.jstatsoft.org/article/view/v045i03&#34;&gt;(2011)&lt;/a&gt; mentioned that increased in explained variance in linear regression is negligible after 15 variables are included.&lt;/p&gt;
&lt;p&gt;There are 4 steps suggested by &lt;a href=&#34;https://stefvanbuuren.name/publications/Flexible%20multivariate%20-%20TNO99054%201999.pdf&#34;&gt;van Buuren &lt;em&gt;et al.&lt;/em&gt; (1999)&lt;/a&gt; for variable selection in the case of big data:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;Include all variables that appear in the complete-data model (final model)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This may include the interaction terms as well (passive imputation can be used to specify the interaction terms in &lt;code&gt;mice&lt;/code&gt; package)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Include variable that have influence on the occurrence of the missing data&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This can be assessed by a correlation matrix between NAs variables and non-NAs variables&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Include variable that explain a considerable amount of variance&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This can be crudely assessed by a correlation matrix between NAs variables and non-NAs variables&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove variable that have too many missing values within the subgroup of incomplete cases&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This can be assessed by a proportion of usable cases (PUC) - how many cases with missing data in a certain variable have an observed values on the predictor variables&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;All these steps should be done on the key variables only. There is another more efficient yet laborious approach suggested by &lt;a href=&#34;https://stefvanbuuren.name/publications/Flexible%20multiple%20-%20TNO99045%201999.pdf&#34;&gt;Oudshoorn &lt;em&gt;et al.&lt;/em&gt; (1999)&lt;/a&gt;, which take into account important predictor of predictors. We are going to focus on the four steps above, and not cover the latter suggested approach in this post.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-codes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;R codes&lt;/h2&gt;
&lt;p&gt;These are the required packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(mice)
library(corrplot)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(airquality)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      Ozone           Solar.R           Wind             Temp      
##  Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
##  1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
##  Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
##  Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
##  3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
##  Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
##  NA&amp;#39;s   :37       NA&amp;#39;s   :7                                       
##      Month            Day      
##  Min.   :5.000   Min.   : 1.0  
##  1st Qu.:6.000   1st Qu.: 8.0  
##  Median :7.000   Median :16.0  
##  Mean   :6.993   Mean   :15.8  
##  3rd Qu.:8.000   3rd Qu.:23.0  
##  Max.   :9.000   Max.   :31.0  
## &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We have 2 variables; Ozone and Solar.R with missing values or NAs. We can further explore the pattern of missing variable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;md.pattern(airquality)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/index.en_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##     Wind Temp Month Day Solar.R Ozone   
## 111    1    1     1   1       1     1  0
## 35     1    1     1   1       1     0  1
## 5      1    1     1   1       0     1  1
## 2      1    1     1   1       0     0  2
##        0    0     0   0       7    37 44&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are 2 rows with NAs in Ozone and Solar.R, 35 rows with NAs only in Ozone, and 5 rows with NAs only in Solar.R. Next, we can check the correlation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor(airquality, use = &amp;quot;pairwise.complete.obs&amp;quot;) |&amp;gt;
  corrplot(method = &amp;quot;number&amp;quot;, type = &amp;quot;upper&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/variable-selection-for-imputation-model-in-mice/index.en_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The correlations of Ozone-Temp and Ozone-Wind are the highest. Now, let’s do a correlation between the NAs variable and non-NAs variable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor(y = airquality, x = !is.na(airquality), use = &amp;quot;pairwise.complete.obs&amp;quot;) |&amp;gt;
  round(digits = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         Ozone Solar.R  Wind Temp Month   Day
## Ozone      NA   -0.02 -0.05 0.00  0.26 -0.05
## Solar.R     0      NA  0.06 0.11  0.11  0.17
## Wind       NA      NA    NA   NA    NA    NA
## Temp       NA      NA    NA   NA    NA    NA
## Month      NA      NA    NA   NA    NA    NA
## Day        NA      NA    NA   NA    NA    NA&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can ignore the warnings and the NAs as only Ozone and Solar.R have a missing values. So, the highest correlation is 0.26 between Month-Ozone - correlation between Month values with Ozone-related NAs and Month values with non-Ozone-related NAs. The column variable in the correlation matrix is the indicators of NAs and the row variables is the variable with observed values.&lt;/p&gt;
&lt;p&gt;Lastly we can calculate ‘manually’ the PUC (proportion of usable cases). &lt;code&gt;md.pairs()&lt;/code&gt; here calculate the number of observation per variable pair.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;var_pair &amp;lt;- md.pairs(airquality)
round(var_pair$mr / (var_pair$mr + var_pair$mm), digits = 3)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         Ozone Solar.R Wind Temp Month Day
## Ozone   0.000   0.946    1    1     1   1
## Solar.R 0.714   0.000    1    1     1   1
## Wind      NaN     NaN  NaN  NaN   NaN NaN
## Temp      NaN     NaN  NaN  NaN   NaN NaN
## Month     NaN     NaN  NaN  NaN   NaN NaN
## Day       NaN     NaN  NaN  NaN   NaN NaN&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Low value of PUC indicate there is a little information on the predictor to impute the target NAs variable. NaN is shown as the variables have no missing values. The row variable are the target variables to be imputed, and the column variables are the predictors in imputation model. We can see that to impute Solar.R (on the row) Ozone has a little less information (0.714) compare to Wind, Temp, and Day. The diagonal elements will always be 0 or NaN. So, from here we can drop predictors with say, 0 PUC as they contain no information to help impute the target NAs variable.&lt;/p&gt;
&lt;p&gt;Actually, we have a nice function from &lt;code&gt;mice&lt;/code&gt; that can do what we ‘manually’ did just now.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;quickpred(airquality)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         Ozone Solar.R Wind Temp Month Day
## Ozone       0       1    1    1     1   0
## Solar.R     1       0    0    1     1   1
## Wind        0       0    0    0     0   0
## Temp        0       0    0    0     0   0
## Month       0       0    0    0     0   0
## Day         0       0    0    0     0   0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, the column variables are the predictors, and the row variables are the target NAs variables. The above matrix is known as predictor matrix, which going to be used in the imputation model. 1 denote a variable included as predictors and 0 vice versa. The two main arguments in &lt;code&gt;quickpred()&lt;/code&gt; are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;mincor - if any of the absolute values in the two correlation matrix that we did earlier above 0.1 (default), the predictors will be included in the predictor matrix&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;minpuc - the default values for PUC is 0, so the predictors are retained even if they have no information to help imputation model&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notice that, variable Day is excluded from the predictors of Ozone. The correlation values are 0 and -0.05 from the first and second correlation matrices, respectively which do not exceed the default setting of 0.1. That’s why, variable Day is excluded. Also, we can observe a similar situation for variable Wind , which is excluded from the predictors of Solar.R (the correlation coefficients are -0.60 and 0.06). The negative (-) sign does not matter as we actually evaluate the absolute values.&lt;/p&gt;
&lt;p&gt;Intuitively, we can change these two arguments as we see fit to do a variable selection for imputation model. Once we finalise our variable selection, we can do the multiple imputation using &lt;code&gt;mice()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Finalised variable selection
var_sel &amp;lt;- quickpred(airquality)
var_sel&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         Ozone Solar.R Wind Temp Month Day
## Ozone       0       1    1    1     1   0
## Solar.R     1       0    0    1     1   1
## Wind        0       0    0    0     0   0
## Temp        0       0    0    0     0   0
## Month       0       0    0    0     0   0
## Day         0       0    0    0     0   0&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Impute
imp &amp;lt;- mice(airquality, m = 5, predictorMatrix = var_sel, printFlag = F)
imp&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##   Ozone Solar.R    Wind    Temp   Month     Day 
##   &amp;quot;pmm&amp;quot;   &amp;quot;pmm&amp;quot;      &amp;quot;&amp;quot;      &amp;quot;&amp;quot;      &amp;quot;&amp;quot;      &amp;quot;&amp;quot; 
## PredictorMatrix:
##         Ozone Solar.R Wind Temp Month Day
## Ozone       0       1    1    1     1   0
## Solar.R     1       0    0    1     1   1
## Wind        0       0    0    0     0   0
## Temp        0       0    0    0     0   0
## Month       0       0    0    0     0   0
## Day         0       0    0    0     0   0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice that &lt;code&gt;mice()&lt;/code&gt; uses the predictor matrix that we provide.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.jstatsoft.org/article/view/v045i03&#34; class=&#34;uri&#34;&gt;https://www.jstatsoft.org/article/view/v045i03&lt;/a&gt; - paper written by Staf van Buuren (a bit outdated in terms of codes, but runnable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://stefvanbuuren.name/fimd/&#34; class=&#34;uri&#34;&gt;https://stefvanbuuren.name/fimd/&lt;/a&gt; - online book written by Stef van Buuren (See chapter 6.3.2 and 9.1.6)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>A short note on multiple imputation</title>
      <link>https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/</link>
      <pubDate>Fri, 29 Oct 2021 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/</guid>
      <description>
&lt;script src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;background&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;Missing data is quite challenging to deal with. Deleting it may be the easiest solution, but may not be the best solution. Missing data can be categorised into 3 types (&lt;a href=&#34;https://www.jstor.org/stable/2335739&#34;&gt;Rubin, 1976&lt;/a&gt;):&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;MCAR&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Missing Completely At Random&lt;/li&gt;
&lt;li&gt;Example; some of the observations are missing due to lost of records during the flood&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MAR&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Missing At Random&lt;/li&gt;
&lt;li&gt;Example; variable income are missing as some participant refuse to give their salary information which they deems as very personal information&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MNAR&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Missing Not At Random&lt;/li&gt;
&lt;li&gt;Example; weight variable is missing for morbidly obese participants since the scale is unable to weight them&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Out of the 3 types above, the most problematic is MNAR, though there exist methods to deal with this type. For example, the &lt;a href=&#34;https://cran.r-project.org/web/packages/miceMNAR/miceMNAR.pdf&#34;&gt;miceMNAR&lt;/a&gt; package in R.&lt;/p&gt;
&lt;p&gt;There are several approaches in handling missing data:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;Listwise-deletion&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Best approach if the amount of missingness is very small&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simple imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using mean/median/mode imputation&lt;/li&gt;
&lt;li&gt;This approach is not advisable as it leads to bias due to reduce variance, though the mean is not affected&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Single imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Simple imputation above is considered as single imputation as well&lt;/li&gt;
&lt;li&gt;This approach ignores uncertainty of the imputation and almost always underestimate the variance&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A bit advanced and it cover the limitation of single imputation approach&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, the main assumption for any imputation methods is the missingness should be MCAR or MAR.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;multiple-imputation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Multiple imputation&lt;/h2&gt;
&lt;p&gt;In short, there are 2 approaches of multiple imputation implemented by packages in R:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;Joint modeling (JM) or joint multivariate normal distribution multiple imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The main assumption for this method is that the observed data follows a multivariate normal distribution&lt;/li&gt;
&lt;li&gt;A violation of this assumption produces incorrect values, though a slight violation is still okay&lt;/li&gt;
&lt;li&gt;Some packages that implemented this method: &lt;code&gt;Amelia&lt;/code&gt; and &lt;code&gt;norm&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fully conditional specification (FCS) or conditional multiple imputation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Also known as multivariate imputation by chained equation (MICE)&lt;/li&gt;
&lt;li&gt;This approach is a bit flexible as distribution is assumed for each variable rather than the whole dataset&lt;/li&gt;
&lt;li&gt;Some package that implemented this method: &lt;code&gt;mice&lt;/code&gt; and &lt;code&gt;mi&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div id=&#34;example&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Example&lt;/h2&gt;
&lt;p&gt;In &lt;code&gt;mice&lt;/code&gt; package, the general steps are:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;code&gt;mice()&lt;/code&gt; - impute the NAs&lt;/li&gt;
&lt;li&gt;&lt;code&gt;with()&lt;/code&gt; - run the analysis (lm, glm, etc)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pool()&lt;/code&gt; - pool the results&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:unnamed-chunk-1&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;Screenshot%202021-11-20%20145517.png&#34; alt=&#34;Main steps in mice package.&#34; width=&#34;90%&#34; height=&#34;90%&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Main steps in mice package.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;These are the required packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(mice)
library(VIM)
#library(missForest) we want to use prodNA() function from this package
library(naniar)
library(niceFunction) #install from github (https://github.com/tengku-hanis/niceFunction)
library(dplyr)
library(gtsummary)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We going to produce some NAs randomly.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(123)
dat &amp;lt;- iris %&amp;gt;% 
  select(-Sepal.Length)%&amp;gt;% 
  missForest::prodNA(0.2) %&amp;gt;%  # randomly insert 20% NAs
  mutate(Sepal.Length = iris$Sepal.Length)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Explore the NAs and the data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;naniar::miss_var_summary(dat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 3
##   variable     n_miss pct_miss
##   &amp;lt;chr&amp;gt;         &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt;
## 1 Petal.Length     38     25.3
## 2 Sepal.Width      33     22  
## 3 Species          28     18.7
## 4 Petal.Width      21     14  
## 5 Sepal.Length      0      0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Some references recommend to remove variables with more than 50% NAs. However, we purposely introduce 20% NAs into our data.&lt;/p&gt;
&lt;p&gt;As a guideline, we can check for MCAR for our NAs.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;naniar::mcar_test(dat) #p &amp;gt; 0.05, MCAR is indicated&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 1 x 4
##   statistic    df p.value missing.patterns
##       &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;            &amp;lt;int&amp;gt;
## 1      38.8    40   0.522               14&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next step is to evaluate the pattern of missingness in our data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;md.pattern(dat, rotate.names = T, plot = T) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##    Sepal.Length Petal.Width Species Sepal.Width Petal.Length    
## 64            1           1       1           1            1   0
## 21            1           1       1           1            0   1
## 15            1           1       1           0            1   1
## 3             1           1       1           0            0   2
## 14            1           1       0           1            1   1
## 4             1           1       0           1            0   2
## 6             1           1       0           0            1   2
## 2             1           1       0           0            0   3
## 7             1           0       1           1            1   1
## 6             1           0       1           1            0   2
## 4             1           0       1           0            1   2
## 2             1           0       1           0            0   3
## 1             1           0       0           1            1   2
## 1             1           0       0           0            1   3
##               0          21      28          33           38 120&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;aggr(dat, prop = F, numbers = T) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We have 13 patterns (numbers on the right) of NAs in our data. These 2 functions work well with small dataset, but with a larger dataset (and with lot more pattern of NAs), it’s probably quite difficult to assess the pattern.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;matrixplot()&lt;/code&gt; probably more appropriate for a larger dataset.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;matrixplot(dat)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In terms of the missingness pattern, we can also assess the distribution of NAs of Sepal.Width is dependent on the variable Sepal.Length.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;niceFunction::histNA_byVar(dat, Sepal.Width, Sepal.Length)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;As we can see the distribution and range of the histograms of the NAs (True) and non-NAs (False) is quite similar. Thus, this may indicated that Sepal.Width is at least MAR. However, by right we should do this for each pair of numerical variable before jumping into any conclusion.&lt;/p&gt;
&lt;p&gt;Another good thing to assess is the correlation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Data with 1 = NAs, 0 = non-NAs
x &amp;lt;- as.data.frame(abs(is.na(dat))) %&amp;gt;% 
  dplyr::select(-Sepal.Length) #pick variable with NAs only&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Firstly, the correlation between the variables with missing data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor(x) %&amp;gt;% 
  corrplot::corrplot()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;No high correlation among variable with NAs. Secondly, let’s see correlation between NAs in a variable and the observed values of other variables.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor(dat %&amp;gt;% mutate(Species = as.numeric(Species)), x, use = &amp;quot;pairwise.complete.obs&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##               Sepal.Width Petal.Length  Petal.Width     Species
## Sepal.Width            NA  0.049158733 -0.065917718  0.09948263
## Petal.Length  0.042075695           NA -0.004572405 -0.17265919
## Petal.Width   0.096195805 -0.003320601           NA -0.11024288
## Species       0.045849046 -0.104143925 -0.081055707          NA
## Sepal.Length -0.006435044 -0.052871701 -0.091024799 -0.08527514&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, there is no high correlation. But, if we were to interpret this correlation matrix; the rows are the observed variables and the columns represent the missingness. For example, missing values of Sepal.Width is more likely to be missing for observations with a high value of Petal.Width (r = 0.05 indicates it’s highly unlikely though).&lt;/p&gt;
&lt;p&gt;Now, we can do multiple imputation. These are the methods in the &lt;code&gt;mice&lt;/code&gt; package:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;methods(mice)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] mice.impute.2l.bin       mice.impute.2l.lmer      mice.impute.2l.norm     
##  [4] mice.impute.2l.pan       mice.impute.2lonly.mean  mice.impute.2lonly.norm 
##  [7] mice.impute.2lonly.pmm   mice.impute.cart         mice.impute.jomoImpute  
## [10] mice.impute.lda          mice.impute.logreg       mice.impute.logreg.boot 
## [13] mice.impute.mean         mice.impute.midastouch   mice.impute.mnar.logreg 
## [16] mice.impute.mnar.norm    mice.impute.norm         mice.impute.norm.boot   
## [19] mice.impute.norm.nob     mice.impute.norm.predict mice.impute.panImpute   
## [22] mice.impute.passive      mice.impute.pmm          mice.impute.polr        
## [25] mice.impute.polyreg      mice.impute.quadratic    mice.impute.rf          
## [28] mice.impute.ri           mice.impute.sample       mice.mids               
## [31] mice.theme              
## see &amp;#39;?methods&amp;#39; for accessing help and source code&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By default, mice uses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pmm (predictive mean matching) for numeric data&lt;/li&gt;
&lt;li&gt;logreg (logistic regression imputation) for binary data, factor with 2 levels&lt;/li&gt;
&lt;li&gt;polyreg (polytomous regression imputation) for unordered categorical data (factor &amp;gt; 2 levels)&lt;/li&gt;
&lt;li&gt;polr (proportional odds model) for ordered, &amp;gt; 2 levels&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;let’s run the mice function to our data:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;imp &amp;lt;- mice(dat, m = 5, seed=1234, maxit = 5, printFlag = F) 
imp&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##  Sepal.Width Petal.Length  Petal.Width      Species Sepal.Length 
##        &amp;quot;pmm&amp;quot;        &amp;quot;pmm&amp;quot;        &amp;quot;pmm&amp;quot;    &amp;quot;polyreg&amp;quot;           &amp;quot;&amp;quot; 
## PredictorMatrix:
##              Sepal.Width Petal.Length Petal.Width Species Sepal.Length
## Sepal.Width            0            1           1       1            1
## Petal.Length           1            0           1       1            1
## Petal.Width            1            1           0       1            1
## Species                1            1           1       0            1
## Sepal.Length           1            1           1       1            0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we can do some diagnostic assessment on the imputed data. This is our imputed data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;imp$imp$Sepal.Width %&amp;gt;% head()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      1   2   3   4   5
## 5  3.4 3.4 4.1 3.1 3.5
## 13 3.2 3.1 3.2 3.6 3.1
## 14 3.1 3.2 2.9 3.4 3.0
## 23 3.6 3.2 3.0 3.8 3.1
## 26 4.1 3.0 3.1 3.5 3.0
## 34 3.4 3.7 3.7 3.4 4.4&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One important thing to check is the convergence. We are going increase the number of iteration for this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;imp_conv &amp;lt;- mice.mids(imp, maxit = 30, printFlag = F)
plot(imp_conv)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-16-1.png&#34; width=&#34;672&#34; /&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-16-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The line in the plot should be intermingled and no obvious trend should be observed. Our plot above indicates a convergence.&lt;/p&gt;
&lt;p&gt;We can also assess density plot of imputed data and the observed data. Blue color is the observed data and red color is the imputed data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;densityplot(imp)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-17-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can further assess variable Sepal.Width.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;densityplot(imp, ~ Sepal.Width | .imp)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-18-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Lastly, we can assess the strip plot. The imputed observations (red color) should not distributed too far from the observed data (blue color).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;stripplot(imp)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/a-short-note-on-multiple-imputation/index.en_files/figure-html/unnamed-chunk-19-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, once we finish the diagnostic checking, we can actually go back and change the imputation method for Sepal.Width, since the its distribution changes quite differently at each iteration. But, we are not going to do that, instead we are going to do the analysis.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# run regression
fit &amp;lt;- with(imp, lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species))
# pool all imputed set
pooled &amp;lt;- pool(fit) 
summary(pooled)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                term   estimate  std.error statistic       df      p.value
## 1       (Intercept)  2.2008307 0.34577321  6.364954 29.02484 5.859560e-07
## 2       Sepal.Width  0.5233500 0.09717217  5.385801 50.89918 1.854832e-06
## 3      Petal.Length  0.7409159 0.09020153  8.214006 12.73722 1.921415e-06
## 4       Petal.Width -0.3623895 0.18562168 -1.952301 22.34517 6.354332e-02
## 5 Speciesversicolor -0.3891112 0.28166528 -1.381467 15.07547 1.872683e-01
## 6  Speciesvirginica -0.5237106 0.42629920 -1.228505 10.82804 2.452897e-01&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since we have the original dataset without the NAs, we going to compare them.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mimpute &amp;lt;- 
  fit %&amp;gt;% 
  tbl_regression() #with mice

noimpute &amp;lt;- 
  dat %&amp;gt;% 
  lm(Sepal.Length ~ ., data = .) %&amp;gt;% 
  tbl_regression() #w/o mice

original &amp;lt;- 
  iris %&amp;gt;% 
  lm(Sepal.Length ~ ., data = .) %&amp;gt;% 
  tbl_regression() #original data

tbl_merge(
  tbls = list(mimpute, noimpute, original), 
  tab_spanner = c(&amp;quot;With MICE&amp;quot;, &amp;quot;Without MICE&amp;quot;, &amp;quot;Original data&amp;quot;)
)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;kofvwjwgme&#34; style=&#34;overflow-x:auto;overflow-y:auto;width:auto;height:auto;&#34;&gt;
&lt;style&gt;html {
  font-family: -apple-system, BlinkMacSystemFont, &#39;Segoe UI&#39;, Roboto, Oxygen, Ubuntu, Cantarell, &#39;Helvetica Neue&#39;, &#39;Fira Sans&#39;, &#39;Droid Sans&#39;, Arial, sans-serif;
}

#kofvwjwgme .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#kofvwjwgme .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#kofvwjwgme .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#kofvwjwgme .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#kofvwjwgme .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kofvwjwgme .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#kofvwjwgme .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#kofvwjwgme .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#kofvwjwgme .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#kofvwjwgme .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#kofvwjwgme .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#kofvwjwgme .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#kofvwjwgme .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#kofvwjwgme .gt_from_md &gt; :first-child {
  margin-top: 0;
}

#kofvwjwgme .gt_from_md &gt; :last-child {
  margin-bottom: 0;
}

#kofvwjwgme .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#kofvwjwgme .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#kofvwjwgme .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#kofvwjwgme .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#kofvwjwgme .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#kofvwjwgme .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#kofvwjwgme .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#kofvwjwgme .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kofvwjwgme .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#kofvwjwgme .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#kofvwjwgme .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#kofvwjwgme .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#kofvwjwgme .gt_left {
  text-align: left;
}

#kofvwjwgme .gt_center {
  text-align: center;
}

#kofvwjwgme .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#kofvwjwgme .gt_font_normal {
  font-weight: normal;
}

#kofvwjwgme .gt_font_bold {
  font-weight: bold;
}

#kofvwjwgme .gt_font_italic {
  font-style: italic;
}

#kofvwjwgme .gt_super {
  font-size: 65%;
}

#kofvwjwgme .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 65%;
}
&lt;/style&gt;
&lt;table class=&#34;gt_table&#34;&gt;
  
  &lt;thead class=&#34;gt_col_headings&#34;&gt;
    &lt;tr&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_left&#34; rowspan=&#34;2&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Characteristic&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_center gt_columns_top_border gt_column_spanner_outer&#34; rowspan=&#34;1&#34; colspan=&#34;3&#34;&gt;
        &lt;span class=&#34;gt_column_spanner&#34;&gt;With MICE&lt;/span&gt;
      &lt;/th&gt;
      &lt;th class=&#34;gt_center gt_columns_top_border gt_column_spanner_outer&#34; rowspan=&#34;1&#34; colspan=&#34;3&#34;&gt;
        &lt;span class=&#34;gt_column_spanner&#34;&gt;Without MICE&lt;/span&gt;
      &lt;/th&gt;
      &lt;th class=&#34;gt_center gt_columns_top_border gt_column_spanner_outer&#34; rowspan=&#34;1&#34; colspan=&#34;3&#34;&gt;
        &lt;span class=&#34;gt_column_spanner&#34;&gt;Original data&lt;/span&gt;
      &lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Beta&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;95% CI&lt;/strong&gt;&lt;sup class=&#34;gt_footnote_marks&#34;&gt;1&lt;/sup&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;p-value&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Beta&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;95% CI&lt;/strong&gt;&lt;sup class=&#34;gt_footnote_marks&#34;&gt;1&lt;/sup&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;p-value&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;Beta&lt;/strong&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;95% CI&lt;/strong&gt;&lt;sup class=&#34;gt_footnote_marks&#34;&gt;1&lt;/sup&gt;&lt;/th&gt;
      &lt;th class=&#34;gt_col_heading gt_columns_bottom_border gt_center&#34; rowspan=&#34;1&#34; colspan=&#34;1&#34;&gt;&lt;strong&gt;p-value&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody class=&#34;gt_table_body&#34;&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;Sepal.Width&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.52&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.33, 0.72&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.48&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.17, 0.79&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.003&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.50&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.33, 0.67&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;Petal.Length&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.74&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.55, 0.94&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.71&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.51, 0.90&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.83&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.69, 1.0&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;0.001&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;Petal.Width&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.36&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.75, 0.02&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.064&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.35&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.85, 0.14&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.2&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.32&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.61, -0.02&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.039&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34;&gt;Species&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34; style=&#34;text-align: left; text-indent: 10px;&#34;&gt;setosa&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;—&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34; style=&#34;text-align: left; text-indent: 10px;&#34;&gt;versicolor&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.39&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.0, 0.21&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.2&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.42&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.1, 0.30&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.3&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.72&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.2, -0.25&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.003&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td class=&#34;gt_row gt_left&#34; style=&#34;text-align: left; text-indent: 10px;&#34;&gt;virginica&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.52&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.5, 0.42&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.2&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-0.42&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.5, 0.63&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.4&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.0&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;-1.7, -0.36&lt;/td&gt;
&lt;td class=&#34;gt_row gt_center&#34;&gt;0.003&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
  
  &lt;tfoot&gt;
    &lt;tr class=&#34;gt_footnotes&#34;&gt;
      &lt;td colspan=&#34;10&#34;&gt;
        &lt;p class=&#34;gt_footnote&#34;&gt;
          &lt;sup class=&#34;gt_footnote_marks&#34;&gt;
            &lt;em&gt;1&lt;/em&gt;
          &lt;/sup&gt;
           
          CI = Confidence Interval
          &lt;br /&gt;
        &lt;/p&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tfoot&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;There is a different in the result between the original dataset (no NAs) and with mice imputation. Probably, exploring other imputation methods will produce a better result.&lt;/p&gt;
&lt;p&gt;There are a lot more that are not cover in this post. For example &lt;a href=&#34;https://www.gerkovink.com/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html&#34;&gt;passive imputation and post-processing&lt;/a&gt;. In fact, there are a series of &lt;a href=&#34;https://github.com/amices/mice#vignettes&#34;&gt;vignettes&lt;/a&gt; written by Gerko Vink and Stef van Buuren (both are the authors of &lt;code&gt;mice&lt;/code&gt;) which provides a good tutorial on using &lt;code&gt;mice&lt;/code&gt; though quite advanced.&lt;/p&gt;
&lt;p&gt;Suggested online books (though, I have not really studied both of the books yet):&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;a href=&#34;https://stefvanbuuren.name/fimd/&#34;&gt;Flexible imputation of missing data&lt;/a&gt; by Stef van Buuren&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://bookdown.org/mwheymans/bookmi/&#34;&gt;Applied missing data analysis with SPSS and (R)Studio&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;References for this post:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;a href=&#34;http://www.cs.uni.edu/~jacobson/4772/week11/R_in_Action.pdf&#34;&gt;R in Action, Data analysis and graphics with R&lt;/a&gt; (Chapter 15)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://data.library.virginia.edu/getting-started-with-multiple-imputation-in-r/&#34; class=&#34;uri&#34;&gt;https://data.library.virginia.edu/getting-started-with-multiple-imputation-in-r/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://stats.idre.ucla.edu/r/faq/how-do-i-perform-multiple-imputation-using-predictive-mean-matching-in-r/&#34; class=&#34;uri&#34;&gt;https://stats.idre.ucla.edu/r/faq/how-do-i-perform-multiple-imputation-using-predictive-mean-matching-in-r/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.jstatsoft.org/article/view/v045i03&#34;&gt;mice: Multivariate Imputation by Chained Equations in R&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
