Chapter 4 Missing values

4.1 Checking missing value

We want to check if there is any missing value in our covid_state and confirmed_d3 data set.

##                  state confirmed_Nov_30 death_Nov_30 population confirmed_rate
## 2               Alaska            32418          121     731545     0.04431443
## 9 District of Columbia            21552          680     705749     0.03053777
##    death_rate Biden Trump Biden Support Rate Support Which Party
## 2 0.003732494     0     0                NaN                <NA>
## 9 0.031551596     0     0                NaN                <NA>
##   health_cost_per_captia death_per_100 death_per_1M
## 2                  11064    0.01654034     165.4034
## 9                  11944    0.09635154     963.5154
##   state confirmed_Nov_30 confirmed_Sep_30 confirmed_July_30 confirmed_May_30
## 2    AK            32576             8845              3605              455
## 9  <NA>            21552            15326             12057             8717
##   confirmed_Mar_30 population   Nov_rate   Sep_rate   July_rate     May_rate
## 2              119     731545 0.04453041 0.01209085 0.004927927 0.0006219713
## 9              401     705749 0.03053777 0.02171594 0.017083977 0.0123514167
##     March_rate Biden Trump Biden Support Rate Support Which Party
## 2 0.0001626694     0     0                NaN                <NA>
## 9 0.0005681907     0     0                NaN                <NA>
##   health_cost_per_captia
## 2                  11064
## 9                  11944

We can find that Alaska and D.C have NA value for support biden rate variable. The county name for election data in D.C and Alaska are different from the covid_19 data. So we might loosing data when merging them together.

4.2 Fixing the NA in dataset

Because we want to know the election status for each state without any NA, we decide to calculate the support rate for Alaska and D.C by adding the data in election.

##        state confirmed_Nov_30 death_Nov_30 population confirmed_rate
## 1    Alabama           249524         3578    4903185     0.05089019
## 2     Alaska            32418          121     731545     0.04431443
## 3    Arizona           326817         6639    7278717     0.04490036
## 4   Arkansas           154865         2502    3017804     0.05131712
## 5 California          1230264        19173   39512223     0.03113629
## 6   Colorado           232878         3037    5758736     0.04043908
##    death_rate    Biden   Trump Biden Support Rate Support Which Party
## 1 0.014339302   849648 1441168          0.3708932          Republican
## 2 0.003732494   153405  189892          0.4468580          Republican
## 3 0.020314121  1672143 1661686          0.5015683          Democratic
## 4 0.016156007   420328  757405          0.3568958          Republican
## 5 0.015584460 11109764 6005961          0.6490969          Democratic
## 6 0.013041163  1804352 1364607          0.5693832          Democratic
##   health_cost_per_captia death_per_100 death_per_1M
## 1                   7281    0.07297298     729.7298
## 2                  11064    0.01654034     165.4034
## 3                   6452    0.09121113     912.1113
## 4                   7408    0.08290797     829.0797
## 5                   7549    0.04852423     485.2423
## 6                   6804    0.05273727     527.3727