Chapter 4 Missing values
4.1 Checking missing value
We want to check if there is any missing value in our covid_state and confirmed_d3 data set.
## state confirmed_Nov_30 death_Nov_30 population confirmed_rate
## 2 Alaska 32418 121 731545 0.04431443
## 9 District of Columbia 21552 680 705749 0.03053777
## death_rate Biden Trump Biden Support Rate Support Which Party
## 2 0.003732494 0 0 NaN <NA>
## 9 0.031551596 0 0 NaN <NA>
## health_cost_per_captia death_per_100 death_per_1M
## 2 11064 0.01654034 165.4034
## 9 11944 0.09635154 963.5154
## state confirmed_Nov_30 confirmed_Sep_30 confirmed_July_30 confirmed_May_30
## 2 AK 32576 8845 3605 455
## 9 <NA> 21552 15326 12057 8717
## confirmed_Mar_30 population Nov_rate Sep_rate July_rate May_rate
## 2 119 731545 0.04453041 0.01209085 0.004927927 0.0006219713
## 9 401 705749 0.03053777 0.02171594 0.017083977 0.0123514167
## March_rate Biden Trump Biden Support Rate Support Which Party
## 2 0.0001626694 0 0 NaN <NA>
## 9 0.0005681907 0 0 NaN <NA>
## health_cost_per_captia
## 2 11064
## 9 11944
We can find that Alaska and D.C have NA value for support biden rate
variable. The county name for election data in D.C and Alaska are different from the covid_19 data. So we might loosing data when merging them together.
4.2 Fixing the NA in dataset
Because we want to know the election status for each state without any NA, we decide to calculate the support rate for Alaska and D.C by adding the data in election.
## state confirmed_Nov_30 death_Nov_30 population confirmed_rate
## 1 Alabama 249524 3578 4903185 0.05089019
## 2 Alaska 32418 121 731545 0.04431443
## 3 Arizona 326817 6639 7278717 0.04490036
## 4 Arkansas 154865 2502 3017804 0.05131712
## 5 California 1230264 19173 39512223 0.03113629
## 6 Colorado 232878 3037 5758736 0.04043908
## death_rate Biden Trump Biden Support Rate Support Which Party
## 1 0.014339302 849648 1441168 0.3708932 Republican
## 2 0.003732494 153405 189892 0.4468580 Republican
## 3 0.020314121 1672143 1661686 0.5015683 Democratic
## 4 0.016156007 420328 757405 0.3568958 Republican
## 5 0.015584460 11109764 6005961 0.6490969 Democratic
## 6 0.013041163 1804352 1364607 0.5693832 Democratic
## health_cost_per_captia death_per_100 death_per_1M
## 1 7281 0.07297298 729.7298
## 2 11064 0.01654034 165.4034
## 3 6452 0.09121113 912.1113
## 4 7408 0.08290797 829.0797
## 5 7549 0.04852423 485.2423
## 6 6804 0.05273727 527.3727