::p_load(sf, tidyverse, tmap, spdep, funModeling) pacman
In-class_Exercise 2
Overview
In this exercise, we are to find out if functionally similar water points positively co-occur in Nigeria.
Getting Started
In the code chunk below, p_load() of pacman package is used to install and load the following R packages into R environment:
sf
tidyverse
tmap
spdep
funModeling, to be used for rapid Exploratory Data Analysis
Importing Geospatial Data
In this in-class exercise, two geospatial data sets will be used:
geo_export
nga_admbnda_adm2_osgof_20190417
Importing water point geospatial data
First, we are going to import the water point geospatial data (i.e. geo_export
) by using the code chunk below.
<- st_read(dsn = "rawdata",
wp layer = "geo_export",
crs = 4326) %>%
filter(clean_coun == "Nigeria")
Things to learn from the code chunk above:
st_read() of sf package is used to import
geo_export
shapefile into R environment and save the imported geospatial data into simple feature data table.filter() of dplyr package is used to extract water point records of Nigeria only.
Note: Avoid performing transformation if you plan to use st_intersects() of sf package in the later stage of the geoprocessing. This is because st_intersects() only works correctly if the geospatial data are in geographic coordinate system (i.e wgs84).
Next, write_rds() of readr package is used to save the extracted sf data table (i.e. wp
) into an output file in rds data format. The output file is called wp_nga.rds
and it is saved in rawdata sub-folder, which will not be uploaded to Git.
<- write_rds(wp,
wp_nga "rawdata/wp_nga.rds")
Importing Nigeria LGA boundary data
Now, we are going to import the Local Government Area (LGA) boundary data into R environment by using the code chunk below.
<- st_read(dsn = "data",
nga layer = "nga_admbnda_adm2_osgof_20190417",
crs = 4326)
Reading layer `nga_admbnda_adm2_osgof_20190417' from data source
`C:\Jacobche\ISSS624\In-class_Ex\data' using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
Thing to learn from the code chunk above.
- st_read() of sf package is used to import
nga_admbnda_adm2_osgof_20190417
shapefile into R environment and save the imported geospatial data into simple feature data table.
Data Wrangling
Recoding NA values into string
In the code chunk below, replace_na() is used to recode all the NA values in status_cle
field into Unknown.
<- read_rds("rawdata/wp_nga.rds") %>%
wp_nga mutate(status_cle = replace_na(status_cle, "Unknown"))
Exploratory Data Analysis
In the code chunk below, freq() of funModeling package is used to display the distribution of status_cle
field in wp_nga
.
freq(data=wp_nga,
input = 'status_cle')
status_cle frequency percentage cumulative_perc
1 Functional 45883 48.29 48.29
2 Non-Functional 29385 30.93 79.22
3 Unknown 10656 11.22 90.44
4 Functional but needs repair 4579 4.82 95.26
5 Non-Functional due to dry season 2403 2.53 97.79
6 Functional but not in use 1686 1.77 99.56
7 Abandoned/Decommissioned 234 0.25 99.81
8 Abandoned 175 0.18 99.99
9 Non functional due to dry season 7 0.01 100.00
Extracting Water Point Data
In this section, we will extract the water point records by using classes in status_cle
field.
Extracting functional water point
In the code chunk below, filter() of dplyr is used to select functional water points.
<- wp_nga %>%
wpt_functional filter(status_cle %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair"))
Exploratory Data Analysis (functional)
In the code chunk below, freq() of funModeling package is used to display the distribution of status_cle
field in wpt_functional
.
freq(data=wpt_functional,
input = 'status_cle')
status_cle frequency percentage cumulative_perc
1 Functional 45883 87.99 87.99
2 Functional but needs repair 4579 8.78 96.77
3 Functional but not in use 1686 3.23 100.00
Extracting non-functional water point
In the code chunk below, filter() of dplyr is used to select non-functional water points.
<- wp_nga %>%
wpt_nonfunctional filter(status_cle %in%
c("Abandoned/Decommissioned",
"Abandoned",
"Non-Functional",
"Non functional due to dry season",
"Non-Functional due to dry season"))
Exploratory Data Analysis (non-functional)
In the code chunk below, freq() of funModeling package is used to display the distribution of status_cle
field in wpt_nonfunctional
.
freq(data=wpt_nonfunctional,
input = 'status_cle')
status_cle frequency percentage cumulative_perc
1 Non-Functional 29385 91.25 91.25
2 Non-Functional due to dry season 2403 7.46 98.71
3 Abandoned/Decommissioned 234 0.73 99.44
4 Abandoned 175 0.54 99.98
5 Non functional due to dry season 7 0.02 100.00
Extracting water point with Unknown class
In the code chunk below, filter() of dplyr is used to select water points with unknown
status.
<- wp_nga %>%
wpt_unknown filter(status_cle == "Unknown")
Performing Point-in-Polygon Count
The code chunk below performs two operations at one go. Firstly, identify water points located inside each LGA by using st_intersects(). Next, length() of Base R is used to calculate numbers of water points that fall inside each LGA.
<- nga %>%
nga_wp mutate(`total wpt` = lengths(
st_intersects(nga, wp_nga))) %>%
mutate(`wpt functional` = lengths(
st_intersects(nga, wpt_functional))) %>%
mutate(`wpt non-functional` = lengths(
st_intersects(nga, wpt_nonfunctional))) %>%
mutate(`wpt unknown` = lengths(
st_intersects(nga, wpt_unknown)))
Saving the Analytical Data Table
The code chunk below computes the proportion of functional and non-functional water point at LGA level.
<- nga_wp %>%
nga_wp mutate(pct_functional = `wpt functional`/`total wpt`) %>%
mutate(`pct_non-functional` = `wpt non-functional`/`total wpt`) %>%
select(3:4, 9:10, 18:23)
Things to learn from the code chunk above:
mutate() of dplyr package is used to derive two fields namely
pct_functional
andpct_non-functional
.to keep the file size small, select() of dplyr is used to retain only fields 3, 4, 9, 10, 18, 19, 20, 21, 22 and 23.
Now, we have the tidy sf data table for subsequent analysis. We will save the sf data table into rds format.
write_rds(nga_wp, "data/nga_wp.rds")
Visualising the spatial distribution of water points
The code below uses qtm() of tmap package to plot side-by-side choropleth maps showing various water points distribution by LGA levels in Nigeria.
<- read_rds("data/nga_wp.rds")
nga_wp <- qtm(nga_wp, "total wpt") +
total tm_layout(scale = 0.7)
<- qtm(nga_wp, "wpt functional")+
wp_functional tm_layout(scale = 0.7)
<- qtm(nga_wp, "wpt non-functional")+
wp_nonfunctional tm_layout(scale = 0.6)
<- qtm(nga_wp, "wpt unknown")+
unknown tm_layout(scale = 0.7)
tmap_arrange(total, wp_functional, wp_nonfunctional, unknown, nrow=2, ncol=2)