::p_load(sf, sfdep, tmap, tidyverse, knitr, DT) pacman
Take-Home-Exercise 1: Geospatial Analytics for Public Good
apply appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.
Overview
Introduction
The project’s background focuses on leveraging digital data from urban infrastructures, such as public transport and utilities, to understand and analyze human movement patterns in cities. With the increasing use of technologies like GPS and RFID in vehicles, and data collection through smart cards, a vast amount of movement data is available. This data is rich in patterns and structures that can illuminate human behaviors and contribute to more effective urban management and transport services. However, a significant challenge lies in the underutilization of this data. Current practices often limit its use to basic tracking and mapping in Geographic Information Systems (GIS) due to the limited capabilities of these systems in handling complex spatial and spatio-temporal data analysis.
Libraries
sf
- Support for simple features, a standardized way to encode spatial vector data. Binds to ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations. Uses by default the ‘s2’ package for spherical geometry operations on ellipsoidal (long/lat) coordinates.
tidyverse
- Loading the core tidyverse packages which will be used for data wrangling and visualisation.tmap
- Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.sfdep - An interface to ‘spdep’ to integrate with ‘sf’ objects and the ‘tidyverse’.
knitr - The R package knitr is a general-purpose literate programming engine, with lightweight API’s designed to give users full control of the output without heavy coding work. It combines many features into one package with slight tweaks motivated from my everyday use of Sweave.
Data Preparation
Apstial data
For the purpose of this take-home exercise, Passenger Volume by Origin Destination Bus Stops downloaded from LTA DataMall will be used.
Geospatial data
Two geospatial data will be used in this study, they are:
Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.
hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.
Importing of data
<- read_csv("Take-home Exercise 1/data/aspatial/origin_destination_bus_202310.csv") bs
Rows: 5694297 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let use display the bs tibble data table by using the code chunk below.
glimpse(bs)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
A quick check of odbus tibble data frame shows that the values in OROGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type. Hence, the code chunk below is used to convert these data values into character data type.
$ORIGIN_PT_CODE <- as.factor(bs$ORIGIN_PT_CODE)
bs$DESTINATION_PT_CODE <- as.factor(bs$DESTINATION_PT_CODE) bs
Extracting the study data
For the purpose of this exercise, we will extract commuting flows on weekday and between 6 and 9 o’clock.
# Weekday morning peak (6 AM to 9 AM)
<- bs %>%
odbus6_9 filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
# Weekday afternoon peak (5 PM to 8 PM)
<- bs %>%
odbus17_20 filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
# Weekend/holiday morning peak (11 AM to 2 PM)
<- bs %>%
odbus11_14 filter(DAY_TYPE == "WEEKEND_OR_HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
# Weekend/holiday evening peak (4 PM to 7 PM)
<- bs %>%
odbus16_19 filter(DAY_TYPE == "WEEKEND_OR_HOLIDAY") %>%
filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
kable(head(odbus6_9))
ORIGIN_PT_CODE | TRIPS |
---|---|
01012 | 1770 |
01013 | 841 |
01019 | 1530 |
01029 | 2483 |
01039 | 2919 |
01059 | 1734 |
Table below shows the content of bs6_9
datatable(odbus6_9)
datatable(odbus17_20)
datatable(odbus11_14)
datatable(odbus16_19)
We will save the output in rds format for future used.
write_rds(odbus6_9, "Take-home Exercise 1/data/rds/odbus6_9.rds")
write_rds(odbus17_20, "Take-home Exercise 1/data/rds/odbus17_20.rds")
write_rds(odbus11_14, "Take-home Exercise 1/data/rds/odbus11_14.rds")
write_rds(odbus16_19, "Take-home Exercise 1/data/rds/odbus16_19.rds")
The code chunk below will be used to import the save bs6_9.rds into R environment.
<- read_rds("Take-home Exercise 1/data/rds/odbus6_9.rds")
odbus6_9 <- read_rds("Take-home Exercise 1/data/rds/odbus17_20.rds")
odbus17_20 <- read_rds("Take-home Exercise 1/data/rds/odbus11_14.rds")
odbus11_14 <- read_rds("Take-home Exercise 1/data/rds/odbus16_19.rds") odbus16_19
Working with Geospatial Data
For the purpose of this exercise, two geospatial data will be used. They are:
BusStop: This data provides the location of bus stop as at last quarter of 202310.
Hexagon: hexagonal girds usually more visually appealing and use them in most of my mapping projects.
Importing geospatial data
Bus stop:
<- st_read(dsn = "Take-home Exercise 1/data/geospatial",
busstop layer = "BusStop") %>%
st_transform(crs = 3414)
Reading layer `BusStop' from data source
`D:\y1zaoWang\ISSS624\Take-home Exercise 1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
glimpse(busstop)
Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
Hexagon:
= st_make_grid(busstop, c(150, 150), what = "polygons", square = FALSE)
honeycomb_busstop
= st_sf(honeycomb_busstop) %>%
honeycomb_busstop_sf mutate(grid_id = 1:length(lengths(honeycomb_busstop)))
$n_colli = lengths(st_intersects(honeycomb_busstop_sf, busstop))
honeycomb_busstop_sf
= filter(honeycomb_busstop_sf, n_colli > 0) honeycomb_count
Combining both data frame by using left join
As mentioned we want to visualise the spatial patterns on the map of Singapore, this will require us to join the two existing dataframes in order to have a complete data framework.
We will join the two dataframes using “BusStop” in BUS_STOP_N and “Hexagon” in grid_id, both referring to the code of the city.
<- st_intersection(busstop, honeycomb_count) %>%
busstop_hxgn_grid select(BUS_STOP_N, grid_id) %>%
st_drop_geometry()
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
st_intersection()
is used to perform point and polygon overly and the output will be in point sf object.select()
of dplyr package is then use to retain only BUS_STOP_N and grid_id in the sf data frame.five bus stops are excluded in the resultant data frame because they are outside of Singapore bpundary.
<- left_join(busstop_hxgn_grid,odbus6_9, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
wkd6_9_hxgn_grid rename(ORIGIN_BS = BUS_STOP_N,
ORIGIN_SZ = grid_id) %>%
group_by(ORIGIN_BS, ORIGIN_SZ) %>%
summarise(TOT_TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_BS'. You can override using the
`.groups` argument.
<- left_join(busstop_hxgn_grid,odbus17_20, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
wkd17_20_hxgn_grid rename(ORIGIN_BS = BUS_STOP_N,
ORIGIN_SZ = grid_id) %>%
group_by(ORIGIN_BS, ORIGIN_SZ) %>%
summarise(TOT_TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_BS'. You can override using the
`.groups` argument.
<- left_join(busstop_hxgn_grid,odbus16_19, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
wkd16_19_hxgn_grid rename(ORIGIN_BS = BUS_STOP_N,
ORIGIN_SZ = grid_id) %>%
group_by(ORIGIN_BS, ORIGIN_SZ) %>%
summarise(TOT_TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_BS'. You can override using the
`.groups` argument.
<- left_join(busstop_hxgn_grid,odbus11_14, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
wkd11_14_hxgn_grid rename(ORIGIN_BS = BUS_STOP_N,
ORIGIN_SZ = grid_id) %>%
group_by(ORIGIN_BS, ORIGIN_SZ) %>%
summarise(TOT_TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_BS'. You can override using the
`.groups` argument.
glimpse(wkd6_9_hxgn_grid)
Rows: 5,153
Columns: 3
Groups: ORIGIN_BS [5,145]
$ ORIGIN_BS <chr> "01012", "01013", "01019", "01029", "01039", "01059", "01109…
$ ORIGIN_SZ <int> 36068, 36172, 36068, 36274, 36584, 36688, 36689, 36379, 3668…
$ TOT_TRIPS <dbl> 1770, 841, 1530, 2483, 2919, 1734, 200, 8593, 7749, 3687, 57…
head(wkd6_9_hxgn_grid)
# A tibble: 6 × 3
# Groups: ORIGIN_BS [6]
ORIGIN_BS ORIGIN_SZ TOT_TRIPS
<chr> <int> <dbl>
1 01012 36068 1770
2 01013 36172 841
3 01019 36068 1530
4 01029 36274 2483
5 01039 36584 2919
6 01059 36688 1734
check for duplicating records.
<- wkd6_9_hxgn_grid %>%
duplicate group_by_all() %>%
filter(n()>1) %>%
ungroup()
<- unique(wkd6_9_hxgn_grid)
origin_data
<- left_join(honeycomb_count,
origintrip_bs
wkd6_9_hxgn_grid,by = c("grid_id" = "ORIGIN_SZ")) %>%
rename(geometry = "honeycomb_busstop")
glimpse(origintrip_bs)
Rows: 5,153
Columns: 5
$ grid_id <int> 110, 731, 738, 946, 1142, 1251, 1354, 1354, 1361, 1465, 1563…
$ n_colli <int> 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ORIGIN_BS <chr> "25059", "25751", "26379", "26369", "25761", "26389", "25711…
$ TOT_TRIPS <dbl> 103, 52, 78, 53, 185, 301, 251, 815, 60, 9, 760, 64, 257, 39…
$ geometry <POLYGON [m]> POLYGON ((3970.122 27954.34..., POLYGON ((4420.122 2…
head(origintrip_bs)
Simple feature collection with 6 features and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3895.122 ymin: 27954.34 xmax: 4870.122 ymax: 31245.24
Projected CRS: SVY21 / Singapore TM
grid_id n_colli ORIGIN_BS TOT_TRIPS geometry
1 110 1 25059 103 POLYGON ((3970.122 27954.34...
2 731 1 25751 52 POLYGON ((4420.122 28733.77...
3 738 1 26379 78 POLYGON ((4420.122 30552.42...
4 946 1 26369 53 POLYGON ((4570.122 31072.04...
5 1142 1 25761 185 POLYGON ((4720.122 28473.96...
6 1251 1 26389 301 POLYGON ((4795.122 30162.71...
Plotting a choropleth map
Using the steps you had learned, prepare a choropleth map showing the distribution of passenger trips at planning sub-zone level.
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(origintrip_bs) +
tm_dots(
col = "TOT_TRIPS",
style = "quantile",
palette = "Blues",
title = "Passenger trips"
+
) tm_layout(
main.title = "Passenger trips generated at planning sub-zone level",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE
+
) tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha = 0.2) +
tm_credits(
"Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA",
position = c("left", "bottom")
)
Credits not supported in view mode.
Compass not supported in view mode.
Clusters of darker blue dots, especially those representing the highest quantile of 7,592 to 357,043 trips, are likely found in commercial or central business districts and around major transportation nodes such as train stations or bus interchanges. The spread and concentration of these clusters can reveal urban density, the layout of transportation infrastructure, and possibly even socioeconomic activity levels.
tmap_mode("view")
tmap mode set to interactive viewing
= tm_shape(honeycomb_count) +
map_honeycomb tm_fill(
col = "n_colli",
palette = "Reds",
style = "cont",
title = "Number of collisions",
id = "grid_id",
showNA = FALSE,
alpha = 0.6,
popup.vars = c(
"Number of collisions: " = "n_colli"
),popup.format = list(
n_colli = list(format = "f", digits = 0)
)+
) tm_borders(col = "grey40", lwd = 0.7)
map_honeycomb
The distribution of collisions appears to be somewhat scattered, with no clear pattern of concentration in any particular area. This could imply that the risk of collisions is widespread across the region rather than being localized to specific zones. However, there may be subtle clusters of higher collision frequencies that are not immediately visible due to the scale or the granularity of the data represented.
tmap_mode("view")
tmap mode set to interactive viewing
tmap_options(check.and.fix = TRUE)
tm_shape(origintrip_bs) +
tm_dots(col = "TOT_TRIPS", # Use tm_dots for point data
style = "quantile",
palette = "Blues",
title = "Passenger trips") +
tm_layout(main.title = "Passenger trips generated at planning sub-zone level",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha = 0.2) +
tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA",
position = c("left", "bottom"))
Credits not supported in view mode.
Compass not supported in view mode.
Local Indicators of Spatial Association (LISA) Analysis
Compute LISA of the passengers trips generate by origin at hexagon level.
Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)
Deriving contiguity weights: Queen’s method