Take-Home-Exercise 1: Geospatial Analytics for Public Good

Author

WYZ

apply appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.

Overview

Introduction

The project’s background focuses on leveraging digital data from urban infrastructures, such as public transport and utilities, to understand and analyze human movement patterns in cities. With the increasing use of technologies like GPS and RFID in vehicles, and data collection through smart cards, a vast amount of movement data is available. This data is rich in patterns and structures that can illuminate human behaviors and contribute to more effective urban management and transport services. However, a significant challenge lies in the underutilization of this data. Current practices often limit its use to basic tracking and mapping in Geographic Information Systems (GIS) due to the limited capabilities of these systems in handling complex spatial and spatio-temporal data analysis.

Libraries

  • sf - Support for simple features, a standardized way to encode spatial vector data. Binds to ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations. Uses by default the ‘s2’ package for spherical geometry operations on ellipsoidal (long/lat) coordinates.
  • tidyverse - Loading the core tidyverse packages which will be used for data wrangling and visualisation.

  • tmap - Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.

  • sfdep - An interface to ‘spdep’ to integrate with ‘sf’ objects and the ‘tidyverse’.

  • knitr - The R package knitr is a general-purpose literate programming engine, with lightweight API’s designed to give users full control of the output without heavy coding work. It combines many features into one package with slight tweaks motivated from my everyday use of Sweave.

pacman::p_load(sf, sfdep, tmap, tidyverse, knitr, DT)

Data Preparation

Apstial data

For the purpose of this take-home exercise, Passenger Volume by Origin Destination Bus Stops downloaded from LTA DataMall will be used.

Geospatial data

Two geospatial data will be used in this study, they are:

  • Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.

  • hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.

Importing of data

bs <- read_csv("Take-home Exercise 1/data/aspatial/origin_destination_bus_202310.csv")
Rows: 5694297 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let use display the bs tibble data table by using the code chunk below.

glimpse(bs)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…

A quick check of odbus tibble data frame shows that the values in OROGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type. Hence, the code chunk below is used to convert these data values into character data type.

bs$ORIGIN_PT_CODE <- as.factor(bs$ORIGIN_PT_CODE)
bs$DESTINATION_PT_CODE <- as.factor(bs$DESTINATION_PT_CODE) 

Extracting the study data

For the purpose of this exercise, we will extract commuting flows on weekday and between 6 and 9 o’clock.

# Weekday morning peak (6 AM to 9 AM)
odbus6_9 <- bs %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

# Weekday afternoon peak (5 PM to 8 PM)
odbus17_20 <- bs %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

# Weekend/holiday morning peak (11 AM to 2 PM)
odbus11_14 <- bs %>%
  filter(DAY_TYPE == "WEEKEND_OR_HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

# Weekend/holiday evening peak (4 PM to 7 PM)
odbus16_19 <- bs %>%
  filter(DAY_TYPE == "WEEKEND_OR_HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
kable(head(odbus6_9))
ORIGIN_PT_CODE TRIPS
01012 1770
01013 841
01019 1530
01029 2483
01039 2919
01059 1734

Table below shows the content of bs6_9

datatable(odbus6_9)
datatable(odbus17_20)
datatable(odbus11_14)
datatable(odbus16_19)

We will save the output in rds format for future used.

write_rds(odbus6_9, "Take-home Exercise 1/data/rds/odbus6_9.rds")
write_rds(odbus17_20, "Take-home Exercise 1/data/rds/odbus17_20.rds")
write_rds(odbus11_14, "Take-home Exercise 1/data/rds/odbus11_14.rds")
write_rds(odbus16_19, "Take-home Exercise 1/data/rds/odbus16_19.rds")

The code chunk below will be used to import the save bs6_9.rds into R environment.

odbus6_9 <- read_rds("Take-home Exercise 1/data/rds/odbus6_9.rds")
odbus17_20 <- read_rds("Take-home Exercise 1/data/rds/odbus17_20.rds")
odbus11_14 <- read_rds("Take-home Exercise 1/data/rds/odbus11_14.rds")
odbus16_19 <- read_rds("Take-home Exercise 1/data/rds/odbus16_19.rds")

Working with Geospatial Data

For the purpose of this exercise, two geospatial data will be used. They are:

  • BusStop: This data provides the location of bus stop as at last quarter of 202310.

  • Hexagon: hexagonal girds usually more visually appealing and use them in most of my mapping projects.

Importing geospatial data

Bus stop:

busstop <- st_read(dsn = "Take-home Exercise 1/data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
glimpse(busstop)
Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC   <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry   <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…

Hexagon:

honeycomb_busstop = st_make_grid(busstop, c(150, 150), what = "polygons", square = FALSE)

honeycomb_busstop_sf = st_sf(honeycomb_busstop) %>%
  mutate(grid_id = 1:length(lengths(honeycomb_busstop)))

honeycomb_busstop_sf$n_colli = lengths(st_intersects(honeycomb_busstop_sf, busstop))

honeycomb_count = filter(honeycomb_busstop_sf, n_colli > 0)

Combining both data frame by using left join

As mentioned we want to visualise the spatial patterns on the map of Singapore, this will require us to join the two existing dataframes in order to have a complete data framework.

We will join the two dataframes using “BusStop” in BUS_STOP_N and “Hexagon” in grid_id, both referring to the code of the city.

busstop_hxgn_grid <- st_intersection(busstop, honeycomb_count) %>%
  select(BUS_STOP_N, grid_id) %>%
  st_drop_geometry()
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
  • st_intersection() is used to perform point and polygon overly and the output will be in point sf object.

  • select() of dplyr package is then use to retain only BUS_STOP_N and grid_id in the sf data frame.

  • five bus stops are excluded in the resultant data frame because they are outside of Singapore bpundary.

wkd6_9_hxgn_grid <- left_join(busstop_hxgn_grid,odbus6_9, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
  rename(ORIGIN_BS = BUS_STOP_N,
         ORIGIN_SZ = grid_id) %>%
  group_by(ORIGIN_BS, ORIGIN_SZ) %>%
  summarise(TOT_TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_BS'. You can override using the
`.groups` argument.
wkd17_20_hxgn_grid <- left_join(busstop_hxgn_grid,odbus17_20, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
  rename(ORIGIN_BS = BUS_STOP_N,
         ORIGIN_SZ = grid_id) %>%
  group_by(ORIGIN_BS, ORIGIN_SZ) %>%
  summarise(TOT_TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_BS'. You can override using the
`.groups` argument.
wkd16_19_hxgn_grid <- left_join(busstop_hxgn_grid,odbus16_19, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
  rename(ORIGIN_BS = BUS_STOP_N,
         ORIGIN_SZ = grid_id) %>%
  group_by(ORIGIN_BS, ORIGIN_SZ) %>%
  summarise(TOT_TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_BS'. You can override using the
`.groups` argument.
wkd11_14_hxgn_grid <- left_join(busstop_hxgn_grid,odbus11_14, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
  rename(ORIGIN_BS = BUS_STOP_N,
         ORIGIN_SZ = grid_id) %>%
  group_by(ORIGIN_BS, ORIGIN_SZ) %>%
  summarise(TOT_TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_BS'. You can override using the
`.groups` argument.
glimpse(wkd6_9_hxgn_grid)
Rows: 5,153
Columns: 3
Groups: ORIGIN_BS [5,145]
$ ORIGIN_BS <chr> "01012", "01013", "01019", "01029", "01039", "01059", "01109…
$ ORIGIN_SZ <int> 36068, 36172, 36068, 36274, 36584, 36688, 36689, 36379, 3668…
$ TOT_TRIPS <dbl> 1770, 841, 1530, 2483, 2919, 1734, 200, 8593, 7749, 3687, 57…
head(wkd6_9_hxgn_grid)
# A tibble: 6 × 3
# Groups:   ORIGIN_BS [6]
  ORIGIN_BS ORIGIN_SZ TOT_TRIPS
  <chr>         <int>     <dbl>
1 01012         36068      1770
2 01013         36172       841
3 01019         36068      1530
4 01029         36274      2483
5 01039         36584      2919
6 01059         36688      1734

check for duplicating records.

duplicate <- wkd6_9_hxgn_grid %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

origin_data <- unique(wkd6_9_hxgn_grid)

origintrip_bs <- left_join(honeycomb_count, 
                           wkd6_9_hxgn_grid,
                           by = c("grid_id" = "ORIGIN_SZ")) %>%
  rename(geometry = "honeycomb_busstop")

glimpse(origintrip_bs)
Rows: 5,153
Columns: 5
$ grid_id   <int> 110, 731, 738, 946, 1142, 1251, 1354, 1354, 1361, 1465, 1563…
$ n_colli   <int> 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ORIGIN_BS <chr> "25059", "25751", "26379", "26369", "25761", "26389", "25711…
$ TOT_TRIPS <dbl> 103, 52, 78, 53, 185, 301, 251, 815, 60, 9, 760, 64, 257, 39…
$ geometry  <POLYGON [m]> POLYGON ((3970.122 27954.34..., POLYGON ((4420.122 2…
head(origintrip_bs)
Simple feature collection with 6 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3895.122 ymin: 27954.34 xmax: 4870.122 ymax: 31245.24
Projected CRS: SVY21 / Singapore TM
  grid_id n_colli ORIGIN_BS TOT_TRIPS                       geometry
1     110       1     25059       103 POLYGON ((3970.122 27954.34...
2     731       1     25751        52 POLYGON ((4420.122 28733.77...
3     738       1     26379        78 POLYGON ((4420.122 30552.42...
4     946       1     26369        53 POLYGON ((4570.122 31072.04...
5    1142       1     25761       185 POLYGON ((4720.122 28473.96...
6    1251       1     26389       301 POLYGON ((4795.122 30162.71...

Plotting a choropleth map

Using the steps you had learned, prepare a choropleth map showing the distribution of passenger trips at planning sub-zone level.

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(origintrip_bs) +
  tm_dots(
    col = "TOT_TRIPS", 
    style = "quantile", 
    palette = "Blues",
    title = "Passenger trips"
  ) +
  tm_layout(
    main.title = "Passenger trips generated at planning sub-zone level",
    main.title.position = "center",
    main.title.size = 1.2,
    legend.height = 0.45, 
    legend.width = 0.35,
    frame = TRUE
  ) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha = 0.2) +
  tm_credits(
    "Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
    position = c("left", "bottom")
  )
Credits not supported in view mode.
Compass not supported in view mode.

Clusters of darker blue dots, especially those representing the highest quantile of 7,592 to 357,043 trips, are likely found in commercial or central business districts and around major transportation nodes such as train stations or bus interchanges. The spread and concentration of these clusters can reveal urban density, the layout of transportation infrastructure, and possibly even socioeconomic activity levels.

tmap_mode("view")
tmap mode set to interactive viewing
map_honeycomb = tm_shape(honeycomb_count) +
  tm_fill(
    col = "n_colli",
    palette = "Reds",
    style = "cont",
    title = "Number of collisions",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.vars = c(
      "Number of collisions: " = "n_colli"
    ),
    popup.format = list(
      n_colli = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)

map_honeycomb

The distribution of collisions appears to be somewhat scattered, with no clear pattern of concentration in any particular area. This could imply that the risk of collisions is widespread across the region rather than being localized to specific zones. However, there may be subtle clusters of higher collision frequencies that are not immediately visible due to the scale or the granularity of the data represented.

tmap_mode("view")
tmap mode set to interactive viewing
tmap_options(check.and.fix = TRUE)

tm_shape(origintrip_bs) +
  tm_dots(col = "TOT_TRIPS",  # Use tm_dots for point data
          style = "quantile",
          palette = "Blues",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated at planning sub-zone level",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha = 0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
Credits not supported in view mode.
Compass not supported in view mode.

Local Indicators of Spatial Association (LISA) Analysis

  • Compute LISA of the passengers trips generate by origin at hexagon level.

  • Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)

Deriving contiguity weights: Queen’s method