Take home Ex2

Author

WYZ

Applied Spatial Interaction Models: A case study of Singapore public bus commuter flows

1 Background

The scenario highlights the challenges in urban mobility, specifically understanding commuting patterns and the impact of public transportation changes. Traditional methods like commuter surveys are outdated and inefficient. The focus shifts to leveraging digital infrastructure data, such as GPS and SMART card usage, for more dynamic and insightful analysis.The exercise is motivated by two factors: the underutilization of available open data for policy making and the need for practical research in geospatial data science and analysis (GDSA). The task involves using GDSA to integrate diverse data sources, building spatial interaction models to understand public bus transit patterns. This approach aims to provide more effective tools for urban planning and decision-making.

2 The Data

Open Government Data

For the purpose of this assignment, data from several open government sources will be used:

Passenger Volume by Origin Destination Bus Stops, Bus Stop Location, Train Station and Train Station Exit Point, just to name a few of them, from LTA DataMall.
Master Plan 2019 Subzone Boundary, HDB Property Information, School Directory and Information and other relevant data from Data.gov.sg.
hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.

3 Getting Started

pacman::p_load(sf, sp, tmap, tidyverse, knitr, stplanr, reshape2, performance)
tmap_mode("plot")

tmap mode set to plotting

tmap_style("natural")

tmap style set to "natural"

other available styles are: "white", "gray", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor"

set.seed(1234)

4 Data Preparation

4.1 Importing the OD data

Firstly, we will import the Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall by using read_csv() of readr package.

odbus <- read_csv("Take-home Exercise 2/data/aspatial/origin_destination_bus_202310.csv")

Rows: 5694297 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

kable(head(odbus))

YEAR_MONTH	DAY_TYPE	TIME_PER_HOUR	PT_TYPE	ORIGIN_PT_CODE	DESTINATION_PT_CODE	TOTAL_TRIPS
2023-10	WEEKENDS/HOLIDAY	16	BUS	04168	10051	3
2023-10	WEEKDAY	16	BUS	04168	10051	5
2023-10	WEEKENDS/HOLIDAY	14	BUS	80119	90079	3
2023-10	WEEKDAY	14	BUS	80119	90079	5
2023-10	WEEKDAY	17	BUS	44069	17229	4
2023-10	WEEKENDS/HOLIDAY	17	BUS	20281	20141	1

4.2 Importing Geospatial data into R

For the purpose of this exercise, three geospatial data will be used first. They are:

BusStop: This data provides the location of bus stop as at last quarter of 2023.
MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019.
Hexagon: analytical hexagon data of 375m (this distance is the perpendicular distance between the centre of the hexagon and its edges) to represent the traffic analysis zone (TAZ).

mpsz <- st_read(dsn = "Take-home Exercise 2/data/geospatial",
                layer = "MPSZ-2019") %>%
  select(SUBZONE_N)

Reading layer `MPSZ-2019' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

busstops <- st_read(dsn = "Take-home Exercise 2/data/geospatial",
                    layer = "BusStop")

Reading layer `BusStop' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

write_rds(mpsz, "Take-home Exercise 2/data/rds/mpsz.rds")
write_rds(busstops, "Take-home Exercise 2/data/rds/busstops.rds")

4.3 Creating Hexagon grid

honeycomb <- busstops %>% st_make_grid(cellsize = 750,
                                       what="polygons",
                                       square = FALSE) %>%
  st_sf() %>%
  filter(lengths(st_intersects(geometry, busstops)) > 0)

Now that we have hexagons properly generated, we will assign id for each hexagon to be used as a unique identifier. We will store this id under the HEX_ID column, and can be used in joining data frames.

honeycomb$HEX_ID <- sprintf("H%04d", seq_len(nrow(honeycomb))) %>% as.factor()
kable(head(honeycomb))

geometry	HEX_ID
POLYGON ((3970.122 27348.13…	H0001
POLYGON ((4345.122 27997.65…	H0002
POLYGON ((4345.122 30595.72…	H0003
POLYGON ((4720.122 28647.16…	H0004
POLYGON ((4720.122 29946.2,…	H0005
POLYGON ((4720.122 31245.24…	H0006

write_rds(honeycomb, "Take-home Exercise 2/data/rds/honeycomb.rds")

5 Spatial Interaction Analysis

5.1 Generating the O-D trip data by hexagon level

Filtering the relevant data

We only need the data for the weekend morning peak period, which is from 11 AM - 2 PM on weekends and holidays. As such, we will filter the data for the relevant hours.

od_trips <- odbus %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter( TIME_PER_HOUR >= 11 &
            TIME_PER_HOUR < 14
          ) %>%
  group_by(ORIGIN_PT_CODE, DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS)) %>%
  rename(
    ORIG_BUS_STOP_N = ORIGIN_PT_CODE,
    DEST_BUS_STOP_N = DESTINATION_PT_CODE
  )

`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.

kable(head(od_trips))

ORIG_BUS_STOP_N	DEST_BUS_STOP_N	TRIPS
01012	01112	204
01012	01113	129
01012	01121	95
01012	01211	91
01012	01311	152
01012	01559	5

rm(odbus)

To connect the trip data to the their corresponding hexagon, we need to create a lookup table. This will serve as a glue in associating the aspatial od_trips data frame to the honeycomb data frame.

This can be done via st_intersection().

bs_hex <- st_intersection(busstops, honeycomb) %>%
  st_drop_geometry() %>%
  select(c(BUS_STOP_N, HEX_ID))

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

kable(head(bs_hex))

	BUS_STOP_N	HEX_ID
3269	25059	H0001
2570	25751	H0002
254	26379	H0003
2403	26369	H0003
2829	25741	H0004
1715	26399	H0005

Joining od_trips and bs_hex

Next, we need to associate each origin bus stop and destination bus stop to their corresponding hexagons.

We can use that by doing inner_join() twice, once for the origin and another for the destination.

od_trips_w_hex <- od_trips %>%
  inner_join(bs_hex,
             by = c("ORIG_BUS_STOP_N" = "BUS_STOP_N")) %>%
  rename(ORIG_HEX_ID = HEX_ID) %>%
  inner_join(bs_hex,
             by = c("DEST_BUS_STOP_N" = "BUS_STOP_N")) %>%
  rename(DEST_HEX_ID = HEX_ID)

Warning in inner_join(., bs_hex, by = c(ORIG_BUS_STOP_N = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 25339 of `x` matches multiple rows in `y`.
ℹ Row 3057 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Warning in inner_join(., bs_hex, by = c(DEST_BUS_STOP_N = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 186 of `x` matches multiple rows in `y`.
ℹ Row 3146 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

kable(head(od_trips_w_hex))

ORIG_BUS_STOP_N	DEST_BUS_STOP_N	TRIPS	ORIG_HEX_ID	DEST_HEX_ID
01012	01112	204	H0518	H0530
01012	01113	129	H0518	H0530
01012	01121	95	H0518	H0553
01012	01211	91	H0518	H0553
01012	01311	152	H0518	H0564
01012	01559	5	H0518	H0553

Aggregating data by hexagon

Next, we will perform aggregations by ORIG_HEX_ID and DEST_HEX_ID to have an aggregated sum of trips by hexagon instead of bus stops.

od_hex <- od_trips_w_hex %>%
  group_by(ORIG_HEX_ID, DEST_HEX_ID) %>%
  summarise(TRIPS = sum(TRIPS))

`summarise()` has grouped output by 'ORIG_HEX_ID'. You can override using the
`.groups` argument.

kable(head(od_hex))

ORIG_HEX_ID	DEST_HEX_ID	TRIPS
H0002	H0016	1
H0002	H0017	2
H0002	H0032	16
H0003	H0005	1
H0003	H0022	56
H0003	H0028	10

Save point:

write_rds(bs_hex, "Take-home Exercise 2/data/rds/bs_hex.rds")
write_rds(od_hex, "Take-home Exercise 2/data/rds/od_hex202310.rds")
write_rds(od_trips, "Take-home Exercise 2/data/rds/od_trips202310.rds")

rm(od_trips_w_hex)

5.3 Generating the flow lines

First, we will generate the flow lines using od2line(). honeycomb will be supplied as the zone as it contains the hexagons we are using as the traffic analysis zones.

invalid_geoms <- which(!st_is_valid(mpsz))

# If there are invalid geometries, fix them
if(length(invalid_geoms) > 0) {
  mpsz[invalid_geoms, ] <- st_make_valid(mpsz[invalid_geoms, ])
}

flowlines <- od_hex %>% od2line(
  honeycomb,
  zone_code = "HEX_ID")

Creating centroids representing desire line start and end points.

write_rds(flowlines, "Take-home Exercise 2/data/rds/flowlines202310.rds")

tm_shape(mpsz) +
  tm_polygons("gray", title = "Singapore Boundary", alpha = 0.5) +
  
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  
  tm_shape(flowlines) +
  tm_lines(lwd = "TRIPS",
           style = "quantile",
           col = "red",
           scale = c(0.1, 1, 3, 5, 7),
           title.lwd = "# of bus trips",
           alpha = 0.8) +
  
  tm_layout(main.title = "Bus Passenger flow for Weekends/Holidays 11 AM - 2PM (October 2023)",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_grid(alpha = 0.2)

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

write_rds(flowlines, "Take-home Exercise 2/data/rds/flowlines202310.rds")

rm(flowlines)
rm(od_hex)

6 Spatial Interaction Modelling

Next, we will prepare the data needed for spatial interaction modelling. Some of these are straightforward to get, especially those of attractiveness variables. Additional steps are needed for more complex data sets, like those needed for propulsiveness variables. We will derive those in a separate section.

6.1 Attractiveness variables

We will first initiate attractiveness from honeycomb.

attractiveness <- honeycomb

Because not all of the st_cos(x) == st_crs(y) are ture, it’s crucial for all datasets to be in the same CRS before performing spatial operations like intersections. The next steps will all tackle this problem.

BUSSTOP:

attractiveness$BUS_STOP_COUNT <- lengths(
  st_intersects(attractiveness, busstops))

ENTERTAINMENT:

entertn <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "entertn")

Reading layer `entertn' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM

st_crs(attractiveness)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(entertn)

Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

entertn <- st_transform(entertn, st_crs(attractiveness))

attractiveness <- st_transform(attractiveness, st_crs(entertn))

attractiveness$ENTERTN_COUNT <- lengths(st_intersects(attractiveness, entertn))

attractiveness$ENTERTN_COUNT <- lengths(st_intersects(attractiveness, entertn))

F&B:

f_and_b <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "F&B")

Reading layer `F&B' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM

st_crs(attractiveness)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(f_and_b)

Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

f_and_b <- st_transform(f_and_b, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(f_and_b))
attractiveness$F_AND_B_COUNT <- lengths(st_intersects(attractiveness, f_and_b))

attractiveness$F_AND_B_COUNT <- lengths(st_intersects(attractiveness, f_and_b))

Leisure&Recreation:

leis_rec <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "Liesure&Recreation")

Reading layer `Liesure&Recreation' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM

tm_shape(mpsz) +
  tm_polygons("green", title = "Singapore Boundary", alpha = 0.5) +
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  tm_layout(main.title = "Map of Leisure & Recreation Spots in Singapore",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_shape(leis_rec) +
  tm_dots(col = "red", size = 0.005, title = "Leisure & Recreation Spots") +
  tm_grid(alpha = 0.2)

st_crs(attractiveness)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(leis_rec)

Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

leis_rec <- st_transform(leis_rec, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(leis_rec))
attractiveness$LEISURE_COUNT <- lengths(st_intersects(attractiveness, leis_rec))

attractiveness$LEISURE_COUNT <- lengths(st_intersects(attractiveness, leis_rec))

RETAIL:

retail <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "Retails")

Reading layer `Retails' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM

st_crs(attractiveness)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(retail)

Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

retail <- st_transform(retail, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(retail))
attractiveness$RETAIL_COUNT <- lengths(st_intersects(attractiveness, retail))

attractiveness$RETAIL_COUNT <- lengths(st_intersects(attractiveness, retail))

TRAIN STATION EXITS:

train_exits <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "Train_Station_Exit_Layer")

Reading layer `Train_Station_Exit_Layer' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 565 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6134.086 ymin: 27499.7 xmax: 45356.36 ymax: 47865.92
Projected CRS: SVY21

tm_shape(mpsz) +
  tm_polygons("green", title = "Singapore Boundary", alpha = 0.5) +
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  tm_layout(main.title = "Map of Train Station Exits in Singapore",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_shape(train_exits) +
  tm_dots(col = "red", size = 0.005, title = "Train Station Exits") +
  tm_grid(alpha = 0.2)

st_crs(attractiveness)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(train_exits)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

train_exits <- st_transform(train_exits, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(train_exits))
attractiveness$TRAIN_EXITS_COUNT <- lengths(st_intersects(attractiveness, train_exits))

attractiveness$TRAIN_EXITS_COUNT <- lengths(st_intersects(attractiveness, train_exits))

Let’s check if the attractiveness variables have been added correctly.

kable(head(attractiveness))

geometry	HEX_ID	BUS_STOP_COUNT	RETAIL_COUNT
POLYGON ((3970.122 27348.13…	H0001	1	0
POLYGON ((4345.122 27997.65…	H0002	1	0
POLYGON ((4345.122 30595.72…	H0003	2	0
POLYGON ((4720.122 28647.16…	H0004	1	0
POLYGON ((4720.122 29946.2,…	H0005	4	5
POLYGON ((4720.122 31245.24…	H0006	1	0

write_rds(attractiveness, "Take-home Exercise 2/data/rds/attractiveness_no_hdb.rds")
write_rds(train_exits, "Take-home Exercise 2/data/rds/train_exits.rds")

rm(busstops)
rm(entertn)
rm(f_and_b)
rm(leis_rec)
rm(retail)

6.2 Deriving Passengers Alighting from Bus Stop

Using the similar techniques used in Take-home Exercise 1, we will aggregate the trips using inner_join(), group_by, and summarise.

dest_bus_hex <- od_trips %>%
  inner_join(bs_hex,
             by = join_by(DEST_BUS_STOP_N == BUS_STOP_N)) %>%
  group_by(HEX_ID) %>%
  summarise(TRIPS = sum(TRIPS))

Warning in inner_join(., bs_hex, by = join_by(DEST_BUS_STOP_N == BUS_STOP_N)): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 186 of `x` matches multiple rows in `y`.
ℹ Row 3146 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

kable(head(dest_bus_hex))

HEX_ID	TRIPS
H0002	20
H0003	121
H0004	4
H0005	152
H0006	204
H0007	41

write_rds(dest_bus_hex, "Take-home Exercise 2/data/rds/dest_bus_hex202310.rds")

rm(bs_hex)
rm(od_trips)

7 Deriving HDB population

7.1 Importing the data

hdb_vars <- honeycomb

hdb_csv <- read_csv("Take-home Exercise 2/data/aspatial/hdb.csv")

New names:
Rows: 12442 Columns: 37
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(18): blk_no, street, residential, commercial, market_hawker, miscellane... dbl
(19): ...1, max_floor_lvl, year_completed, total_dwelling_units, 1room_s...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

kable(head(hdb_csv))

…1	blk_no	street	max_floor_lvl	year_completed	residential	commercial	market_hawker	miscellaneous	multistorey_carpark	precinct_pavilion	bldg_contract_town	total_dwelling_units	2room_sold	3room_sold	4room_sold	5room_sold	lat	lng	building	addr	postal	SUBZONE_NO	SUBZONE_N	SUBZONE_C	PLN_AREA_N	PLN_AREA_C	REGION_N	REGION_C
0	1	BEACH RD	16	1970	Y	Y	N	N	N	N	KWN	142	1	138	1	2	1.295097	103.8541	RAFFLES HOTEL	1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673	189673	2	CITY HALL	DTSZ02	DOWNTOWN CORE	DT	CENTRAL REGION	CR
1	1	BEDOK STH AVE 1	14	1975	Y	N	N	Y	N	N	BD	206	0	204	0	2	1.320852	103.9337	NIL	1 BEDOK SOUTH AVENUE 1 SINGAPORE 460001	460001	6	BEDOK SOUTH	BDSZ06	BEDOK	BD	EAST REGION	ER
2	1	CANTONMENT RD	2	2010	N	Y	N	N	N	N	CT	0	0	0	0	0	1.275488	103.8414	PINNACLE @ DUXTON	1 CANTONMENT ROAD PINNACLE @ DUXTON SINGAPORE 080001	080001	3	CHINATOWN	OTSZ03	OUTRAM	OT	CENTRAL REGION	CR
3	1	CHAI CHEE RD	15	1982	Y	N	N	N	N	N	BD	102	0	0	10	92	1.327969	103.9227	PING YI GARDENS	1 CHAI CHEE ROAD PING YI GARDENS SINGAPORE 461001	461001	3	KEMBANGAN	BDSZ03	BEDOK	BD	EAST REGION	ER
4	1	CHANGI VILLAGE RD	4	1975	Y	Y	N	N	N	N	PRC	55	0	54	0	1	1.388610	103.9881	OCBC CHANGI VILLAGE ROAD - 7 ELEVEN	1 CHANGI VILLAGE ROAD OCBC CHANGI VILLAGE ROAD - 7 ELEVEN SINGAPORE 500001	500001	1	CHANGI POINT	CHSZ01	CHANGI	CH	EAST REGION	ER
5	1	DELTA AVE	25	1982	Y	N	N	N	N	N	BM	96	0	0	0	96	1.292075	103.8286	NIL	1 DELTA AVENUE SINGAPORE 160001	160001	9	BUKIT HO SWEE	BMSZ09	BUKIT MERAH	BM	CENTRAL REGION	CR

hdb_sf <- hdb_csv %>% st_as_sf(coords = c("lng", "lat"),
                               crs = 4326) %>%
  st_transform(crs = 3414)

tm_shape(mpsz) +
  tm_polygons("green", title = "Singapore Boundary", alpha = 0.5) +
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  tm_layout(main.title = "Map of HDB Blocks in Singapore",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_shape(hdb_sf) +
  tm_dots(col = "red", size = 0.001, title = "HDB Blocks") +
  tm_grid(alpha = 0.2)

7.2 Adding HDB_COUNT

This variable will contain the number of HDB blocks in a zone. We will use the same methods to count the number locations in the zone, by using lengths() and st_intersects().

We will use this as attractiveness variable, and will include all HDB block types (commercial, hawker, residential) as all of them are attractive destinations for eating out, meeting family/friends, and errands.

st_crs(hdb_vars)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(hdb_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

hdb_sf <- st_transform(hdb_sf, st_crs(hdb_vars))
hdb_vars <- st_transform(hdb_vars, st_crs(hdb_sf))
hdb_vars$HDB_COUNT <- lengths(st_intersects(hdb_vars, hdb_sf))

hdb_vars$HDB_COUNT <- lengths(st_intersects(hdb_vars, hdb_sf))

attractiveness <- left_join(attractiveness,
                            st_drop_geometry(hdb_vars))

Joining with `by = join_by(HEX_ID)`

# Using head() to limit the number of rows
kable(head(attractiveness))

HEX_ID	BUS_STOP_COUNT	RETAIL_COUNT	geometry
H0001	1	0	POLYGON ((3970.122 27348.13…
H0002	1	0	POLYGON ((4345.122 27997.65…
H0003	2	0	POLYGON ((4345.122 30595.72…
H0004	1	0	POLYGON ((4720.122 28647.16…
H0005	4	5	POLYGON ((4720.122 29946.2,…
H0006	1	0	POLYGON ((4720.122 31245.24…

Removing unnecessary data

hdb_filtered_sf <- hdb_sf %>%
  filter(residential == "Y") %>%
  select(total_dwelling_units)

Adding HDB_DWELLING_COUNT

While HDB_COUNT can be a population proxy, we need to consider that HDB blocks have different sizes. For example, taller and wider blocks may have more units compared to shorter blocks, and therefore higher population.

hdb_vars <- hdb_vars %>%
  left_join(
    st_intersection(hdb_filtered_sf, hdb_vars) %>%
      st_drop_geometry() %>%
      group_by(HEX_ID) %>%
      summarise(HDB_RESIDENT_COUNT = sum(total_dwelling_units))
  )

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

Joining with `by = join_by(HEX_ID)`

kable(hdb_vars[160:165,])

	HEX_ID	HDB_COUNT	HDB_RESIDENT_COUNT	geometry
160	H0160	0	NA	POLYGON ((17095.12 31894.76…
161	H0161	12	952	POLYGON ((17095.12 33193.8,…
162	H0162	0	NA	POLYGON ((17095.12 34492.84…
163	H0163	29	3114	POLYGON ((17095.12 35791.87…
164	H0164	13	1136	POLYGON ((17095.12 37090.91…
165	H0165	44	3652	POLYGON ((17095.12 39688.99…

write_rds(hdb_vars, "Take-home Exercise 2/data/rds/hdb_vars.rds")
write_rds(attractiveness, "Take-home Exercise 2/data/rds/attractiveness.rds")

rm(hdb_csv)
rm(hdb_sf)
rm(hdb_filtered_sf)

8 Propulsiveness variables

honeycomb <- read_rds("Take-home Exercise 2/data/rds/honeycomb.rds")
hdb_vars <- read_rds("Take-home Exercise 2/data/rds/hdb_vars.rds")
dest_bus_hex <- read_rds("Take-home Exercise 2/data/rds/dest_bus_hex202310.rds")

propulsiveness <- honeycomb

HDB:

propulsiveness <- propulsiveness %>%
  left_join(st_drop_geometry(hdb_vars)) %>%
  select(HEX_ID, HDB_RESIDENT_COUNT)

Joining with `by = join_by(HEX_ID)`

BUS:

propulsiveness <- propulsiveness %>%
  left_join(st_drop_geometry(dest_bus_hex)) %>%
  rename(BUS_ALIGHT_COUNT = TRIPS)

Joining with `by = join_by(HEX_ID)`

propulsiveness[is.na(propulsiveness)] <- 0
kable(head(propulsiveness))

HEX_ID	BUS_ALIGHT_COUNT	geometry
H0001	0	POLYGON ((3970.122 27348.13…
H0002	20	POLYGON ((4345.122 27997.65…
H0003	121	POLYGON ((4345.122 30595.72…
H0004	4	POLYGON ((4720.122 28647.16…
H0005	152	POLYGON ((4720.122 29946.2,…
H0006	204	POLYGON ((4720.122 31245.24…

Save point:

write_rds(propulsiveness, "Take-home Exercise 2/data/rds/propulsiveness202310.rds")

rm(dest_bus_hex)
rm(hdb_vars)

9 Generating distance table

honeycomb <- read_rds("Take-home Exercise 2/data/rds/honeycomb.rds")

Now that we have the attractive and propulsive forces, we can finally prepare the data for the distance decay component of the model.

9.1Generating distance matrix

We will use spDists() to generate the matrix from our honeycomb, which requires a Spatial data frame. We also need to name the columns and rows to the corresponding HEX_ID of the hexagons.

dist_mat <- spDists(as(honeycomb, "Spatial"),
                    longlat = FALSE)
colnames(dist_mat) <- paste0(honeycomb$HEX_ID)
rownames(dist_mat) <- paste0(honeycomb$HEX_ID)
kable(head(dist_mat, n=c(8, 8)))

	H0001	H0002	H0003	H0004	H0005	H0006	H0007	H0008
H0001	0.000	750.000	3269.174	1500.000	2704.163	3968.627	1299.038	2250.000
H0002	750.000	0.000	2598.076	750.000	1984.313	3269.174	750.000	1500.000
H0003	3269.174	2598.076	0.000	1984.313	750.000	750.000	2704.163	1500.000
H0004	1500.000	750.000	1984.313	0.000	1299.038	2598.076	750.000	750.000
H0005	2704.163	1984.313	750.000	1299.038	0.000	1299.038	1984.313	750.000
H0006	3968.627	3269.174	750.000	2598.076	1299.038	0.000	3269.174	1984.313
H0007	1299.038	750.000	2704.163	750.000	1984.313	3269.174	0.000	1299.038
H0008	2250.000	1500.000	1500.000	750.000	750.000	1984.313	1299.038	0.000

9.2 Generating a pivot table

To generate data with the specifications we defined in Data Outputs, we must generate a pivot table from our distance matrix, dist_mat.

We will use melt(), for this purpose and rename the columns to names we defined in our modelling data shape.

dist_tbl <- melt(dist_mat) %>%
  rename(DISTANCE = value) %>%
  rename(ORIG_HEX_ID = Var1) %>%
  rename(DEST_HEX_ID = Var2)
kable(head(dist_tbl))

ORIG_HEX_ID	DEST_HEX_ID	DISTANCE
H0001	H0001	0.000
H0002	H0001	750.000
H0003	H0001	3269.174
H0004	H0001	1500.000
H0005	H0001	2704.163
H0006	H0001	3968.627

9.3 Setting intra-zonal distances

dist_tbl$DISTANCE[dist_tbl$ORIG_HEX_ID == dist_tbl$DEST_HEX_ID] <- 200
summary(dist_tbl$DISTANCE)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    200    8250   13332   14145   18929   44680

write_rds(dist_tbl, "Take-home Exercise 2/data/rds/dist_tbl.rds")

rm(dist_mat)

10 Four Sim

10.1 Generate SIM_data

honeycomb <- read_rds("Take-home Exercise 2/data/rds/honeycomb.rds")
flowlines <- read_rds("Take-home Exercise 2/data/rds/flowlines202310.rds")
dist_tbl <- read_rds("Take-home Exercise 2/data/rds/dist_tbl.rds")
attractiveness <- read_rds("Take-home Exercise 2/data/rds/attractiveness.rds")
propulsiveness <- read_rds("Take-home Exercise 2/data/rds/propulsiveness202310.rds")

Now that we have all the components, we will now generate the geospatial data that contains the following columns:

ORIG_HEX_ID: ID corresponding to the origin zone
DEST_HEX_ID: ID corresponding to the destination zone
DISTANCE: Distance between the (centroids of) origin and destination zones
TRIPS: Number of bus trips between the origin and destination zones
DEST_*_COUNT: Values from [Attractiveness Variables Table (attractiveness)]
ORIG_*_COUNT: Values from [Propulsiveness Variables Table (propulsiveness)]
Geometry containing the flowlines

We will join the tables to generate this data.

Flowlines:

kable(head(flowlines))

ORIG_HEX_ID	DEST_HEX_ID	TRIPS	geometry
H0002	H0016	1	LINESTRING (4345.122 28430….
H0002	H0017	2	LINESTRING (4345.122 28430….
H0002	H0032	16	LINESTRING (4345.122 28430….
H0003	H0005	1	LINESTRING (4345.122 31028….
H0003	H0022	56	LINESTRING (4345.122 31028….
H0003	H0028	10	LINESTRING (4345.122 31028….

SIM_data <- flowlines

Distance:

SIM_data <- SIM_data %>% left_join(dist_tbl)

Joining with `by = join_by(ORIG_HEX_ID, DEST_HEX_ID)`

Propulsive:

SIM_data <- left_join(
  SIM_data,
  propulsiveness %>%
    st_drop_geometry() %>%
    rename_with(~paste("ORIG_", .x, sep = ""))
  )

Joining with `by = join_by(ORIG_HEX_ID)`

Attractive:

SIM_data <- left_join(
  SIM_data,
  attractiveness %>%
    st_drop_geometry() %>%
    rename_with(~paste("DEST_", .x, sep = ""))
  )

Joining with `by = join_by(DEST_HEX_ID)`

We can now clear the data that we have used for “SIM_data” from the environment.

rm(attractiveness)
rm(propulsiveness)
rm(dist_tbl)
rm(flowlines)

10.2 Visualize SIM_data

summary(SIM_data)

  ORIG_HEX_ID     DEST_HEX_ID        TRIPS             DISTANCE    
 H0518  :  307   H0518  :  314   Min.   :    1.00   Min.   :  200  
 H0728  :  291   H0540  :  300   1st Qu.:    3.00   1st Qu.: 2704  
 H0521  :  285   H0530  :  297   Median :   12.00   Median : 5250  
 H0530  :  282   H0728  :  297   Mean   :   96.74   Mean   : 6117  
 H0491  :  278   H0521  :  290   3rd Qu.:   48.00   3rd Qu.: 8518  
 H0651  :  268   H0450  :  280   Max.   :33266.00   Max.   :24784  
 (Other):58135   (Other):58068                                     
 ORIG_HDB_RESIDENT_COUNT ORIG_BUS_ALIGHT_COUNT DEST_BUS_STOP_COUNT
 Min.   :   0            Min.   :     0        Min.   : 1.00      
 1st Qu.:   0            1st Qu.:  3459        1st Qu.: 5.00      
 Median : 932            Median :  8693        Median : 8.00      
 Mean   :1850            Mean   : 14148        Mean   : 7.96      
 3rd Qu.:3351            3rd Qu.: 16757        3rd Qu.:10.00      
 Max.   :7946            Max.   :108111        Max.   :19.00      
                                                                  
 DEST_ENTERTN_COUNT DEST_F_AND_B_COUNT DEST_LEISURE_COUNT DEST_RETAIL_COUNT
 Min.   :0.0000     Min.   :  0.000    Min.   : 0.000     Min.   :   0.00  
 1st Qu.:0.0000     1st Qu.:  0.000    1st Qu.: 0.000     1st Qu.:   7.00  
 Median :0.0000     Median :  0.000    Median : 1.000     Median :  31.00  
 Mean   :0.3442     Mean   :  5.696    Mean   : 2.336     Mean   :  98.31  
 3rd Qu.:0.0000     3rd Qu.:  2.000    3rd Qu.: 3.000     3rd Qu.: 110.00  
 Max.   :9.0000     Max.   :133.000    Max.   :41.000     Max.   :1678.00  
                                                                           
 DEST_TRAIN_EXITS_COUNT DEST_HDB_COUNT            geometry    
 Min.   : 0.000         Min.   :  0.00   LINESTRING   :59846  
 1st Qu.: 0.000         1st Qu.:  0.00   epsg:NA      :    0  
 Median : 0.000         Median : 12.00   +proj=tmer...:    0  
 Mean   : 1.375         Mean   : 20.81                        
 3rd Qu.: 2.000         3rd Qu.: 36.00                        
 Max.   :13.000         Max.   :103.00

Now we will make finishing touches on SIM_data , so that they are compatible with modeling. We need to remove the 0’s as we will apply log function to them, which will result to undefined.

Set them to 0.99.

replace_zeroes <- function(data, col_name) {
  data[[col_name]][data[[col_name]] == 0] <- 0.99
  data
}

summary(SIM_data)

  ORIG_HEX_ID     DEST_HEX_ID        TRIPS             DISTANCE    
 H0518  :  307   H0518  :  314   Min.   :    1.00   Min.   :  200  
 H0728  :  291   H0540  :  300   1st Qu.:    3.00   1st Qu.: 2704  
 H0521  :  285   H0530  :  297   Median :   12.00   Median : 5250  
 H0530  :  282   H0728  :  297   Mean   :   96.74   Mean   : 6117  
 H0491  :  278   H0521  :  290   3rd Qu.:   48.00   3rd Qu.: 8518  
 H0651  :  268   H0450  :  280   Max.   :33266.00   Max.   :24784  
 (Other):58135   (Other):58068                                     
 ORIG_HDB_RESIDENT_COUNT ORIG_BUS_ALIGHT_COUNT DEST_BUS_STOP_COUNT
 Min.   :   0            Min.   :     0        Min.   : 1.00      
 1st Qu.:   0            1st Qu.:  3459        1st Qu.: 5.00      
 Median : 932            Median :  8693        Median : 8.00      
 Mean   :1850            Mean   : 14148        Mean   : 7.96      
 3rd Qu.:3351            3rd Qu.: 16757        3rd Qu.:10.00      
 Max.   :7946            Max.   :108111        Max.   :19.00      
                                                                  
 DEST_ENTERTN_COUNT DEST_F_AND_B_COUNT DEST_LEISURE_COUNT DEST_RETAIL_COUNT
 Min.   :0.0000     Min.   :  0.000    Min.   : 0.000     Min.   :   0.00  
 1st Qu.:0.0000     1st Qu.:  0.000    1st Qu.: 0.000     1st Qu.:   7.00  
 Median :0.0000     Median :  0.000    Median : 1.000     Median :  31.00  
 Mean   :0.3442     Mean   :  5.696    Mean   : 2.336     Mean   :  98.31  
 3rd Qu.:0.0000     3rd Qu.:  2.000    3rd Qu.: 3.000     3rd Qu.: 110.00  
 Max.   :9.0000     Max.   :133.000    Max.   :41.000     Max.   :1678.00  
                                                                           
 DEST_TRAIN_EXITS_COUNT DEST_HDB_COUNT            geometry    
 Min.   : 0.000         Min.   :  0.00   LINESTRING   :59846  
 1st Qu.: 0.000         1st Qu.:  0.00   epsg:NA      :    0  
 Median : 0.000         Median : 12.00   +proj=tmer...:    0  
 Mean   : 1.375         Mean   : 20.81                        
 3rd Qu.: 2.000         3rd Qu.: 36.00                        
 Max.   :13.000         Max.   :103.00

Save point:

write_rds(SIM_data, "Take-home Exercise 2/data/rds/SIM_data202310.rds")

rm(SIM_data)
rm(replace_zeroes)

11 Visualize Spatial Interactions

mpsz <- read_rds("Take-home Exercise 2/data/rds/mpsz.rds")
honeycomb <- read_rds("Take-home Exercise 2/data/rds/honeycomb.rds")
flowlines <- read_rds("Take-home Exercise 2/data/rds/flowlines202310.rds")
SIM_data <- read_rds("Take-home Exercise 2/data/rds/SIM_data202310.rds")
propulsiveness <- read_rds("Take-home Exercise 2/data/rds/propulsiveness202310.rds")
attractiveness <- read_rds("Take-home Exercise 2/data/rds/attractiveness.rds")

11.1 Visualize flowlines

flowlines_no_intra <- flowlines %>%
  filter(ORIG_HEX_ID != DEST_HEX_ID)

quantile(flowlines_no_intra$TRIPS,
         probs = c(0.8,0.9,0.95,0.99,1))

     80%      90%      95%      99%     100% 
   67.00   171.00   377.00  1519.14 33266.00

Let’s take a look at the map.

invalid_geoms <- which(!st_is_valid(mpsz))

# If there are invalid geometries, fix them
if(length(invalid_geoms) > 0) {
  mpsz[invalid_geoms, ] <- st_make_valid(mpsz[invalid_geoms, ])
}

tm_shape(mpsz) +
  tm_polygons("grey", title = "Singapore Boundary", alpha = 0.5) +
  
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  
  tm_shape(flowlines_no_intra %>% filter(TRIPS > 376)) +
  tm_lines(lwd = "TRIPS",
           style = "quantile",
           col = "red",
           scale = c(0.1, 1, 3, 5, 7),
           title.lwd = "# of bus trips",
           alpha = 0.8) +
  
  tm_layout(main.title = "Top 5% Bus Passenger flow for Weekends/Holidays 11 AM - 2PM (October 2023)",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_grid(alpha = 0.2)

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

Insights

It is notable that the thickest flow lines are for relatively short distances, like the bus rides to and from Woodlands Checkpoint to Kranji Station. We can notice these thick lines on busy zones where lines converge as well.

Although not as thick, we can notice longer long lines on the map. This can indicate people more willing to travel longer distances over the weekend for recreation and meeting family and friends. This also means that a location being close by is not the only motivator for people to visit a place. Spatial interaction model can reveal more patterns to this.

12 Plot Chart: TRIPS & DISTANCE

ggplot(SIM_data,
       aes(x = DISTANCE, y = TRIPS)) +
  geom_point() +
  geom_hline(yintercept = 376.25, color = "red", linetype = "dashed") +
  annotate("text", x = 20000,
           y = 600, label = "95th percentile",
           hjust = -0.1, color = "red", size = 3) +
  geom_hline(yintercept = 1510, color = "blue", linetype = "dashed") +
  annotate("text", x = 20000,
           y = 1800, label = "99th percentile",
           hjust = -0.1, color = "blue", size = 3) +
  labs(title = "Number of Trips as a Function of Distance",
       x = "Distance (m)",
       y = "Number of Trips")

Plotting it in a log scale shows a more linear relationship.

ggplot(SIM_data,
       aes(x = log(DISTANCE), y = log(TRIPS))) +
  geom_point() +
  geom_smooth(method = lm)

`geom_smooth()` using formula = 'y ~ x'

Insights

The maximum number of trips exponentially decrease as the distance increases, which means that generally, the farther the distance, the less trips there are.

However, some outliers can be observed like some zone pairs with almost 20km distance between them having close to 99th percentile of TRIP values. In these zone pairs, there could be strong propulsive or attractive forces attracting passengers to ride the bus between those zones.

13 Visualize propulsive forces

plot_propulsive <- function(var_name, title_comp) {
  tm_shape(mpsz) +
  tm_polygons("gray", title = "Singapore Boundary") +
  
  # Adding this layer underneath propulsiveness as we removed 0s. from the map
  # so it won't skew the legend
  tm_shape(honeycomb) +
  tm_polygons(col = "white") +
  
  tm_shape(propulsiveness %>% filter(if_any(var_name, ~. >= 1))) +
  tm_polygons(var_name, palette = "Blues", style = "quantile") +
    
  tm_shape(flowlines_no_intra %>% filter(TRIPS > 376)) +
  tm_lines(lwd = "TRIPS",
           style = "quantile",
           col = "orange",
           scale = c(0.1, 1, 3, 5, 7, 10),
           title.lwd = "# of bus trips",
           n = 6,
           alpha = 0.5) +
  
  tm_layout(main.title = paste("Top 5% Bus Passenger Flows and", title_comp),
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  
  tm_scale_bar(bg.color = "white", bg.alpha = 0.7, position = c("right", "top")) +
  tm_compass(type="8star", size = 2, bg.color = "white",
             bg.alpha = 0.5, position = c("right", "top")) +
  tm_grid(alpha = 0.2) +
  tm_credits("*Passenger data from weekend/holidays 11AM - 2PM\n(October 2023)",
             bg.color = "white", bg.alpha = 0.7,
             position = c("left", "bottom"))
}

13.1 HDB Residents

plot_propulsive("HDB_RESIDENT_COUNT", "HDB POPULATION")

Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(var_name)

  # Now:
  data %>% select(all_of(var_name))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.

Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

13.2 Transfer from BUS

plot_propulsive("BUS_ALIGHT_COUNT", "Transfer by BUS")

Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

Insights

Upon visual inspection, HDB population and bus alights from zones correspond closely with our flowlines.

14 Visualize attractive forces

In this section, we plot attractive forces in their own choropleth maps to see if the top 5% flows correspond to areas we hypothesize have high attractive forces.

plot_attractive <- function(var_name, title_comp) {
  tm_shape(mpsz) +
  tm_polygons("gray", title = "Singapore Boundary") +
  
  # Adding this layer underneath attractiveness as we removed 0s. from the map
  # so it won't skew the legend
  tm_shape(honeycomb) +
  tm_polygons(col = "white") +
  
  tm_shape(attractiveness %>% filter(if_any(var_name, ~. >= 1))) +
  tm_polygons(var_name, palette = "Purples", style = "quantile") +
    
  tm_shape(flowlines_no_intra %>% filter(TRIPS > 376)) +
  tm_lines(lwd = "TRIPS",
           style = "quantile",
           col = "red",
           scale = c(0.1, 1, 3, 5, 7, 10),
           title.lwd = "# of bus trips",
           n = 6,
           alpha = 0.5) +
  
  tm_layout(main.title = paste("Top 5% Bus Passenger Flows and", title_comp),
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  
  tm_scale_bar(bg.color = "white", bg.alpha = 0.7, position = c("right", "top")) +
  tm_compass(type="8star", size = 2, bg.color = "white",
             bg.alpha = 0.5, position = c("right", "top")) +
  tm_grid(alpha = 0.2) +
  tm_credits("*Passenger data from weekend/holidays 11AM - 2PM\n(October 2023)",
             bg.color = "white", bg.alpha = 0.7,
             position = c("left", "bottom"))
}

14.1 Busstops

plot_attractive("BUS_STOP_COUNT", "Number of Busstops")

Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.2 HDB

plot_attractive("HDB_COUNT", "Number of HDB")

Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.3 Entertainment

plot_attractive("ENTERTN_COUNT", "Number of Entertainment Items")

Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.4 F&B

plot_attractive("F_AND_B_COUNT", "Number of F&B Outlets")

Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.5 Leisure&Recreation

plot_attractive("LEISURE_COUNT", "Number of Leisure&Recreation Items")

Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.6 Retail

plot_attractive("RETAIL_COUNT", "Number of Retail")

Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

Insights

There are less location types than the others, like entertainment and leisure locations.

There are also location types that are heavily concentrated in some zones, like the entertainment and F&B, which are heavily concentrated around the Orchard area. Conversely, there are much less HDBs in Orchard area. Even though the flowlines are not thick in this area, there are many flowlines, although thin. This means people are coming from various parts of Singapore.

Save point

write_rds(flowlines_no_intra, "Take-home Exercise 2/data/rds/flowines_no_intra202310.rds")

rm(flowlines)
rm(attractiveness)
rm(propulsiveness)
rm(plot_attractive)
rm(plot_propulsive)