Take home Ex2

Author

WYZ

Applied Spatial Interaction Models: A case study of Singapore public bus commuter flows

1 Background

The scenario highlights the challenges in urban mobility, specifically understanding commuting patterns and the impact of public transportation changes. Traditional methods like commuter surveys are outdated and inefficient. The focus shifts to leveraging digital infrastructure data, such as GPS and SMART card usage, for more dynamic and insightful analysis.The exercise is motivated by two factors: the underutilization of available open data for policy making and the need for practical research in geospatial data science and analysis (GDSA). The task involves using GDSA to integrate diverse data sources, building spatial interaction models to understand public bus transit patterns. This approach aims to provide more effective tools for urban planning and decision-making.

2 The Data

Open Government Data

For the purpose of this assignment, data from several open government sources will be used:

  • Passenger Volume by Origin Destination Bus Stops, Bus Stop Location, Train Station and Train Station Exit Point, just to name a few of them, from LTA DataMall.

  • Master Plan 2019 Subzone Boundary, HDB Property Information, School Directory and Information and other relevant data from Data.gov.sg.

  • hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.

3 Getting Started

pacman::p_load(sf, sp, tmap, tidyverse, knitr, stplanr, reshape2, performance)
tmap_mode("plot")
tmap mode set to plotting
tmap_style("natural")
tmap style set to "natural"
other available styles are: "white", "gray", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor" 
set.seed(1234)

4 Data Preparation

4.1 Importing the OD data

Firstly, we will import the Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall by using read_csv() of readr package.

odbus <- read_csv("Take-home Exercise 2/data/aspatial/origin_destination_bus_202310.csv")
Rows: 5694297 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
kable(head(odbus))
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE TOTAL_TRIPS
2023-10 WEEKENDS/HOLIDAY 16 BUS 04168 10051 3
2023-10 WEEKDAY 16 BUS 04168 10051 5
2023-10 WEEKENDS/HOLIDAY 14 BUS 80119 90079 3
2023-10 WEEKDAY 14 BUS 80119 90079 5
2023-10 WEEKDAY 17 BUS 44069 17229 4
2023-10 WEEKENDS/HOLIDAY 17 BUS 20281 20141 1

4.2 Importing Geospatial data into R

For the purpose of this exercise, three geospatial data will be used first. They are:

  • BusStop: This data provides the location of bus stop as at last quarter of 2023.

  • MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019.

  • Hexagon: analytical hexagon data of 375m (this distance is the perpendicular distance between the centre of the hexagon and its edges) to represent the traffic analysis zone (TAZ).

mpsz <- st_read(dsn = "Take-home Exercise 2/data/geospatial",
                layer = "MPSZ-2019") %>%
  select(SUBZONE_N)
Reading layer `MPSZ-2019' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
busstops <- st_read(dsn = "Take-home Exercise 2/data/geospatial",
                    layer = "BusStop")
Reading layer `BusStop' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
write_rds(mpsz, "Take-home Exercise 2/data/rds/mpsz.rds")
write_rds(busstops, "Take-home Exercise 2/data/rds/busstops.rds")

4.3 Creating Hexagon grid

honeycomb <- busstops %>% st_make_grid(cellsize = 750,
                                       what="polygons",
                                       square = FALSE) %>%
  st_sf() %>%
  filter(lengths(st_intersects(geometry, busstops)) > 0)

Now that we have hexagons properly generated, we will assign id for each hexagon to be used as a unique identifier. We will store this id under the HEX_ID column, and can be used in joining data frames.

honeycomb$HEX_ID <- sprintf("H%04d", seq_len(nrow(honeycomb))) %>% as.factor()
kable(head(honeycomb))
geometry HEX_ID
POLYGON ((3970.122 27348.13… H0001
POLYGON ((4345.122 27997.65… H0002
POLYGON ((4345.122 30595.72… H0003
POLYGON ((4720.122 28647.16… H0004
POLYGON ((4720.122 29946.2,… H0005
POLYGON ((4720.122 31245.24… H0006
write_rds(honeycomb, "Take-home Exercise 2/data/rds/honeycomb.rds")

5 Spatial Interaction Analysis

5.1 Generating the O-D trip data by hexagon level

Filtering the relevant data

We only need the data for the weekend morning peak period, which is from 11 AM - 2 PM on weekends and holidays. As such, we will filter the data for the relevant hours.

od_trips <- odbus %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter( TIME_PER_HOUR >= 11 &
            TIME_PER_HOUR < 14
          ) %>%
  group_by(ORIGIN_PT_CODE, DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS)) %>%
  rename(
    ORIG_BUS_STOP_N = ORIGIN_PT_CODE,
    DEST_BUS_STOP_N = DESTINATION_PT_CODE
  )
`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.
kable(head(od_trips))
ORIG_BUS_STOP_N DEST_BUS_STOP_N TRIPS
01012 01112 204
01012 01113 129
01012 01121 95
01012 01211 91
01012 01311 152
01012 01559 5
rm(odbus)

To connect the trip data to the their corresponding hexagon, we need to create a lookup table. This will serve as a glue in associating the aspatial od_trips data frame to the honeycomb data frame.

This can be done via st_intersection().

bs_hex <- st_intersection(busstops, honeycomb) %>%
  st_drop_geometry() %>%
  select(c(BUS_STOP_N, HEX_ID))
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
kable(head(bs_hex))
BUS_STOP_N HEX_ID
3269 25059 H0001
2570 25751 H0002
254 26379 H0003
2403 26369 H0003
2829 25741 H0004
1715 26399 H0005

Joining od_trips and bs_hex

Next, we need to associate each origin bus stop and destination bus stop to their corresponding hexagons.

We can use that by doing inner_join() twice, once for the origin and another for the destination.

od_trips_w_hex <- od_trips %>%
  inner_join(bs_hex,
             by = c("ORIG_BUS_STOP_N" = "BUS_STOP_N")) %>%
  rename(ORIG_HEX_ID = HEX_ID) %>%
  inner_join(bs_hex,
             by = c("DEST_BUS_STOP_N" = "BUS_STOP_N")) %>%
  rename(DEST_HEX_ID = HEX_ID)
Warning in inner_join(., bs_hex, by = c(ORIG_BUS_STOP_N = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 25339 of `x` matches multiple rows in `y`.
ℹ Row 3057 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
Warning in inner_join(., bs_hex, by = c(DEST_BUS_STOP_N = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 186 of `x` matches multiple rows in `y`.
ℹ Row 3146 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
kable(head(od_trips_w_hex))
ORIG_BUS_STOP_N DEST_BUS_STOP_N TRIPS ORIG_HEX_ID DEST_HEX_ID
01012 01112 204 H0518 H0530
01012 01113 129 H0518 H0530
01012 01121 95 H0518 H0553
01012 01211 91 H0518 H0553
01012 01311 152 H0518 H0564
01012 01559 5 H0518 H0553

Aggregating data by hexagon

Next, we will perform aggregations by ORIG_HEX_ID and DEST_HEX_ID to have an aggregated sum of trips by hexagon instead of bus stops.

od_hex <- od_trips_w_hex %>%
  group_by(ORIG_HEX_ID, DEST_HEX_ID) %>%
  summarise(TRIPS = sum(TRIPS))
`summarise()` has grouped output by 'ORIG_HEX_ID'. You can override using the
`.groups` argument.
kable(head(od_hex))
ORIG_HEX_ID DEST_HEX_ID TRIPS
H0002 H0016 1
H0002 H0017 2
H0002 H0032 16
H0003 H0005 1
H0003 H0022 56
H0003 H0028 10

Save point:

write_rds(bs_hex, "Take-home Exercise 2/data/rds/bs_hex.rds")
write_rds(od_hex, "Take-home Exercise 2/data/rds/od_hex202310.rds")
write_rds(od_trips, "Take-home Exercise 2/data/rds/od_trips202310.rds")
rm(od_trips_w_hex)

5.3 Generating the flow lines

First, we will generate the flow lines using od2line(). honeycomb will be supplied as the zone as it contains the hexagons we are using as the traffic analysis zones.

invalid_geoms <- which(!st_is_valid(mpsz))

# If there are invalid geometries, fix them
if(length(invalid_geoms) > 0) {
  mpsz[invalid_geoms, ] <- st_make_valid(mpsz[invalid_geoms, ])
}
flowlines <- od_hex %>% od2line(
  honeycomb,
  zone_code = "HEX_ID")
Creating centroids representing desire line start and end points.
write_rds(flowlines, "Take-home Exercise 2/data/rds/flowlines202310.rds")
tm_shape(mpsz) +
  tm_polygons("gray", title = "Singapore Boundary", alpha = 0.5) +
  
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  
  tm_shape(flowlines) +
  tm_lines(lwd = "TRIPS",
           style = "quantile",
           col = "red",
           scale = c(0.1, 1, 3, 5, 7),
           title.lwd = "# of bus trips",
           alpha = 0.8) +
  
  tm_layout(main.title = "Bus Passenger flow for Weekends/Holidays 11 AM - 2PM (October 2023)",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_grid(alpha = 0.2)
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

write_rds(flowlines, "Take-home Exercise 2/data/rds/flowlines202310.rds")
rm(flowlines)
rm(od_hex)

6 Spatial Interaction Modelling

Next, we will prepare the data needed for spatial interaction modelling. Some of these are straightforward to get, especially those of attractiveness variables. Additional steps are needed for more complex data sets, like those needed for propulsiveness variables. We will derive those in a separate section.

6.1 Attractiveness variables

We will first initiate attractiveness from honeycomb.

attractiveness <- honeycomb

Because not all of the st_cos(x) == st_crs(y) are ture, it’s crucial for all datasets to be in the same CRS before performing spatial operations like intersections. The next steps will all tackle this problem.

BUSSTOP:

attractiveness$BUS_STOP_COUNT <- lengths(
  st_intersects(attractiveness, busstops))

ENTERTAINMENT:

entertn <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "entertn")
Reading layer `entertn' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM
st_crs(attractiveness)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(entertn)
Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
entertn <- st_transform(entertn, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(entertn))
attractiveness$ENTERTN_COUNT <- lengths(st_intersects(attractiveness, entertn))
attractiveness$ENTERTN_COUNT <- lengths(st_intersects(attractiveness, entertn))

F&B:

f_and_b <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "F&B")
Reading layer `F&B' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM
st_crs(attractiveness)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(f_and_b)
Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
f_and_b <- st_transform(f_and_b, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(f_and_b))
attractiveness$F_AND_B_COUNT <- lengths(st_intersects(attractiveness, f_and_b))
attractiveness$F_AND_B_COUNT <- lengths(st_intersects(attractiveness, f_and_b))

Leisure&Recreation:

leis_rec <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "Liesure&Recreation")
Reading layer `Liesure&Recreation' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM
tm_shape(mpsz) +
  tm_polygons("green", title = "Singapore Boundary", alpha = 0.5) +
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  tm_layout(main.title = "Map of Leisure & Recreation Spots in Singapore",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_shape(leis_rec) +
  tm_dots(col = "red", size = 0.005, title = "Leisure & Recreation Spots") +
  tm_grid(alpha = 0.2)

st_crs(attractiveness)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(leis_rec)
Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
leis_rec <- st_transform(leis_rec, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(leis_rec))
attractiveness$LEISURE_COUNT <- lengths(st_intersects(attractiveness, leis_rec))
attractiveness$LEISURE_COUNT <- lengths(st_intersects(attractiveness, leis_rec))

RETAIL:

retail <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "Retails")
Reading layer `Retails' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM
st_crs(attractiveness)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(retail)
Coordinate Reference System:
  User input: SVY21 / Singapore TM 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
retail <- st_transform(retail, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(retail))
attractiveness$RETAIL_COUNT <- lengths(st_intersects(attractiveness, retail))
attractiveness$RETAIL_COUNT <- lengths(st_intersects(attractiveness, retail))

TRAIN STATION EXITS:

train_exits <- st_read(dsn = "Take-home Exercise 2/data/geospatial", layer = "Train_Station_Exit_Layer")
Reading layer `Train_Station_Exit_Layer' from data source 
  `D:\y1zaoWang\ISSS624\Take-home Exercise 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 565 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6134.086 ymin: 27499.7 xmax: 45356.36 ymax: 47865.92
Projected CRS: SVY21
tm_shape(mpsz) +
  tm_polygons("green", title = "Singapore Boundary", alpha = 0.5) +
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  tm_layout(main.title = "Map of Train Station Exits in Singapore",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_shape(train_exits) +
  tm_dots(col = "red", size = 0.005, title = "Train Station Exits") +
  tm_grid(alpha = 0.2)

st_crs(attractiveness)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(train_exits)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
train_exits <- st_transform(train_exits, st_crs(attractiveness))
attractiveness <- st_transform(attractiveness, st_crs(train_exits))
attractiveness$TRAIN_EXITS_COUNT <- lengths(st_intersects(attractiveness, train_exits))
attractiveness$TRAIN_EXITS_COUNT <- lengths(st_intersects(attractiveness, train_exits))

Let’s check if the attractiveness variables have been added correctly.

kable(head(attractiveness))
geometry HEX_ID BUS_STOP_COUNT ENTERTN_COUNT F_AND_B_COUNT LEISURE_COUNT RETAIL_COUNT TRAIN_EXITS_COUNT
POLYGON ((3970.122 27348.13… H0001 1 0 0 0 0 0
POLYGON ((4345.122 27997.65… H0002 1 0 0 0 0 0
POLYGON ((4345.122 30595.72… H0003 2 0 0 0 0 0
POLYGON ((4720.122 28647.16… H0004 1 0 0 0 0 0
POLYGON ((4720.122 29946.2,… H0005 4 0 0 0 5 0
POLYGON ((4720.122 31245.24… H0006 1 0 0 0 0 0
write_rds(attractiveness, "Take-home Exercise 2/data/rds/attractiveness_no_hdb.rds")
write_rds(train_exits, "Take-home Exercise 2/data/rds/train_exits.rds")
rm(busstops)
rm(entertn)
rm(f_and_b)
rm(leis_rec)
rm(retail)

6.2 Deriving Passengers Alighting from Bus Stop

Using the similar techniques used in Take-home Exercise 1, we will aggregate the trips using inner_join()group_by, and summarise.

dest_bus_hex <- od_trips %>%
  inner_join(bs_hex,
             by = join_by(DEST_BUS_STOP_N == BUS_STOP_N)) %>%
  group_by(HEX_ID) %>%
  summarise(TRIPS = sum(TRIPS))
Warning in inner_join(., bs_hex, by = join_by(DEST_BUS_STOP_N == BUS_STOP_N)): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 186 of `x` matches multiple rows in `y`.
ℹ Row 3146 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
kable(head(dest_bus_hex))
HEX_ID TRIPS
H0002 20
H0003 121
H0004 4
H0005 152
H0006 204
H0007 41
write_rds(dest_bus_hex, "Take-home Exercise 2/data/rds/dest_bus_hex202310.rds")
rm(bs_hex)
rm(od_trips)

7 Deriving HDB population

7.1 Importing the data

hdb_vars <- honeycomb
hdb_csv <- read_csv("Take-home Exercise 2/data/aspatial/hdb.csv")
New names:
Rows: 12442 Columns: 37
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(18): blk_no, street, residential, commercial, market_hawker, miscellane... dbl
(19): ...1, max_floor_lvl, year_completed, total_dwelling_units, 1room_s...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
kable(head(hdb_csv))
…1 blk_no street max_floor_lvl year_completed residential commercial market_hawker miscellaneous multistorey_carpark precinct_pavilion bldg_contract_town total_dwelling_units 1room_sold 2room_sold 3room_sold 4room_sold 5room_sold exec_sold multigen_sold studio_apartment_sold 1room_rental 2room_rental 3room_rental other_room_rental lat lng building addr postal SUBZONE_NO SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N REGION_C
0 1 BEACH RD 16 1970 Y Y N N N N KWN 142 0 1 138 1 2 0 0 0 0 0 0 0 1.295097 103.8541 RAFFLES HOTEL 1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673 189673 2 CITY HALL DTSZ02 DOWNTOWN CORE DT CENTRAL REGION CR
1 1 BEDOK STH AVE 1 14 1975 Y N N Y N N BD 206 0 0 204 0 2 0 0 0 0 0 0 0 1.320852 103.9337 NIL 1 BEDOK SOUTH AVENUE 1 SINGAPORE 460001 460001 6 BEDOK SOUTH BDSZ06 BEDOK BD EAST REGION ER
2 1 CANTONMENT RD 2 2010 N Y N N N N CT 0 0 0 0 0 0 0 0 0 0 0 0 0 1.275488 103.8414 PINNACLE @ DUXTON 1 CANTONMENT ROAD PINNACLE @ DUXTON SINGAPORE 080001 080001 3 CHINATOWN OTSZ03 OUTRAM OT CENTRAL REGION CR
3 1 CHAI CHEE RD 15 1982 Y N N N N N BD 102 0 0 0 10 92 0 0 0 0 0 0 0 1.327969 103.9227 PING YI GARDENS 1 CHAI CHEE ROAD PING YI GARDENS SINGAPORE 461001 461001 3 KEMBANGAN BDSZ03 BEDOK BD EAST REGION ER
4 1 CHANGI VILLAGE RD 4 1975 Y Y N N N N PRC 55 0 0 54 0 1 0 0 0 0 0 0 0 1.388610 103.9881 OCBC CHANGI VILLAGE ROAD - 7 ELEVEN 1 CHANGI VILLAGE ROAD OCBC CHANGI VILLAGE ROAD - 7 ELEVEN SINGAPORE 500001 500001 1 CHANGI POINT CHSZ01 CHANGI CH EAST REGION ER
5 1 DELTA AVE 25 1982 Y N N N N N BM 96 0 0 0 0 96 0 0 0 0 0 0 0 1.292075 103.8286 NIL 1 DELTA AVENUE SINGAPORE 160001 160001 9 BUKIT HO SWEE BMSZ09 BUKIT MERAH BM CENTRAL REGION CR
hdb_sf <- hdb_csv %>% st_as_sf(coords = c("lng", "lat"),
                               crs = 4326) %>%
  st_transform(crs = 3414)
tm_shape(mpsz) +
  tm_polygons("green", title = "Singapore Boundary", alpha = 0.5) +
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  tm_layout(main.title = "Map of HDB Blocks in Singapore",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_shape(hdb_sf) +
  tm_dots(col = "red", size = 0.001, title = "HDB Blocks") +
  tm_grid(alpha = 0.2)

7.2 Adding HDB_COUNT

This variable will contain the number of HDB blocks in a zone. We will use the same methods to count the number locations in the zone, by using lengths() and st_intersects().

We will use this as attractiveness variable, and will include all HDB block types (commercial, hawker, residential) as all of them are attractive destinations for eating out, meeting family/friends, and errands.

st_crs(hdb_vars)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(hdb_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
hdb_sf <- st_transform(hdb_sf, st_crs(hdb_vars))
hdb_vars <- st_transform(hdb_vars, st_crs(hdb_sf))
hdb_vars$HDB_COUNT <- lengths(st_intersects(hdb_vars, hdb_sf))
hdb_vars$HDB_COUNT <- lengths(st_intersects(hdb_vars, hdb_sf))
attractiveness <- left_join(attractiveness,
                            st_drop_geometry(hdb_vars))
Joining with `by = join_by(HEX_ID)`
# Using head() to limit the number of rows
kable(head(attractiveness))
HEX_ID BUS_STOP_COUNT ENTERTN_COUNT F_AND_B_COUNT LEISURE_COUNT RETAIL_COUNT TRAIN_EXITS_COUNT HDB_COUNT geometry
H0001 1 0 0 0 0 0 0 POLYGON ((3970.122 27348.13…
H0002 1 0 0 0 0 0 0 POLYGON ((4345.122 27997.65…
H0003 2 0 0 0 0 0 0 POLYGON ((4345.122 30595.72…
H0004 1 0 0 0 0 0 0 POLYGON ((4720.122 28647.16…
H0005 4 0 0 0 5 0 0 POLYGON ((4720.122 29946.2,…
H0006 1 0 0 0 0 0 0 POLYGON ((4720.122 31245.24…

Removing unnecessary data

hdb_filtered_sf <- hdb_sf %>%
  filter(residential == "Y") %>%
  select(total_dwelling_units)

Adding HDB_DWELLING_COUNT

While HDB_COUNT can be a population proxy, we need to consider that HDB blocks have different sizes. For example, taller and wider blocks may have more units compared to shorter blocks, and therefore higher population.

hdb_vars <- hdb_vars %>%
  left_join(
    st_intersection(hdb_filtered_sf, hdb_vars) %>%
      st_drop_geometry() %>%
      group_by(HEX_ID) %>%
      summarise(HDB_RESIDENT_COUNT = sum(total_dwelling_units))
  )
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
Joining with `by = join_by(HEX_ID)`
kable(hdb_vars[160:165,])
HEX_ID HDB_COUNT HDB_RESIDENT_COUNT geometry
160 H0160 0 NA POLYGON ((17095.12 31894.76…
161 H0161 12 952 POLYGON ((17095.12 33193.8,…
162 H0162 0 NA POLYGON ((17095.12 34492.84…
163 H0163 29 3114 POLYGON ((17095.12 35791.87…
164 H0164 13 1136 POLYGON ((17095.12 37090.91…
165 H0165 44 3652 POLYGON ((17095.12 39688.99…
write_rds(hdb_vars, "Take-home Exercise 2/data/rds/hdb_vars.rds")
write_rds(attractiveness, "Take-home Exercise 2/data/rds/attractiveness.rds")
rm(hdb_csv)
rm(hdb_sf)
rm(hdb_filtered_sf)

8 Propulsiveness variables

honeycomb <- read_rds("Take-home Exercise 2/data/rds/honeycomb.rds")
hdb_vars <- read_rds("Take-home Exercise 2/data/rds/hdb_vars.rds")
dest_bus_hex <- read_rds("Take-home Exercise 2/data/rds/dest_bus_hex202310.rds")
propulsiveness <- honeycomb

HDB:

propulsiveness <- propulsiveness %>%
  left_join(st_drop_geometry(hdb_vars)) %>%
  select(HEX_ID, HDB_RESIDENT_COUNT)
Joining with `by = join_by(HEX_ID)`

BUS:

propulsiveness <- propulsiveness %>%
  left_join(st_drop_geometry(dest_bus_hex)) %>%
  rename(BUS_ALIGHT_COUNT = TRIPS)
Joining with `by = join_by(HEX_ID)`
propulsiveness[is.na(propulsiveness)] <- 0
kable(head(propulsiveness))
HEX_ID HDB_RESIDENT_COUNT BUS_ALIGHT_COUNT geometry
H0001 0 0 POLYGON ((3970.122 27348.13…
H0002 0 20 POLYGON ((4345.122 27997.65…
H0003 0 121 POLYGON ((4345.122 30595.72…
H0004 0 4 POLYGON ((4720.122 28647.16…
H0005 0 152 POLYGON ((4720.122 29946.2,…
H0006 0 204 POLYGON ((4720.122 31245.24…

Save point:

write_rds(propulsiveness, "Take-home Exercise 2/data/rds/propulsiveness202310.rds")
rm(dest_bus_hex)
rm(hdb_vars)

9 Generating distance table

honeycomb <- read_rds("Take-home Exercise 2/data/rds/honeycomb.rds")

Now that we have the attractive and propulsive forces, we can finally prepare the data for the distance decay component of the model.

9.1Generating distance matrix

We will use spDists() to generate the matrix from our honeycomb, which requires a Spatial data frame. We also need to name the columns and rows to the corresponding HEX_ID of the hexagons.

dist_mat <- spDists(as(honeycomb, "Spatial"),
                    longlat = FALSE)
colnames(dist_mat) <- paste0(honeycomb$HEX_ID)
rownames(dist_mat) <- paste0(honeycomb$HEX_ID)
kable(head(dist_mat, n=c(8, 8)))
H0001 H0002 H0003 H0004 H0005 H0006 H0007 H0008
H0001 0.000 750.000 3269.174 1500.000 2704.163 3968.627 1299.038 2250.000
H0002 750.000 0.000 2598.076 750.000 1984.313 3269.174 750.000 1500.000
H0003 3269.174 2598.076 0.000 1984.313 750.000 750.000 2704.163 1500.000
H0004 1500.000 750.000 1984.313 0.000 1299.038 2598.076 750.000 750.000
H0005 2704.163 1984.313 750.000 1299.038 0.000 1299.038 1984.313 750.000
H0006 3968.627 3269.174 750.000 2598.076 1299.038 0.000 3269.174 1984.313
H0007 1299.038 750.000 2704.163 750.000 1984.313 3269.174 0.000 1299.038
H0008 2250.000 1500.000 1500.000 750.000 750.000 1984.313 1299.038 0.000

9.2 Generating a pivot table

To generate data with the specifications we defined in Data Outputs, we must generate a pivot table from our distance matrix, dist_mat.

We will use melt(), for this purpose and rename the columns to names we defined in our modelling data shape.

dist_tbl <- melt(dist_mat) %>%
  rename(DISTANCE = value) %>%
  rename(ORIG_HEX_ID = Var1) %>%
  rename(DEST_HEX_ID = Var2)
kable(head(dist_tbl))
ORIG_HEX_ID DEST_HEX_ID DISTANCE
H0001 H0001 0.000
H0002 H0001 750.000
H0003 H0001 3269.174
H0004 H0001 1500.000
H0005 H0001 2704.163
H0006 H0001 3968.627

9.3 Setting intra-zonal distances

dist_tbl$DISTANCE[dist_tbl$ORIG_HEX_ID == dist_tbl$DEST_HEX_ID] <- 200
summary(dist_tbl$DISTANCE)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    200    8250   13332   14145   18929   44680 
write_rds(dist_tbl, "Take-home Exercise 2/data/rds/dist_tbl.rds")
rm(dist_mat)

10 Four Sim

10.1 Generate SIM_data

honeycomb <- read_rds("Take-home Exercise 2/data/rds/honeycomb.rds")
flowlines <- read_rds("Take-home Exercise 2/data/rds/flowlines202310.rds")
dist_tbl <- read_rds("Take-home Exercise 2/data/rds/dist_tbl.rds")
attractiveness <- read_rds("Take-home Exercise 2/data/rds/attractiveness.rds")
propulsiveness <- read_rds("Take-home Exercise 2/data/rds/propulsiveness202310.rds")

Now that we have all the components, we will now generate the geospatial data that contains the following columns:

  • ORIG_HEX_ID: ID corresponding to the origin zone

  • DEST_HEX_ID: ID corresponding to the destination zone

  • DISTANCE: Distance between the (centroids of) origin and destination zones

  • TRIPS: Number of bus trips between the origin and destination zones

  • DEST_*_COUNT: Values from [Attractiveness Variables Table (attractiveness)]

  • ORIG_*_COUNT: Values from [Propulsiveness Variables Table (propulsiveness)]

  • Geometry containing the flowlines

We will join the tables to generate this data.

Flowlines:

kable(head(flowlines))
ORIG_HEX_ID DEST_HEX_ID TRIPS geometry
H0002 H0016 1 LINESTRING (4345.122 28430….
H0002 H0017 2 LINESTRING (4345.122 28430….
H0002 H0032 16 LINESTRING (4345.122 28430….
H0003 H0005 1 LINESTRING (4345.122 31028….
H0003 H0022 56 LINESTRING (4345.122 31028….
H0003 H0028 10 LINESTRING (4345.122 31028….
SIM_data <- flowlines

Distance:

SIM_data <- SIM_data %>% left_join(dist_tbl)
Joining with `by = join_by(ORIG_HEX_ID, DEST_HEX_ID)`

Propulsive:

SIM_data <- left_join(
  SIM_data,
  propulsiveness %>%
    st_drop_geometry() %>%
    rename_with(~paste("ORIG_", .x, sep = ""))
  )
Joining with `by = join_by(ORIG_HEX_ID)`

Attractive:

SIM_data <- left_join(
  SIM_data,
  attractiveness %>%
    st_drop_geometry() %>%
    rename_with(~paste("DEST_", .x, sep = ""))
  )
Joining with `by = join_by(DEST_HEX_ID)`

We can now clear the data that we have used for “SIM_data” from the environment.

rm(attractiveness)
rm(propulsiveness)
rm(dist_tbl)
rm(flowlines)

10.2 Visualize SIM_data

summary(SIM_data)
  ORIG_HEX_ID     DEST_HEX_ID        TRIPS             DISTANCE    
 H0518  :  307   H0518  :  314   Min.   :    1.00   Min.   :  200  
 H0728  :  291   H0540  :  300   1st Qu.:    3.00   1st Qu.: 2704  
 H0521  :  285   H0530  :  297   Median :   12.00   Median : 5250  
 H0530  :  282   H0728  :  297   Mean   :   96.74   Mean   : 6117  
 H0491  :  278   H0521  :  290   3rd Qu.:   48.00   3rd Qu.: 8518  
 H0651  :  268   H0450  :  280   Max.   :33266.00   Max.   :24784  
 (Other):58135   (Other):58068                                     
 ORIG_HDB_RESIDENT_COUNT ORIG_BUS_ALIGHT_COUNT DEST_BUS_STOP_COUNT
 Min.   :   0            Min.   :     0        Min.   : 1.00      
 1st Qu.:   0            1st Qu.:  3459        1st Qu.: 5.00      
 Median : 932            Median :  8693        Median : 8.00      
 Mean   :1850            Mean   : 14148        Mean   : 7.96      
 3rd Qu.:3351            3rd Qu.: 16757        3rd Qu.:10.00      
 Max.   :7946            Max.   :108111        Max.   :19.00      
                                                                  
 DEST_ENTERTN_COUNT DEST_F_AND_B_COUNT DEST_LEISURE_COUNT DEST_RETAIL_COUNT
 Min.   :0.0000     Min.   :  0.000    Min.   : 0.000     Min.   :   0.00  
 1st Qu.:0.0000     1st Qu.:  0.000    1st Qu.: 0.000     1st Qu.:   7.00  
 Median :0.0000     Median :  0.000    Median : 1.000     Median :  31.00  
 Mean   :0.3442     Mean   :  5.696    Mean   : 2.336     Mean   :  98.31  
 3rd Qu.:0.0000     3rd Qu.:  2.000    3rd Qu.: 3.000     3rd Qu.: 110.00  
 Max.   :9.0000     Max.   :133.000    Max.   :41.000     Max.   :1678.00  
                                                                           
 DEST_TRAIN_EXITS_COUNT DEST_HDB_COUNT            geometry    
 Min.   : 0.000         Min.   :  0.00   LINESTRING   :59846  
 1st Qu.: 0.000         1st Qu.:  0.00   epsg:NA      :    0  
 Median : 0.000         Median : 12.00   +proj=tmer...:    0  
 Mean   : 1.375         Mean   : 20.81                        
 3rd Qu.: 2.000         3rd Qu.: 36.00                        
 Max.   :13.000         Max.   :103.00                        
                                                              

Now we will make finishing touches on SIM_data , so that they are compatible with modeling. We need to remove the 0’s as we will apply log function to them, which will result to undefined.

Set them to 0.99.

replace_zeroes <- function(data, col_name) {
  data[[col_name]][data[[col_name]] == 0] <- 0.99
  data
}
summary(SIM_data)
  ORIG_HEX_ID     DEST_HEX_ID        TRIPS             DISTANCE    
 H0518  :  307   H0518  :  314   Min.   :    1.00   Min.   :  200  
 H0728  :  291   H0540  :  300   1st Qu.:    3.00   1st Qu.: 2704  
 H0521  :  285   H0530  :  297   Median :   12.00   Median : 5250  
 H0530  :  282   H0728  :  297   Mean   :   96.74   Mean   : 6117  
 H0491  :  278   H0521  :  290   3rd Qu.:   48.00   3rd Qu.: 8518  
 H0651  :  268   H0450  :  280   Max.   :33266.00   Max.   :24784  
 (Other):58135   (Other):58068                                     
 ORIG_HDB_RESIDENT_COUNT ORIG_BUS_ALIGHT_COUNT DEST_BUS_STOP_COUNT
 Min.   :   0            Min.   :     0        Min.   : 1.00      
 1st Qu.:   0            1st Qu.:  3459        1st Qu.: 5.00      
 Median : 932            Median :  8693        Median : 8.00      
 Mean   :1850            Mean   : 14148        Mean   : 7.96      
 3rd Qu.:3351            3rd Qu.: 16757        3rd Qu.:10.00      
 Max.   :7946            Max.   :108111        Max.   :19.00      
                                                                  
 DEST_ENTERTN_COUNT DEST_F_AND_B_COUNT DEST_LEISURE_COUNT DEST_RETAIL_COUNT
 Min.   :0.0000     Min.   :  0.000    Min.   : 0.000     Min.   :   0.00  
 1st Qu.:0.0000     1st Qu.:  0.000    1st Qu.: 0.000     1st Qu.:   7.00  
 Median :0.0000     Median :  0.000    Median : 1.000     Median :  31.00  
 Mean   :0.3442     Mean   :  5.696    Mean   : 2.336     Mean   :  98.31  
 3rd Qu.:0.0000     3rd Qu.:  2.000    3rd Qu.: 3.000     3rd Qu.: 110.00  
 Max.   :9.0000     Max.   :133.000    Max.   :41.000     Max.   :1678.00  
                                                                           
 DEST_TRAIN_EXITS_COUNT DEST_HDB_COUNT            geometry    
 Min.   : 0.000         Min.   :  0.00   LINESTRING   :59846  
 1st Qu.: 0.000         1st Qu.:  0.00   epsg:NA      :    0  
 Median : 0.000         Median : 12.00   +proj=tmer...:    0  
 Mean   : 1.375         Mean   : 20.81                        
 3rd Qu.: 2.000         3rd Qu.: 36.00                        
 Max.   :13.000         Max.   :103.00                        
                                                              

Save point:

write_rds(SIM_data, "Take-home Exercise 2/data/rds/SIM_data202310.rds")
rm(SIM_data)
rm(replace_zeroes)

11 Visualize Spatial Interactions

mpsz <- read_rds("Take-home Exercise 2/data/rds/mpsz.rds")
honeycomb <- read_rds("Take-home Exercise 2/data/rds/honeycomb.rds")
flowlines <- read_rds("Take-home Exercise 2/data/rds/flowlines202310.rds")
SIM_data <- read_rds("Take-home Exercise 2/data/rds/SIM_data202310.rds")
propulsiveness <- read_rds("Take-home Exercise 2/data/rds/propulsiveness202310.rds")
attractiveness <- read_rds("Take-home Exercise 2/data/rds/attractiveness.rds")

11.1 Visualize flowlines

flowlines_no_intra <- flowlines %>%
  filter(ORIG_HEX_ID != DEST_HEX_ID)
quantile(flowlines_no_intra$TRIPS,
         probs = c(0.8,0.9,0.95,0.99,1))
     80%      90%      95%      99%     100% 
   67.00   171.00   377.00  1519.14 33266.00 

Let’s take a look at the map.

invalid_geoms <- which(!st_is_valid(mpsz))

# If there are invalid geometries, fix them
if(length(invalid_geoms) > 0) {
  mpsz[invalid_geoms, ] <- st_make_valid(mpsz[invalid_geoms, ])
}
tm_shape(mpsz) +
  tm_polygons("grey", title = "Singapore Boundary", alpha = 0.5) +
  
  tm_shape(honeycomb) +
  tm_polygons(col = "white", title = "Hexagons", alpha = 1) +
  
  tm_shape(flowlines_no_intra %>% filter(TRIPS > 376)) +
  tm_lines(lwd = "TRIPS",
           style = "quantile",
           col = "red",
           scale = c(0.1, 1, 3, 5, 7),
           title.lwd = "# of bus trips",
           alpha = 0.8) +
  
  tm_layout(main.title = "Top 5% Bus Passenger flow for Weekends/Holidays 11 AM - 2PM (October 2023)",
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  
  tm_compass(type="8star", size = 2, bg.color = "white", bg.alpha = 0.5) +
  tm_scale_bar(bg.color = "white", bg.alpha = 0.5) +
  tm_grid(alpha = 0.2)
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

Insights

It is notable that the thickest flow lines are for relatively short distances, like the bus rides to and from Woodlands Checkpoint to Kranji Station. We can notice these thick lines on busy zones where lines converge as well.

Although not as thick, we can notice longer long lines on the map. This can indicate people more willing to travel longer distances over the weekend for recreation and meeting family and friends. This also means that a location being close by is not the only motivator for people to visit a place. Spatial interaction model can reveal more patterns to this.

12 Plot Chart: TRIPS & DISTANCE

ggplot(SIM_data,
       aes(x = DISTANCE, y = TRIPS)) +
  geom_point() +
  geom_hline(yintercept = 376.25, color = "red", linetype = "dashed") +
  annotate("text", x = 20000,
           y = 600, label = "95th percentile",
           hjust = -0.1, color = "red", size = 3) +
  geom_hline(yintercept = 1510, color = "blue", linetype = "dashed") +
  annotate("text", x = 20000,
           y = 1800, label = "99th percentile",
           hjust = -0.1, color = "blue", size = 3) +
  labs(title = "Number of Trips as a Function of Distance",
       x = "Distance (m)",
       y = "Number of Trips")

Plotting it in a log scale shows a more linear relationship.

ggplot(SIM_data,
       aes(x = log(DISTANCE), y = log(TRIPS))) +
  geom_point() +
  geom_smooth(method = lm)
`geom_smooth()` using formula = 'y ~ x'

Insights

The maximum number of trips exponentially decrease as the distance increases, which means that generally, the farther the distance, the less trips there are.

However, some outliers can be observed like some zone pairs with almost 20km distance between them having close to 99th percentile of TRIP values. In these zone pairs, there could be strong propulsive or attractive forces attracting passengers to ride the bus between those zones.

13 Visualize propulsive forces

plot_propulsive <- function(var_name, title_comp) {
  tm_shape(mpsz) +
  tm_polygons("gray", title = "Singapore Boundary") +
  
  # Adding this layer underneath propulsiveness as we removed 0s. from the map
  # so it won't skew the legend
  tm_shape(honeycomb) +
  tm_polygons(col = "white") +
  
  tm_shape(propulsiveness %>% filter(if_any(var_name, ~. >= 1))) +
  tm_polygons(var_name, palette = "Blues", style = "quantile") +
    
  tm_shape(flowlines_no_intra %>% filter(TRIPS > 376)) +
  tm_lines(lwd = "TRIPS",
           style = "quantile",
           col = "orange",
           scale = c(0.1, 1, 3, 5, 7, 10),
           title.lwd = "# of bus trips",
           n = 6,
           alpha = 0.5) +
  
  tm_layout(main.title = paste("Top 5% Bus Passenger Flows and", title_comp),
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  
  tm_scale_bar(bg.color = "white", bg.alpha = 0.7, position = c("right", "top")) +
  tm_compass(type="8star", size = 2, bg.color = "white",
             bg.alpha = 0.5, position = c("right", "top")) +
  tm_grid(alpha = 0.2) +
  tm_credits("*Passenger data from weekend/holidays 11AM - 2PM\n(October 2023)",
             bg.color = "white", bg.alpha = 0.7,
             position = c("left", "bottom"))
}

13.1 HDB Residents

plot_propulsive("HDB_RESIDENT_COUNT", "HDB POPULATION")
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(var_name)

  # Now:
  data %>% select(all_of(var_name))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

13.2 Transfer from BUS

plot_propulsive("BUS_ALIGHT_COUNT", "Transfer by BUS")
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

Insights

Upon visual inspection, HDB population and bus alights from zones correspond closely with our flowlines.

14 Visualize attractive forces

In this section, we plot attractive forces in their own choropleth maps to see if the top 5% flows correspond to areas we hypothesize have high attractive forces.

plot_attractive <- function(var_name, title_comp) {
  tm_shape(mpsz) +
  tm_polygons("gray", title = "Singapore Boundary") +
  
  # Adding this layer underneath attractiveness as we removed 0s. from the map
  # so it won't skew the legend
  tm_shape(honeycomb) +
  tm_polygons(col = "white") +
  
  tm_shape(attractiveness %>% filter(if_any(var_name, ~. >= 1))) +
  tm_polygons(var_name, palette = "Purples", style = "quantile") +
    
  tm_shape(flowlines_no_intra %>% filter(TRIPS > 376)) +
  tm_lines(lwd = "TRIPS",
           style = "quantile",
           col = "red",
           scale = c(0.1, 1, 3, 5, 7, 10),
           title.lwd = "# of bus trips",
           n = 6,
           alpha = 0.5) +
  
  tm_layout(main.title = paste("Top 5% Bus Passenger Flows and", title_comp),
            main.title.position = "center",
            main.title.size = 1.0,
            legend.height = 0.35, 
            legend.width = 0.35,
            frame = TRUE) +
  
  tm_scale_bar(bg.color = "white", bg.alpha = 0.7, position = c("right", "top")) +
  tm_compass(type="8star", size = 2, bg.color = "white",
             bg.alpha = 0.5, position = c("right", "top")) +
  tm_grid(alpha = 0.2) +
  tm_credits("*Passenger data from weekend/holidays 11AM - 2PM\n(October 2023)",
             bg.color = "white", bg.alpha = 0.7,
             position = c("left", "bottom"))
}

14.1 Busstops

plot_attractive("BUS_STOP_COUNT", "Number of Busstops")
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.2 HDB

plot_attractive("HDB_COUNT", "Number of HDB")
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.3 Entertainment

plot_attractive("ENTERTN_COUNT", "Number of Entertainment Items")
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.4 F&B

plot_attractive("F_AND_B_COUNT", "Number of F&B Outlets")
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.5 Leisure&Recreation

plot_attractive("LEISURE_COUNT", "Number of Leisure&Recreation Items")
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

14.6 Retail

plot_attractive("RETAIL_COUNT", "Number of Retail")
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length


Insights

There are less location types than the others, like entertainment and leisure locations.

There are also location types that are heavily concentrated in some zones, like the entertainment and F&B, which are heavily concentrated around the Orchard area. Conversely, there are much less HDBs in Orchard area. Even though the flowlines are not thick in this area, there are many flowlines, although thin. This means people are coming from various parts of Singapore.

Save point

write_rds(flowlines_no_intra, "Take-home Exercise 2/data/rds/flowines_no_intra202310.rds")
rm(flowlines)
rm(attractiveness)
rm(propulsiveness)
rm(plot_attractive)
rm(plot_propulsive)