NJ Transit, the second largest commuter rail network in the United States, serves passengers traveling between New Jersey and New York City. With the return of office workers, train delays have worsened. However, there is a lack of interactive applications that predict real-time delays on commuters’ daily routes, leaving passengers without immediate insight into potential disruptions. To address this issue, our project focuses on analyzing delay performance across NJ Transit’s commuter rail routes and developing an interactive tool—Delay Detective—that offers real-time delay predictions for commuters.
Delay Detective is designed to provide commuters with real-time notifications, such as estimated arrival times, on-time performance, average delays, and passenger feedback. This tool aims to help passengers adjust their travel plans and mitigate the impact of delays before they occur.
To develop the predictive tool, we integrated multiple data sources:
Geospatial Data: We collected data on NJ Transit rail lines and stations, ensuring consistency in station and line names. This allowed us to map train routes and their corresponding locations.(https://njogis-newjersey.opendata.arcgis.com)
Delay Data: We merged train delay data in April 2020 with station information, enabling us to associate delays with specific stations and track the performance between different station pairs.(https://www.kaggle.com/datasets/pranavbadami/nj-transit-amtrak-nec-performance)
Weather and Census Data: Weather data(https://mesonet.agron.iastate.edu/request/download.phtml) from Newark Liberty Airport (EWR) and census data(https://data.census.gov/) on demographics and commuting patterns were incorporated to understand how external factors like weather and population density might influence delays.
#geometry data
line <- st_read('https://services6.arcgis.com/M0t0HPE53pFK525U/arcgis/rest/services/NJTRANSIT_RAIL_LINES_1/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson')%>%
mutate(LINE_NAME = ifelse(LINE_NAME == 'Bergen County Line','Bergen Co. Line ',LINE_NAME),
LINE_NAME = ifelse(LINE_NAME == 'Montclair-Boonton Line','Montclair-Boonton',LINE_NAME),
LINE_NAME = ifelse(LINE_NAME == 'North Jersey Coast Line','No Jersey Coast',LINE_NAME),
LINE_NAME = ifelse(LINE_NAME == 'Northeast Corridor','Northeast Corrdr',LINE_NAME),
LINE_NAME = ifelse(LINE_NAME == 'Pascack Valley Line','Pascack Valley',LINE_NAME),
LINE_NAME = ifelse(LINE_NAME == 'Princeton Dinky','Princeton Shuttle',LINE_NAME),
LINE_NAME = ifelse(LINE_NAME == 'Raritan Valley Line','Raritan Valley',LINE_NAME))%>%
dplyr::select(LINE_NAME,geometry)
stop <- st_read("https://services6.arcgis.com/M0t0HPE53pFK525U/arcgis/rest/services/NJTransit_Rail_Stations/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson") %>%
mutate(STATION_ID = ifelse(STATION_ID == 'Atlantic City', 'Atlantic City Rail Terminal', STATION_ID),
STATION_ID = ifelse((STATION_ID == 'Middletown')&(COUNTY == 'Orange, NY'), 'Middletown NY', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Princeton Jct.', 'Princeton Junction', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Secaucus Junction Upper Level', 'Secaucus Upper Lvl', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Anderson Street-Hackensack', 'Anderson Street', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Bay Street-Montclair', 'Bay Street', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Broadway', 'Broadway Fair Lawn', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Essex Street-Hackensack', 'Essex Street', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Glen Rock-Boro Hall', 'Glen Rock Boro Hall', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Glen Rock-Main', 'Glen Rock Main Line', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Hoboken Terminal', 'Hoboken', STATION_ID),
STATION_ID = ifelse((STATION_ID == 'Middletown')&(COUNTY == 'Monmouth'), 'Middletown NJ', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Montclair St Univ', 'Montclair State U', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Mountain View-Wayne', 'Mountain View', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Pennsauken Transit Center', 'Pennsauken', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Radburn', 'Radburn Fair Lawn', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Ramsey', 'Ramsey Main St', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Rte 17 Ramsey', 'Ramsey Route 17', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Secaucus Junction Lower Level', 'Secaucus Lower Lvl', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Teterboro-Williams Ave', 'Teterboro', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Watsessing', 'Watsessing Avenue', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Wayne Route 23 Transit Center', 'Wayne-Route 23', STATION_ID),
STATION_ID = ifelse(STATION_ID == 'Wood-Ridge', 'Wood Ridge', STATION_ID),
STATION_ID = ifelse(STATION_ID == '30th Street Station', 'Philadelphia', STATION_ID),
line_intersct = str_count(RAIL_SERVICE, ",") + 1)%>%
dplyr::select(STATION_ID,LATITUDE,LONGITUDE,line_intersct)%>%
st_drop_geometry()
id <- '1V_kl3QKOxrTlwA8UG7kdv_VpJdaojKMj'
delay_df <- read.csv(sprintf("/Users/bin/Library/Containers/com.tencent.xinWeChat/Data/Library/Application Support/com.tencent.xinWeChat/2.0b4.0.9/a2dd23c536e578a6424a380f6cf3201a/Message/MessageTemp/694ef1a3988d0d1b616f542678496647/File/2020_04.csv", id))%>%
filter(type == 'NJ Transit')%>%
na.omit()
merged_dataset <- merge(delay_df, stop, by.x = "from", by.y = "STATION_ID",all.x = TRUE)
merged_dataset <- merge(merged_dataset, stop, by.x = "to", by.y = "STATION_ID",all.x = TRUE)
merged_dataset <- merged_dataset%>%
filter(to != 'Mount Arlington')%>%
filter(from != 'Mount Arlington')%>%
rename(from_lat = LATITUDE.x,
from_lon = LONGITUDE.x,
from_inter = line_intersct.x,
to_lat = LATITUDE.y,
to_lon = LONGITUDE.y,
to_inter = line_intersct.y
)%>%
mutate(distance = distHaversine(cbind(from_lon, from_lat), cbind(to_lon, to_lat)),
interval60 = floor_date(ymd_hms(scheduled_time), unit = "hour"),
week = week(interval60),
dotw = wday(interval60, label=TRUE),
time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 19 ~ "Overnight",
hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
hour(interval60) >= 15 & hour(interval60) <= 19 ~ "PM Rush"),
weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday"))
#wweather data
weather.Panel <-
riem_measures(station = "EWR", date_start = "2020-04-01", date_end = "2020-05-01") %>%
dplyr::select(valid, tmpf, p01i, sknt)%>%
replace(is.na(.), 0) %>%
mutate(interval60 = ymd_h(substr(valid,1,13))) %>%
mutate(week = week(interval60),
dotw = wday(interval60, label=TRUE)) %>%
group_by(interval60) %>%
summarize(Temperature = max(tmpf),
Precipitation = sum(p01i),
Wind_Speed = max(sknt)) %>%
mutate(Temperature = ifelse(Temperature == 0, 42, Temperature))
# census data
NJCensus <-
get_acs(geography = "county subdivision",
variables = c("B01003_001", "B19013_001",
"B02001_002", "B08013_001",
"B08012_001", "B08301_001",
"B08301_010", "B01002_001"),
year = 2019,
state = "NJ",
geometry = TRUE,
output = "wide") %>%
rename(Total_Pop = B01003_001E,
Med_Inc = B19013_001E,
Med_Age = B01002_001E,
White_Pop = B02001_002E,
Travel_Time = B08013_001E,
Num_Commuters = B08012_001E,
Means_of_Transport = B08301_001E,
Total_Public_Trans = B08301_010E) %>%
select(Total_Pop, Med_Inc, White_Pop, Travel_Time,
Means_of_Transport, Total_Public_Trans,
Med_Age,
GEOID, geometry) %>%
mutate(Percent_White = White_Pop / Total_Pop,
Mean_Commute_Time = Travel_Time / Total_Public_Trans,
Percent_Taking_Public_Trans = Total_Public_Trans / Means_of_Transport)
NJCensus_select <- NJCensus%>%
mutate(bigcity = ifelse(Total_Pop >= 100000, 'big city','small city'))%>%
select(geometry, Total_Pop,bigcity, GEOID)
NJTracts <-
NJCensus %>%
as.data.frame() %>%
distinct(GEOID, .keep_all = TRUE) %>%
select(GEOID, geometry) %>%
st_sf
train_census <- st_join(merged_dataset %>%
filter(is.na(from_lon) == FALSE &
is.na(from_lat) == FALSE &
is.na(to_lat) == FALSE &
is.na(to_lon) == FALSE) %>%
st_as_sf(., coords = c("from_lon", "from_lat"), crs = 4326),
NJTracts %>%
st_transform(crs=4326),
join=st_intersects,
left = TRUE) %>%
rename(From.Tract = GEOID) %>%
mutate(from_lon = unlist(map(geometry, 1)),
from_lat = unlist(map(geometry, 2)))%>%
as.data.frame() %>%
select(-geometry)
train_census <- train_census %>%
st_as_sf(., coords = c("to_lon", "to_lat"), crs = 4326) %>%
st_join(., NJTracts %>%
st_transform(crs=4326),
join=st_intersects,
left = TRUE) %>%
rename(To.Tract = GEOID) %>%
mutate(to_lon = unlist(map(geometry, 1)),
to_lat = unlist(map(geometry, 2)))%>%
as.data.frame() %>%
select(-geometry)
train_dataset <-train_census %>%
left_join(weather.Panel, by ="interval60")
merged_dataset <- train_dataset %>%
left_join(NJCensus_select, by = c("From.Tract" = "GEOID")) %>%
left_join(NJCensus_select, by =c("To.Tract"="GEOID")) %>%
select(-geometry.x,-geometry.y) %>%
rename(From_Total_Pop = Total_Pop.x,
To_Total_Pop = Total_Pop.y,
From_city = bigcity.x,
To_city = bigcity.y)%>%
mutate(From_Total_Pop = ifelse(from == "Philadelphia", 1579075, From_Total_Pop),
To_Total_Pop = ifelse(to == "Philadelphia", 1579075, To_Total_Pop),
From_Total_Pop = ifelse(from == "Middletown NY", 1631993, From_Total_Pop),
To_Total_Pop = ifelse(to == "Middletown NY", 1631993, To_Total_Pop),
From_city = ifelse(from == "Philadelphia", 'big city', From_city),
To_city = ifelse(to == "Philadelphia", 'big city', To_city),
From_city = ifelse(from == "Middletown NY", 'big city', From_city),
To_city = ifelse(to == "Middletown NY", 'big city', To_city))
median_value_f <- median(merged_dataset$From_Total_Pop, na.rm = TRUE)
merged_dataset$From_Total_Pop[is.na(merged_dataset$From_Total_Pop)] <- median_value_f
median_value_t <- median(merged_dataset$To_Total_Pop, na.rm = TRUE)
merged_dataset$To_Total_Pop[is.na(merged_dataset$To_Total_Pop)] <- median_value_t
merged_dataset <- merged_dataset%>%
mutate(From_city = ifelse(From_Total_Pop == 24784, 'big city', From_city),
To_city = ifelse(To_Total_Pop == 24784, 'big city', To_city))
From the temporal series analysis, we observed consistent patterns in NJ Transit delays. Weekday delays averaged shorter durations compared to weekends, where delays were more significant, particularly in the PM rush and overnight phases. During a typical day, delays peak between 2:00 a.m. and 3:00 a.m., gradually increasing from early morning and stabilizing during midday hours.
Analyzing stop sequences, we found that average delay time grows progressively with the stop sequence, highlighting cumulative operational inefficiencies across extended routes. This insight is particularly evident in sequences later in the journey.
Looking at time-of-day categorization, delays were longest during the PM rush, followed closely by the overnight phase. These findings were consistent across both weekday and weekend datasets, though weekend delays remained slightly higher overall.
A monthly delay distribution for April 2020 demonstrated periodic spikes, reflecting irregularities likely linked to external factors like weather or operational disruptions. Despite these outliers, the general trend suggested a recurring daily cycle, reinforcing the temporal reliability of the dataset.
delay_time <- merged_dataset %>%
group_by(time_of_day) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE), .groups = "drop")
delay_day <- merged_dataset %>%
group_by(dotw) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE), .groups = "drop")
delay_week <- merged_dataset %>%
group_by(weekend) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE), .groups = "drop")
delay_hour <- merged_dataset %>%
group_by(hour(interval60)) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE), .groups = "drop") %>%
rename(hour = 'hour(interval60)')
delay_sequence <- merged_dataset %>%
group_by(stop_sequence) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE), .groups = "drop")
delay_time_week <- merged_dataset %>%
group_by(time_of_day, weekend) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE), .groups = "drop")
delay_week_hour <- merged_dataset %>%
group_by(weekend, hour(interval60)) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE), .groups = "drop") %>%
rename(hour = 'hour(interval60)')
palette2 <- c("#264653", "#2a9d8f")
library(gridExtra)
grid.arrange(
ggplot(data = delay_day, aes(x = dotw, y = mean_delay)) +
geom_bar(stat = "identity", fill = "#2a9d8f") +
labs(title = "Delay minutes in a week", x = "Day of The Week", y = "Mean Delay") +
theme_minimal(),
ggplot(data = delay_week, aes(x = weekend, y = mean_delay, fill = weekend)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = palette2) +
labs(title = "Delay comparison in Weekend", x = "Weekend or Weekday", y = "Mean Delay") +
theme_minimal(),
ggplot(data = delay_hour, aes(x = hour, y = mean_delay)) +
geom_bar(stat = "identity", fill = "#2a9d8f") +
labs(title = "Delay minutes in 24 hours", x = "Hour in a day", y = "Mean Delay") +
theme_minimal(),
ggplot(data = delay_sequence, aes(x = stop_sequence, y = mean_delay)) +
geom_bar(stat = "identity", fill = "#2a9d8f") +
labs(title = "Delay minutes in each sequence", x = "Stop Sequence", y = "Mean Delay") +
theme_minimal(),
ggplot(data = delay_time, aes(x = time_of_day, y = mean_delay)) +
geom_bar(stat = "identity", fill = "#2a9d8f") +
labs(title = "Delay minutes by time of day", x = "Time of Day", y = "Mean Delay") +
theme_minimal(),
ggplot(data = delay_week_hour, aes(x = hour, y = mean_delay, color = weekend)) +
geom_line(size = 1) +
scale_color_manual(values = palette2) +
labs(title = "Delay under weekend and time", x = "Hour", y = "Delay Minutes") +
theme_minimal(),
nrow = 3
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
merged_dataset %>%
dplyr::select(interval60, from, delay_minutes) %>%
gather(Variable, Value, -interval60, -from) %>%
group_by(Variable, interval60) %>%
summarize(Value = mean(Value), .groups = "drop") %>%
ggplot(aes(interval60, Value)) +
geom_line(size = 0.8, colour = "#2a9d8f") +
labs(title = "Delay distribution in A Month", subtitle = "NJ, Apr, 2020",
x = "Day", y = "Mean Delay") +
theme_minimal()
The analysis highlights clear patterns in delays across NJ Transit’s network, pointing to areas that need immediate attention. The Atlantic City Line consistently experiences the longest delays, standing out as a major issue. Other lines, such as the Morristown and Pascack Valley Lines, also show above-average delays, though not as severe.
When it comes to stations, hubs like the Atlantic City Rail Terminal, Pennsauken, and Philadelphia are among the worst performers in terms of delay times. Given their importance in the network, delays at these key locations likely create ripple effects across the system, making them a critical focus for improvement. Interestingly, when comparing delays between larger cities (population over 100,000) and smaller towns, there’s not much difference. This suggests that delays are more tied to operational challenges than to the size or type of city the station serves.
The maps provide further insight. High-delay stations tend to cluster in urban areas and along specific lines like the Atlantic City Line, but there’s no clear difference in delays based on direction—origins and destinations show similar patterns. This reinforces the idea that systemic issues, rather than isolated factors, are at play. The findings point to a need for targeted action, particularly on the Atlantic City Line and at high-delay stations like Atlantic City Rail Terminal. Tackling these problem areas could significantly improve reliability and overall commuter satisfaction across the network.
delay_line <- merged_dataset %>%
group_by(line)%>%
summarize(mean_delay = mean(delay_minutes))%>%
arrange(., mean_delay)
delay_from <- merged_dataset %>%
group_by(from)%>%
summarize(mean_delay = mean(delay_minutes))%>%
arrange(., -mean_delay)%>%
head(20)
big_from <- merged_dataset %>%
group_by(From_city)%>%
summarize(mean_delay = mean(delay_minutes))%>%
mutate(status = 'from')%>%
rename(city_type = From_city)
big_to <- merged_dataset %>%
group_by(To_city)%>%
summarize(mean_delay = mean(delay_minutes))%>%
mutate(status = 'to')%>%
rename(city_type = To_city)
grid.arrange(ggplot(data = delay_line, aes(x = line, y = mean_delay, fill = mean_delay)) +
geom_col(position = "dodge")+
labs(title = "Delay minutes comparison in lines", x = "line", y = "Mean Delay") +
scale_fill_gradient(low = "#2a9d8f", high = "#264653") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 15, hjust = 1)) ,
ggplot(data = rbind(big_from,big_to), aes(x = status , y = mean_delay, fill = city_type)) +
geom_col(position = "dodge")+
labs(title = "Delay minutes comparison in big and small city", x = "City Type", y = "Mean Delay") +
scale_fill_manual(values = palette2)+
theme_minimal()+
theme(axis.text.x = element_text(angle = 0 , hjust = 1)),
ggplot(data = delay_from, aes(x = from, y = mean_delay, fill = mean_delay)) +
geom_col(position = "dodge")+
labs(title = "Delay minutes comparison in stations", y = "Mean Delay") +
scale_fill_gradient(low = "#2a9d8f", high = "#264653") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 30, hjust = 1)))
map_from <- merged_dataset %>%
group_by(from) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE)) %>% # 添加 na.rm = TRUE
left_join(stop, by = c('from' = 'STATION_ID')) %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326) %>%
mutate(status = 'Orientation') %>%
rename(station = from)
map_to <- merged_dataset %>%
group_by(to) %>%
summarize(mean_delay = mean(delay_minutes, na.rm = TRUE)) %>% # 添加 na.rm = TRUE
left_join(stop, by = c('to' = 'STATION_ID')) %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326) %>%
mutate(status = 'Destination') %>%
rename(station = to)
ggplot() +
geom_sf(data = NJTracts, color = 'grey') +
geom_sf(data = rbind(map_from, map_to), aes(size = mean_delay, color = mean_delay), alpha = 0.5) +
scale_colour_viridis(direction = -1, discrete = FALSE, option = "D") +
scale_size_continuous(name = "Delay Minutes") +
coord_sf() +
labs(title = "Delayed Time in Station, April, 2020") +
facet_grid(~status) +
theme_minimal()
Beyond the temporal and spatial dimensions, we analyzed how external factors like weather and operational characteristics influence train delays. The weather data from Newark (EWR) for April 2020 reveals some notable patterns. While precipitation was minimal for most of the month, a few significant rainfall events occurred around April 13 and April 20. These spikes likely contributed to delays by creating wet tracks and reducing visibility. Wind speeds, on the other hand, fluctuated significantly throughout the month, with peaks aligning with the same time periods. High winds can disrupt operations, causing delays due to safety concerns or infrastructure impacts. Temperature data, which ranged from 40°F to 70°F, showed a consistent daily cycle with no extreme variations during the observed period. While moderate temperatures appear to have a limited effect on delays, more extreme conditions might lead to larger impacts.
In terms of operational factors, the number of rail intersections and route distance emerged as key contributors to delays. Delays increased significantly in areas with more intersections, peaking at four intersections, where congestion and signaling complexity likely caused bottlenecks. However, delays decreased slightly in areas with six intersections, potentially due to better traffic management or infrastructure enhancements. Similarly, longer travel distances showed a positive correlation with delays, suggesting that cumulative inefficiencies along extended routes contribute to increased travel times.
When examining the interaction between rain and delays, we found that rainy conditions resulted in slightly higher delays compared to no-rain scenarios. Although the difference was not drastic, the consistent increase suggests that rainy weather exacerbates operational challenges. Finally, delays showed a minor upward trend with rising temperatures, likely reflecting seasonal impacts or slight variations in infrastructure performance.
grid.arrange(
ggplot(weather.Panel, aes(interval60,Precipitation)) + geom_line(color="#2a9d8f") +
labs(title="Percipitation", x="Hour", y="Perecipitation") + theme_minimal(),
ggplot(weather.Panel, aes(interval60,Wind_Speed)) + geom_line(color="#2a9d8f") +
labs(title="Wind Speed", x="Hour", y="Wind Speed") + theme_minimal(),
ggplot(weather.Panel, aes(interval60,Temperature)) + geom_line(color="#2a9d8f") +
labs(title="Temperature", x="Hour", y="Temperature") + theme_minimal(),
top="Weather Data - NJ EWR - Apr, 2020")
delay_distance <- merged_dataset %>%
group_by(distance)%>%
summarize(mean_delay = mean(delay_minutes))
delay_intersct <- rbind(
merged_dataset %>%
group_by(from_inter)%>%
summarize(mean_delay = mean(delay_minutes))%>%
mutate(status = 'from')%>%
rename(inter = from_inter),
merged_dataset %>%
group_by(to_inter)%>%
summarize(mean_delay = mean(delay_minutes))%>%
mutate(status = 'to')%>%
rename(inter = to_inter))
delay_rain_week <- merged_dataset %>%
mutate(rain = ifelse(Precipitation == 0,'NoRain','Rain'))%>%
group_by(rain)%>%
summarize(mean_delay = mean(delay_minutes))
delay_temp <- merged_dataset %>%
group_by(Temperature)%>%
summarize(mean_delay = mean(delay_minutes))
# just calculate the intersecation delay
grid.arrange(
ggplot(data = delay_intersct, aes(x = inter, y = mean_delay,fill=status)) +
geom_col(position = "dodge") +
scale_fill_manual(values = palette2) +
labs(title = "Delay minutes in each intersection", x = "Num of Intersection", y = "Mean Delay") +
theme_minimal(),
ggplot(data = delay_rain_week, aes(x = rain, y = mean_delay,fill=rain)) +
geom_col(position = "dodge") +
scale_fill_manual(values = palette2) +
labs(title = "Delay minutes with Rain", x = "Rain", y = "Mean Delay") +
theme_minimal(),
ggplot(data = delay_distance, aes(x = distance, y = mean_delay)) +
geom_line(color = "#2a9d8f") +
labs(title = "The relationship between delay and distance", x = "Distance", y = "Value") +
geom_smooth(method = "lm", se = TRUE)+
theme_minimal(),
ggplot(data = delay_temp, aes(x = Temperature, y = mean_delay)) +
geom_line(color = "#2a9d8f") +
labs(title = "The relationship between delay and temperature", x = "Temperature", y = "Value") +
geom_smooth(method = "lm", se = TRUE)+
theme_minimal())
The animation provides a detailed hourly breakdown of station delays across NJ Transit’s network for a single day in April 2020. Larger delays are concentrated at stations with a higher number of line intersections, which act as operational bottlenecks due to increased traffic and signaling demands. Delays fluctuate significantly throughout the day, with peaks likely corresponding to morning and evening rush hours, reflecting the impact of commuter patterns on system performance.
delay_stop_time <- merged_dataset %>%
group_by(from,to,hour(interval60))%>%
summarize(mean_delay = mean(delay_minutes))%>%
rename(hour = 'hour(interval60)')%>%
left_join(stop,by=c('from'='STATION_ID'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)
ggplot() +
geom_sf(data = NJTracts, color = 'grey') +
geom_sf(data = delay_stop_time, aes(size = line_intersct,color = mean_delay)) +
scale_colour_viridis(direction = -1,discrete = FALSE, option = "D") +
labs(title = "Station Delay For One Day in Apr, 2020",
subtitle = "Hours in a day: {current_frame}") +
transition_manual(hour)+ mapTheme()+theme_minimal()
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This visualization compares station delays on weekdays and weekends, highlighting variations based on the number of intersecting lines. Stations with more intersections experience consistently higher delays, particularly on weekends, likely due to reduced operational resources or traffic management adjustments. The color gradient and size of the points emphasize the correlation between intersection complexity and average delays.
delay_intersct_week_f <-merged_dataset %>%
group_by(from_inter,weekend)%>%
summarize(mean_delay = mean(delay_minutes))
inter_stop <- stop%>%
right_join(delay_intersct_week_f,by=c('line_intersct' = 'from_inter'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)
ggplot() +
geom_sf(data = NJTracts, color = 'grey') +
geom_sf(data = inter_stop, aes(size = line_intersct,color = mean_delay)) +
scale_colour_viridis(direction = -1,discrete = FALSE, option = "D") +
labs(title = "Station Delay Intersection Comparison, 2020") +
facet_wrap(~weekend)+ mapTheme()+theme_minimal()
Delays over a week reveal cyclical patterns influenced by the day of the week. Weekends generally show higher average delays at both origin and destination stations, suggesting that limited weekend service capacity might exacerbate delays. The animation highlights that delays are spatially distributed across the network but are more pronounced at major transit hubs.
delay_to_time <- merged_dataset %>%
group_by(to,dotw)%>%
summarize(mean_delay = mean(delay_minutes))%>%
rename(Day = dotw)%>%
left_join(stop,by=c('to'='STATION_ID'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)%>%
mutate(status = 'Orientation')%>%
rename(station = to)
delay_from_time <- merged_dataset %>%
group_by(from,dotw)%>%
summarize(mean_delay = mean(delay_minutes))%>%
rename(Day = dotw)%>%
left_join(stop,by=c('from'='STATION_ID'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)%>%
mutate(status = 'Destination')%>%
rename(station = from)
ggplot() +
geom_sf(data = NJTracts, color = 'grey') +
geom_sf(data = rbind(delay_from_time,delay_to_time), aes(size =mean_delay, color = mean_delay)) +
scale_colour_viridis(direction = -1,discrete = FALSE, option = "D") +
labs(title = "Station Delay For One Week in Apr, 2020",
subtitle = "Day in a week: {current_frame}") +
facet_wrap(~status)+
transition_manual(Day)+ mapTheme()+theme_minimal()
Over the course of a month, the patterns remain consistent, with weekly cycles and periodic spikes in delays. These spikes could be tied to external factors, such as weather events or temporary operational disruptions. The visualization effectively illustrates the persistence of delays at key stations, reinforcing the need for targeted interventions.
delay_to_time <- merged_dataset %>%
group_by(to,week)%>%
summarize(mean_delay = mean(delay_minutes))%>%
left_join(stop,by=c('to'='STATION_ID'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)%>%
mutate(status = 'Orientation')%>%
rename(station = to)
delay_from_time <- merged_dataset %>%
group_by(from,week)%>%
summarize(mean_delay = mean(delay_minutes))%>%
left_join(stop,by=c('from'='STATION_ID'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)%>%
mutate(status = 'Destination')%>%
rename(station = from)
ggplot() +
geom_sf(data = NJTracts, color = 'grey') +
geom_sf(data = rbind(delay_from_time,delay_to_time), aes(size =mean_delay, color = mean_delay)) +
scale_colour_viridis(direction = -1,discrete = FALSE, option = "D") +
labs(title = "Station Delay For One Month in Apr, 2020",
subtitle = "Week in a month: {current_frame}") +
facet_wrap(~status)+
transition_manual(week)+ mapTheme()+theme_minimal()
Resource Allocation: Increase operational resources at high-intersection stations, especially during weekends and peak commuting hours.
Infrastructure Upgrades: Enhance signaling and track capacity at major hubs to reduce congestion and improve overall reliability.
Data-Driven Planning: Use delay patterns to optimize scheduling and resource deployment, prioritizing bottleneck stations for improvement.
Overall, the spatial-temporal distribution of delay times is consistent with the findings of the temporal and spatial analyses conducted previously. At the same time, we found a spatial manifestation of the lag effect of delay times at the site level. This provides us with a choice of new independent variables for the subsequent construction of the predictive model.
From the exploratory analysis above, we identified lag effects in both spatial and temporal dimensions of delay times. Based on these findings, we introduced temporal and spatial lag variables in the real-world model to enhance its predictive capabilities. Temporal lag reflects the influence of delays occurring at a passenger’s stop prior to a certain time on the schedule they are traveling on. Spatial lag, on the other hand, represents the impact of delays at other stops earlier along the passenger’s trip on the delays at their current stop. To address our use case of predicting delays before passengers board, we chose a relatively short time lag of 15 minutes. This shorter interval allows us to incorporate more granular time lag variables into the model as the boarding time approaches, thereby improving the accuracy and practicality of our predictions.
The analysis shows that the correlation between delay times and temporal lag variables decreases as the lag interval increases. Shorter lags, such as 15 minutes, have the strongest correlation with current delays, while longer lags show weaker relationships. This indicates that recent delays are more predictive of current delays.
merged_dataset <- merged_dataset %>%
mutate(interval15 = floor_date(ymd_hms(scheduled_time), unit = "15 mins"))
merged_dataset <-
merged_dataset %>%
arrange(from_id, interval15) %>%
mutate(lag15min = dplyr::lag(delay_minutes,1),
lag30min = dplyr::lag(delay_minutes,2),
lag45min = dplyr::lag(delay_minutes,3),
lag1h = dplyr::lag(delay_minutes,4),
lag1h15min = dplyr::lag(delay_minutes,5),
lag1h30min = dplyr::lag(delay_minutes,6),
lag1h45min = dplyr::lag(delay_minutes,7),
lag2h = dplyr::lag(delay_minutes,8),
lag2h15min = dplyr::lag(delay_minutes,9),
lag2h30min = dplyr::lag(delay_minutes,10),
lag2h45min = dplyr::lag(delay_minutes,11),
lag3h = dplyr::lag(delay_minutes,12))
as.data.frame(merged_dataset) %>%
group_by(interval15) %>%
summarise_at(vars(starts_with("lag"), "delay_minutes"), mean, na.rm = TRUE) %>%
gather(Variable, Value, -interval15, -delay_minutes) %>%
mutate(Variable = factor(Variable, levels=c("lag15min","lag30min","lag45min","lag1h", "lag1h15min","lag1h30min","lag1h45min","lag2h","lag2h15min","lag2h30min","lag2h45min","lag3h")))%>%
group_by(Variable) %>%
summarize(correlation = round(cor(Value, delay_minutes),2))
## # A tibble: 12 × 2
## Variable correlation
## <fct> <dbl>
## 1 lag15min 0.55
## 2 lag30min 0.27
## 3 lag45min 0.33
## 4 lag1h 0.15
## 5 lag1h15min 0.21
## 6 lag1h30min 0.08
## 7 lag1h45min 0.14
## 8 lag2h 0.2
## 9 lag2h15min 0.19
## 10 lag2h30min 0.01
## 11 lag2h45min 0.15
## 12 lag3h 0.04
merged_dataset_station <-
merged_dataset %>%
arrange(train_id, interval15,stop_sequence) %>%
mutate(lagsstation = if_else(stop_sequence == 1, 0, lag(delay_minutes, 1)),
lags2station = if_else(stop_sequence == 1 | stop_sequence == 2, 0, lag(delay_minutes, 2)),
lags3station = if_else(stop_sequence == 1 | stop_sequence == 2| stop_sequence == 3, 0, lag(delay_minutes, 3)),
lags4station = if_else(stop_sequence == 1 | stop_sequence == 2| stop_sequence == 3| stop_sequence == 4, 0, lag(delay_minutes, 4)),
lags5station = if_else(stop_sequence == 1 | stop_sequence == 2| stop_sequence == 3| stop_sequence == 4| stop_sequence == 5, 0, lag(delay_minutes, 5)),
lags6station = if_else(stop_sequence == 1 | stop_sequence == 2| stop_sequence == 3| stop_sequence == 4| stop_sequence == 5| stop_sequence == 6, 0, lag(delay_minutes, 6)),
lags7station = if_else(stop_sequence == 1 | stop_sequence == 2| stop_sequence == 3| stop_sequence == 4| stop_sequence == 5 | stop_sequence ==6 | stop_sequence == 7, 0, lag(delay_minutes, 7)),
lags8station = if_else(stop_sequence == 1 | stop_sequence == 2| stop_sequence == 3| stop_sequence == 4| stop_sequence == 5| stop_sequence == 6| stop_sequence == 7| stop_sequence == 8 , 0, lag(delay_minutes, 8))
)
selected_columns <- merged_dataset_station[, c("delay_minutes", "lag15min","lag30min","lag45min","lag1h", "lag1h15min","lag1h30min","lag1h45min","lag2h","lag2h15min","lag2h30min","lag2h45min","lag3h","week")]
cor_delay_all_time <- cor(selected_columns, use = "complete.obs")["delay_minutes", -1]%>%
as.data.frame()%>%rename(cor_score = '.')
plotData.lag_time <-
filter(as.data.frame(merged_dataset_station), week == 16) %>%
dplyr::select(lag15min, lag30min, lag45min, lag1h,
lag1h15min, lag1h30min, lag1h45min, lag2h,
lag2h15min, lag2h30min, lag2h45min, lag3h, delay_minutes) %>%
filter(complete.cases(.)) %>%
gather(Variable, Value, -delay_minutes) %>%
mutate(Variable = fct_relevel(Variable, "lag15min", "lag30min", "lag45min", "lag1h",
"lag1h15min", "lag1h30min", "lag1h45min", "lag2h",
"lag2h15min", "lag2h30min", "lag2h45min", "lag3h"))
correlation.lag_time <-
group_by(plotData.lag_time, Variable) %>%
filter(!is.na(Value) & !is.na(delay_minutes)) %>%
summarize(correlation = round(cor(Value, delay_minutes, use = "complete.obs"), 2))
ggplot(plotData.lag_time, aes(Value, delay_minutes)) +
geom_point(size = 0.1) +
geom_text(data = correlation.lag_time, aes(label = paste("r =", round(correlation, 2))),
x = -Inf, y = Inf, vjust = 1.5, hjust = -0.1) +
geom_smooth(method = 'lm', se = FALSE, color = "#2a9d8f") +
facet_wrap(~Variable, ncol = 4, scales = 'free') +
labs(title = "Delay minute from previous time as a function of spatial lags",
subtitle = "One week in Apr, 2020") +
mapTheme() + theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
The scatterplots confirm these findings, with stronger correlations observed for shorter lags. To improve the model’s accuracy, shorter lag intervals, like 15 and 30 minutes, are prioritized for real-time delay predictions.
Incorporating spatial lag variables further enhances the model by accounting for delay propagation between stations, providing a more complete picture of delay dynamics and improving prediction reliability.
The analysis of spatial lag variables reveals that their correlation with delay times decreases as the number of lagged stations increases. This indicates that delays at stations closer in sequence have a stronger influence on current delays compared to those further away. However, even with increasing lagged stations, the overall correlation of spatial lags with delay times remains relatively high, suggesting that delays at upstream stations are significant predictors of downstream delays.
The visualization highlights these findings, showing scatterplots for each spatial lag variable across one week in April 2020. The correlation coefficients (e.g., r values) indicate that the impact of delays diminishes as the lagged station distance grows. Despite this decline, the trends remain positive, with notable influence extending up to eight stations.
This analysis supports the inclusion of spatial lag variables in the predictive model. The stronger correlations from nearby stations can help capture the immediate propagation of delays, while further lagged variables provide insights into broader delay patterns along the route. These findings enhance the model’s ability to predict delays by incorporating the spatial interdependencies of train operations.
In constructing the 30-minute, 60-minute, and 90-minute delay prediction models, we carefully selected variables from multiple dimensions to capture the most significant factors influencing delays. These variables reflect temporal, spatial, and operational characteristics, providing a comprehensive framework for understanding and predicting delays.
1. Temporal Variables
-Hour: Represents the time of day, capturing rush hour
patterns and off-peak effects.
- Lag Variables: These variables track delays at
earlier times, such as: - lag45min
, lag1h
, and
lag1h15min
(30-minute model focus) - lag2h
and
lag3h
(90-minute model focus)
Temporal lags allow the models to account for the propagation of delays
over time, with shorter lags being more relevant to immediate
predictions.
to_inter
and
from_inter
): Reflect the complexity of rail intersections,
which often act as bottlenecks and influence delay times.lags3station
,
lags5station
, lags6station
): Measure the
impact of delays at upstream stations on downstream delays, highlighting
the role of spatial propagation.lag45min
and lags3station
, along with
immediate operational and spatial factors. This model is designed for
short-term predictions close to boarding times.lag1h30min
, while expanding the scope of spatial
lags to include up to 5 upstream stations
(lags5station
).lag3h
and lags6station
,
to predict delays further away from the expected boarding time.merged_dataset_model <- merged_dataset_station %>%
filter(line != 'Atl. City Line') %>%
mutate(
hour = hour(interval60),
from = as.factor(from),
to = as.factor(to),
line = as.factor(line)
) %>%
select(-weekend)
factor_levels <- sapply(merged_dataset_model, function(x) if (is.factor(x)) length(levels(x)) else NA)
problematic_factors <- names(factor_levels[factor_levels <= 1 & !is.na(factor_levels)])
if (length(problematic_factors) > 0) {
cat("The following factor variables have only one level and will be removed from the dataset: ", problematic_factors, "\n")
merged_dataset_model <- merged_dataset_model %>% select(-all_of(problematic_factors))
}
delay.Train <- filter(merged_dataset_model, week <= 14)
delay.Test <- filter(merged_dataset_model, week > 14)
reg.30 <- lm(
delay_minutes ~ from + to + hour + Temperature + Precipitation + Wind_Speed + lag45min +
lag1h + lag1h15min + lag1h30min + lag1h45min + lag2h + lag2h15min + lag2h30min +
lag2h45min + lag3h + lags3station + lags4station + lags5station + lags6station +
stop_sequence + line + to_inter + from_inter,
data = delay.Train
)
reg.60 <- lm(
delay_minutes ~ from + to + hour + Temperature + Precipitation + Wind_Speed +
lag1h + lag1h15min + lag1h30min + lag1h45min + lag2h + lag2h15min + lag2h30min +
lag2h45min + lag3h + lags5station + lags6station + stop_sequence + line + to_inter + from_inter,
data = delay.Train
)
reg.90 <- lm(
delay_minutes ~ from + to + hour + Temperature + Precipitation + Wind_Speed +
lag1h30min + lag1h45min + lag2h + lag2h15min + lag2h30min + lag2h45min + lag3h +
lags6station + stop_sequence + line + to_inter + from_inter,
data = delay.Train
)
summary(reg.30)
##
## Call:
## lm(formula = delay_minutes ~ from + to + hour + Temperature +
## Precipitation + Wind_Speed + lag45min + lag1h + lag1h15min +
## lag1h30min + lag1h45min + lag2h + lag2h15min + lag2h30min +
## lag2h45min + lag3h + lags3station + lags4station + lags5station +
## lags6station + stop_sequence + line + to_inter + from_inter,
## data = delay.Train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.736 -0.932 -0.131 0.685 145.764
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.789e+01 1.130e+00 15.824 < 2e-16 ***
## fromAllendale -9.186e+00 1.134e+00 -8.102 5.59e-16 ***
## fromAllenhurst 2.471e+00 1.066e+00 2.317 0.020504 *
## fromAnderson Street -9.006e+00 1.586e+00 -5.678 1.38e-08 ***
## fromAsbury Park 1.230e+00 1.123e+00 1.095 0.273664
## fromAvenel 3.452e+00 6.963e-01 4.958 7.17e-07 ***
## fromBasking Ridge -6.690e+00 1.509e+00 -4.433 9.34e-06 ***
## fromBay Head 7.219e+00 1.382e+00 5.224 1.76e-07 ***
## fromBay Street -1.303e+01 1.140e+00 -11.424 < 2e-16 ***
## fromBelmar 3.257e+00 1.249e+00 2.608 0.009118 **
## fromBerkeley Heights -2.081e+00 1.344e+00 -1.549 0.121436
## fromBernardsville -4.365e+00 1.532e+00 -2.850 0.004380 **
## fromBloomfield -1.900e+01 1.075e+00 -17.672 < 2e-16 ***
## fromBoonton -1.220e+00 1.846e+00 -0.661 0.508547
## fromBound Brook 3.400e-01 1.292e+00 0.263 0.792437
## fromBradley Beach 1.253e+00 1.207e+00 1.038 0.299234
## fromBrick Church -3.208e+00 9.604e-01 -3.340 0.000839 ***
## fromBridgewater 3.717e+00 1.310e+00 2.838 0.004538 **
## fromBroadway Fair Lawn -1.098e+01 1.171e+00 -9.373 < 2e-16 ***
## fromCampbell Hall -8.461e+00 1.612e+00 -5.250 1.53e-07 ***
## fromChatham -1.686e+00 1.167e+00 -1.444 0.148726
## fromClifton -7.950e+00 1.195e+00 -6.655 2.88e-11 ***
## fromConvent Station -1.621e+00 1.226e+00 -1.323 0.185967
## fromCranford 1.132e+00 1.051e+00 1.077 0.281348
## fromDelawanna -8.529e+00 1.151e+00 -7.411 1.28e-13 ***
## fromDenville -2.057e-01 1.286e+00 -0.160 0.872945
## fromDover 2.735e-01 1.336e+00 0.205 0.837715
## fromDunellen 1.504e+00 1.269e+00 1.185 0.236021
## fromEast Orange -3.584e+00 9.282e-01 -3.861 0.000113 ***
## fromEdison 3.169e+00 8.849e-01 3.582 0.000342 ***
## fromElberon 3.550e+00 9.342e-01 3.801 0.000145 ***
## fromElizabeth 3.235e+00 8.187e-01 3.952 7.77e-05 ***
## fromEmerson -1.113e+01 1.780e+00 -6.251 4.13e-10 ***
## fromEssex Street -1.264e+01 1.421e+00 -8.899 < 2e-16 ***
## fromFanwood -9.661e-01 1.155e+00 -0.837 0.402735
## fromFar Hills -4.265e+00 1.534e+00 -2.780 0.005432 **
## fromGarfield -9.652e+00 1.157e+00 -8.344 < 2e-16 ***
## fromGarwood 1.311e+00 1.103e+00 1.188 0.234873
## fromGillette -5.447e+00 1.381e+00 -3.945 8.01e-05 ***
## fromGladstone -2.549e+00 1.550e+00 -1.645 0.100012
## fromGlen Ridge -1.542e+01 1.116e+00 -13.818 < 2e-16 ***
## fromGlen Rock Boro Hall -1.027e+01 1.147e+00 -8.957 < 2e-16 ***
## fromGlen Rock Main Line -1.060e+01 1.158e+00 -9.152 < 2e-16 ***
## fromHamilton 2.742e+00 9.580e-01 2.862 0.004210 **
## fromHarriman -7.030e+00 1.480e+00 -4.752 2.02e-06 ***
## fromHawthorne -8.828e+00 1.183e+00 -7.464 8.61e-14 ***
## fromHazlet -5.446e-02 9.227e-01 -0.059 0.952930
## fromHighland Avenue -2.581e+00 1.022e+00 -2.526 0.011531 *
## fromHillsdale -1.033e+01 1.700e+00 -6.076 1.25e-09 ***
## fromHoboken -6.683e+00 9.606e-01 -6.958 3.53e-12 ***
## fromJersey Avenue -3.956e-01 1.875e+00 -0.211 0.832874
## fromKingsland -8.235e+00 1.051e+00 -7.836 4.78e-15 ***
## fromLake Hopatcong -1.739e+01 1.514e+00 -11.485 < 2e-16 ***
## fromLincoln Park -2.997e+00 2.077e+00 -1.443 0.148999
## fromLinden 3.189e+00 7.730e-01 4.126 3.71e-05 ***
## fromLittle Falls -9.882e+00 2.031e+00 -4.867 1.14e-06 ***
## fromLittle Silver 3.734e+00 7.118e-01 5.246 1.56e-07 ***
## fromLong Branch 5.671e+00 8.590e-01 6.602 4.11e-11 ***
## fromLyndhurst -1.067e+01 1.131e+00 -9.437 < 2e-16 ***
## fromLyons -4.731e+00 1.499e+00 -3.156 0.001600 **
## fromMadison -1.322e+00 1.198e+00 -1.103 0.269884
## fromMahwah -8.580e+00 1.157e+00 -7.419 1.21e-13 ***
## fromManasquan 3.372e+00 1.332e+00 2.532 0.011345 *
## fromMaplewood -2.858e+00 1.075e+00 -2.658 0.007861 **
## fromMetropark 3.450e+00 8.074e-01 4.273 1.93e-05 ***
## fromMetuchen 1.625e+00 8.729e-01 1.862 0.062653 .
## fromMiddletown NJ -1.092e-01 5.141e-01 -0.212 0.831752
## fromMiddletown NY -1.133e+01 1.677e+00 -6.757 1.44e-11 ***
## fromMillburn -2.213e+00 1.098e+00 -2.016 0.043828 *
## fromMillington -7.880e+00 1.460e+00 -5.396 6.85e-08 ***
## fromMontclair Heights -1.201e+01 1.874e+00 -6.409 1.48e-10 ***
## fromMontclair State U -6.237e+00 1.866e+00 -3.343 0.000829 ***
## fromMontvale -1.160e+01 1.657e+00 -7.001 2.59e-12 ***
## fromMorris Plains -5.761e-01 1.260e+00 -0.457 0.647547
## fromMorristown -1.890e+00 1.246e+00 -1.517 0.129260
## fromMount Tabor -1.627e-01 1.308e+00 -0.124 0.901045
## fromMountain Avenue -8.276e+00 1.762e+00 -4.696 2.67e-06 ***
## fromMountain Lakes -1.486e+01 1.658e+00 -8.963 < 2e-16 ***
## fromMountain Station -2.866e+00 1.032e+00 -2.776 0.005503 **
## fromMountain View -1.163e+01 2.124e+00 -5.478 4.34e-08 ***
## fromMurray Hill -3.477e+00 1.268e+00 -2.743 0.006096 **
## fromNanuet -8.888e+00 1.650e+00 -5.387 7.21e-08 ***
## fromNetherwood 2.127e-01 1.203e+00 0.177 0.859585
## fromNew Bridge Landing -9.868e+00 1.660e+00 -5.946 2.77e-09 ***
## fromNew Brunswick 1.566e+00 9.250e-01 1.693 0.090424 .
## fromNew Providence -1.238e+00 1.213e+00 -1.020 0.307538
## fromNew York Penn Station 6.182e+00 8.790e-01 7.033 2.06e-12 ***
## fromNewark Airport 1.636e+00 8.160e-01 2.005 0.044972 *
## fromNewark Broad Street -1.386e+00 8.986e-01 -1.542 0.123088
## fromNewark Penn Station 4.812e+00 8.339e-01 5.770 7.99e-09 ***
## fromNorth Branch 4.665e+00 4.488e+00 1.039 0.298651
## fromNorth Elizabeth 2.980e+00 8.238e-01 3.617 0.000298 ***
## fromOradell -1.120e+01 1.754e+00 -6.384 1.75e-10 ***
## fromOrange -3.426e+00 9.811e-01 -3.492 0.000480 ***
## fromOtisville -7.662e+00 1.696e+00 -4.519 6.25e-06 ***
## fromPark Ridge -1.062e+01 1.628e+00 -6.526 6.87e-11 ***
## fromPassaic -1.041e+01 1.197e+00 -8.693 < 2e-16 ***
## fromPaterson -1.050e+01 1.207e+00 -8.701 < 2e-16 ***
## fromPeapack -2.773e+00 1.543e+00 -1.797 0.072340 .
## fromPearl River -1.043e+01 1.496e+00 -6.972 3.18e-12 ***
## fromPerth Amboy 2.276e+00 5.110e-01 4.454 8.48e-06 ***
## fromPlainfield -1.151e+00 1.240e+00 -0.928 0.353403
## fromPlauderville -1.115e+01 1.190e+00 -9.371 < 2e-16 ***
## fromPoint Pleasant Beach 4.199e+00 1.356e+00 3.097 0.001957 **
## fromPort Jervis -9.199e+00 1.748e+00 -5.263 1.42e-07 ***
## fromPrinceton -1.637e+01 1.192e+00 -13.736 < 2e-16 ***
## fromPrinceton Junction 3.384e+00 9.391e-01 3.603 0.000315 ***
## fromRadburn Fair Lawn -1.124e+01 1.191e+00 -9.442 < 2e-16 ***
## fromRahway 2.877e+00 8.120e-01 3.543 0.000397 ***
## fromRamsey Main St -9.599e+00 1.152e+00 -8.332 < 2e-16 ***
## fromRamsey Route 17 -7.823e+00 1.143e+00 -6.845 7.77e-12 ***
## fromRaritan 5.664e+00 1.333e+00 4.250 2.14e-05 ***
## fromRed Bank 3.300e+00 8.900e-01 3.708 0.000209 ***
## fromRidgewood -9.108e+00 1.111e+00 -8.202 2.45e-16 ***
## fromRiver Edge -1.043e+01 1.744e+00 -5.979 2.27e-09 ***
## fromRoselle Park 3.699e+00 9.727e-01 3.802 0.000144 ***
## fromRutherford -7.933e+00 1.068e+00 -7.425 1.15e-13 ***
## fromSalisbury Mills-Cornwall -1.115e+01 1.569e+00 -7.106 1.22e-12 ***
## fromSecaucus Lower Lvl -8.608e+00 1.005e+00 -8.569 < 2e-16 ***
## fromSecaucus Upper Lvl 1.894e+00 8.491e-01 2.230 0.025738 *
## fromShort Hills -2.432e+00 1.117e+00 -2.177 0.029453 *
## fromSloatsburg -8.277e+00 1.277e+00 -6.479 9.37e-11 ***
## fromSomerville 3.315e+00 1.320e+00 2.512 0.012023 *
## fromSouth Amboy 1.574e+00 9.426e-01 1.670 0.094935 .
## fromSouth Orange -2.257e+00 1.044e+00 -2.161 0.030669 *
## fromSpring Lake 2.383e+00 1.300e+00 1.833 0.066868 .
## fromSpring Valley -7.683e+00 1.648e+00 -4.663 3.13e-06 ***
## fromStirling -3.980e+00 1.438e+00 -2.768 0.005643 **
## fromSuffern -8.707e+00 1.158e+00 -7.517 5.74e-14 ***
## fromSummit -1.305e+00 1.116e+00 -1.169 0.242375
## fromTeterboro -1.363e+01 1.705e+00 -7.995 1.34e-15 ***
## fromTowaco -1.427e+01 2.012e+00 -7.090 1.36e-12 ***
## fromTrenton 4.528e+00 9.832e-01 4.606 4.13e-06 ***
## fromTuxedo -1.083e+01 1.406e+00 -7.708 1.31e-14 ***
## fromUnion 4.104e+00 9.728e-01 4.219 2.46e-05 ***
## fromUpper Montclair -1.397e+01 1.675e+00 -8.338 < 2e-16 ***
## fromWaldwick -9.273e+00 1.103e+00 -8.404 < 2e-16 ***
## fromWalnut Street -1.556e+01 1.360e+00 -11.444 < 2e-16 ***
## fromWatchung Avenue -1.041e+01 1.541e+00 -6.753 1.47e-11 ***
## fromWatsessing Avenue -1.620e+01 1.019e+00 -15.896 < 2e-16 ***
## fromWayne-Route 23 -5.732e+00 2.082e+00 -2.753 0.005904 **
## fromWesmont -1.075e+01 1.132e+00 -9.496 < 2e-16 ***
## fromWestfield 2.902e-01 1.101e+00 0.263 0.792185
## fromWestwood -9.593e+00 1.729e+00 -5.549 2.90e-08 ***
## fromWood Ridge -1.168e+01 1.281e+00 -9.119 < 2e-16 ***
## fromWoodbridge 2.172e+00 9.424e-01 2.305 0.021189 *
## fromWoodcliff Lake -1.013e+01 1.943e+00 -5.214 1.86e-07 ***
## toAllendale -7.679e+00 1.271e+00 -6.042 1.54e-09 ***
## toAllenhurst -8.798e-01 1.039e+00 -0.847 0.397229
## toAnderson Street -9.655e+00 1.652e+00 -5.844 5.13e-09 ***
## toAsbury Park 3.805e-01 1.112e+00 0.342 0.732131
## toBasking Ridge 3.348e+00 1.634e+00 2.049 0.040491 *
## toBay Head -6.054e+00 1.351e+00 -4.482 7.43e-06 ***
## toBay Street -5.242e+00 1.266e+00 -4.140 3.49e-05 ***
## toBelmar -1.530e+00 1.235e+00 -1.239 0.215504
## toBerkeley Heights 1.821e+00 1.469e+00 1.240 0.215070
## toBernardsville 4.180e+00 1.641e+00 2.547 0.010868 *
## toBloomfield -1.562e+00 1.198e+00 -1.305 0.192049
## toBoonton -3.504e+00 1.931e+00 -1.815 0.069601 .
## toBound Brook 1.051e+00 1.422e+00 0.739 0.459723
## toBradley Beach -1.791e+00 1.190e+00 -1.505 0.132455
## toBrick Church 9.493e-01 1.111e+00 0.854 0.393038
## toBridgewater 3.308e+00 1.430e+00 2.313 0.020706 *
## toBroadway Fair Lawn -7.013e+00 1.332e+00 -5.266 1.40e-07 ***
## toCampbell Hall -4.684e+00 1.714e+00 -2.734 0.006263 **
## toChatham -1.677e+00 1.296e+00 -1.294 0.195715
## toClifton -6.842e+00 1.328e+00 -5.150 2.61e-07 ***
## toConvent Station -7.877e-01 1.349e+00 -0.584 0.559420
## toCranford 1.666e+00 1.194e+00 1.396 0.162841
## toDelawanna -6.179e+00 1.293e+00 -4.780 1.76e-06 ***
## toDenville -2.777e+00 1.404e+00 -1.978 0.047947 *
## toDover -3.556e+00 1.420e+00 -2.504 0.012272 *
## toDunellen 3.451e+00 1.393e+00 2.478 0.013232 *
## toEast Orange 3.678e-01 1.088e+00 0.338 0.735214
## toEdison -8.680e-01 1.067e+00 -0.813 0.416168
## toElberon -2.471e+00 9.224e-01 -2.678 0.007399 **
## toElizabeth -1.853e+00 9.680e-01 -1.914 0.055602 .
## toEmerson -1.140e+01 1.851e+00 -6.158 7.47e-10 ***
## toEssex Street -9.403e+00 1.516e+00 -6.201 5.67e-10 ***
## toFanwood 2.600e+00 1.307e+00 1.990 0.046647 *
## toFar Hills 2.604e+00 1.651e+00 1.577 0.114783
## toGarfield -6.605e+00 1.305e+00 -5.062 4.17e-07 ***
## toGarwood 2.043e+00 1.394e+00 1.466 0.142673
## toGillette 1.193e+00 1.519e+00 0.785 0.432287
## toGladstone -6.362e-02 1.658e+00 -0.038 0.969394
## toGlen Ridge -4.215e+00 1.256e+00 -3.357 0.000789 ***
## toGlen Rock Boro Hall -7.004e+00 1.303e+00 -5.375 7.73e-08 ***
## toGlen Rock Main Line -8.341e+00 1.288e+00 -6.477 9.52e-11 ***
## toHamilton -3.267e+00 1.102e+00 -2.965 0.003026 **
## toHarriman -6.498e+00 1.593e+00 -4.080 4.52e-05 ***
## toHawthorne -7.108e+00 1.310e+00 -5.424 5.86e-08 ***
## toHazlet 1.088e+00 9.164e-01 1.188 0.234950
## toHighland Avenue -8.957e-02 1.175e+00 -0.076 0.939215
## toHillsdale -1.172e+01 1.776e+00 -6.598 4.22e-11 ***
## toHoboken -9.724e+00 1.097e+00 -8.866 < 2e-16 ***
## toJersey Avenue -3.536e+00 2.403e+00 -1.471 0.141223
## toKingsland -6.579e+00 1.203e+00 -5.469 4.57e-08 ***
## toLake Hopatcong NA NA NA NA
## toLincoln Park -4.835e+00 2.155e+00 -2.244 0.024846 *
## toLinden -2.275e+00 1.001e+00 -2.272 0.023111 *
## toLittle Falls -1.117e+01 2.085e+00 -5.357 8.53e-08 ***
## toLittle Silver -2.935e+00 7.191e-01 -4.081 4.49e-05 ***
## toLong Branch -5.059e+00 8.069e-01 -6.270 3.65e-10 ***
## toLyndhurst -8.766e+00 1.259e+00 -6.962 3.42e-12 ***
## toLyons 3.365e+00 1.612e+00 2.087 0.036869 *
## toMadison -1.088e+00 1.330e+00 -0.818 0.413370
## toMahwah -8.146e+00 1.282e+00 -6.354 2.12e-10 ***
## toManasquan -1.829e+00 1.312e+00 -1.394 0.163220
## toMaplewood -3.028e-01 1.222e+00 -0.248 0.804347
## toMetropark -1.498e+00 1.016e+00 -1.474 0.140472
## toMetuchen -2.888e+00 1.011e+00 -2.856 0.004294 **
## toMiddletown NJ -3.370e-02 5.160e-01 -0.065 0.947927
## toMiddletown NY -7.615e+00 1.743e+00 -4.370 1.25e-05 ***
## toMillburn -3.102e-01 1.246e+00 -0.249 0.803372
## toMillington 1.623e+00 1.592e+00 1.020 0.307919
## toMontclair Heights -1.135e+01 1.967e+00 -5.771 7.95e-09 ***
## toMontclair State U -1.037e+01 1.979e+00 -5.241 1.61e-07 ***
## toMontvale -1.083e+01 1.687e+00 -6.419 1.39e-10 ***
## toMorris Plains -8.864e-01 1.376e+00 -0.644 0.519350
## toMorristown -1.099e+00 1.365e+00 -0.805 0.420810
## toMount Tabor -1.598e+00 1.412e+00 -1.132 0.257804
## toMountain Avenue -5.129e+00 1.979e+00 -2.591 0.009565 **
## toMountain Lakes -1.701e+01 1.740e+00 -9.780 < 2e-16 ***
## toMountain Station -6.433e-01 1.186e+00 -0.542 0.587589
## toMountain View -1.299e+01 2.186e+00 -5.940 2.89e-09 ***
## toMurray Hill -1.290e+00 1.411e+00 -0.914 0.360720
## toNanuet -1.050e+01 1.665e+00 -6.306 2.90e-10 ***
## toNetherwood 3.987e+00 1.329e+00 3.000 0.002699 **
## toNew Bridge Landing -1.174e+01 1.754e+00 -6.696 2.17e-11 ***
## toNew Brunswick -3.305e+00 1.065e+00 -3.104 0.001908 **
## toNew Providence -4.330e-01 1.347e+00 -0.321 0.747951
## toNew York Penn Station -4.158e+00 1.019e+00 -4.080 4.52e-05 ***
## toNewark Airport -2.298e+00 9.830e-01 -2.338 0.019405 *
## toNewark Broad Street 1.382e+00 1.056e+00 1.309 0.190431
## toNewark Penn Station -8.608e-01 1.001e+00 -0.860 0.389804
## toNorth Elizabeth -1.445e+00 1.021e+00 -1.415 0.157075
## toOradell -1.055e+01 1.858e+00 -5.678 1.38e-08 ***
## toOrange -1.261e-01 1.137e+00 -0.111 0.911732
## toOtisville -6.460e+00 1.790e+00 -3.609 0.000308 ***
## toPark Ridge -1.083e+01 1.764e+00 -6.138 8.45e-10 ***
## toPassaic -8.992e+00 1.321e+00 -6.810 9.95e-12 ***
## toPaterson -9.147e+00 1.331e+00 -6.872 6.46e-12 ***
## toPeapack 1.354e+00 1.648e+00 0.822 0.411208
## toPearl River -9.784e+00 1.704e+00 -5.741 9.50e-09 ***
## toPerth Amboy -8.176e-01 5.117e-01 -1.598 0.110107
## toPlainfield 1.244e+00 1.378e+00 0.903 0.366703
## toPlauderville -7.861e+00 1.287e+00 -6.106 1.03e-09 ***
## toPoint Pleasant Beach -4.197e+00 1.343e+00 -3.125 0.001778 **
## toPort Jervis -7.993e+00 1.798e+00 -4.445 8.83e-06 ***
## toPrinceton -1.041e+00 1.256e+00 -0.829 0.407372
## toPrinceton Junction -1.086e+00 1.097e+00 -0.990 0.322131
## toRadburn Fair Lawn -6.976e+00 1.302e+00 -5.359 8.43e-08 ***
## toRahway -3.108e+00 9.247e-01 -3.361 0.000778 ***
## toRamsey Main St -8.123e+00 1.266e+00 -6.418 1.40e-10 ***
## toRamsey Route 17 -7.414e+00 1.276e+00 -5.812 6.23e-09 ***
## toRaritan -7.627e-01 1.447e+00 -0.527 0.598141
## toRed Bank -3.855e-01 8.666e-01 -0.445 0.656434
## toRidgewood -7.372e+00 1.236e+00 -5.963 2.51e-09 ***
## toRiver Edge -1.080e+01 1.813e+00 -5.955 2.63e-09 ***
## toRoselle Park 2.374e+00 1.128e+00 2.105 0.035323 *
## toRutherford -6.622e+00 1.203e+00 -5.503 3.76e-08 ***
## toSalisbury Mills-Cornwall -8.622e+00 1.648e+00 -5.232 1.69e-07 ***
## toSecaucus Lower Lvl -8.973e+00 1.147e+00 -7.821 5.39e-15 ***
## toSecaucus Upper Lvl -3.978e+00 1.013e+00 -3.927 8.64e-05 ***
## toShort Hills -1.412e+00 1.251e+00 -1.128 0.259207
## toSloatsburg -7.192e+00 1.410e+00 -5.100 3.41e-07 ***
## toSomerville -1.931e-01 1.445e+00 -0.134 0.893682
## toSouth Amboy -2.617e-01 9.445e-01 -0.277 0.781713
## toSouth Orange -3.972e-02 1.190e+00 -0.033 0.973379
## toSpring Lake -2.406e+00 1.287e+00 -1.870 0.061490 .
## toSpring Valley -1.214e+01 1.746e+00 -6.958 3.52e-12 ***
## toStirling 3.938e+00 1.556e+00 2.530 0.011401 *
## toSuffern -8.641e+00 1.273e+00 -6.788 1.16e-11 ***
## toSummit -1.825e+00 1.266e+00 -1.442 0.149402
## toTeterboro -9.329e+00 1.780e+00 -5.240 1.61e-07 ***
## toTowaco -1.577e+01 2.085e+00 -7.562 4.06e-14 ***
## toTrenton -2.833e+00 1.112e+00 -2.548 0.010842 *
## toTuxedo -9.292e+00 1.499e+00 -6.198 5.78e-10 ***
## toUnion 2.082e-01 1.126e+00 0.185 0.853306
## toUpper Montclair -1.048e+01 1.795e+00 -5.837 5.36e-09 ***
## toWaldwick -8.089e+00 1.246e+00 -6.493 8.55e-11 ***
## toWalnut Street -9.983e+00 1.530e+00 -6.525 6.89e-11 ***
## toWatchung Avenue -6.179e+00 1.668e+00 -3.704 0.000212 ***
## toWatsessing Avenue 2.105e+00 1.166e+00 1.805 0.071030 .
## toWayne-Route 23 -7.043e+00 2.169e+00 -3.247 0.001169 **
## toWesmont -8.712e+00 1.270e+00 -6.862 6.91e-12 ***
## toWestfield 3.407e+00 1.232e+00 2.766 0.005680 **
## toWestwood -1.042e+01 1.849e+00 -5.635 1.76e-08 ***
## toWood Ridge -4.828e+00 1.347e+00 -3.585 0.000338 ***
## toWoodbridge -1.568e+00 9.464e-01 -1.657 0.097495 .
## toWoodcliff Lake -1.061e+01 2.022e+00 -5.250 1.53e-07 ***
## hour -7.612e-03 5.549e-03 -1.372 0.170116
## Temperature 3.519e-03 4.435e-03 0.793 0.427518
## Precipitation -8.299e-01 8.247e-01 -1.006 0.314298
## Wind_Speed 1.006e-02 4.363e-03 2.306 0.021119 *
## lag45min 3.735e-03 3.032e-03 1.232 0.217983
## lag1h 9.626e-02 2.944e-03 32.699 < 2e-16 ***
## lag1h15min -9.367e-03 2.945e-03 -3.181 0.001469 **
## lag1h30min -1.348e-03 2.927e-03 -0.461 0.645006
## lag1h45min -1.216e-02 2.922e-03 -4.161 3.18e-05 ***
## lag2h 4.375e-04 2.954e-03 0.148 0.882263
## lag2h15min 1.358e-02 2.902e-03 4.678 2.91e-06 ***
## lag2h30min 3.061e-02 2.853e-03 10.729 < 2e-16 ***
## lag2h45min -1.150e-02 2.927e-03 -3.927 8.60e-05 ***
## lag3h -5.684e-03 2.864e-03 -1.985 0.047187 *
## lags3station 9.353e-01 9.187e-03 101.808 < 2e-16 ***
## lags4station 7.218e-02 1.244e-02 5.804 6.54e-09 ***
## lags5station -1.354e-02 1.254e-02 -1.080 0.280048
## lags6station 8.888e-03 1.048e-02 0.848 0.396465
## stop_sequence -6.320e-02 1.071e-02 -5.903 3.61e-09 ***
## lineGladstone Branch -1.516e+01 6.170e-01 -24.576 < 2e-16 ***
## lineMain Line -4.938e-01 1.707e-01 -2.893 0.003817 **
## lineMontclair-Boonton 1.781e-01 4.512e-01 0.395 0.693104
## lineMorristown Line -1.491e+01 5.399e-01 -27.623 < 2e-16 ***
## lineNo Jersey Coast -1.838e+01 5.771e-01 -31.851 < 2e-16 ***
## lineNortheast Corrdr -1.824e+01 5.757e-01 -31.675 < 2e-16 ***
## linePascack Valley 3.344e+00 4.370e-01 7.653 2.02e-14 ***
## linePrinceton Shuttle NA NA NA NA
## lineRaritan Valley -2.127e+01 7.820e-01 -27.194 < 2e-16 ***
## to_inter NA NA NA NA
## from_inter NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.287 on 32560 degrees of freedom
## Multiple R-squared: 0.7663, Adjusted R-squared: 0.764
## F-statistic: 337.8 on 316 and 32560 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = delay_minutes ~ from + to + hour + Temperature +
## Precipitation + Wind_Speed + lag1h + lag1h15min + lag1h30min +
## lag1h45min + lag2h + lag2h15min + lag2h30min + lag2h45min +
## lag3h + lags5station + lags6station + stop_sequence + line +
## to_inter + from_inter, data = delay.Train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.699 -1.153 -0.220 0.758 144.660
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.079905 1.446424 13.882 < 2e-16 ***
## fromAllendale -6.896173 1.450683 -4.754 2.01e-06 ***
## fromAllenhurst 4.773301 1.364666 3.498 0.000470 ***
## fromAnderson Street -10.999039 2.029027 -5.421 5.97e-08 ***
## fromAsbury Park 2.296422 1.437293 1.598 0.110110
## fromAvenel 4.169821 0.891263 4.679 2.90e-06 ***
## fromBasking Ridge -7.623134 1.931144 -3.947 7.91e-05 ***
## fromBay Head 7.699917 1.768764 4.353 1.35e-05 ***
## fromBay Street -30.193569 1.450226 -20.820 < 2e-16 ***
## fromBelmar 3.008033 1.598782 1.881 0.059919 .
## fromBerkeley Heights -3.103539 1.719832 -1.805 0.071153 .
## fromBernardsville -4.881400 1.960426 -2.490 0.012780 *
## fromBloomfield -16.770626 1.375944 -12.188 < 2e-16 ***
## fromBoonton -5.785834 2.355741 -2.456 0.014052 *
## fromBound Brook 1.831788 1.653890 1.108 0.268059
## fromBradley Beach 2.767092 1.544715 1.791 0.073250 .
## fromBrick Church -1.591326 1.229045 -1.295 0.195410
## fromBridgewater 5.054644 1.676100 3.016 0.002566 **
## fromBroadway Fair Lawn -11.286927 1.498947 -7.530 5.21e-14 ***
## fromCampbell Hall -7.234870 2.062762 -3.507 0.000453 ***
## fromChatham -2.727113 1.493438 -1.826 0.067850 .
## fromClifton -8.107242 1.528867 -5.303 1.15e-07 ***
## fromConvent Station -1.383942 1.567904 -0.883 0.377421
## fromCranford 2.189223 1.345461 1.627 0.103722
## fromDelawanna -8.192027 1.472678 -5.563 2.68e-08 ***
## fromDenville -0.476940 1.643945 -0.290 0.771727
## fromDover -0.399091 1.706852 -0.234 0.815129
## fromDunellen 0.573791 1.624457 0.353 0.723926
## fromEast Orange -2.025540 1.187595 -1.706 0.088096 .
## fromEdison 3.285224 1.132327 2.901 0.003719 **
## fromElberon 3.502837 1.195588 2.930 0.003394 **
## fromElizabeth 3.894394 1.047910 3.716 0.000202 ***
## fromEmerson -11.751674 2.277397 -5.160 2.48e-07 ***
## fromEssex Street -13.592889 1.818849 -7.473 8.01e-14 ***
## fromFanwood -2.046293 1.477767 -1.385 0.166148
## fromFar Hills -3.660431 1.962876 -1.865 0.062214 .
## fromGarfield -10.014230 1.480288 -6.765 1.36e-11 ***
## fromGarwood 2.543852 1.412254 1.801 0.071669 .
## fromGillette -7.005194 1.766929 -3.965 7.37e-05 ***
## fromGladstone -4.280606 1.982709 -2.159 0.030860 *
## fromGlen Ridge -29.016577 1.422521 -20.398 < 2e-16 ***
## fromGlen Rock Boro Hall -9.570563 1.467421 -6.522 7.04e-11 ***
## fromGlen Rock Main Line -9.651659 1.481540 -6.515 7.39e-11 ***
## fromHamilton 3.948007 1.226009 3.220 0.001282 **
## fromHarriman -6.791314 1.893778 -3.586 0.000336 ***
## fromHawthorne -9.016992 1.513686 -5.957 2.60e-09 ***
## fromHazlet 1.244995 1.180907 1.054 0.291767
## fromHighland Avenue -3.331258 1.307309 -2.548 0.010833 *
## fromHillsdale -12.896299 2.175019 -5.929 3.07e-09 ***
## fromHoboken -6.740427 1.228947 -5.485 4.17e-08 ***
## fromJersey Avenue 1.108862 2.399614 0.462 0.644013
## fromKingsland -7.508281 1.344471 -5.585 2.36e-08 ***
## fromLake Hopatcong -20.326584 1.937361 -10.492 < 2e-16 ***
## fromLincoln Park -10.437196 2.645010 -3.946 7.96e-05 ***
## fromLinden 4.207003 0.989296 4.253 2.12e-05 ***
## fromLittle Falls -17.204743 2.575486 -6.680 2.42e-11 ***
## fromLittle Silver 3.374236 0.910765 3.705 0.000212 ***
## fromLong Branch 5.395900 1.099247 4.909 9.21e-07 ***
## fromLyndhurst -8.049364 1.447328 -5.562 2.70e-08 ***
## fromLyons -5.999410 1.918098 -3.128 0.001763 **
## fromMadison -2.115844 1.532380 -1.381 0.167363
## fromMahwah -7.553852 1.479970 -5.104 3.34e-07 ***
## fromManasquan 3.917177 1.704319 2.298 0.021546 *
## fromMaplewood -3.273603 1.375614 -2.380 0.017331 *
## fromMetropark 3.987350 1.033388 3.859 0.000114 ***
## fromMetuchen 2.645507 1.117108 2.368 0.017882 *
## fromMiddletown NJ 1.209214 0.657617 1.839 0.065956 .
## fromMiddletown NY -9.874067 2.146694 -4.600 4.25e-06 ***
## fromMillburn -2.696032 1.404855 -1.919 0.054983 .
## fromMillington -8.733550 1.867855 -4.676 2.94e-06 ***
## fromMontclair Heights -20.870639 2.391489 -8.727 < 2e-16 ***
## fromMontclair State U -17.807459 2.349290 -7.580 3.55e-14 ***
## fromMontvale -10.111155 2.119774 -4.770 1.85e-06 ***
## fromMorris Plains -0.132204 1.610895 -0.082 0.934592
## fromMorristown -0.524405 1.592855 -0.329 0.741989
## fromMount Tabor -0.168233 1.672098 -0.101 0.919859
## fromMountain Avenue -20.845337 2.240156 -9.305 < 2e-16 ***
## fromMountain Lakes -16.751614 2.118018 -7.909 2.67e-15 ***
## fromMountain Station -3.824236 1.321203 -2.895 0.003800 **
## fromMountain View -17.596273 2.705090 -6.505 7.89e-11 ***
## fromMurray Hill -4.591539 1.622285 -2.830 0.004653 **
## fromNanuet -8.554847 2.111041 -4.052 5.08e-05 ***
## fromNetherwood -1.914095 1.539222 -1.244 0.213675
## fromNew Bridge Landing -12.868358 2.124099 -6.058 1.39e-09 ***
## fromNew Brunswick 3.667159 1.183762 3.098 0.001951 **
## fromNew Providence -2.979221 1.552052 -1.920 0.054925 .
## fromNew York Penn Station 6.505822 1.124914 5.783 7.39e-09 ***
## fromNewark Airport 3.706281 1.043935 3.550 0.000385 ***
## fromNewark Broad Street -0.090087 1.150037 -0.078 0.937563
## fromNewark Penn Station 6.113937 1.067132 5.729 1.02e-08 ***
## fromNorth Branch 6.864123 5.745017 1.195 0.232175
## fromNorth Elizabeth 4.829219 1.054193 4.581 4.65e-06 ***
## fromOradell -13.347502 2.245245 -5.945 2.80e-09 ***
## fromOrange -3.104598 1.255510 -2.473 0.013412 *
## fromOtisville -7.152905 2.170540 -3.295 0.000984 ***
## fromPark Ridge -11.486444 2.083457 -5.513 3.55e-08 ***
## fromPassaic -9.943792 1.532037 -6.491 8.67e-11 ***
## fromPaterson -9.618259 1.544014 -6.229 4.74e-10 ***
## fromPeapack -3.424178 1.974905 -1.734 0.082955 .
## fromPearl River -10.401808 1.914520 -5.433 5.58e-08 ***
## fromPerth Amboy 2.067244 0.654069 3.161 0.001576 **
## fromPlainfield -1.855553 1.586896 -1.169 0.242293
## fromPlauderville -11.826679 1.523010 -7.765 8.38e-15 ***
## fromPoint Pleasant Beach 6.141577 1.735229 3.539 0.000402 ***
## fromPort Jervis -11.127050 2.236801 -4.975 6.57e-07 ***
## fromPrinceton -15.680491 1.525175 -10.281 < 2e-16 ***
## fromPrinceton Junction 4.314170 1.201707 3.590 0.000331 ***
## fromRadburn Fair Lawn -10.855328 1.523815 -7.124 1.07e-12 ***
## fromRahway 3.488274 1.039204 3.357 0.000790 ***
## fromRamsey Main St -7.387665 1.473935 -5.012 5.41e-07 ***
## fromRamsey Route 17 -6.233879 1.462533 -4.262 2.03e-05 ***
## fromRaritan 7.223383 1.705715 4.235 2.29e-05 ***
## fromRed Bank 3.705934 1.139193 3.253 0.001143 **
## fromRidgewood -8.298855 1.421112 -5.840 5.28e-09 ***
## fromRiver Edge -12.552422 2.231604 -5.625 1.87e-08 ***
## fromRoselle Park 4.910148 1.244923 3.944 8.03e-05 ***
## fromRutherford -6.982636 1.366832 -5.109 3.26e-07 ***
## fromSalisbury Mills-Cornwall -9.768869 2.008035 -4.865 1.15e-06 ***
## fromSecaucus Lower Lvl -7.649638 1.285349 -5.951 2.69e-09 ***
## fromSecaucus Upper Lvl 2.954722 1.086518 2.719 0.006543 **
## fromShort Hills -2.912013 1.428947 -2.038 0.041571 *
## fromSloatsburg -6.206151 1.634902 -3.796 0.000147 ***
## fromSomerville 5.555869 1.689517 3.288 0.001009 **
## fromSouth Amboy 1.539943 1.206303 1.277 0.201759
## fromSouth Orange -2.633971 1.336141 -1.971 0.048695 *
## fromSpring Lake 3.719977 1.664053 2.235 0.025392 *
## fromSpring Valley -8.766803 2.107742 -4.159 3.20e-05 ***
## fromStirling -6.559082 1.839954 -3.565 0.000365 ***
## fromSuffern -7.555347 1.482131 -5.098 3.46e-07 ***
## fromSummit -2.615750 1.427677 -1.832 0.066935 .
## fromTeterboro 1.441130 2.176585 0.662 0.507908
## fromTowaco -15.425163 2.568770 -6.005 1.93e-09 ***
## fromTrenton 5.416350 1.258153 4.305 1.67e-05 ***
## fromTuxedo -10.352981 1.798830 -5.755 8.72e-09 ***
## fromUnion 5.519917 1.245008 4.434 9.30e-06 ***
## fromUpper Montclair -24.280662 2.140610 -11.343 < 2e-16 ***
## fromWaldwick -8.638214 1.411978 -6.118 9.60e-10 ***
## fromWalnut Street -28.833622 1.736164 -16.608 < 2e-16 ***
## fromWatchung Avenue -25.837970 1.966995 -13.136 < 2e-16 ***
## fromWatsessing Avenue -14.081476 1.302747 -10.809 < 2e-16 ***
## fromWayne-Route 23 -14.012212 2.643160 -5.301 1.16e-07 ***
## fromWesmont -9.514165 1.448716 -6.567 5.20e-11 ***
## fromWestfield 1.022368 1.409723 0.725 0.468319
## fromWestwood -13.335145 2.212501 -6.027 1.69e-09 ***
## fromWood Ridge -9.018547 1.636998 -5.509 3.63e-08 ***
## fromWoodbridge 2.826316 1.205979 2.344 0.019105 *
## fromWoodcliff Lake -11.826546 2.486686 -4.756 1.98e-06 ***
## toAllendale -10.581308 1.626529 -6.505 7.86e-11 ***
## toAllenhurst -2.480780 1.330133 -1.865 0.062182 .
## toAnderson Street -8.850480 2.114376 -4.186 2.85e-05 ***
## toAsbury Park -3.639023 1.422386 -2.558 0.010520 *
## toBasking Ridge 4.074801 2.091359 1.948 0.051376 .
## toBay Head -8.371792 1.728912 -4.842 1.29e-06 ***
## toBay Street 9.341976 1.615328 5.783 7.39e-09 ***
## toBelmar -1.834259 1.580819 -1.160 0.245926
## toBerkeley Heights 1.105016 1.880142 0.588 0.556718
## toBernardsville 2.097746 2.099971 0.999 0.317831
## toBloomfield 9.572179 1.527212 6.268 3.71e-10 ***
## toBoonton -4.925743 2.467946 -1.996 0.045955 *
## toBound Brook -0.062718 1.819766 -0.034 0.972507
## toBradley Beach -1.811444 1.523615 -1.189 0.234483
## toBrick Church -0.518873 1.422426 -0.365 0.715278
## toBridgewater -0.559100 1.829716 -0.306 0.759937
## toBroadway Fair Lawn -7.337422 1.704678 -4.304 1.68e-05 ***
## toCampbell Hall -8.033811 2.193129 -3.663 0.000250 ***
## toChatham -2.002070 1.658418 -1.207 0.227357
## toClifton -9.327007 1.700113 -5.486 4.14e-08 ***
## toConvent Station -1.941062 1.726214 -1.124 0.260825
## toCranford -0.402599 1.528262 -0.263 0.792217
## toDelawanna -9.154733 1.654143 -5.534 3.15e-08 ***
## toDenville -3.751207 1.795340 -2.089 0.036678 *
## toDover -4.592331 1.815489 -2.530 0.011426 *
## toDunellen 2.784549 1.782699 1.562 0.118301
## toEast Orange -1.977360 1.391877 -1.421 0.155430
## toEdison -3.906249 1.366052 -2.860 0.004246 **
## toElberon -4.162027 1.180571 -3.525 0.000423 ***
## toElizabeth -3.900277 1.238844 -3.148 0.001644 **
## toEmerson -10.517688 2.369464 -4.439 9.07e-06 ***
## toEssex Street -10.271518 1.940209 -5.294 1.20e-07 ***
## toFanwood 2.765408 1.672981 1.653 0.098344 .
## toFar Hills 1.448862 2.112992 0.686 0.492912
## toGarfield -7.572088 1.670152 -4.534 5.81e-06 ***
## toGarwood 0.753482 1.784263 0.422 0.672814
## toGillette 1.729968 1.943836 0.890 0.373485
## toGladstone -0.268335 2.121779 -0.126 0.899363
## toGlen Ridge 11.530875 1.600664 7.204 5.98e-13 ***
## toGlen Rock Boro Hall -8.907628 1.668016 -5.340 9.34e-08 ***
## toGlen Rock Main Line -10.573781 1.648301 -6.415 1.43e-10 ***
## toHamilton -6.245172 1.410002 -4.429 9.49e-06 ***
## toHarriman -7.351996 2.038538 -3.607 0.000311 ***
## toHawthorne -9.432766 1.676958 -5.625 1.87e-08 ***
## toHazlet 0.642588 1.172888 0.548 0.583786
## toHighland Avenue -0.602327 1.503359 -0.401 0.688677
## toHillsdale -11.250757 2.272818 -4.950 7.45e-07 ***
## toHoboken -12.063801 1.403489 -8.596 < 2e-16 ***
## toJersey Avenue -7.468490 3.076145 -2.428 0.015193 *
## toKingsland -9.790070 1.539429 -6.360 2.05e-10 ***
## toLake Hopatcong NA NA NA NA
## toLincoln Park -3.038656 2.749117 -1.105 0.269029
## toLinden -3.689004 1.281617 -2.878 0.004000 **
## toLittle Falls -3.157157 2.642020 -1.195 0.232104
## toLittle Silver -4.517523 0.920339 -4.909 9.22e-07 ***
## toLong Branch -6.683480 1.032645 -6.472 9.80e-11 ***
## toLyndhurst -10.317096 1.611618 -6.402 1.56e-10 ***
## toLyons 5.416379 2.063160 2.625 0.008662 **
## toMadison -1.818336 1.701686 -1.069 0.285281
## toMahwah -11.470339 1.640566 -6.992 2.77e-12 ***
## toManasquan -5.524002 1.678775 -3.290 0.001001 **
## toMaplewood -1.675384 1.564274 -1.071 0.284164
## toMetropark -3.373759 1.300813 -2.594 0.009503 **
## toMetuchen -4.953041 1.294099 -3.827 0.000130 ***
## toMiddletown NJ -1.556433 0.660160 -2.358 0.018396 *
## toMiddletown NY -10.222179 2.230327 -4.583 4.59e-06 ***
## toMillburn -1.275220 1.594397 -0.800 0.423825
## toMillington 2.692561 2.037191 1.322 0.186276
## toMontclair Heights -1.040546 2.478581 -0.420 0.674623
## toMontclair State U -1.963593 2.502645 -0.785 0.432690
## toMontvale -11.582414 2.158729 -5.365 8.13e-08 ***
## toMorris Plains -2.336347 1.759039 -1.328 0.184123
## toMorristown -2.286950 1.745772 -1.310 0.190207
## toMount Tabor -3.022075 1.805570 -1.674 0.094189 .
## toMountain Avenue 4.736240 2.527400 1.874 0.060945 .
## toMountain Lakes -16.153602 2.223172 -7.266 3.79e-13 ***
## toMountain Station -0.928186 1.518092 -0.611 0.540928
## toMountain View -6.200356 2.783581 -2.227 0.025922 *
## toMurray Hill -1.390675 1.806268 -0.770 0.441355
## toNanuet -12.680834 2.129728 -5.954 2.64e-09 ***
## toNetherwood 4.479554 1.701067 2.633 0.008458 **
## toNew Bridge Landing -11.538090 2.244457 -5.141 2.75e-07 ***
## toNew Brunswick -4.177951 1.362492 -3.066 0.002168 **
## toNew Providence -1.005703 1.724296 -0.583 0.559726
## toNew York Penn Station -6.260378 1.304327 -4.800 1.60e-06 ***
## toNewark Airport -4.997722 1.257851 -3.973 7.11e-05 ***
## toNewark Broad Street -1.192935 1.350970 -0.883 0.377232
## toNewark Penn Station -3.885892 1.280809 -3.034 0.002416 **
## toNorth Elizabeth -3.816077 1.307125 -2.919 0.003509 **
## toOradell -11.536692 2.377312 -4.853 1.22e-06 ***
## toOrange -0.702382 1.455645 -0.483 0.629438
## toOtisville -6.833226 2.291287 -2.982 0.002863 **
## toPark Ridge -11.155734 2.257277 -4.942 7.77e-07 ***
## toPassaic -10.485351 1.690155 -6.204 5.58e-10 ***
## toPaterson -10.988045 1.703732 -6.449 1.14e-10 ***
## toPeapack 1.151976 2.108523 0.546 0.584834
## toPearl River -12.951791 2.180557 -5.940 2.88e-09 ***
## toPerth Amboy -2.146716 0.654890 -3.278 0.001047 **
## toPlainfield 2.364518 1.764049 1.340 0.180127
## toPlauderville -8.158539 1.647809 -4.951 7.41e-07 ***
## toPoint Pleasant Beach -6.589060 1.718963 -3.833 0.000127 ***
## toPort Jervis -8.574670 2.301783 -3.725 0.000195 ***
## toPrinceton -4.140648 1.607698 -2.576 0.010014 *
## toPrinceton Junction -4.121210 1.403727 -2.936 0.003328 **
## toRadburn Fair Lawn -8.355362 1.666024 -5.015 5.33e-07 ***
## toRahway -5.329497 1.183299 -4.504 6.69e-06 ***
## toRamsey Main St -11.942395 1.619586 -7.374 1.70e-13 ***
## toRamsey Route 17 -10.561864 1.632487 -6.470 9.95e-11 ***
## toRaritan -4.069492 1.851930 -2.197 0.027997 *
## toRed Bank -1.514164 1.109226 -1.365 0.172242
## toRidgewood -9.838538 1.582133 -6.219 5.08e-10 ***
## toRiver Edge -10.631477 2.320367 -4.582 4.63e-06 ***
## toRoselle Park -0.621306 1.443813 -0.430 0.666964
## toRutherford -9.683772 1.539984 -6.288 3.25e-10 ***
## toSalisbury Mills-Cornwall -9.408679 2.109222 -4.461 8.20e-06 ***
## toSecaucus Lower Lvl -11.294525 1.468154 -7.693 1.48e-14 ***
## toSecaucus Upper Lvl -5.993172 1.296656 -4.622 3.81e-06 ***
## toShort Hills -1.643362 1.601558 -1.026 0.304851
## toSloatsburg -9.002396 1.804812 -4.988 6.13e-07 ***
## toSomerville -3.519027 1.849180 -1.903 0.057047 .
## toSouth Amboy -0.630942 1.208943 -0.522 0.601747
## toSouth Orange -0.762510 1.523292 -0.501 0.616679
## toSpring Lake -3.709998 1.646859 -2.253 0.024280 *
## toSpring Valley -14.190736 2.233403 -6.354 2.13e-10 ***
## toStirling 4.297220 1.991521 2.158 0.030954 *
## toSuffern -11.910252 1.628935 -7.312 2.70e-13 ***
## toSummit -2.204809 1.619785 -1.361 0.173468
## toTeterboro 1.548814 2.274208 0.681 0.495855
## toTowaco -11.731443 2.660228 -4.410 1.04e-05 ***
## toTrenton -5.695212 1.422760 -4.003 6.27e-05 ***
## toTuxedo -11.385632 1.918832 -5.934 2.99e-09 ***
## toUnion -2.885587 1.441157 -2.002 0.045264 *
## toUpper Montclair 2.874191 2.281044 1.260 0.207666
## toWaldwick -10.761762 1.594459 -6.749 1.51e-11 ***
## toWalnut Street 7.996041 1.950540 4.099 4.15e-05 ***
## toWatchung Avenue 6.091052 2.124717 2.867 0.004150 **
## toWatsessing Avenue -2.207994 1.491813 -1.480 0.138863
## toWayne-Route 23 -1.342247 2.760401 -0.486 0.626793
## toWesmont -10.193372 1.624962 -6.273 3.59e-10 ***
## toWestfield 2.810558 1.576657 1.783 0.074660 .
## toWestwood -11.093781 2.365553 -4.690 2.75e-06 ***
## toWood Ridge -9.268548 1.722909 -5.380 7.52e-08 ***
## toWoodbridge -3.157314 1.211278 -2.607 0.009149 **
## toWoodcliff Lake -11.632009 2.587848 -4.495 6.99e-06 ***
## hour -0.011182 0.007102 -1.574 0.115400
## Temperature 0.008258 0.005676 1.455 0.145696
## Precipitation -1.546268 1.055616 -1.465 0.142985
## Wind_Speed 0.023642 0.005582 4.236 2.28e-05 ***
## lag1h 0.078731 0.003749 20.999 < 2e-16 ***
## lag1h15min -0.028437 0.003763 -7.557 4.23e-14 ***
## lag1h30min -0.009220 0.003709 -2.486 0.012937 *
## lag1h45min -0.019057 0.003737 -5.100 3.42e-07 ***
## lag2h -0.015703 0.003777 -4.157 3.23e-05 ***
## lag2h15min 0.008769 0.003709 2.364 0.018075 *
## lag2h30min 0.023949 0.003651 6.560 5.46e-11 ***
## lag2h45min -0.020982 0.003744 -5.605 2.10e-08 ***
## lag3h -0.004864 0.003666 -1.327 0.184581
## lags5station 0.934121 0.011843 78.875 < 2e-16 ***
## lags6station 0.061852 0.013188 4.690 2.74e-06 ***
## stop_sequence -0.128800 0.013693 -9.407 < 2e-16 ***
## lineGladstone Branch -15.614693 0.789714 -19.773 < 2e-16 ***
## lineMain Line -0.391479 0.218457 -1.792 0.073140 .
## lineMontclair-Boonton 0.780355 0.577468 1.351 0.176596
## lineMorristown Line -15.414479 0.691030 -22.307 < 2e-16 ***
## lineNo Jersey Coast -18.795577 0.738671 -25.445 < 2e-16 ***
## lineNortheast Corrdr -18.525681 0.736909 -25.140 < 2e-16 ***
## linePascack Valley 4.383799 0.559269 7.838 4.70e-15 ***
## linePrinceton Shuttle NA NA NA NA
## lineRaritan Valley -21.770549 1.000943 -21.750 < 2e-16 ***
## to_inter NA NA NA NA
## from_inter NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.488 on 32563 degrees of freedom
## Multiple R-squared: 0.617, Adjusted R-squared: 0.6133
## F-statistic: 167.6 on 313 and 32563 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = delay_minutes ~ from + to + hour + Temperature +
## Precipitation + Wind_Speed + lag1h30min + lag1h45min + lag2h +
## lag2h15min + lag2h30min + lag2h45min + lag3h + lags6station +
## stop_sequence + line + to_inter + from_inter, data = delay.Train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.547 -1.255 -0.284 0.739 144.575
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.652377 1.586777 12.385 < 2e-16 ***
## fromAllendale -7.611778 1.591224 -4.784 1.73e-06 ***
## fromAllenhurst 2.972562 1.497245 1.985 0.047113 *
## fromAnderson Street -10.786843 2.226074 -4.846 1.27e-06 ***
## fromAsbury Park 3.035667 1.577093 1.925 0.054257 .
## fromAvenel 2.856701 0.977780 2.922 0.003485 **
## fromBasking Ridge -7.720120 2.118507 -3.644 0.000269 ***
## fromBay Head 7.511630 1.940865 3.870 0.000109 ***
## fromBay Street -37.610846 1.584325 -23.739 < 2e-16 ***
## fromBelmar 4.891505 1.754133 2.789 0.005297 **
## fromBerkeley Heights -7.201310 1.885893 -3.819 0.000135 ***
## fromBernardsville -8.506844 2.150309 -3.956 7.63e-05 ***
## fromBloomfield -21.013402 1.508173 -13.933 < 2e-16 ***
## fromBoonton -9.610425 2.581818 -3.722 0.000198 ***
## fromBound Brook 0.035839 1.814632 0.020 0.984243
## fromBradley Beach 2.370224 1.695012 1.398 0.162017
## fromBrick Church -3.102597 1.347903 -2.302 0.021353 *
## fromBridgewater 4.735968 1.839143 2.575 0.010026 *
## fromBroadway Fair Lawn -11.668243 1.644266 -7.096 1.31e-12 ***
## fromCampbell Hall -7.017348 2.263221 -3.101 0.001933 **
## fromChatham -4.969569 1.637674 -3.035 0.002411 **
## fromClifton -8.991763 1.677128 -5.361 8.31e-08 ***
## fromConvent Station -2.892431 1.719462 -1.682 0.092545 .
## fromCranford 1.077332 1.476245 0.730 0.465531
## fromDelawanna -7.858533 1.615310 -4.865 1.15e-06 ***
## fromDenville -2.194056 1.803074 -1.217 0.223673
## fromDover -2.033405 1.872330 -1.086 0.277474
## fromDunellen 0.765813 1.782464 0.430 0.667462
## fromEast Orange -2.395882 1.302778 -1.839 0.065916 .
## fromEdison 4.385634 1.242315 3.530 0.000416 ***
## fromElberon 4.543671 1.311835 3.464 0.000534 ***
## fromElizabeth 3.869671 1.149783 3.366 0.000765 ***
## fromEmerson -14.584349 2.498640 -5.837 5.37e-09 ***
## fromEssex Street -13.671257 1.995549 -6.851 7.47e-12 ***
## fromFanwood -3.248070 1.621442 -2.003 0.045164 *
## fromFar Hills -5.625211 2.153430 -2.612 0.009000 **
## fromGarfield -9.603251 1.623825 -5.914 3.37e-09 ***
## fromGarwood 1.774894 1.549615 1.145 0.252061
## fromGillette -9.031468 1.937986 -4.660 3.17e-06 ***
## fromGladstone -6.079006 2.175235 -2.795 0.005199 **
## fromGlen Ridge -24.589575 1.556094 -15.802 < 2e-16 ***
## fromGlen Rock Boro Hall -10.539235 1.609607 -6.548 5.93e-11 ***
## fromGlen Rock Main Line -10.910985 1.624925 -6.715 1.91e-11 ***
## fromHamilton 3.347936 1.345146 2.489 0.012819 *
## fromHarriman -5.692516 2.077606 -2.740 0.006148 **
## fromHawthorne -9.752510 1.660340 -5.874 4.30e-09 ***
## fromHazlet 1.473814 1.295800 1.137 0.255389
## fromHighland Avenue -4.765133 1.433764 -3.324 0.000890 ***
## fromHillsdale -14.558493 2.386400 -6.101 1.07e-09 ***
## fromHoboken -6.945622 1.348231 -5.152 2.60e-07 ***
## fromJersey Avenue -0.061130 2.632984 -0.023 0.981477
## fromKingsland -7.039241 1.474634 -4.774 1.82e-06 ***
## fromLake Hopatcong -20.202822 2.125689 -9.504 < 2e-16 ***
## fromLincoln Park -14.903960 2.895234 -5.148 2.65e-07 ***
## fromLinden 3.992932 1.085489 3.678 0.000235 ***
## fromLittle Falls -28.262486 2.809925 -10.058 < 2e-16 ***
## fromLittle Silver 2.875571 0.999343 2.877 0.004011 **
## fromLong Branch 4.434495 1.206098 3.677 0.000237 ***
## fromLyndhurst -9.246095 1.587558 -5.824 5.80e-09 ***
## fromLyons -9.092228 2.103842 -4.322 1.55e-05 ***
## fromMadison -4.909845 1.680343 -2.922 0.003481 **
## fromMahwah -7.939740 1.623272 -4.891 1.01e-06 ***
## fromManasquan 3.108982 1.870114 1.662 0.096431 .
## fromMaplewood -5.638012 1.508401 -3.738 0.000186 ***
## fromMetropark 4.155712 1.133829 3.665 0.000248 ***
## fromMetuchen 1.427626 1.225580 1.165 0.244085
## fromMiddletown NJ 0.841718 0.721586 1.166 0.243427
## fromMiddletown NY -11.274058 2.355257 -4.787 1.70e-06 ***
## fromMillburn -5.981000 1.540324 -3.883 0.000103 ***
## fromMillington -9.790295 2.048930 -4.778 1.78e-06 ***
## fromMontclair Heights -30.256906 2.612435 -11.582 < 2e-16 ***
## fromMontclair State U -31.215272 2.550501 -12.239 < 2e-16 ***
## fromMontvale -11.183573 2.325910 -4.808 1.53e-06 ***
## fromMorris Plains -1.409057 1.766749 -0.798 0.425142
## fromMorristown -2.702172 1.746874 -1.547 0.121906
## fromMount Tabor -1.786073 1.833973 -0.974 0.330122
## fromMountain Avenue -31.697940 2.441350 -12.984 < 2e-16 ***
## fromMountain Lakes -19.231206 2.322076 -8.282 < 2e-16 ***
## fromMountain Station -5.079883 1.448995 -3.506 0.000456 ***
## fromMountain View -27.169416 2.959097 -9.182 < 2e-16 ***
## fromMurray Hill -7.316059 1.779038 -4.112 3.93e-05 ***
## fromNanuet -9.808068 2.316207 -4.235 2.30e-05 ***
## fromNetherwood -2.524495 1.688919 -1.495 0.134992
## fromNew Bridge Landing -17.440314 2.329923 -7.485 7.32e-14 ***
## fromNew Brunswick 2.619450 1.298750 2.017 0.043714 *
## fromNew Providence -5.965967 1.701893 -3.505 0.000456 ***
## fromNew York Penn Station 6.339016 1.234326 5.136 2.83e-07 ***
## fromNewark Airport 3.281836 1.145394 2.865 0.004170 **
## fromNewark Broad Street -0.978014 1.261751 -0.775 0.438272
## fromNewark Penn Station 5.724670 1.170843 4.889 1.02e-06 ***
## fromNorth Branch 5.485548 6.303967 0.870 0.384212
## fromNorth Elizabeth 4.374277 1.156681 3.782 0.000156 ***
## fromOradell -17.203624 2.463145 -6.984 2.92e-12 ***
## fromOrange -4.055002 1.377086 -2.945 0.003236 **
## fromOtisville -7.046399 2.381582 -2.959 0.003092 **
## fromPark Ridge -12.583207 2.285945 -5.505 3.73e-08 ***
## fromPassaic -11.041093 1.680463 -6.570 5.10e-11 ***
## fromPaterson -11.232339 1.693528 -6.633 3.35e-11 ***
## fromPeapack -5.436358 2.166607 -2.509 0.012107 *
## fromPearl River -10.905497 2.100567 -5.192 2.10e-07 ***
## fromPerth Amboy 0.511180 0.717334 0.713 0.476092
## fromPlainfield -3.927622 1.741059 -2.256 0.024085 *
## fromPlauderville -11.810143 1.670876 -7.068 1.60e-12 ***
## fromPoint Pleasant Beach 5.614798 1.904040 2.949 0.003191 **
## fromPort Jervis -11.887706 2.454298 -4.844 1.28e-06 ***
## fromPrinceton -16.100883 1.673169 -9.623 < 2e-16 ***
## fromPrinceton Junction 5.339687 1.318450 4.050 5.13e-05 ***
## fromRadburn Fair Lawn -10.991094 1.671652 -6.575 4.94e-11 ***
## fromRahway 2.692757 1.140223 2.362 0.018202 *
## fromRamsey Main St -8.138384 1.616691 -5.034 4.83e-07 ***
## fromRamsey Route 17 -6.678712 1.604245 -4.163 3.15e-05 ***
## fromRaritan 7.165256 1.871629 3.828 0.000129 ***
## fromRed Bank 3.420471 1.250032 2.736 0.006217 **
## fromRidgewood -8.856707 1.558827 -5.682 1.35e-08 ***
## fromRiver Edge -14.023259 2.448549 -5.727 1.03e-08 ***
## fromRoselle Park 4.417699 1.365976 3.234 0.001221 **
## fromRutherford -7.050769 1.499472 -4.702 2.58e-06 ***
## fromSalisbury Mills-Cornwall -10.682595 2.203111 -4.849 1.25e-06 ***
## fromSecaucus Lower Lvl -8.118453 1.409973 -5.758 8.59e-09 ***
## fromSecaucus Upper Lvl 2.680677 1.192060 2.249 0.024533 *
## fromShort Hills -5.187681 1.566868 -3.311 0.000931 ***
## fromSloatsburg -7.072778 1.793562 -3.943 8.05e-05 ***
## fromSomerville 4.431484 1.853799 2.390 0.016832 *
## fromSouth Amboy 1.755354 1.323676 1.326 0.184809
## fromSouth Orange -5.666046 1.465108 -3.867 0.000110 ***
## fromSpring Lake 3.376588 1.825962 1.849 0.064436 .
## fromSpring Valley -9.929429 2.312310 -4.294 1.76e-05 ***
## fromStirling -10.419173 2.017772 -5.164 2.44e-07 ***
## fromSuffern -8.018273 1.625804 -4.932 8.18e-07 ***
## fromSummit -5.609870 1.565382 -3.584 0.000339 ***
## fromTeterboro 1.376266 2.387868 0.576 0.564378
## fromTowaco -25.070282 2.813001 -8.912 < 2e-16 ***
## fromTrenton 5.424637 1.380443 3.930 8.53e-05 ***
## fromTuxedo -10.642370 1.973506 -5.393 6.99e-08 ***
## fromUnion 4.400130 1.366003 3.221 0.001278 **
## fromUpper Montclair -33.087921 2.344079 -14.116 < 2e-16 ***
## fromWaldwick -8.923018 1.548729 -5.762 8.41e-09 ***
## fromWalnut Street -35.623987 1.902339 -18.726 < 2e-16 ***
## fromWatchung Avenue -35.426148 2.147386 -16.497 < 2e-16 ***
## fromWatsessing Avenue -12.060828 1.428835 -8.441 < 2e-16 ***
## fromWayne-Route 23 -23.193220 2.885965 -8.037 9.55e-16 ***
## fromWesmont -10.281616 1.589443 -6.469 1.00e-10 ***
## fromWestfield -0.009328 1.546775 -0.006 0.995188
## fromWestwood -15.981581 2.427410 -6.584 4.66e-11 ***
## fromWood Ridge -9.129474 1.795876 -5.084 3.72e-07 ***
## fromWoodbridge 2.486637 1.323310 1.879 0.060239 .
## fromWoodcliff Lake -13.817779 2.728490 -5.064 4.12e-07 ***
## toAllendale -9.755795 1.784728 -5.466 4.63e-08 ***
## toAllenhurst -3.578305 1.459440 -2.452 0.014218 *
## toAnderson Street -3.254244 2.318837 -1.403 0.160509
## toAsbury Park -2.032711 1.560609 -1.303 0.192751
## toBasking Ridge 7.376497 2.294062 3.215 0.001304 **
## toBay Head -8.473570 1.897117 -4.467 7.98e-06 ***
## toBay Street 19.003558 1.761511 10.788 < 2e-16 ***
## toBelmar -1.637016 1.734623 -0.944 0.345314
## toBerkeley Heights 3.280639 2.062353 1.591 0.111681
## toBernardsville 3.678038 2.303958 1.596 0.110409
## toBloomfield 7.304490 1.673256 4.365 1.27e-05 ***
## toBoonton 3.317618 2.704401 1.227 0.219926
## toBound Brook -0.112915 1.996818 -0.057 0.954906
## toBradley Beach -2.746220 1.671805 -1.643 0.100461
## toBrick Church -0.039630 1.560551 -0.025 0.979740
## toBridgewater 0.256985 2.007706 0.128 0.898150
## toBroadway Fair Lawn -6.393281 1.870504 -3.418 0.000632 ***
## toCampbell Hall -6.146465 2.406244 -2.554 0.010642 *
## toChatham 1.303502 1.818758 0.717 0.473565
## toClifton -7.763450 1.865377 -4.162 3.16e-05 ***
## toConvent Station 0.081449 1.893444 0.043 0.965689
## toCranford -0.210795 1.676944 -0.126 0.899969
## toDelawanna -7.696072 1.814963 -4.240 2.24e-05 ***
## toDenville -2.181446 1.969511 -1.108 0.268039
## toDover -3.256252 1.991751 -1.635 0.102086
## toDunellen 4.207687 1.956060 2.151 0.031475 *
## toEast Orange -0.789558 1.526801 -0.517 0.605067
## toEdison -2.877491 1.498881 -1.920 0.054897 .
## toElberon -3.341054 1.295381 -2.579 0.009907 **
## toElizabeth -3.720689 1.359372 -2.737 0.006202 **
## toEmerson -6.693972 2.599427 -2.575 0.010023 *
## toEssex Street -9.479021 2.128663 -4.453 8.49e-06 ***
## toFanwood 4.182758 1.835665 2.279 0.022697 *
## toFar Hills 3.119212 2.318162 1.346 0.178456
## toGarfield -6.264464 1.832557 -3.418 0.000631 ***
## toGarwood 1.435401 1.957842 0.733 0.463469
## toGillette 5.464320 2.131892 2.563 0.010378 *
## toGladstone 0.916837 2.328047 0.394 0.693715
## toGlen Ridge 18.069881 1.751883 10.315 < 2e-16 ***
## toGlen Rock Boro Hall -8.273336 1.830280 -4.520 6.20e-06 ***
## toGlen Rock Main Line -9.269055 1.808591 -5.125 2.99e-07 ***
## toHamilton -6.834400 1.547149 -4.417 1.00e-05 ***
## toHarriman -5.887848 2.236723 -2.632 0.008484 **
## toHawthorne -8.319290 1.840006 -4.521 6.17e-06 ***
## toHazlet 0.879173 1.287007 0.683 0.494539
## toHighland Avenue 1.815594 1.649027 1.101 0.270901
## toHillsdale -6.905291 2.493085 -2.770 0.005613 **
## toHoboken -11.499848 1.540015 -7.467 8.39e-14 ***
## toJersey Avenue -7.150308 3.375452 -2.118 0.034155 *
## toKingsland -9.004542 1.689159 -5.331 9.84e-08 ***
## toLake Hopatcong NA NA NA NA
## toLincoln Park 7.575507 3.010159 2.517 0.011853 *
## toLinden -3.318904 1.406298 -2.360 0.018279 *
## toLittle Falls 8.074040 2.881843 2.802 0.005087 **
## toLittle Silver -3.501458 1.009793 -3.467 0.000526 ***
## toLong Branch -6.175512 1.133029 -5.450 5.06e-08 ***
## toLyndhurst -10.971883 1.768358 -6.205 5.55e-10 ***
## toLyons 5.227245 2.263424 2.309 0.020925 *
## toMadison 0.386194 1.866447 0.207 0.836078
## toMahwah -10.719298 1.800136 -5.955 2.63e-09 ***
## toManasquan -5.229198 1.842106 -2.839 0.004532 **
## toMaplewood 1.396641 1.715599 0.814 0.415603
## toMetropark -3.544484 1.427376 -2.483 0.013025 *
## toMetuchen -4.621628 1.419995 -3.255 0.001136 **
## toMiddletown NJ -1.268148 0.724385 -1.751 0.080015 .
## toMiddletown NY -10.107538 2.447245 -4.130 3.63e-05 ***
## toMillburn 0.700379 1.748744 0.401 0.688788
## toMillington 7.087952 2.234282 3.172 0.001513 **
## toMontclair Heights 13.288794 2.692589 4.935 8.04e-07 ***
## toMontclair State U 11.566541 2.724921 4.245 2.19e-05 ***
## toMontvale -10.455769 2.368548 -4.414 1.02e-05 ***
## toMorris Plains -1.021035 1.929619 -0.529 0.596713
## toMorristown -1.227532 1.914963 -0.641 0.521514
## toMount Tabor -1.575352 1.980671 -0.795 0.426409
## toMountain Avenue 16.482947 2.763469 5.965 2.48e-09 ***
## toMountain Lakes -12.864181 2.438462 -5.276 1.33e-07 ***
## toMountain Station 1.618950 1.665132 0.972 0.330926
## toMountain View 1.274346 3.046005 0.418 0.675682
## toMurray Hill 1.034451 1.981296 0.522 0.601598
## toNanuet -11.303097 2.336686 -4.837 1.32e-06 ***
## toNetherwood 6.521293 1.866378 3.494 0.000476 ***
## toNew Bridge Landing -9.568194 2.462622 -3.885 0.000102 ***
## toNew Brunswick -5.769018 1.494907 -3.859 0.000114 ***
## toNew Providence 1.529548 1.891272 0.809 0.418671
## toNew York Penn Station -6.149277 1.431222 -4.297 1.74e-05 ***
## toNewark Airport -4.883048 1.380215 -3.538 0.000404 ***
## toNewark Broad Street -1.100829 1.482303 -0.743 0.457701
## toNewark Penn Station -3.910676 1.405428 -2.783 0.005396 **
## toNorth Elizabeth -3.479168 1.434287 -2.426 0.015284 *
## toOradell -9.417009 2.608465 -3.610 0.000306 ***
## toOrange 0.664266 1.596741 0.416 0.677403
## toOtisville -5.610224 2.513951 -2.232 0.025646 *
## toPark Ridge -9.292570 2.476754 -3.752 0.000176 ***
## toPassaic -9.459356 1.854522 -5.101 3.40e-07 ***
## toPaterson -9.480120 1.869391 -5.071 3.97e-07 ***
## toPeapack 2.577005 2.313370 1.114 0.265304
## toPearl River -11.351370 2.392521 -4.745 2.10e-06 ***
## toPerth Amboy -1.743954 0.718573 -2.427 0.015231 *
## toPlainfield 3.582177 1.935618 1.851 0.064227 .
## toPlauderville -6.866028 1.808047 -3.797 0.000146 ***
## toPoint Pleasant Beach -6.510601 1.886210 -3.452 0.000558 ***
## toPort Jervis -8.005969 2.525559 -3.170 0.001526 **
## toPrinceton -3.555203 1.764089 -2.015 0.043879 *
## toPrinceton Junction -3.544509 1.540262 -2.301 0.021384 *
## toRadburn Fair Lawn -6.916214 1.827998 -3.783 0.000155 ***
## toRahway -4.519163 1.298366 -3.481 0.000501 ***
## toRamsey Main St -11.379160 1.777128 -6.403 1.54e-10 ***
## toRamsey Route 17 -9.983425 1.791271 -5.573 2.52e-08 ***
## toRaritan -4.381285 2.032121 -2.156 0.031090 *
## toRed Bank -1.282470 1.217150 -1.054 0.292044
## toRidgewood -8.659381 1.735964 -4.988 6.12e-07 ***
## toRiver Edge -5.286442 2.545077 -2.077 0.037798 *
## toRoselle Park 0.138639 1.584248 0.088 0.930266
## toRutherford -8.846655 1.689790 -5.235 1.66e-07 ***
## toSalisbury Mills-Cornwall -9.470427 2.314358 -4.092 4.29e-05 ***
## toSecaucus Lower Lvl -10.910489 1.610980 -6.773 1.29e-11 ***
## toSecaucus Upper Lvl -5.917198 1.422776 -4.159 3.21e-05 ***
## toShort Hills 1.208247 1.756521 0.688 0.491543
## toSloatsburg -8.470333 1.980381 -4.277 1.90e-05 ***
## toSomerville -3.870403 2.029102 -1.907 0.056472 .
## toSouth Amboy 1.133281 1.326324 0.854 0.392861
## toSouth Orange 1.175586 1.670815 0.704 0.481687
## toSpring Lake -3.184244 1.807076 -1.762 0.078062 .
## toSpring Valley -12.931184 2.450414 -5.277 1.32e-07 ***
## toStirling 5.815971 2.184671 2.662 0.007768 **
## toSuffern -11.081331 1.787371 -6.200 5.72e-10 ***
## toSummit 0.095349 1.776595 0.054 0.957199
## toTeterboro 2.189791 2.495452 0.878 0.380215
## toTowaco -8.572445 2.915470 -2.940 0.003281 **
## toTrenton -5.713263 1.561158 -3.660 0.000253 ***
## toTuxedo -9.877828 2.105402 -4.692 2.72e-06 ***
## toUnion -2.855849 1.581372 -1.806 0.070938 .
## toUpper Montclair 16.489181 2.481228 6.646 3.07e-11 ***
## toWaldwick -9.902672 1.749534 -5.660 1.52e-08 ***
## toWalnut Street 20.145977 2.112489 9.537 < 2e-16 ***
## toWatchung Avenue 17.767862 2.318313 7.664 1.85e-14 ***
## toWatsessing Avenue 1.014573 1.636275 0.620 0.535229
## toWayne-Route 23 9.595274 3.018638 3.179 0.001481 **
## toWesmont -9.962969 1.783010 -5.588 2.32e-08 ***
## toWestfield 3.442803 1.730035 1.990 0.046597 *
## toWestwood -7.875273 2.595364 -3.034 0.002412 **
## toWood Ridge -9.172549 1.890514 -4.852 1.23e-06 ***
## toWoodbridge -1.542501 1.328902 -1.161 0.245759
## toWoodcliff Lake -9.913165 2.839354 -3.491 0.000481 ***
## hour -0.014150 0.007791 -1.816 0.069352 .
## Temperature 0.012790 0.006227 2.054 0.039998 *
## Precipitation -1.868911 1.158141 -1.614 0.106599
## Wind_Speed 0.033916 0.006117 5.544 2.97e-08 ***
## lag1h30min -0.022311 0.004037 -5.527 3.28e-08 ***
## lag1h45min -0.014112 0.004061 -3.475 0.000511 ***
## lag2h -0.024615 0.004129 -5.961 2.53e-09 ***
## lag2h15min 0.004351 0.004065 1.070 0.284507
## lag2h30min 0.017288 0.003995 4.327 1.51e-05 ***
## lag2h45min -0.024440 0.004104 -5.955 2.62e-09 ***
## lag3h -0.003344 0.004016 -0.833 0.405093
## lags6station 0.966913 0.006704 144.222 < 2e-16 ***
## stop_sequence -0.121743 0.015016 -8.107 5.36e-16 ***
## lineGladstone Branch -14.960498 0.866004 -17.275 < 2e-16 ***
## lineMain Line -0.294316 0.239700 -1.228 0.219512
## lineMontclair-Boonton 0.876499 0.633605 1.383 0.166566
## lineMorristown Line -15.023294 0.757765 -19.826 < 2e-16 ***
## lineNo Jersey Coast -18.349714 0.810060 -22.652 < 2e-16 ***
## lineNortheast Corrdr -18.012343 0.808139 -22.289 < 2e-16 ***
## linePascack Valley 4.579807 0.613669 7.463 8.67e-14 ***
## linePrinceton Shuttle NA NA NA NA
## lineRaritan Valley -21.110345 1.097858 -19.229 < 2e-16 ***
## to_inter NA NA NA NA
## from_inter NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.022 on 32566 degrees of freedom
## Multiple R-squared: 0.5388, Adjusted R-squared: 0.5344
## F-statistic: 122.7 on 310 and 32566 DF, p-value: < 2.2e-16
The analysis presents three time series plots comparing predicted and actual train delays across different prediction horizons (30, 60, and 90 minutes) during April 2020.
30-Minute Prediction Model:Shows the strongest tracking of actual delay patterns Captures most peak delay events accurately Maintains consistent prediction accuracy throughout the month Slight underprediction of extreme delay events (>7.5 minutes)
60-Minute Prediction Model:Generally follows the trend of actual delays but with reduced accuracy Moderate degradation in peak delay prediction More smoothed predictions compared to 30-minute model Still captures daily patterns effectively
90-Minute Prediction Model:Shows similar patterns to 60-minute model but with further accuracy reduction Tends to underestimate larger delay events more significantly Maintains ability to capture general delay trends Some negative predictions appear, indicating model limitations
delay.Test_5 <- delay.Test%>%
mutate(pre_5 = predict(reg.30, newdata = delay.Test),
abosulte_error = abs(pre_5 - delay_minutes),
MAE = mean(abosulte_error,na.rm = TRUE),
sd_AE = sd(abosulte_error,na.rm = TRUE),
per_error = (pre_5 - delay_minutes)/delay_minutes,
per_error = ifelse(per_error == Inf,0,per_error),
per_error = ifelse(per_error == -Inf,0,per_error))%>%
rename(mod_30 = pre_5)
delay.Test_6 <- delay.Test%>%
mutate(pre_6 = predict(reg.60, newdata = delay.Test),
abosulte_error = abs(pre_6 - delay_minutes),
MAE = mean(abosulte_error,na.rm = TRUE),
sd_AE = sd(abosulte_error,na.rm = TRUE),
per_error = (pre_6 - delay_minutes)/delay_minutes,
per_error = ifelse(per_error == Inf,0,per_error),
per_error = ifelse(per_error == -Inf,0,per_error))%>%
rename(mod_60 = pre_6)
delay.Test_7 <- delay.Test%>%
mutate(pre_7 = predict(reg.90, newdata = delay.Test),
abosulte_error = abs(pre_7 - delay_minutes),
MAE = mean(abosulte_error,na.rm = TRUE),
sd_AE = sd(abosulte_error,na.rm = TRUE),
per_error = (pre_7 - delay_minutes)/delay_minutes,
per_error = ifelse(per_error == Inf,0,per_error),
per_error = ifelse(per_error == -Inf,0,per_error))%>%
rename(mod_90 = pre_7)
grid.arrange(
delay.Test_5%>%
dplyr::select(interval60, from, delay_minutes, mod_30) %>%
gather(Variable, Value, -interval60, -from) %>%
group_by(Variable, interval60) %>%
summarize(Value = mean(Value))%>%
ggplot(aes(interval60, Value, colour=Variable)) +
geom_line(size = 0.9)+
labs(title = "Predicted/Observed delay time series", subtitle = "30 Minustes-Pre Predict", x = "Day", y= "Mean Delay") +
theme_minimal(),
delay.Test_6%>%
dplyr::select(interval60, from, delay_minutes, mod_60) %>%
gather(Variable, Value, -interval60, -from) %>%
group_by(Variable, interval60) %>%
summarize(Value = mean(Value))%>%
ggplot(aes(interval60, Value, colour=Variable)) +
geom_line(size = 0.9)+
labs(title = "Predicted/Observed delay time series", subtitle = "60 Minustes-Pre Predict", x = "Day", y= "Mean Delay") +
theme_minimal(),
delay.Test_7%>%
dplyr::select(interval60, from, delay_minutes, mod_90) %>%
gather(Variable, Value, -interval60, -from) %>%
group_by(Variable, interval60) %>%
summarize(Value = mean(Value))%>%
ggplot(aes(interval60, Value, colour=Variable)) +
geom_line(size = 0.9)+
labs(title = "Predicted/Observed delay time series", subtitle = "90 Minustes-Pre Predict", x = "Day", y= "Mean Delay") +
theme_minimal(),
ncol=1)
The time series comparison reveals both strengths and limitations of our delay prediction models. The models show consistent performance in capturing regular daily and weekly patterns, with the 30-minute prediction horizon demonstrating the highest accuracy. All models effectively track normal operating conditions but tend to underpredict extreme delay events.
Performance gradually declines as the prediction horizon extends from 30 to 90 minutes, with increasing difficulty in capturing sudden spikes in delays. While the models maintain good accuracy for typical delay patterns, they show a tendency to smooth out extreme variations, particularly noticeable in the longer-range predictions.
These findings suggest the models are most reliable for short-term operational planning under normal conditions, while additional considerations may be needed for predicting extreme delay events or longer time horizons.
The spatial analysis reveals distinct patterns in prediction accuracy across the NJ Transit network. Urban stations, particularly those serving as intersection points for multiple lines, exhibit higher prediction errors, indicated by darker colors on the map. This increased error rate likely stems from their more complex operational environments and higher passenger volumes. In contrast, coastal and peripheral stations with simpler operations show notably lower prediction errors, as shown by lighter colors. While line intersection points consistently demonstrate elevated error rates across all prediction horizons, the overall spatial pattern of errors remains relatively stable whether predicting 30, 60, or 90 minutes ahead, with only modest increases in error magnitude as the prediction timeframe extends. This stability suggests that the underlying factors influencing prediction accuracy are primarily structural rather than temporal in nature.
temp <- delay.Test_5 %>%
group_by(from)%>%
summarise(mean_ae = mean(abosulte_error),
mean_pe = mean(per_error))%>%
left_join(stop,by=c('from'='STATION_ID'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)%>%
mutate(mod = '30m-Predict')
temp2 <- delay.Test_6 %>%
group_by(from)%>%
summarise(mean_ae = mean(abosulte_error),
mean_pe = mean(per_error))%>%
left_join(stop,by=c('from'='STATION_ID'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)%>%
mutate(mod = '60m-Predict')
temp3 <- delay.Test_7 %>%
group_by(from)%>%
summarise(mean_ae = mean(abosulte_error),
mean_pe = mean(per_error))%>%
left_join(stop,by=c('from'='STATION_ID'))%>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)%>%
mutate(mod = '90m-Predict')
temp4 <- rbind(temp,temp2)
temp4 <- rbind(temp4,temp3)
ggplot() +
geom_sf(data = NJTracts, color = 'grey') +
geom_sf(data = temp4, aes(color = mean_ae,size = line_intersct)) +
scale_colour_viridis(direction = -1,discrete = FALSE, option = "D") +
labs(title = "MAE Spatial Comparison in 3 Models") +
facet_wrap(~mod)+
mapTheme()+
theme_minimal()
High-error clusters appear in the northeastern region, particularly around major transit hubs
Rural and single-line stations maintain more consistent prediction accuracy
The spatial distribution of errors suggests that network complexity is a stronger determinant of prediction accuracy than geographic location
The evaluation highlights a clear trade-off between prediction accuracy and the temporal range of the model. The 30-minute model consistently outperforms the 60-minute and 90-minute models, achieving the lowest MAE (~1.5) and the smallest standard deviation of errors. This finding reinforces that delay predictions are most reliable in the short term, where recent temporal and spatial lagged variables retain stronger predictive power. However, as the prediction window extends to 60 and 90 minutes, the inherent uncertainty of transit dynamics increases, leading to higher errors. This limitation should inform operational decisions where real-time interventions are prioritized over longer-term projections.
temp2 <- delay.Test_5 %>%
summarise(
MAE = mean(abosulte_error, na.rm = TRUE),
sd_AE = sd(abosulte_error, na.rm = TRUE)
) %>%
mutate(mod = '30m')
temp <- delay.Test_6 %>%
summarise(
MAE = mean(abosulte_error, na.rm = TRUE),
sd_AE = sd(abosulte_error, na.rm = TRUE)
) %>%
mutate(mod = '60m')
temp3 <- delay.Test_7 %>%
summarise(
MAE = mean(abosulte_error, na.rm = TRUE),
sd_AE = sd(abosulte_error, na.rm = TRUE)
) %>%
mutate(mod = '90m')
temp4 <- bind_rows(temp2, temp, temp3)
grid.arrange(
ggplot(temp4, aes(x = mod, y = MAE, colour = mod)) +
geom_point(size = 3) +
labs(title = "MAE Temporal Comparison", x = "Model", y = "MAE") +
theme_minimal(),
ggplot(temp4, aes(x = mod, y = sd_AE, colour = mod)) +
geom_point(size = 3) +
labs(title = "SD of MAE Temporal Comparison", x = "Model", y = "SD_MAE") +
theme_minimal()
)
The spatial MAE analysis reveals that some stations and routes consistently experience higher errors, particularly in the 60-minute and 90-minute models. This indicates that specific lines or regions with more complex conditions—such as frequent intersections or higher passenger volumes—require tailored strategies for improvement. Addressing localized challenges with additional contextual variables or advanced modeling techniques could enhance predictive accuracy in these areas.
temp <- delay.Test_6 %>%
group_by(from) %>%
summarise(mean_ae = mean(absolute_error, na.rm = TRUE),
mean_pe = mean(per_error, na.rm = TRUE)) %>%
left_join(stop, by = c('from' = 'STATION_ID')) %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326)
ggplot() +
geom_sf(data = NJTracts, color = 'grey') +
geom_sf(data = temp, aes(color = mean_ae)) +
scale_colour_viridis(direction = -1, discrete = FALSE, option = "D") +
labs(title = "MAE Comparison", subtitle = '60mins Model') +
mapTheme() +
theme_minimal()
prediction errors vary significantly across transit lines, reflecting the operational and environmental complexities unique to each route. For instance, busier routes or those with high variability in traffic conditions tend to yield higher errors, especially in the 90-minute model. This suggests that line-specific factors, such as schedule adherence patterns, traffic congestion, and infrastructure, should be incorporated into future models to improve their robustness.
colnames(delay.Test_5)[colnames(delay.Test_5) == "abosulte_error"] <- "absolute_error"
colnames(delay.Test_7)[colnames(delay.Test_7) == "abosulte_error"] <- "absolute_error"
temp <- delay.Test_6 %>%
group_by(line) %>%
summarise(mean_ae = mean(absolute_error, na.rm = TRUE),
mean_pe = mean(per_error, na.rm = TRUE)) %>%
left_join(line, by = c('line' = 'LINE_NAME')) %>%
mutate(mod = '60min')
temp2 <- delay.Test_5 %>%
group_by(line) %>%
summarise(mean_ae = mean(absolute_error, na.rm = TRUE),
mean_pe = mean(per_error, na.rm = TRUE)) %>%
left_join(line, by = c('line' = 'LINE_NAME')) %>%
mutate(mod = '30min')
temp3 <- delay.Test_7 %>%
group_by(line) %>%
summarise(mean_ae = mean(absolute_error, na.rm = TRUE),
mean_pe = mean(per_error, na.rm = TRUE)) %>%
left_join(line, by = c('line' = 'LINE_NAME')) %>%
mutate(mod = '90min')
temp4 <- rbind(temp, temp2)
temp4 <- rbind(temp4, temp3)
ggplot() +
geom_sf(data = NJTracts, color = 'grey') +
geom_sf(data = temp4, aes(color = mean_ae, geometry = geometry)) +
facet_wrap(~mod) +
scale_colour_viridis(direction = -1, discrete = FALSE, option = "A") +
labs(title = "Model MAE Comparison in Line", subtitle = '30&60&90mins model') +
mapTheme() +
theme_minimal()
The scatter plots comparing observed and predicted delays reveal that the 30-minute model aligns closely with the diagonal line, signifying robust predictions across most scenarios. However, as the prediction horizon extends, the accuracy diminishes, with increasing instances of underprediction and overprediction, particularly for significant delays. This underscores the need for advanced error-handling mechanisms in longer-range models, such as adjusting for extreme delays or incorporating external shocks like weather or accidents.
temp <- delay.Test_5 %>%
dplyr::select(delay_minutes, mod_30) %>%
mutate(mod = '30min') %>%
rename(pre = mod_30)
temp2 <- delay.Test_6 %>%
dplyr::select(delay_minutes, mod_60) %>%
mutate(mod = '60min') %>%
rename(pre = mod_60)
temp3 <- delay.Test_7 %>%
dplyr::select(delay_minutes, mod_90) %>%
mutate(mod = '90min') %>%
rename(pre = mod_90)
temp4 <- rbind(temp, temp2)
temp4 <- rbind(temp4, temp3)
ggplot() +
geom_point(data = temp4, aes(x = delay_minutes, y = pre), color = "#2a9d8f") +
geom_smooth(data = temp4, aes(x = delay_minutes, y = pre), method = "lm", se = FALSE, color = '#f4a261') +
geom_abline(slope = 1, intercept = 0) +
facet_wrap(~mod) +
labs(title = "Observed vs Predicted",
subtitle = 'Model and prediction comparison',
x = "Observed delay minutes",
y = "Predicted delay minutes") +
plotTheme() +
theme_minimal()
## Warning: Removed 87 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 87 rows containing missing values or values outside the scale range
## (`geom_point()`).
Cross-validation was used to evaluate the generalizability of the three models, designed for predicting delay times at 30-minute, 60-minute, and 90-minute intervals. Key evaluation metrics included Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R² values, with cross-fold means represented as dotted lines for baseline comparisons.
The 30-minute model exhibited the lowest MAE and RMSE, indicating higher accuracy in short-term predictions compared to the other models.
The 60-minute model achieved a balance between accuracy and generalizability, making it suitable for practical use cases despite slightly higher errors compared to the 30-minute model.
The 90-minute model showed the highest MAE and RMSE, reflecting a decline in prediction performance as the time window increased.
fitControl <- trainControl(method = "cv", number = 5)
set.seed(825)
reg.cv.30 <-
train(delay_minutes ~ from + to + hour + Temperature + Precipitation + Wind_Speed + lag45min + lag1h + lag1h15min + lag1h30min + lag1h45min + lag2h + lag2h15min + lag2h30min + lag2h45min + lag3h + lags3station + lags4station + lags5station + lags6station + stop_sequence + line + to_inter + from_inter,
data = merged_dataset_model, method = "lm", trControl = fitControl, na.action = na.pass)
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
reg.cv.60 <-
train(delay_minutes ~ from + to + hour + Temperature + Precipitation + Wind_Speed + lag1h + lag1h15min + lag1h30min + lag1h45min + lag2h + lag2h15min + lag2h30min + lag2h45min + lag3h + lags5station + lags6station + stop_sequence + line + to_inter + from_inter,
data = merged_dataset_model, method = "lm", trControl = fitControl, na.action = na.pass)
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
reg.cv.90 <-
train(delay_minutes ~ from + to + hour + Temperature + Precipitation + Wind_Speed + lag1h30min + lag1h45min + lag2h + lag2h15min + lag2h30min + lag2h45min + lag3h + lags6station + stop_sequence + line + to_inter + from_inter,
data = merged_dataset_model, method = "lm", trControl = fitControl, na.action = na.pass)
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
The histograms of goodness-of-fit metrics for each model demonstrate the variability in performance across folds, with cross-fold means providing a clear benchmark for comparisons.
grid.arrange(
dplyr::select(reg.cv.30$resample, -Resample) %>%
gather(metric, value) %>%
left_join(gather(reg.cv.30$results[2:4], metric, mean)) %>%
ggplot(aes(value)) +
geom_histogram(bins=35, fill = "#2a9d8f") +
facet_wrap(~metric) +
geom_vline(aes(xintercept = mean), colour = "#e76f51", linetype = 3, size = 1.5) +
scale_x_continuous(limits = c(0, 5)) +
labs(x="Goodness of Fit", y="Count", title="CV Goodness of Fit Metrics-30mins Model",
subtitle = "Across-fold mean reprented as dotted lines")+
theme_minimal(),
dplyr::select(reg.cv.60$resample, -Resample) %>%
gather(metric, value) %>%
left_join(gather(reg.cv.60$results[2:4], metric, mean)) %>%
ggplot(aes(value)) +
geom_histogram(bins=35, fill = "#2a9d8f") +
facet_wrap(~metric) +
geom_vline(aes(xintercept = mean), colour = "#e76f51", linetype = 3, size = 1.5) +
scale_x_continuous(limits = c(0, 5)) +
labs(x="Goodness of Fit", y="Count", title="CV Goodness of Fit Metrics-60mins Model",
subtitle = "Across-fold mean reprented as dotted lines")+
theme_minimal(),
dplyr::select(reg.cv.90$resample, -Resample) %>%
gather(metric, value) %>%
left_join(gather(reg.cv.90$results[2:4], metric, mean)) %>%
ggplot(aes(value)) +
geom_histogram(bins=35, fill = "#2a9d8f") +
facet_wrap(~metric) +
geom_vline(aes(xintercept = mean), colour = "#e76f51", linetype = 3, size = 1.5) +
scale_x_continuous(limits = c(0, 5)) +
labs(x="Goodness of Fit", y="Count", title="CV Goodness of Fit Metrics-90mins Model",
subtitle = "Across-fold mean reprented as dotted lines")+
theme_minimal(),nrow=3)
The summary table highlights that the 30-minute model outperformed the others across all evaluation metrics, although it exhibited slightly higher standard deviations, suggesting greater sensitivity to specific conditions in the training data.
combined_summary <- bind_rows(
reg.cv.30$resample %>%
summarise(Model = "30 min Model",
MAE = mean(.[,3]),
sd = sd(.[,3])),
reg.cv.60$resample %>%
summarise(Model = "60 min Model",
MAE = mean(.[,3]),
sd = sd(.[,3])),
reg.cv.90$resample %>%
summarise(Model = "90 min Model",
MAE = mean(.[,3]),
sd = sd(.[,3]))
)
combined_summary %>%
as.data.frame() %>%
mutate(Model = factor(Model, levels = c("30 min Model", "60 min Model", "90 min Model"))) %>%
kbl(col.names = c('Model', 'Mean Absolute Error', 'Standard Deviation of MAE')) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Model | Mean Absolute Error | Standard Deviation of MAE |
---|---|---|
30 min Model | 1.470051 | 0.0238746 |
60 min Model | 1.799435 | 0.0232207 |
90 min Model | 1.925727 | 0.0187252 |
For real-time prediction requirements, the 30-minute model is best suited due to its higher accuracy and lower error margins. The 60-minute model, while less precise, offers broader applicability, making it valuable for operational planning and strategy development. The 90-minute model, despite its higher errors, can still be useful for trend forecasting over extended time windows.
Incorporating non-linear features or advanced models (e.g., Random Forest, XGBoost) to improve long-term prediction performance.
Integrating more real-time data (e.g., traffic flow or real-time weather conditions) to enhance model generalizability.
Addressing anomalies in the training dataset to reduce sensitivity to outliers and extreme values.
Our analysis of NJ Transit delays demonstrates the practical value of predictive modeling in improving public transportation reliability. This project addresses a critical need: providing commuters with accurate, timely delay predictions while helping transit authorities optimize operations.
Our comprehensive analysis leveraged multiple data sources - NJ Transit operational data, weather measurements, and census information - to create a robust prediction framework. Through extensive spatial and temporal visualization, we identified critical patterns in delay propagation across the network, particularly around major transit hubs and during peak hours.
The three-tiered modeling approach (30, 60, and 90-minute predictions) revealed how delay patterns evolve over different time horizons. Key findings include: - Strong performance of the 30-minute model (MAE ≈ 1.47) - Declining but still useful accuracy in longer-range predictions - Consistent spatial patterns in prediction errors, with higher uncertainty in complex urban areas
This analysis serves multiple stakeholders: 1. Commuters: Real-time predictions improve trip planning and reduce uncertainty. 2. Transit Operators: Systematic insights guide operational resource allocation. 3. Urban Planners: Long-term patterns inform infrastructure investments and policy decisions.
Cross-validation demonstrated the models’ reliability across different conditions, with the 30-minute model showing particular strength in generalizability. The spatial visualization of prediction errors helps identify where the model performs best and where improvements are needed.
Future improvements could include: 1. Data Integration: - Real-time passenger volume data - Infrastructure maintenance schedules - More granular weather information
This analysis provides a foundation for improving transit predictability, benefiting both operators and passengers. The methods demonstrated here can be adapted by other transit systems facing similar challenges, making it a valuable contribution to urban transit management.
The success of the 30-minute prediction model, in particular, suggests that focusing on short-term forecasting while considering both spatial and temporal factors provides the most practical value for day-to-day operations. This approach balances accuracy with usefulness, providing actionable insights for both passengers and transit authorities.
Building on our analytical framework, we developed JourneyGenie (https://youtu.be/Zfa5S6JR9GI), a smart travel assistant that transforms our delay predictions into actionable insights for NJ Transit commuters. The application leverages our predictive models’ strengths - particularly the highly accurate 30-minute predictions - to provide real-time travel recommendations.
JourneyGenie integrates multiple data streams and our prediction models to offer: - Real-time delay predictions with up to 30-minute accuracy - Personalized route recommendations based on historical performance - Live transit tracking with immediate delay notifications - Alternative route suggestions during major disruptions
The app demonstrates how our analytical work can be translated into practical solutions that directly benefit commuters. By combining our statistical models with user-friendly interface design, JourneyGenie helps bridge the gap between complex transit data analysis and everyday commuter needs, making public transportation more predictable and manageable for all users.