Take the "A" Train

Esteban is a Senior majoring in Civil Engineering. He has recently caught the “transportation bug” and is excited to begin working on this research project. After Northwestern, Esteban plans to pursue a PhD in Civil Engineering with a focus in Transportation. He hopes to one day assist in designing transit systems that help cities decrease their carbon footprint while also serving the population better.   Esteban has been funded by a Summer Undergraduate Research Grant from Northwestern's Office of Undergraduate Research.


For this post I have decided to spend time talking about everyone’s favorite part of traveling, traffic!

Congestion is never fun. Planners and engineers work constantly to try to mitigate it. Interestingly, supporters of light rail advocate for rail as a means to reduce road congestion. How does that work? Well it all relies on mode shift. Assume we have a constant number of people traveling through a particular corridor, let’s say 100 people. All 100 of these people are driving separately in their own car. If we add a train and 30 of those people now ride the train together instead of driving, then we have decreased the number of cars on the road.

Of course the above scenario does not account for changes in the total number of people. Also, the addition of the rail line more likely than not takes away space from the road that used to be another lane. We could easily have a situation where congestion doesn’t get better, but in fact gets worse. However, the claim light rail advocates state is that congestion will go down.

Let’s see if that’s the case.

There are a couple of ways we can look at congestion. First, we will take a look at the Texas A&M Urban Mobility Scorecard. Every year, this research group puts out a report card on the nation’s congestion. There are national levels and then the scorecard looks at major urban areas.

One measure of congestion is the Travel Time Index (TTI). This takes the the travel time during peak flow (time when most congested) divided by the travel time at free flow speeds (no congestion). Thus a TTI of 1.2 means that a 10 min trip takes 12 min during peak hours.

Below is the TTI for Dallas and LA over time.
Screen Shot 2016-08-31 at 3.02.23 PM

Screen Shot 2016-08-31 at 3.03.11 PM

In both of these cities the TTI has gone up (as it has across the country). This matches the overarching narrative that congestion is in fact getting worse in the US.

But these measures look at car travel, let’s focus on how light rail is doing and how it compares with the car.

Screen Shot 2016-08-31 at 3.09.39 PM Screen Shot 2016-08-31 at 3.09.03 PM

These graphs show the mean travel time to work by mode. These data come from the American Community Survey 1-Year estimates. They were calculated using aggregate travel time by mode divided by the number of commuters by mode. The grey bar for transit includes all transit modes (light rail and bus). We see that the travel time to work by transit is nearly double that of car. This is pretty consistent with national numbers. What is particularly interesting is that Dallas saw a noticeable increase in travel time for transit between 2010 and 2011. This increase coincided with an increase in direct route miles. This is surprising because one would suspect that an increase in the amount of light rail service would decrease the travel time, not increase.

Why is that? Well a possible explanation is that the system is now crowded with more trains and people, thus causing congestion. Another idea is that the new extension is bringing on new riders who switched from some other mode (although this idea is harder to explain since that would mean people switched to a slower mode of travel). And possibly, since these number include bus travel as well, there could have been changes in bus service which increased travel time.

While looking at only two cities is not nearly a big enough sample, it does appear that at least for Dallas and Los Angeles, their light rail is not helping with congestion. And for Dallas, light rail may even be making it worse.

America’s Next Top Regression Model

A huge part of any research comes down to modeling. You explore a bunch of variables and you try to see which ones are important predictors to the outcome you care about. In transportation, modeling is extremely important for planners and leaders who are the decision makers for transportation infrastructure. If you want to build a new light rail in your city, you have to be able to show that you can get ridership out of it.

I have spent the last week looking at how to build a model to predict ridership levels. Why is ridership so important? Well, the majority of positive outcomes from public transit are contingent on high ridership and use. You cannot get economic development and congestion relief from a new rail service if no one rides it. And while no transit organization makes a profit from fares, high fare revenue helps keep a transit system functioning without having to rely completely on government subsidies. Ridership, while not the most detailed and fair metric, is often used as a quick and easy indicator of a successful system.

What variables drive ridership? Arguably, everything. While constructing a model, it is worth while to consider as many variables as possible to begin with, and then throw out ones that do not appear to have statistical significance.

Off the bat, we could say some obvious predictors of ridership are the route miles (DRM), revenue vehicle miles (RVM) and service area. These are characteristics of the physical system. But demographics of the host city also impact ridership. Such variables are population, unemployment, travel time to work and the travel time index (TTI). The last two are measures of the congestion in the city. The travel time index is calculated by taking the time it takes to make a trip during peak hours dividing by the time it takes to make the same trip at free-flow speeds (i.e. no traffic). For example, an index of 1.20 means that a 10 min trip takes 12 min during the rush hour traffic. While that may not seem bad at first, consider that the national average travel time to work is 25.7 min (American Community Survey 2014 Estimate). So even a TTI of 1.20 can be significant.

To begin developing a working model, I tested four variables: service population, unemployment, travel time to work and the travel time index. I predicted that service population would be the dominating variable.

After running a multivariate regression in Excel, I found that unemployment, travel time to work and TTI all had p-values greater than 0.05. In standard practice, this suggests that these variables are not significant to the model. Only service population had a coefficient statistically different from zero, with a p-value of 0.013308. Keeping all the variables in the model, I used this regression to make predictions for 2014 ridership. I compared the model’s predictions with the actual ridership in a scatter plot, with the predicted values on the y-axis and the actual on the x-axis. If the model is accurate, the resulting plot should be a straight line with a 45° angle.


Above is the plot from the first regression model. Not a terrible fit, but this could be a lot better.

After some thought, I decided to throw out all the variables except for service population. I added DRM, RVM and service area to see what happened.

The Excel regression showed that three of the variables were statistically significant to the model. DRM was the only one with a p-value greater than 0.05. However, keeping the DRM coefficient, I again used the model to graph the predicted ridership against the actual ridership. The results are shown in the plot below.


Here we see a much tighter fit of the data. The R² value is high at 0.93 and the slope is 0.945 (almost 1 which would give a 45° slope!). The question becomes if it is a good idea to still keep DRM in the model or reject it because of its p-value. I decided to keep it. When building a regression model, one cannot lose sight of what variables mean and their nature. RVM, as I discussed in my last post, is arguably not a good independent variable because it can easily be adjusted as a response to ridership. In other words, the direction of causality could be reverse. Since DRM is a measure of the physical tracks built, it is safer to assume that ridership responds to changes in DRM. Because of this, I have kept DRM in the model.

Going forward, I will want to work on developing this model and seeing how it works against individual cities over a time span and see what can be learned from what the model predicts versus what actually happened with light rail ridership.

Static versus Dynamic Analysis

If I have learned one thing this summer, it’s that having a clean-cut way to evaluate anything is impossible. There are so many factors to consider. It’s also very easy to become separated from the context of your research subject. Public transit is not an isolated system. It is very much a living system that is constantly responding to the changes in its environment.

Because of this, I have had to rely on both static and dynamic metrics for assessing these light rail systems. By static, I mean looking at a single year, and creating a snapshot of all of the systems and compare them on one or more measures. The dynamic analysis comes into play when I look at trends for a single city. Both approaches are valuable in determining strong systems from weaker ones.

To begin, I’d like to first make observations, using 2014 as our reference year.


This graph shows annual passenger trips versus revenue vehicle miles (RVM). RVM is essentially the amount of service provided by the transit organization. One would expect that the more service you provide, the more people will ride. And we see that with the linear model. In a very over-simplified approach, we can say that cities above the line are doing better than cities below the line. Places such as Portland and Los Angeles are getting much more riders for the amount of service they’re providing in contrast with Dallas and Denver who are getting less riders.

Let’s look at another measure.


In this graph, we have annual passenger trips versus directional route miles (DRM) per service area. This is the amount of track over the boundaries of the transit provider. In other words, this is the coverage.

Again, one would expect that more coverage would lead to more ridership, and while in general, we can see this trend, it is not nearly as strong as with RVM. In a lot of ways, this makes sense. You can have the greatest coverage of any transit system, but to do so you may have a lot of routes that go places where few people live. Sometimes it is more effective to have a more localized coverage. We see this in Houston, LA, Seattle, Minneapolis, Denver and Salt Lake City.

Now let’s take a closer look at Los Angeles to try to have a better understanding.


Los Angeles is always an interesting city to look at. This chart shows the growth of ridership, DRM and RVM. I indexed each value using the equation y(i) = (x(i)/x(first))*100. This sets the first value at 100 and every proceeding value is expressed as a percent increase from the first value. So an index of 200 means that the value is double that of year one. This lets us look at all three metrics on the same scale and from the same starting place. It allows for a quick glance comparison of growth trends.

LA has increasing ridership and directly growing RVM. DRM grew in phases since that was how the light rail in LA was built. One could argue that causality of ridership and RVM goes both ways. Increasing RVM gives more riders, and more riders causes the transit operators to put out more service via RVM. Either way, there is definitely growth happening in LA, which is really good news.

Now I want to look at a city that appeared to not do so well, Denver. In the first graph, Denver is one of the cities giving out a lot of service with RVM, but not getting a lot of riders.


We can see that from 2006 to 2012, there was no increase in DRM. Basically, Denver was not looking to expand its tracks during this time. However, there was a jump in both ridership and RVM from 2006 to 2007. Ridership looks stable from 2007 to 2012, while RVM is jumping around. To me, I think one possible explanation for this is that Denver was trying to increase ridership by putting out more service, but perhaps increased RVM too much and was then adjusting it to meet the actual demand for the light rail. Of course, the issue of causality is up for debate. One could argue that ridership went up and Denver increased service to respond, but again overshot and had to readjust. The later is probably more realistic.

If we look at 2014, which is the year the first graph uses data from, we can see that the RVM is 16x what it was in 1999, but the ridership is only around 6x from where it was in 1997. It makes sense then that Denver fell below the line in the first plot. The amount of service provided has been increasing much faster than the demand for it has.

It is unclear if that is a bad thing or not. We are not sure what pushed the service increase. But this highlights why you cannot rely solely on a single year snapshot to make an assessment. Understanding light rail performance requires you to look at the story from different angles.

The Winners, the Losers, and the Meh

I’m getting down to the last few weeks of my research project. So far, I’ve been looking at countless measures of performance and service efficiency. However, looking at all these individual measures can be meaningless unless you have a way to compare across.

For example, Phoenix has the highest route miles per service area (which is to say Phoenix has the best coverage of any system). While Houston has the greatest trips per vehicle revenue mile, an indication that Houston is a very packed system. If I want to create broad categories of “winners”, “losers” and the in-between, then I need a way to compare all the possible metrics on the same scale.

Taking all the metrics I have assessed, I created a “scorecard” using scaled values. I scaled everything between 0 and 1. With 0 being the minimum value and 1 replacing the maximum. So for route miles per service area, Phoenix would get a score of 1, since it had the highest value. This scaling allows me to place multiple metrics on the same graph for a comparison.Scorecard

Looking at this chart at first gave me a headache. It doesn’t make a lot of sense. I had expected to see clumps of points. I thought that places that did well on one metric would do well on others, and vice versa. You see this clumping a little bit with Baltimore, St. Louis and San Jose. For every measure, their score was on the lower end. But for places like Houston, which got a 1 on three of the measures and a 0 on another, its values are all across the scale. I want to represent this data in a different way, to better see the clumping for each city.

I decided to go with a box and whisker plot. The advantage of using this type of graph is it shows the distribution of values. Below is the new chart.


From this graph, one can easily determine that Baltimore, Pittsburgh, and San Jose are among the lesser performing, while Denver, Houston, Los Angeles, Phoenix and Seattle are doing much better. And of course we have a lot of locations that are in the middle.

A big part of this research was to determine which systems appear to be performing better than others. Now, I need to look into construction costs. I want to see if the investments in these systems have been worth it.

Efficiency Measures

Efficiency – every engineer’s favorite word. It’s perhaps the most important criterion to differentiate good engineering designs from bad ones.

How is efficiency measured in transit systems? One metric often used is the cost per vehicle revenue mile. A vehicle revenue mile (VRM) is the distance that a vehicle travels while in revenue service. In other words, if a train is in service collecting fares, the distance it travels counts towards the VRM. You can think of the amount of VRMs as the amount of service being provided. The cost per VRM is simply the operating costs divided by the number of VRMs. More efficient systems will be able to provide more service at a lower cost.

Another metric used is the cost per unlinked passenger trip, which can be thought of as the cost per rider.

With these two metrics in mind, I evaluated 17 cities to see which systems were more efficient than the others. I assessed these systems using 2014 data, which is the most recent data published by the NTD. You can see the results in the chart below. Cost_Effectiveness


The vertical and horizontal lines represent the average values based on this sample. The average cost per VRM is $16.62 and the average cost per rider is $4.32. Cities in the bottom-left corner (highlighted in green) are below average on both metrics, and thus are considered efficient systems by comparison. Systems in the upper-right corner (indicated in red) are the exact opposite, they have costs that are above average and are less efficient. Of course, these metrics don’t tell everything, but they serve as a good starting point to help categorize successful systems from failures.

Moving beyond efficiency, another consideration is the productivity or effectiveness of a transit system. One way to look at this is by evaluating the number of unlinked passenger trips per VRM. This, in a way, shows how many people a system is moving. For this evaluation, I ranked the cities from the highest trips per VRM to the lowest. I also indicated the year that service began in each city. The hypothesis is that older light rail systems will do better than older systems. Why? – Economics.

If we think of transit service like any other good/service, then there are certain places where there is a better market for it than others. You would build your first store in the best location to maximize profit. If you open a second store you would place it in the second best location, and so on. The theory is that the best cities in the United States for light rail transit built it already. The more recent projects from 2000 onward are in less desirable markets. If this is true, we should see the older systems at the top of the rankings and newer projects towards the bottom. The results are shown in the graph below. TripsVRM

  • The color scheme is as follows:
  • Systems opened on or before 1995: Green
  • Systems opened between 1996-2000: Blue
    Systems opened between 2001-2005: Purple
    Systems opened between 2006-2010: Orange

Immediately it seems that the old vs. new theory does not hold up. The best system is Houston which began light rail service in 2004. There appears to be a good mix of newer and older systems across the rankings. While this does not necessarily discount the hypothesis completely, it does suggest that the age of a system may not play as much of a role in its productivity and efficiency. Going forward I want to explore this some more and try to find out, if not age, what characteristics DO impact performance.


A Tale of Two Cities

An important performance measurement for almost anything is the amount of service provided. In transit, this is reflected in ridership. More people riding public transit is generally good news for transit agencies. Ridership is measured in unlinked passenger trips. Effectively we’re talking about the number of boardings. If one person goes on a trip from A to B by first getting on Bus 1 then transferring to Bus 2, that would count as two unlinked passenger trips.

Ridership is crucial for many reasons. First, higher ridership can help transit agencies be more cost effective. 20 people riding the 8am bus is better than 5 people because the cost per rider is much lower. More importantly, many of the secondary benefits from public transit (such as traffic congestion and economic development) are dependent on the ridership. If Chicago builds a new subway line but no one rides it, there will still be the same amount of traffic along that corridor.

This past week, I explored ridership trends for all 24 cities. I was able to find the data from the National Transit Database (NTD) and the American Public Transportation Association (APTA). The oldest data I could find were from 1995.

There were a handful of cities that grabbed my attention. I’ve decided to highlight two in this post; Salt Lake City and Minneapolis.

Salt Lake City


Here we have the ridership trends for Salt Lake City. For an easier analysis, I have only included motor bus and light rail as the modes of interest. Salt Lake City introduced their LR in 1999. Ridership grew at a reasonable pace. What’s interesting is that the ridership for bus has stayed relatively constant. This is a good sign. One fear of introducing a new LR system is that it will take riders away from the bus system and will not add any new riders to the overall transit system. We can see here that overall ridership grew, and it is reasonable to say that LR contributed to that growth.



In Minneapolis, we see a completely different story. When LR came on in 2004, there was some increase at first, but the line immediately reached its peak. Notice how bus ridership starts to decline with the introduction of LR. This is a case where LR does not appear to have added anything to the transit system. Now, the last two years saw great growth in LR. In the coming years, maybe this trend will continue?


One cannot simply look at ridership to get a complete story. Dozens of other factors are constantly at play. Population growth, changes in the amount of service provided and fare prices all impact ridership. A simple growth in ridership could mean nothing. But this serves as a good starting place. As I move forward this summer I will be looking at those other factors. What is really driving these ridership trends?

An Evaluation of Rail Transit in the US

Hello there! Thank you for taking the time to check out my blog. This summer, I am researching the fascinating world of transportation. I know what you’re saying, “transportation?” but believe me, it’s much more than traffic lights. Chances are, if you have ever been in a large urban setting you have used some sort of public transportation; either a bus, a subway or maybe even a ferryboat. Transportation systems are a huge part of our lives.

Reality is, until teleportation is invented, we still need ways to move people and goods from point A to point B. And as an engineer, I’m always intrigued by how large systems work and more importantly, are they effective? When I think about public transportation, I am curious if the systems in place are benefiting the people using them.

For the next 8 weeks, I will explore the effectiveness of one specific type of public transportation, light rail transit (LRT). The Transportation Research Board officially defines LRT as, “A metropolitan electric railway system characterized by its ability to operate single cars or short trains along exclusive rights-of-way at ground level…in streets and to broad and discharge passengers at track or car floor level.” Essentially, think of a traditional subway, but with shorter and less bulky trains, usually powered by overhead cables or electric tracks. Since the 1980s, cities throughout the US have been investing in LRT systems to meet growing transportation demand.

However, there have been concerns regarding if LRT is the right alternative. Supporters of LRT say it will reduce traffic congestion.

Opponents argue that LRT does not bring more transit riders, but rather takes away riders from existing bus systems.

My goal this summer is to assess the claims made by both sides of the argument. I have laid out a list of 23 cities, which have built LRT since the 1980s. I want to discover which cities had success with their light rail (meaning they got more overall riders) and which cities failed. The success or failure of an LRT system may be dependent on the characteristics of the host city (such as population density and median income).

As I progress through this project, I will be updating this blog with preliminary results, interesting findings, and maybe even a bad joke or two.

Thanks for reading and please stay tuned for more!