British Trains Are The Envy Of Europe… When It Comes To Data, Anyway

By James O Malley on at

When it comes to trains, the conventional wisdom is that Britain doesn’t have much to be proud of. We may have invented the steam engine, but today we can look at France with its extensive high speed TGV network, or Switzerland with its punctuality… and suddenly our trains instinctively feel a bit… shit, right?

This attitude though, as it turns out, is bad. We’ve got something nailed that our European friends are still struggling with: Data.

That’s at least according to Mark Holt, the Chief Technical Officer at train ticketing site Trainline (they dropped the “The” a couple of years ago). As one of Britain’s train ticket retailers, it is currently responsible for 125,000 train journeys every single day - and reportedly sells more than 100 tickets every three minutes. A couple of week’s ago Mark was speaking at London’s Tech Leaders Summit - and was kind of enough to sit down with me after to talk about some of the clever ways that his company uses data, machine learning and so-called “artificial intelligence”.

“A lot of politicians in other countries look at [Britain’s open rail data] with envy and would like to be able to do something similar.”, he explains.

Britain, it turns out, has rather a lot of open rail data: The movements of our trains, timetables and other data are more or less freely accessible to anyone who wants to take the data and do interesting things with it. Across the Channel, access is much more restricted - which is a pain for companies like Trainline.

“There is a thing called the MERITS timetable, which is the European timetable for European rail. That’s heavily restricted in terms of access at the moment”, Mark explains, “It doesn’t make any sense... its public data, collected by public organisations and yet it’s not public.”

So are the train companies that operate it acting like a cartel, afraid to give Trainline and others access? “I never used the word ‘cartel’”, Mark laughs.

More seriously, Mark thinks that the reason the rest of Europe isn’t doing much with the data is simply because it’s a low priority.

“If you’re operating trains your number one concern is safety. The second one is probably punctuality, then its resource usage, then its union relations, then its sixteen other things before it comes to customer experience and selling tickets”, he says. “The most important thing in customer experience is ‘is it safe?’, so they’re doing the right thing - their heads are in the right place - for what they’re providing. But they're just not thinking about how could other people make use of this data? The more open API view of the world, ‘just push data out and let people play with it’, I think is just not a place they've reached yet.”

“You know who is the best at this in terms of open data? TfL”, says Mark - referring to Transport for London, which operates London’s public transport, “TfL rock!”.

To the uninitiated this might seem like an odd choice of organisation to sing the praises of - but for data nerds, for several years now TfL has offered up mountains of its data, on things like train timetables, and real time bus data, for developers to access through its APIs.

“In terms of putting data out there and just going ‘play with it, see what you can do’. CityMapper a great example. [...] It’s been huge for TfL from a business perspective. So many people consuming those APIs and doing interesting things with them, I think it’s fantastic.”

Using Big Data To Beat Restrictions

Though this said, Britain is far from perfect. One restriction on the data that Trainline has to suffer through is that when a customer searches for train tickets, their system is only allowed to ask the National Rail ticketing system - RSP - about four ticket prices at a time, limiting the ticket options that the customer is able to pick from.

In order to get around this restriction, Trainline uses a “big data” style analysis to aid the process. Because it has years of historic data on both ticket prices and transactions, it has recently launched a Price Prediction feature. This is where you put in where you want to go, and Trainline’s systems will predict how the price of a given ticket will rise as it gets closer to travel time, based on historic data. By using this data, they’re able to factor in, for example, tickets at Christmas time jumping in price more quickly than on a random Tuesday afternoon in September. Even if a user chooses not to buy tickets after doing a search, that search will return ticketing data that Trainline can then build into its pricing models.

One other recent addition to the company’s armoury is the “BusyBot”. Rather than rely on an official data source, this simply uses the Trainline app to poll customers when they sit down on the train on how busy the train is. Are there spare seats around them? Amazingly, thousands of people do actually respond to this prompt. This crowdsourced data is then fed back to base - enabling Trainline to make recommendations for travel based on how busy future trains are predicted to be.

This is all pretty smart. So wouldn’t it actually be bad for Trainline if National Rail opened up their data further? What if all of their ticketing data was completely transparent, so that anyone - whether Trainline, its competitors, or you and me could access the full pricing dataset? Wouldn’t that actually be bad for Trainline, as it would undermine their competitive advantage?

“No, that'd be fine, because it is about how you create customer experience on top of that. I think it’d be wonderful. I'd love to see some real innovation and experience in European rail, it'd be fantastic. Then that would drive us to do more, and then other people would do more and that would be good for everybody.”

So c’mon European rail operators. It’s time to unlock your data. If you want to the best - you need to. surprisingly. be more like Britain.