Individuals can be uniquely identified with just four points of location data, a study of mobile phone records shows.
Countless mobile applications make use of location data, and such information is increasingly used to tailor both services for users and advertisements.
But a study in Scientific Reports warns that human mobility patterns are unique identifiers, even when data are scarce.
It presents a formula to describe the trade-off between genuine anonymity and the “resolution” of location data.
The growing ubiquity of mobile phones and smartphone applications has ushered in an era in which tremendous amounts of user data have become available to the companies that operate and distribute them – sometimes released publicly as “anonymised” or aggregated data sets.
These data are of extraordinary value to advertisers and service providers, but also for example to those who plan shopping centres, allocate emergency services, and a new generation of social scientists.
Yet the spread and development of “location services” has outpaced the development of a clear understanding of how location data impact users’ privacy and anonymity.
For example, sat-nav manufacturers have long been using location data from both mobile phones and sat-navs themselves to improve traffic reporting, by calculating how fast users are moving on a given stretch of road.
The data used in such calculations are “anonymised” – no actual mobile numbers or personal details are associated with the data.
But there are some glaring examples of how nominally anonymous data can be linked back to individuals, the most striking of which occurred with a tranche of data deliberately released by AOL in 2006, outlining 20 million anonymised web searches.
The New York Times did a little sleuthing in the data and was able to determine the identity of “searcher 4417749”.
Recent work has increasingly shown that humans’ patterns of movement, however random and unpredictable they seem to be, are actually very limited in scope and can in fact act as a kind of fingerprint for who is doing the moving.
The new work details just how “low-resolution” these location data can be and still act as a unique identifier of individuals.
Researchers at the Massachusetts Institute of Technology (MIT) and the Catholic University of Louvain studied 15 months’ worth of mobile phone records for 1.5 million individuals.
The location of a given mobile phone can be determined from antennas within each “cell” of the network. The team considered these locations as assessed hourly during that time.
They found from the “mobility traces” – the evident paths of each mobile phone – that only four locations and times were enough to identify a particular user.
“In the 1930s, it was shown that you need 12 points to uniquely identify and characterise a fingerprint,” said the study’s lead author Yves-Alexandre de Montjoye of MIT.
“What we did here is the exact same thing but with mobility traces. The way we move and the behaviour is so unique that four points are enough to identify 95% of people,” he told BBC News.
“We think this data is more available than people think. When you think about, for instance wi-fi or any application you start on your phone, we call up the same kind of mobility data.
“When you share information, you look around you and feel like there are lots of people around – in the shopping centre or a tourist place – so you feel this isn’t sensitive information.”
The team went on to quantify how “high-resolution” the data need to be – the precision to which a location is known – in order to more fully guarantee privacy.
Co-author Cesar Hidalgo said that the data follow a natural mathematical pattern that could be used as an analytical guide as more location services and high-resolution data become available.
“The idea here is that there is a natural trade-off between the resolution at which you are capturing this information and anonymity, and that this trade-off is just by virtue of resolution and the uniqueness of the pattern,” he told BBC News.
“This is really fundamental in the sense that now we’re operating at high resolution, the trade-off is how useful the data are and if the data can be anonymised at all. A traffic forecasting service wouldn’t work if you had the data within a day; you need that within an hour, within minutes.”
Dr Hidalgo notes that additional information would still be needed to connect a mobility trace to an individual, but that users freely give away some of that information through geo-located tweets, location “check-ins” with applications such as Foursquare and so on.
But the authors say their purpose is to provide a mathematical link – a formula applicable to all mobility data – that quantifies the anonymity/utility trade-off, and hope that the work sparks debate about the relative merits of this “Big Data” and individual privacy.
“We really don’t think that we should stop collecting or using this data – there’s way too much to gain for all of us – companies, scientists, and users,” said Mr de Montjoye.
“We’ve really tried hard to not frame this as a ‘Big Brother’ situation, as ‘we know everything about you’. But we show that even if there’s no name or email address it can still be personal data, so we need it to be treated accordingly.”
Delivered by The Daily Sheeple
We encourage you to share and republish our reports, analyses, breaking news and videos (Click for details).
Contributed by Jason Palmer of bbc.co.uk.