Location data: can they be really anonymous?

Author: Marta Sestelo, Technical Manager of Data Analytics & AI at Gradiant

During the following minute thousands of applications will request millions of people to reveal their location. For instance, users can check through their mobile phone bus schedules, movie times or local information, and all of them are dependent on their current location. However, these data are extremely sensitive, so their processing needs to be done taking into account the adequate measures, in order to avoid that a possible exploit of these data could compromise the privacy of the users.

Privacy: different types

The term “privacy” covers a wide range of concepts and definitions: bodily privacy, which means that your body is your own and it is related to protection from physically invasive procedures, such as genetic testing; territorial privacy, which concerns the setting of limits on intrusions into physical space, such as companies or homes; communication privacy, focused on the security of communications, such as email or messages through WhatsApp; and information privacy, which deals with the establishment of rules governing the collection, processing and handling of personal data.

According to these concepts, location privacy can be defined as a special type of information privacy which covers the rights of individuals to determine for themselves when, how, and until what extent their location information can be known and processed. In short, the ability of an individual to control access to his/her current and past location information is the central issue in location privacy.

Dozens of companies that collect information about our location state that the collected data is anonymous and that it doesn’t pose any privacy risks, as they don’t associate the data to any directly identifiable information like names, ID cards or email addresses. However, it is not so difficult to connect the identity of real people with a set of dots (i.e. locations) that appear on a map. Consider for example your daily routine: which is the probability that any other person moves between your house and your office? Recent studies show that four randomly chosen points are enough to uniquely characterize the movements of the 95% of the users of a dataset, and with the selection of just two randomly chosen points it would still be possible to characterize more than a 50% of them. Therefore, mobility traces can be considered in general as unique and thus, it cannot be stated that a dataset that only contains location data will be anonymous per se.

Data anonymization

In general, data anonymization is a procedure that allows to protect the privacy of personal data with the focus on reducing the risk of identifying the people that appears on a dataset. In order to perform an adequate anonymization of the data, there are several techniques that can be applied. In the case of geolocated datasets the procedure is also necessary, even though names or ID cards are no longer present in the dataset, but the techniques applied differ from those applied to a more general dataset. An example of these techniques could be the cloaking with its two approaches: spatial and temporal. In the former case, the precision of information about an individual’s location is adapted according to the number of other individuals within the same quadrant, while in the latter, the frequency of temporal information is reduced to a time interval instead of at one point of time.

Gradiant is currently working on the H2020 project INFINITECH and, among the solutions that the project will provide, an anonymization tool that supports location data will be developed by this technology research center.