“What would you think if somebody showed up at your door saying: ‘ Hey, I have your complete browsing history – every day, every hour, every minute, every click you did on the web for the last month’? How would you think we got it: some shady hacker? No. It was much easier: you can just buy it.”
This was the disturbing introduction from journalist Svea Eckert when presenting her new research about privacy on the Internet. In recent months, Eckert has teamed up with data scientist Andreas Dewes to discover how easy it is to uncover the anonymity the internet theoretically guarantees us. The results are remarkable.
These two German researchers didn’t just discover how easy it is to get anonymous browsing data for over three million German citizens. They also found out how easy it is to “de-anonymize” this anonymity. For example, they uncovered the pornographic preferences of a judge and the medication that a German parliament member uses.
The investigative pair revealed their findings at a Def Con conference in Las Vegas, an event focused on hacking. They got a database of three billion URLs that three million German users had visited over the course of the previous month. Some of these three million only visited a few dozen pages, but others visited hundreds of thousands of pages. The journalist and scientist had the entire online life of this second group in their hands. And from there, they could glean numerous things about their lifestyle, outlook…and wrongdoing.
How did they get this info? They built a fake marketing company with their own official webpage, a LinkedIn page for their supposed CEO and even a job board. On the official page, they put generic photos and “typical marketing buzzwords” that assured others they had created a machine-learning algorithm that could make the marketing of products easier, provided they could collect a large amount of data.
“We wrote and called nearly a hundred companies, and asked if we could have the raw data, the clickstream from people’s lives.” They were delayed in getting data not because companies refused to admit they had such data, but because they only specialized in user data for the U.S. and the U.K. The journalist and scientist wanted user data for Germany.
They got the data they needed for free from a data broker.
A data broker analyzes the lives of users online. They accumulate user data, analyze it and sell the conclusions to companies.
This data broker helped them because he fell into their trap: he believed that this algorithm existed, and he wanted to try it out.
The data broker offered them anonymous data for German users. But what the journalist and scientist managed to do was “de-anonymize” some of this data until they found specific users behind it.
At the conference, the investigative pair described a few methods for finding users in anonymous data packs.
For example, they created lists of URLs that they knew were related to each other, which belonged to the same user, by using time stamps on various pages. In other words: they found somebody’s browsing history.
That’s all fine and dandy, but how did they know the name of the person who had visited these sites? By searching URLs that revealed their social media usernames. For example, if a user has visited the analytics section of his/her Twitter account, in this browsing history, a URL will show up that includes the Twitter username. Bingo! In this way, researchers knew who this list of related URLs belonged to. Game over for this user’s anonymity.
In other cases, with just 10 URLs, the pair of researchers managed to identify somebody. Here it involved probabilities and comparison work. Think of yourself as a user. How many people in your country work at your same company, have an account at your same bank, have the same hobbies, read your same favorite site, have your same company telephone number…? In this way, using URLs that revealed this info, the scientists found the equivalent of “digital fingerprints.”
If in detective series like CSI, unidentified fingerprints are compared with public record fingerprints to find a suspect, we can say the same thing happened with these researchers’ project. They compared these URLs with public data visible to everybody, such as social media accounts (where you can know where somebody works, their hobbies) , public YouTube playlists…and thus they found specific users.
And where did the info offered by the data broker come from? From various plugins that we install on our browsers. The most ironic culprit is a tool called Web of Trust that theoretically guarantees anonymous browsing. This plugin changed its privacy policy to be able to sell user data…but since nobody reads the fine print, its users didn’t even notice.
In conclusion: “Anonymous browsing is nearly impossible,” stated the researchers. They’re not saying that just anybody can find out your browsing data. It’s not that easy to create an imaginary company, contact data brokers and then analyze three billion URLs. But it’s possible, feasible and, once again, shows that online we’re less protected than we think.
Source: The Guardian