Read Online Data and Goliath by Bruce Schneier - Free Book Online

Book: Data and Goliath by Bruce Schneier Read Free Book Online

Authors: Bruce Schneier

Ads: Link

started all this, and police arrested w0rmer aka Ochoa.
    Maintaining Internet anonymity against a ubiquitous surveillor is nearly impossible.
     If you forget even once to enable your protections, or click on the wrong link, or
     type the wrong thing, you’ve permanently attached your name to whatever anonymous
     provider you’re using. The level of operational security required to maintain privacy
     and anonymity in the face of a focused and determined investigation is beyond the
     resources of even trained government agents. Even a team of highly trained Israeli
     assassins was quickly identified in Dubai, based on surveillance camera footage around
     the city.
    The same is true for large sets of anonymous data. We might naïvely think that there
     are so many of us that it’s easy to hide in the sea of data. Or that most of our data
     is anonymous. That’s not true. Most techniques for anonymizing data don’t work, and
     the data can be de-anonymized with surprisingly little information.
    In 2006, AOL released three months of search data for 657,000 users: 20 million searches
     in all. The idea was that it would be useful for researchers; to protect people’s
     identity, they replaced names with numbers. So, for example, Bruce Schneier might
     be 608429. They were surprised when researchers were able to attach names to numbers
     by correlating different items in individuals’ search history.
    In 2008, Netflix published 10 million movie rankings by 500,000 anonymized customers,
     as part of a challenge for people to come up with better recommendation systems than
     the one the company was using at that time. Researchers were able to de-anonymize
     people by comparing rankings and time stamps with public rankings and time stamps
     in the Internet Movie Database.
    These might seem like special cases, but correlation opportunities pop up more frequently
     than you might think. Someone with access to an anonymous data set of telephone records,
     for example, might partially de-anonymize it by correlating it with a catalog merchant’s
     telephone orderdatabase. Or Amazon’s online book reviews could be the key to partially de-anonymizing
     a database of credit card purchase details.
    Using public anonymous data from the 1990 census, computer scientist Latanya Sweeney
     found that 87% of the population in the United States, 216 million of 248 million
     people, could likely be uniquely identified by their five-digit ZIP code combined
     with their gender and date of birth. For about half, just a city, town, or municipality
     name was sufficient. Other researchers reported similar results using 2000 census
     data.
    Google, with its database of users’ Internet searches, could de-anonymize a public
     database of Internet purchases, or zero in on searches of medical terms to de-anonymize
     a public health database. Merchants who maintain detailed customer and purchase information
     could use their data to partially de-anonymize any large search engine’s search data.
     A data broker holding databases of several companies might be able to de-anonymize
     most of the records in those databases.
    Researchers have been able to identify people from their anonymous DNA by comparing
     the data with information from genealogy sites and other sources. Even something like
     Alfred Kinsey’s sex research data from the 1930s and 1940s isn’t safe. Kinsey took
     great pains to preserve the anonymity of his subjects, but in 2013, researcher Raquel
     Hill was able to identify 97% of them.
    It’s counterintuitive, but it takes less data to uniquely identify us than we think.
     Even though we’re all pretty typical, we’re nonetheless distinctive. It turns out
     that if you eliminate the top 100 movies everyone watches, our movie-watching habits
     are all pretty individual. This is also true for our book-reading habits, our Internet-shopping
     habits, our telephone habits, and our web-searching habits. We can be uniquely identified
     by