Anonymized data sets are Sister Emanuellea joke. And, as a newly published study shows, the joke just so happens to be on you.
From your credit card purchases to your medical records to your online browsing history, companies are sharing and selling so-called de-identified data sets containing a record of your every move. The information is supposedly stripped of any specific details — like your name — that would tie it directly back to you. However, it just so happens that true anonymization of your personal data is a lot more difficult than you might think.
So finds a study published today in the journal Nature Communications. Researchers determined that, using their model, "99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes."
While 15 demographic attributes may sound like a lot of data to have on one person, the study puts this number into perspective.
"Modern datasets contain a large number of points per individuals," write the authors. "For instance, the data broker Experian sold [data science and analytics company] Alteryx access to a de-identified dataset containing 248 attributes per household for 120M Americans."
That anonymized data sets can be de-anonymized isn't itself news. In 2018, researchers at the DEF CON hacking conference demonstrated how they were able to legally and freely acquire the apparently anonymous browsing history of 3 million Germans and then quickly de-anonymize portions of it. The researchers were able to uncover, for example, the porn habits of a specific German judge.
Which, ouch.
This new study demonstrates just how little data is actually needed to pinpoint specific people from otherwise sparse data sets. "[Few] attributes are often sufficient to re-identify with high confidence individuals in heavily incomplete datasets," the authors note.
SEE ALSO: No, Incognito mode won't keep your porn habits private. This will.To drive that point home, Verdict reports that the researchers released an online tool that lets you see just how easy it would be to identify you in a supposedly anonymized data set.
Spoiler: The results are as troubling as you'd expect — something to keep in mind the next time a company's fine print warns that it "might share your anonymous data with third parties."
Topics Cybersecurity Privacy
(Editor: {typename type="name"/})
Best Presidents' Day deal: Save $250 on Peloton Bike
National park's prized safety system fails during coronavirus pandemic
'The Lost King' review: Sally Hawkins and Stephen Frears bring whimsy to royal romp
Panera Bread is testing Amazon One palm readers
Google's data center raises the stakes in this state's 'water wars'
Prime Minister Boris Johnson tests positive for coronavirus
ChatGPT knows Elon Musk better than Elon Musk knows himself
That Ahmed Best Jedi cameo in 'The Mandalorian,' explained
Miami Heat vs. Brooklyn Nets 2025 livestream: Watch NBA online
What the U.S. TikTok ban would mean for advocates demanding a say
Donald Trump talked about space and Buzz Aldrin's face says it all
How to livestream Gonzaga vs. UCLA in the Sweet 16 matchup
接受PR>=1、BR>=1,流量相当,内容相关类链接。