Besides legal stuff, it's clear that AOL didn't follow what's SOP procedures for preserving respondent, privacy in the social sciences. They thought they were by not directly releasing user-holder's name and accounts, but were so eager to be helpful that they didn't sanitize the data well. Always a concern, but one routinely dealt. OK, I gotta stop myself or else I go on another rant about computer scientists not knowing any social science -- in this case methods. Barry _____________________________________________________________________ Barry Wellman Professor of Sociology NetLab Director wellman at chass.utoronto.ca http://www.chass.utoronto.ca/~wellman Centre for Urban & Community Studies University of Toronto 455 Spadina Avenue Toronto Canada M5S 2G8 fax:+1-416-978-7162 You're invited to visit & contribute to the new version of "Updating Cybertimes: It's Time to Bring Our Culture into Cyberspace" http://chass.utoronto.ca/oldnew/cybertimes.php _____________________________________________________________________
I'm wondering what kind of sanitization method(s) would have worked in this case? Ericka Menchen Trevino Graduate Student Department of Communication University of Illinois at Chicago On 8/29/06, Barry Wellman <wellman@chass.utoronto.ca> wrote:
Besides legal stuff, it's clear that AOL didn't follow what's SOP procedures for preserving respondent, privacy in the social sciences. They thought they were by not directly releasing user-holder's name and accounts, but were so eager to be helpful that they didn't sanitize the data well. Always a concern, but one routinely dealt. OK, I gotta stop myself or else I go on another rant about computer scientists not knowing any social science -- in this case methods. Barry _____________________________________________________________________
Barry Wellman Professor of Sociology NetLab Director wellman at chass.utoronto.ca http://www.chass.utoronto.ca/~wellman
Centre for Urban & Community Studies University of Toronto 455 Spadina Avenue Toronto Canada M5S 2G8 fax:+1-416-978-7162
You're invited to visit & contribute to the new version of "Updating Cybertimes: It's Time to Bring Our Culture into Cyberspace" http://chass.utoronto.ca/oldnew/cybertimes.php _____________________________________________________________________
_______________________________________________ The air-l@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- http://blog.erickamenchen.net AIM & Skype: erickaakcire
Subject: Re: [Air-l] AOL and research ethics
I'm wondering what kind of sanitization method(s) would have worked in this case?
Data sanitization is very difficult to do, and just about equally difficult to audit or evaluate. It may go as far as a state in which *single inferences* that can be made from the data are damaging - for example, a user who searches for several proper names (uncommon ones) *plus* a string of terms regarding some pretty unsavory pornography. How hard, in this case, to trace out their social network and figure out who the person really is? I would guess that you might be able to sanitize this sort of data by removing all of the nouns from it. *smirk* --elijah
On 8/29/06, Barry Wellman <wellman@chass.utoronto.ca> wrote:
Besides legal stuff, it's clear that AOL didn't follow what's SOP procedures for preserving respondent, privacy in the social sciences. They thought they were by not directly releasing user-holder's name and accounts, but were so eager to be helpful that they didn't sanitize the data well. Always a concern, but one routinely dealt. OK, I gotta stop myself or else I go on another rant about computer scientists not knowing any social science -- in this case methods.
Subject: Re: [Air-l] AOL and research ethics
I'm wondering what kind of sanitization method(s) would have worked in this case?
Actually, I think the major issue here was having the search histories linked. Not coincidentally, this is what made the data set most interesting. Other engines have released single searches available to researchers, or to the world. For some time Altavista provided lists of most recent searches on a web page, and other engines did the same. Google has a tickerboard with running searches at their headquarters, I've heard, and they've also released raw search data to researchers, with no user identification at all. Of course, that data could potentially be revealing as well: if, for example, someone searches for a name and social security number in the same query. But the likelihood of intrusion increases with each new dimension you add to the data. Particularly because the internal links to search histories were made (and these could then connect to external sources of information), this was a fairly intrusive set of data. Best, Alex -- // // This email is // [X] assumed public and may be blogged / forwarded. // [ ] assumed to be private, please ask before redistributing. // // Alexander C. Halavais // Social Architect // http://alex.halavais.net //
participants (4)
-
Alex Halavais -
Barry Wellman -
elw@stderr.org -
Ericka Menchen Trevino