Subject: Re: [Air-l] AOL and research ethics
I'm wondering what kind of sanitization method(s) would have worked in this case?
Actually, I think the major issue here was having the search histories linked. Not coincidentally, this is what made the data set most interesting. Other engines have released single searches available to researchers, or to the world. For some time Altavista provided lists of most recent searches on a web page, and other engines did the same. Google has a tickerboard with running searches at their headquarters, I've heard, and they've also released raw search data to researchers, with no user identification at all. Of course, that data could potentially be revealing as well: if, for example, someone searches for a name and social security number in the same query. But the likelihood of intrusion increases with each new dimension you add to the data. Particularly because the internal links to search histories were made (and these could then connect to external sources of information), this was a fairly intrusive set of data. Best, Alex -- // // This email is // [X] assumed public and may be blogged / forwarded. // [ ] assumed to be private, please ask before redistributing. // // Alexander C. Halavais // Social Architect // http://alex.halavais.net //