AoL Search Data Redux

24 Oct 2008

      With eye to nostalgia, I (re)raise the following question: are the
data that were provided by the ill-fated AoL data release fair game
for research? I could *really* use them for a current project; enough
so that I will do the research. I may just never actually tell anyone
about it. Which strikes me as somehow deeply wrong.

For those new to the story, Eszter Hargittai tackled some of the issues here:

http://crookedtimber.org/2006/08/07/the-aol-data-mess/

I did a bit here:

http://alex.halavais.net/aol-data/

A piece by Nate Anderson appeared in Ars Technica:

http://arstechnica.com/news.ars/post/20060823-7578.html

And there were about three dozen messages to AIR-L soon after the release.

Now a couple of years have passed, and I have a part of research
question that can be effectively answered through these data, and
without these data (or similar), answering the question is
prohibitively expensive and makes use of an approach less likely to
yield valid information.

Although the data was used, in part, during a presentation I saw in
Copenhagen, I haven't seen any published articles making use of the
released data, with the exception of articles that are related to
detecting inferences in data, or the privacy concerns of search
engines.

My personal view is that this is widely publicly available data, and
has already done its harm. My use of the data would yield no
personally identifiable results. Given that my research does no harm
to the subjects, I think a case may be made that it is therefore a
reasonable target of investigation.

Please tell me why I am wrong, and why my IRB will say I am wrong.

Alex

-- 

--
//
// This email is
// [X] assumed public and may be blogged / forwarded.
// [ ] assumed to be private, please ask before redistributing.
//
// Alexander C. Halavais, cyberflâneur
// http://alex.halavais.net
//

Alex Halavais

tags

participants (1)