Hi! As researcher who has employed search engine transaction logs in research projects for nearly a decade, the concerns about the AOL data release are out of proportion to reality. Note, from the example in the NYT story, that even with 3 months of query data, including geographical data, the reporter wasn't sure that this was the person. (BTW, the reporter, Saul Hansell, obviously didn't mind publishing the lady's queries for the entire world to see -- with her name. I hope he adequately explained to the lady the ramifications of what she was agreeing to.) It is VERY difficult using just query terms to identify a particular searcher, which is why researchers have been struggling with personalization for nearly two decades. In the DOJ vs Google case, which is mentioned in the story, Google had to provide the queries to the DOJ (a statistically significant sample of about 5,000 instead of the larger number the DOJ was asking for). The privacy concerns were weighted against other factors, which is what we, as researchers, should be doing here. There is no other way to get real world interaction data from a significant sample of Web users unless the search engine companies provide it to academic researchers. Many search engine company provide and have provided this type of data (including Excite, AltaVista, AlltheWeb, Lyco, AOL, Yahoo!, MSN, and Google, among others -- they all do it). Many search engine companies post it on their Web pages, provide it to researchers, the government, or sell it to commercial research companies. Are there potential privacy concerns with such data release? Yes. Are there potentially great benefits with such data release? Yes. A good road ahead for the research community is to work on ways to preserve privacy in such data releases and provide a balanced voice in these debates. Best, Jim ************************************** Jim Jansen Email: jjansen@acm.org URL: http://ist.psu.edu/faculty_pages/jjansen/ <https://mail.ist.psu.edu/exchweb/bin/redir.asp?URL=http://ist.psu.edu/faculty_pages/jjansen/> Blog: http://jimjansen.blogspot.com/ <https://mail.ist.psu.edu/exchweb/bin/redir.asp?URL=http://jimjansen.blogspot.com/> Phone: 814-865-6459 Fax: 814-865-6426 College of Information Sciences and Technology The Pennsylvania State University 329F Information Sciences and Technology Building University Park, PA, 16802, USA ************************************** ________________________________ From: air-l-bounces@listserv.aoir.org on behalf of Jennifer Stromer-Galley Sent: Wed 8/9/2006 9:48 AM To: air-l@listserv.aoir.org Subject: Re: [Air-l] AOL Releases Search Logs from 500,000 Users The New York Times has an article online today about the AOL release of data. You can find the article at http://www.nytimes.com/2006/08/09/technology/09aol.html. (NyTimes.com requires registration). The article highlights one woman, a 60 year old from Georgia, whose searches were captured in the three month period. Much is revealed about her in her search queries . . . It also discusses the release of the data. AOL spokespeople are saying they did not authorize the release - that an employee acted hastily and without authorization to release it. The article also reports that programmers have set up Web sites to let people search the data in the database, which is leading people to find shocking or amusing search histories. Eeeek. Best, ~Jenny -- Assistant Professor Department of Communication, SS 340 University at Albany, SUNY 1400 Washington Ave. Albany, NY 12222 518-442-4873 jstromer@albany.edu http://www.albany.edu/~jstromer _______________________________________________ The air-l@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org <http://aoir.org/> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/