There are many good reasons to anonymize Tweets during the research process (reducing annotator bias, for example) and definitely during the presentation of results (particularly controversial Tweets). Indeed, the visual presentation of sensational individual Tweets is something ethicists and IRBs might caution against, despite the public nature of the platform. Going further, you have to consider the ethical obligation not to publically display deleted Tweets, though I don't think this would extend to public figures, like @realdonaldtrump. Having said that, Tweets have considerably less "meaning" when you hide the Twitter handles. Context is lost, so there is a big trade off. DiscoverText has an automated redaction capability that can remove or obscure all the Twitter handles at once. Here is an example of an archive consisting of replies to a Tweet status ID where the start of every Tweet is a Twitter handle: https://drive.google.com/file/d/0B1iEonkdfwKua0lmWndNZTkyWXM/view?usp=sharin... This (underutilized) functionality is a part of a Freedom of Information Act (FOIA) capability including a "dirty word tool" that members of this list helped to create about 5 years ago. If any member of this list would like to experiment with the redaction tools, just shoot us an email ( info@texifter.com) and I will put you in a special sponsored sandbox for redaction experiments, I will give you a web demo, and we will provide complimentary Gnip and Search API access to play with. ~Stu On Fri, Apr 14, 2017 at 7:07 AM, Bernhard Rieder <berno.rieder@gmail.com> wrote:
On 14 Apr 2017, at 7:47 , Maurice Vergeer <m.vergeer@maw.ru.nl> wrote:
Still, anonymizing is fairly easy when you have the data in a statistical program such as SPSS, R or even Excel: replace the userhandles with a unique number (from 1 to N). Then remove the userhandles from the dataset. Still I would advice always to keep a secure file with both keyvariables userhandles and the new identifyer for future resrearch.
I you hash the userhandle, e.g. with SHA-1 or similar (which is even possible in Excel with a small formula), there is no need to keep a correspondence file, because hashing a string will always yield the same hash - while making reversal virtually impossible (i.e. you cannot get the handle from the hash).
best, Bernhard
-- Bernhard Rieder | Associate Professor | New Media and Digital Culture University of Amsterdam | Turfdraagsterpad 9 | 1012 XT Amsterdam | The Netherlands http://thepoliticsofsystems.net | http://rieder.polsys.net | https://www.digitalmethods.net | @RiederB
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/ listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr. Stuart W. Shulman Founder and CEO, Texifter LinkedIn: http://www.linkedin.com/in/stuartwshulman