Re: [Air-L] Platform Data Access Report

1 Jun 2022

      Hi Stu,

Deleted data and research data retention are among the most nuanced issues
out there, in part because Member State laws take different positions on
the issue of data retention. That’s one of the main reasons the Code
provides a compendium of select Member State laws.

That said, in general, there are a few things to keep in mind:

First, while GDPR embraces the right to erasure (the “right to be
forgotten” is outdated terminology), it is applicable when a data subject
affirmatively invokes the right. Deleting one’s data from social media does
not mean researchers are automatically required to delete said data from
their own datasets.

The draft Code addresses the right to erasure in two ways. 1) It discusses
the issue of transparency and data subject transparency notifications. (See
Part I.) Researchers (through their institutions) must provide a mechanism
by which European data subjects can assess whether their data is likely to
be part of a research project and potentially request removal. But data
subjects must have appropriate grounds for this request, and those grounds
are limited. (See Articles 17 & 19 of the GDPR.) Institutional DPOs should
lead the way in determining what is a valid request. 2) Data
retention/destruction plans are very much related to this question. As part
of the Data Needs and Management Plan described in Part II of the Code,
researchers will need to provide solid justification for choosing how long
they will retain data and the steps by which they will destroy it. That
justification should be tied to considerations of users’ rights, as well as
considerations balancing scientific research needs (eg, study replication).
GDPR recognizes both.

Second, in requiring API users to delete suspended and deleted data,
Twitter’s TOS go well beyond GDPR requirements. The Code cannot speak to
Twitter’s TOS, only to the law. Twitter is free to impose additional
restrictions on researchers.

Third and finally, all of the platforms are skittish about their own
long-term data retention policies. They generally delete data from their
core pipelines after 90 days. We hope that a delegated act under the DSA
(Article 31 of which will compel data access for researchers in certain
circumstances), will help us tackle this. The Code itself hints at, but
doesn’t yet go all in on, the need for platforms to retain data longer than
this 90-day period when it is needed for independent research. But how to
identify such data prospectively is tricky. We’ll need the DSA’s help (and,
honestly, maybe some case law) to spell that out more.

Hope this helps clarify—and offers greater insight into an issue that we
researchers typically understand quite poorly. Chairing this Working Group
was revelatory for me in so many ways.

Rebekah

On Tue, May 31, 2022 at 19:41 Shulman, Stu <stu@texifter.com> wrote:
...
Fascinating report. The detail is remarkable, insightful and helpful.
Thank-you for sharing and all the work. These reports take incredible
patience. It is well written as well.
I notice deletions are mentioned in the context of holding data but not
(unless I missed it) in the context of user-generated deletions. When a
user generates a social data action (Tweet, RT, reply, etc.) then later
deletes it, any researcher who may be holding the datapoint must also
delete it or render it inaccessible. Very few do, in my experience. The
"right to be forgotten" is still operational under GDPR, in my
understanding of it, though I'd be happy to be updated. However, if some of
you are holding my deleted Tweets in Europe in a spreadsheet, are you
compliant? The term forgotten is not in the report. The term suspended does
not appear in the report either but is fundamental to compliance with
Twitter research terms. You cannot look at data from suspended accounts.
Many academics are doing Twitter or Reddit data work because of generous
data access options. I would say account suspensions and user deletions are
fairly significant issues that should be kept in focus as systemic and
ethically problematic failures in the current spreadsheet-centric paradigm
for examining social data artifacts.
On Tue, May 31, 2022 at 1:14 PM Charles M. Ess via Air-L <
air-l@listserv.aoir.org> wrote:
...
Dear colleagues,
as a quick follow up - first of all, a tremendous shout out to Rebekah
for her work as chair of this project. Bringing together representatives
from major platforms, experts in GDPR and related law, NGOs, practicing
researchers (and even an ethicist) into sharp and focused dialogue over
the year+ leading up to this publication, coupled with the agreements
over various aspects and elements of the Code of Conduct, was an all but
superhuman task. As someone privileged to participate under Chatham
House Rules, I am allowed to say that there was universal and
enthusiastic consensus affirming Rebekah's extraordinary work in getting
us to this place - a place that one at the outset could reasonably doubt
we would ever see.
Secondly: the draft Code endorses the AoIR ethics guidelines 3.0 as
follows:
The research should follow the Ethical Guidelines for Internet Research
of the Association of Internet Researchers (as well as any other
specialized or sector-based guidelines relevant to the research) and be
reviewed and approved before data is requested from a DSO by an
institutional, or appropriate third-party, ethical review board, as
described in Part II of the Code. (p. 27).
In addition, there will be reference to an affiliated document titled
"Best practices and reflection questions for the Code of Conduct." The
document cross references the 3.0 guidelines with several key issues
raised in the draft Code, and is designed to serve as a springboard for
further ethical reflection on the part of those developing the sorts of
research envisioned and circumscribed therein.
This latter document will soon appear on the EDMO website as well along
with other documents affiliated with the draft Code. Here I would like
to thank especially:
aline shakti franzke (University of Duisburg-Essen);
Stine Lomborg (Copenhagen University);
Elizabeth Buchanan (Marshfield Clinic Research Institute)
Rich Ling ( Norwegian Academy of Science and Letters);
and Michael Zimmer (Marquette University)
for their very great helps in putting this document together.
A thousand thanks to Rebekah and the Working Group, and I very much look
forward to seeing how all of this unfolds.
All best,
- charles
On 31/05/2022 16:30, Tromble, Rebekah via Air-L wrote:
...
Dear colleagues,
Earlier today the European Digital Media Observatory's Working Group on
Platform-to-Researcher Data Access published its official report
<
https://edmo.eu/wp-content/uploads/2022/02/Report-of-the-European-Digital-Me...
.
The multi-stakeholder group has been hard at work for the last year. Our
main charge was to draft a Code of Conduct under Article 40 of the GDPR
that would facilitate better access to data for independent researchers.
This report contains that draft Code.
Among other things, the draft Code lays out a framework for assessing
the
level of risk involved in accessing and conducting research with
different
types of platform data. It then lays out a number of safeguards that
can be
put in place to mitigate different levels of risks--helping to promote
research that is ethical and responsible. (I tweeted more about it here
<https://twitter.com/RebekahKTromble/status/1531611984944316419>.)
Getting to this point has entailed tremendously hard work by everyone
involved, and, as the report itself notes, the work is far from over.
But
publishing the report and draft Code represent a major step forward.
Though
certain requirements are necessarily tied to specifications under the
GDPR,
the general principles and proposed solutions the report offers are
instructive well beyond the European context.
Please feel free to circulate widely. And let me know if you have any
questions, thoughts, etc.
Rebekah
Dr. Rebekah Tromble
Director, Institute for Data, Democracy & Politics, George Washington
University |
Associate Professor, School of Media & Public Affairs, George Washington
University |
Visiting Researcher, The Alan Turing Institute (London) |
www.rebekahtromble.net
iddp.gwu.edu
_______________________________________________
The Air-L@listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers:
http://www.aoir.org/
--
Professor Emeritus
University of Oslo
<http://www.hf.uio.no/imk/english/people/aca/charlees/index.html>
3rd edition of Digital Media Ethics now available:
<http://politybooks.com/bookdetail/?isbn=9781509533428>
_______________________________________________
The Air-L@listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers:
http://www.aoir.org/
--
Dr. Stuart W. Shulman
Founder and CEO, Texifter
Editor Emeritus, *Journal of Information Technology & Politics*
--
Dr. Rebekah Tromble
Director, Institute for Data, Democracy & Politics, George Washington
University |
Associate Professor, School of Media & Public Affairs, George Washington
University |
Visiting Researcher, The Alan Turing Institute (London) |
www.rebekahtromble.net
iddp.gwu.edu