Hello list- I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here: http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-... ). He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes). I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions). One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public. To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't. I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released"). I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.) So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure). But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance. -Nat -- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com
Hi Nat, I think that's an interesting question, but as someone unfamiliar with hacking laws I need to ask: is it legal to download/own the data? Best, Tim Tim Laquintano Assistant Professor of English Lafayette College Sent from my iPhone
On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here: http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-... ).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Depends on the country. Legal in Denmark in 2012. Stine Lomborg, in a research study, did used hacked data but decided to get informed consent from every data point. You could do that. Sent from my mobile device, Please excuse any typos
On 8 Oct 2015, at 7:27 AM, Tim Laquintano <tlaquintano@gmail.com> wrote:
Hi Nat,
I think that's an interesting question, but as someone unfamiliar with hacking laws I need to ask: is it legal to download/own the data?
Best,
Tim
Tim Laquintano Assistant Professor of English Lafayette College
Sent from my iPhone
On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here: http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-... ).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
I think that the data is essentially the stolen property of whoever is running Patreon. Maybe they could have used it to monetize their site, or reduce operating costs, or whatever. Your hands wind up pretty ethically dirty unless you're ONLY looking at data that could have been gathered from Patreon's public APIs/interfaces... that's the purist look at it. The 'gray hat' position is probably "the cat is out of the bag, not my problem". The "black hat" position is more of a "what can I leverage out of this breech, and I don't care who it hurts." I wouldn't use it, not even for secops/netsec research. --e On Wed, Oct 7, 2015 at 3:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here:
http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-... ).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
I think one could look a little at the consequences of what you are doing. Seems you are trying to make money by researching funding data is that right? I find that unethical but I find all kinds of data mining unethical. There are reasons to use your same skill sets that could benefit society. May be I don't understand what your end result is about. Peter Timusk peterotimusk@gmail.com I do not speak for my employer or charities or political parties or unions I volunteer with or belong to, unless otherwise noted.
On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here: http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-... ).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
A similar case study might be the history of the ENRON email data set. It went through multiple iterations of availability and takedowns as it was slowly edited over time to remove emails. People still use it as a canonical dataset, but it is certainly still controversial, and especially was when it was first made available. --- Alexander Leavitt PhD Candidate USC Annenberg School for Communication & Journalism http://alexleavitt.com Twitter: @alexleavitt <http://twitter.com/alexleavitt> On Wed, Oct 7, 2015 at 1:54 PM, Peter Timusk <peterotimusk@gmail.com> wrote:
I think one could look a little at the consequences of what you are doing. Seems you are trying to make money by researching funding data is that right? I find that unethical but I find all kinds of data mining unethical. There are reasons to use your same skill sets that could benefit society. May be I don't understand what your end result is about.
Peter Timusk peterotimusk@gmail.com I do not speak for my employer or charities or political parties or unions I volunteer with or belong to, unless otherwise noted.
On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here:
http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-...
).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
I was following the 2600 forums after the Ashley Madison hack. People were wanting to explore that data trove and the consensus after some input from a journalist was that it was legal to obtain the data trove once it was released to the public. But it is illegal to disclose sensitive data such as passwords or credit card numbers. This does not answer the colleague's original question regarding the use of the data in a formal research setting. Perhaps the "grey hat" academic solution would be to anonymize the resulting data, in much the similar way we usually treat survey data. That was analysis can be performed and no individual is "injured" as a result of the research. Jeremiah, Ph.D. On Wed, Oct 7, 2015 at 4:29 PM, Alex Leavitt <alexleavitt@gmail.com> wrote:
A similar case study might be the history of the ENRON email data set. It went through multiple iterations of availability and takedowns as it was slowly edited over time to remove emails. People still use it as a canonical dataset, but it is certainly still controversial, and especially was when it was first made available.
---
Alexander Leavitt PhD Candidate USC Annenberg School for Communication & Journalism http://alexleavitt.com Twitter: @alexleavitt <http://twitter.com/alexleavitt>
On Wed, Oct 7, 2015 at 1:54 PM, Peter Timusk <peterotimusk@gmail.com> wrote:
I think one could look a little at the consequences of what you are doing. Seems you are trying to make money by researching funding data is that right? I find that unethical but I find all kinds of data mining unethical. There are reasons to use your same skill sets that could benefit society. May be I don't understand what your end result is about.
Peter Timusk peterotimusk@gmail.com I do not speak for my employer or charities or political parties or unions I volunteer with or belong to, unless otherwise noted.
On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here:
http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-...
).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- -------------------- Jeremiah Spence, Ph.D. Technologist. Analyst. Consultant. jeremiahspence.com jeremiah.spence@gmail.com
I was just having this discussion with a student after a webinar on Ashley Madison that we hosted for students here at the Berkeley I School - thought I'd share some of what I shared with the student in case folks find it useful. The short response is: it's difficult to give an all-or-nothing sort of answer to the question of using stolen data, as there are a number of considerations we need to fold in. (I won't speak directly to Patreon, but a lot of the context below applies, I think.) In particular, it's important to keep in mind means and ends in a professional context. To illustrate, we can consider two main groups that have an interest in the contents of the AM data dump: academic researchers and journalists. One of the primary "ends" of journalism, at least ideally, is service to the public interest. With that in mind, we permit or tolerate - from a professional ethical perspective, at least - journalists to mine and explore stolen information so long as it is for reasons that can be justified in terms of public interest. There are lots of precedents here, from Watergate to Snowden to Wikileaks cables and so on. A good example in this particular case is the Gizmodo reporting on Ashley Madison <http://gizmodo.com/the-fembots-of-ashley-madison-1726670394>: exposing the "fembots" (or fake, automated profiles) of Ashley Madison lays bare a deceptive practice that consumers and the FTC have an legitimate interest in knowing about. In another example, there are cases of individuals reporting on public figures that might be included in the database - in the aftermath of the hack, some outlets have reported on the private emails of the CEO of AM parent company Avid Life Media (whose relevance to the hack is obvious) while others broke the news that conservative "family values" cultural figure Josh Duggar was using the service. I think the case for investigating the CEO here (by, for example, exposing his private emails) is ethically justifiable (though I think the case for exposing Duggar is not as straightforward, but it's a legitimate open question - by way of a counterargument, Dan Savage, in particular, really, really thinks that exposing Duggar is justifiable <http://www.bioethics.net/2015/09/ashley-madison-using-stolen-data/>.) At the same time, we would not tolerate journalists targeting and exposing the details of non-public figures in the dataset. That would be bullying or harassment - and definitely unethical. For academic researchers, our ends aren't *necessarily* public interest (though there can be clear connections to the public interest in some cases, like the West Virginia researchers that exposed VW's cheating software <http://spectrum.ieee.org/cars-that-think/transportation/advanced-cars/how-professors-caught-vw-cheating>). Setting aside romantic notions of progress and "the glory of science"), the ends of research can be variously ontological, epistemological, political, etc... Over time, we've decided - in part as a response to past ethical transgressions - that, regardless of the ends of our research, there are certain values we shouldn't compromise in the pursuit of knowledge - chief among them is the value of respect. As the Belmont Report and other research documents, the notion of informed consent is one of the main ways (if not *the* main way) in which we operationalize the value of respect in practice. The challenge that the Ashley Madison data poses for researchers, then, is that those included in the dataset never consented to being a part of research (and, indeed, it could probably be assumed that many of the affected individuals would definitely not agree to disclose many of these intimate details to researchers without certain guarantees of privacy). So, I'm not sure what the legal status of conducting research on a stolen database might be (I don't have the legal background to answer that question) - but from an ethical perspective, concerns with consent and respect are still absolutely pressing. So, rather than giving a blanket "yes" or "no" from an ethical perspective, I think it is important to consider 1) the kinds of research questions you would want to ask and why, 2) what the relationship of your research might be to your institution's IRB (if you're at the kind of institution that has one, anyway) given that the dataset contains human subjects, and 3) what possible further harms your research might cause if not approached properly. (Plus: even if consent is out of the picture, we still have other important values to which we can appeal - such as beneficence, justice, care, etc...). -Anna ----- Anna Lauren Hoffmann Professional Faculty & Postdoctoral Researcher School of Information University of California, Berkeley On Wed, Oct 7, 2015 at 2:35 PM, Jeremiah Spence <jeremiah.spence@gmail.com> wrote:
I was following the 2600 forums after the Ashley Madison hack. People were wanting to explore that data trove and the consensus after some input from a journalist was that it was legal to obtain the data trove once it was released to the public. But it is illegal to disclose sensitive data such as passwords or credit card numbers.
This does not answer the colleague's original question regarding the use of the data in a formal research setting. Perhaps the "grey hat" academic solution would be to anonymize the resulting data, in much the similar way we usually treat survey data. That was analysis can be performed and no individual is "injured" as a result of the research.
Jeremiah, Ph.D.
On Wed, Oct 7, 2015 at 4:29 PM, Alex Leavitt <alexleavitt@gmail.com> wrote:
A similar case study might be the history of the ENRON email data set. It went through multiple iterations of availability and takedowns as it was slowly edited over time to remove emails. People still use it as a canonical dataset, but it is certainly still controversial, and especially was when it was first made available.
---
Alexander Leavitt PhD Candidate USC Annenberg School for Communication & Journalism http://alexleavitt.com Twitter: @alexleavitt <http://twitter.com/alexleavitt>
On Wed, Oct 7, 2015 at 1:54 PM, Peter Timusk <peterotimusk@gmail.com> wrote:
I think one could look a little at the consequences of what you are doing. Seems you are trying to make money by researching funding data is that right? I find that unethical but I find all kinds of data mining unethical. There are reasons to use your same skill sets that could benefit society. May be I don't understand what your end result is about.
Peter Timusk peterotimusk@gmail.com I do not speak for my employer or charities or political parties or unions I volunteer with or belong to, unless otherwise noted.
On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here:
http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-...
).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- -------------------- Jeremiah Spence, Ph.D. Technologist. Analyst. Consultant. jeremiahspence.com jeremiah.spence@gmail.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Dear all, we are happy to announce the new issue of gamevironments. games, religion, and stuff. In our first regular issue, #2 (2015), you find articles, reports about current activities in the field of gaming and religion, an interview and a game review. free download at http://www.gamevironments.org/. Enjoy! Kerstin Radde-Antweiler & Xenia Zeiler (editors-in-chief) -- Prof. Dr. Xenia Zeiler Associate Professor South Asian Studies P.O. Box 59 (Unioninkatu 38B) 00014 University of Helsinki, Finland +358 50 4482713 http://tuhat.halvi.helsinki.fi/portal/en/person/zeiler
Call for Papers gamevironments. games, religion, and stuff Papers Due: 15.01.2016 The online journal gamevironments, http://www.gamevironments.org/, highlighting video gaming as related to religion, culture, and society, invites paper submissions for its next regular issue. As the journal’s title suggests, researching video games by today is not limited to the established media-centered approaches. On the contrary, also the ‘games/gaming’ – ‘environments’ and actor-centered approaches need to be highlighted. Gamevironments consist of both, the technical environments of video games/gaming and the cultural environments of video games/gaming. The journal welcomes contributions applying all approaches and highlighting all fields of investigation related to video games/gaming and religion, culture, and society in the diverse global video game and gaming landscape. We include different categories of texts in the journal: regular academic articles, interviews (f.e. with games designers), research reports, book reviews and game reviews. For more information, please see http://www.gamevironments.org/?page_id=61. 15.01.2016 Full Chapter Submission 28.02.2016 Review Results Returned 30.04.2016 Revised Chapter Submission 31.06.2016 Online Publication If you have any questions, for instance if you want to discuss your paper idea or abstract prior to submission, don’t hesitate to contact us: radde@uni-bremen.de. Kerstin Radde-Antweiler and Xenia Zeiler -- Prof. Dr. Xenia Zeiler Associate Professor South Asian Studies P.O. Box 59 (Unioninkatu 38B) 00014 University of Helsinki, Finland +358 50 4482713 http://tuhat.halvi.helsinki.fi/portal/en/person/zeiler
I think the Enron data is a special case from the legal perspective, if not the ethical perspective, in that it was made an explicitly public dataset as part of the FERC investigation. I know the ethical questions cannot be reduced to legal access issues, but this piece of it, at least, makes Enron a bit unusual. The same cannot be said of the Ashley Madison data, nor probably even the AoL search data, which was made public only to be "unmade" public. I agree that both the ways in which the data will be used and the effect that use might have on those who generated it are essential to your question. Alex On Wed, Oct 7, 2015 at 2:29 PM, Alex Leavitt <alexleavitt@gmail.com> wrote:
A similar case study might be the history of the ENRON email data set. It went through multiple iterations of availability and takedowns as it was slowly edited over time to remove emails. People still use it as a canonical dataset, but it is certainly still controversial, and especially was when it was first made available.
---
Alexander Leavitt PhD Candidate USC Annenberg School for Communication & Journalism http://alexleavitt.com Twitter: @alexleavitt <http://twitter.com/alexleavitt>
On Wed, Oct 7, 2015 at 1:54 PM, Peter Timusk <peterotimusk@gmail.com> wrote:
I think one could look a little at the consequences of what you are doing. Seems you are trying to make money by researching funding data is that right? I find that unethical but I find all kinds of data mining unethical. There are reasons to use your same skill sets that could benefit society. May be I don't understand what your end result is about.
Peter Timusk peterotimusk@gmail.com I do not speak for my employer or charities or political parties or unions I volunteer with or belong to, unless otherwise noted.
On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here:
http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-...
).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- // Alexander Halavais, Sociologist, Semiologist, and Saboteur Extraordinaire // Associate Professor of Social Technologies, Arizona State University // http://alex.halavais.net/bio @halavais
Peter and list - We are academic researchers — when I said something to the effect of “trying to help crowdfunders do it better” I meant “I am an idealistic ex-professor who still does academic research and I still hope some of my work will make the world a better place and not sit unread in a dusty journal on a shelf,” and since this line of work is about crowdfunding, well that’s what it will improve and inform the practices of (we find it won’t work for everyone, so keep alternate and longer-established funding mechanisms like the NEH in the US). I realize I haven’t been to AoIR in quite some time, but the first one I went to was Toronto, 2003. I’m an academic. Peter I see we have some shared connections on LinkedIn. We have two published papers on this topic in journals most of this list will know: NMS and iCS. Davidson, R, & Poor, N. (2015). The barriers facing artists’ use of crowdfunding platforms: Personality, emotional labor, and going to the well one too many times. New Media & Society,17(2), 289-307. http://nms.sagepub.com/content/early/2014/11/24/1461444814558916.abstract Davidson, R, & Poor, N. (Forthcoming). Why sugar daddies are only good for Bar-Mitzvahs: Exploring the limits on repeat crowdfunding. Information, Communication, and Society. I left Roei out of it since he would think I’m a bit daft for asking (I am 99.5% sure he isn’t on this list…). He is a professor (just got tenure and is on his sabbatical year!), I, though, am not (it’s not my thing). But I’m one of those pesky “independent scholar” people, I’m like a ronin professor who doesn’t teach, which leaves me feeling misunderstood and annoyed with conference registration web pages that require an affiliation. So I don’t have an IRB or tenure committee to worry about, but I want to do good work regardless. Roei, however, does have an IRB and, one day, a promotion committee. This also means I get to mouth off about academic annoyances on Facebook, quite to the irritation of some of my friends and to my great delight. (Granted, I know some of you who do this and yet are professors….) -Nat ------------------------------- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com/ https://sites.google.com/site/natpoor/
On Oct 7, 2015, at 4:54 PM, Peter Timusk <peterotimusk@gmail.com> wrote:
I think one could look a little at the consequences of what you are doing. Seems you are trying to make money by researching funding data is that right? I find that unethical but I find all kinds of data mining unethical. There are reasons to use your same skill sets that could benefit society. May be I don't understand what your end result is about.
Peter Timusk peterotimusk@gmail.com I do not speak for my employer or charities or political parties or unions I volunteer with or belong to, unless otherwise noted.
On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here: http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-... ).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
On 10/7/15 10:11 AM, Nathaniel Poor wrote:
I recently got into a discussion with a colleague about the ethics of using hacked data... I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
Here are some references on this topic you might look at. David Dittrich and Erin Kenneally (co-lead authors). The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research. http://www.dhs.gov/sites/default/files/publications/CSD-MenloPrinciplesCORE-..., December 2012. David Dittrich and Erin Kenneally (eds.). Applying Ethical Principles to Information and Communication Technology Research: A Companion to the Department of Homeland Security Menlo Report. http://www.dhs.gov/sites/default/files/publications/CSD-MenloPrinciplesCOMPA..., January 2012. David Dittrich, Katherine Carpenter, and Manish Karir. An Ethical Examination of the Internet Census 2012 Dataset: A Menlo Report Case Study. Technology and Society Magazine, IEEE, 34(2):40–46, June 2015. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7128817 Ronald Deibert and Masashi Crete-Nishihata. Blurred boundaries: Probing the ethics of cyberspace research. Review of Policy Research, 28(5):531–537, 2011. David Dittrich and Erin Kenneally (eds.). The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research. http://www.cyber.st.dhs.gov/wp-content/uploads/2011/12/MenloPrinciplesCORE-2..., December 2011. David Dittrich. The Ethics of Social Honeypots. Research Ethics, May 2015. http://rea.sagepub.com/content/early/2015/05/19/1747016115583380.abstract Serge Egelman, Joseph Bonneau, Sonia Chiasson, David Dittrich, and Stuart Schechter. It’s Not Stealing If You Need It: A Panel on the Ethics of Performing Research Using Public Data of Illicit Origin. J. Blythe (Ed.): FC 2012 Workshops, LNCS 7398, pp. 124–132, 2012. Springer-Verlag Berlin Heidelberg 2012. Just as a side note, the Carna Botnet (the IEEE pub above) did in fact set a bad precedent for "researchers" who witnessed the exploitation of weak passwords to illegally obtain data, which turned into illegally accessing similar devices in a similar manner to clean them up without the owners' knowledge, involvement, or permission. "There was also a well-known research botnet called the Internet Census 2012, where some researchers used access to these devices to make measurements of the internet. Curiously, they decided to block access for some malware, too, so it is a kind of precursor, although their main intent was to publish data, and our main intent is to kill malware." If you ask me, letting researchers have an ethical "pass" on using illegally obtained data is giving a push to both academic reseachers, and self-proclaimed "researchers", as they head down that slippery slope. -- Dave Dittrich dittrich@u.washington.edu http://staff.washington.edu/dittrich PGP key: http://staff.washington.edu/dittrich/pgpkey.txt Fingerprint: 097B 4DCB BF16 E1D8 A06C 7512 A751 C80A D15E E079
Dear all, what a great question, and what helpful responses! First of all, I appreciate this relatively new case as it helps illuminate the need for continually updating and refreshing the AoIR guidelines. That is, as Nathaniel's careful efforts to make use of the guidelines demonstrates, there's a kind of hole here that clearly needs specific consideration and reflection. (At the same time, Aristotle warned against the impossibility of developing final rules for every new case. In a powerful metaphor (to my mind at least) - guideline and rule-making is somewhat akin to weaving or knitting: every time you weave in a new thread to "cover" a new example or case, you thereby also multiply the holes in your weaving ...) Secondly, without being able to do justice to the full richness of this discussion, a couple of additional observations. One is that I espy some important cultural differences in the ethical argumentation. Correct me where I'm wrong, but a good portion of the argumentation in favor of using the hacked data turns on efforts to consider the consequences of doing so - including possible consequences to the data subjects as well as to the researchers. So far, so good - but this sort of ethical consequentialism is more prevalent in (but by no means exclusive to) U.S./U.K./ and to some degree Australian approaches. (No surprise: the utilitarian philosophers come out of and importantly stamp English-speaking philosophies and cultures in the early 19th century, if not earlier.) By contrast, the example of Stine Lomborg asking for informed consent nonetheless in my mind is an example of the more deontological emphases, especially (but again, by no means exclusively so) in northern Europe and Scandinavia. That is, there is a sense of the importance of respecting foundational rights, with less regard to the consequences of doing so (beginning with making the researcher's life that much more complicated - perhaps to the point of scuttling a project). (Again, no surprise: for all their well-deserved criticism, Kant and Habermas (among others) are regularly invoked in ethical discussions here, especially in connecting our ethics with basic democratic norms, rights, and practices.) While this is clearly painting with a broad brush that screams for a great deal of nuance and counterexample - the contrast, I think, is nonetheless useful in at least two ways. One, it helps more sharply articulate the specific ethical approaches we tend to take up within a given cultural context and tradition, so that we can be clearer about the strengths and limits of those approaches. Two, it helps foreground the ethical difficulty common to much Internet-facilitated research - namely, that our data often draws from and crosses important national and cultural borders, thereby requiring us to pay attention to these culturally-variable emphases insofar as they may apply to a given data set. In the Stine Lomborg example: her taking the more demanding ethical step of asking for informed consent has the advantage of not only going further to ensure basic rights protections - and this, I'm pretty sure, on both deontological and feminist grounds; in addition, had this been an international project, the stronger ethical approach here would have simultaneously met the comparatively weaker demands of a consequentialist approach. Lastly, I'm wondering if anyone has developed analogies from biomedical ethics, i.e., of using medical data drawn from clearly illegal and unethical work (most notoriously, Nazi and Japanese experiments, but certainly also the infamous Tuskeegee Institute work - when they can be legitimately called that)? Insofar as any such analogies might hold - broadly, a consequentialist would argue that great good can come of using data and information, whatever their source, as long as further foreseeable risks are minimal. Some deontologists might argue differently. I dunno - I need more coffee - and it might well be that such analogies would turn out to be fruitless. But in the meantime, again, many thanks for this, and I hope we can take this up as part of the ethics panel at AoIR this year: Friday, October 23, from 1.00-2.50 p.m. (just FYI). Best in the meantime, - charles -- Professor in Media Studies Department of Media and Communication University of Oslo Director, Centre for Research in Media Innovations (CeRMI) Editor, The Journal of Media Innovations <https://www.journals.uio.no/index.php/TJMI/> President, INSEIT <www.inseit.net> Postboks 1093 Blindern 0317 Oslo, Norway c.m.ess@media.uio.no On Thu, Oct 8, 2015 at 4:06 AM, Dave Dittrich <dittrich@apl.washington.edu> wrote:
On 10/7/15 10:11 AM, Nathaniel Poor wrote:
I recently got into a discussion with a colleague about the ethics of using hacked data... I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
Here are some references on this topic you might look at.
David Dittrich and Erin Kenneally (co-lead authors). The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research.
http://www.dhs.gov/sites/default/files/publications/CSD-MenloPrinciplesCORE-... , December 2012.
David Dittrich and Erin Kenneally (eds.). Applying Ethical Principles to Information and Communication Technology Research: A Companion to the Department of Homeland Security Menlo Report.
http://www.dhs.gov/sites/default/files/publications/CSD-MenloPrinciplesCOMPA... , January 2012.
David Dittrich, Katherine Carpenter, and Manish Karir. An Ethical Examination of the Internet Census 2012 Dataset: A Menlo Report Case Study. Technology and Society Magazine, IEEE, 34(2):40–46, June 2015. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7128817
Ronald Deibert and Masashi Crete-Nishihata. Blurred boundaries: Probing the ethics of cyberspace research. Review of Policy Research, 28(5):531–537, 2011.
David Dittrich and Erin Kenneally (eds.). The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research.
http://www.cyber.st.dhs.gov/wp-content/uploads/2011/12/MenloPrinciplesCORE-2... , December 2011.
David Dittrich. The Ethics of Social Honeypots. Research Ethics, May 2015. http://rea.sagepub.com/content/early/2015/05/19/1747016115583380.abstract
Serge Egelman, Joseph Bonneau, Sonia Chiasson, David Dittrich, and Stuart Schechter. It’s Not Stealing If You Need It: A Panel on the Ethics of Performing Research Using Public Data of Illicit Origin. J. Blythe (Ed.): FC 2012 Workshops, LNCS 7398, pp. 124–132, 2012. Springer-Verlag Berlin Heidelberg 2012.
Just as a side note, the Carna Botnet (the IEEE pub above) did in fact set a bad precedent for "researchers" who witnessed the exploitation of weak passwords to illegally obtain data, which turned into illegally accessing similar devices in a similar manner to clean them up without the owners' knowledge, involvement, or permission.
"There was also a well-known research botnet called the Internet Census 2012, where some researchers used access to these devices to make measurements of the internet. Curiously, they decided to block access for some malware, too, so it is a kind of precursor, although their main intent was to publish data, and our main intent is to kill malware."
If you ask me, letting researchers have an ethical "pass" on using illegally obtained data is giving a push to both academic reseachers, and self-proclaimed "researchers", as they head down that slippery slope.
-- Dave Dittrich dittrich@u.washington.edu http://staff.washington.edu/dittrich
PGP key: http://staff.washington.edu/dittrich/pgpkey.txt Fingerprint: 097B 4DCB BF16 E1D8 A06C 7512 A751 C80A D15E E079 _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
On 10/7/2015 10:06 PM, Charles Ess wrote:
Dear all, what a great question, and what helpful responses!
In the Stine Lomborg example: her taking the more demanding ethical step of asking for informed consent has the advantage of not only going further to ensure basic rights protections - and this, I'm pretty sure, on both deontological and feminist grounds; in addition, had this been an international project, the stronger ethical approach here would have simultaneously met the comparatively weaker demands of a consequentialist approach.
For discussion, to what extent would requiring informed consent affect sampling? Are there effective ways to deal with this?
Lastly, I'm wondering if anyone has developed analogies from biomedical ethics, i.e., of using medical data drawn from clearly illegal and unethical work (most notoriously, Nazi and Japanese experiments, but certainly also the infamous Tuskeegee Institute work - when they can be legitimately called that)?
The analogy using the Nazi case isn't a good one. The Nazi were not doing anything close to good science even by the standards of the day. The discussion here presupposes that those using data obtained through hacking would be doing good science. So, ultimately, this is not a proper or useful comparison. Fred -- Fred Fuchs - Founder, CEO, & Producer FireSabre Consulting LLC Content Services for Virtual Worlds
Fred: Just a note of clarification. The question isn't whether the research being proposed is like that of Nazis. It's whether data obtained in unethical ways can be used to an ethical end, or (to draw on my Law & Order knowledge), it represents the fruit of a poisonous tree. I think you could say that in kind, human experimentation on high-altitude or vivisections are a different animal than illegal computer intrusion and theft of data (though criminal penalties in the US would likely be pretty similar). Nonetheless, the basic question is pretty parallel, I think: http://bioethics.as.nyu.edu/docs/IO/30171/Steinberg.HumanResearch.pdf https://www.jewishvirtuallibrary.org/jsource/Judaism/naziexp.html - Alex (Ronin in training.) On Wed, Oct 7, 2015 at 9:56 PM, Fred Fuchs <fred@firesabre.com> wrote:
On 10/7/2015 10:06 PM, Charles Ess wrote:
Dear all, what a great question, and what helpful responses!
In the Stine Lomborg example: her taking the more demanding ethical step of asking for informed consent has the advantage of not only going further to ensure basic rights protections - and this, I'm pretty sure, on both deontological and feminist grounds; in addition, had this been an international project, the stronger ethical approach here would have simultaneously met the comparatively weaker demands of a consequentialist approach.
For discussion, to what extent would requiring informed consent affect sampling? Are there effective ways to deal with this?
Lastly, I'm wondering if anyone has developed analogies from biomedical ethics, i.e., of using medical data drawn from clearly illegal and unethical work (most notoriously, Nazi and Japanese experiments, but certainly also the infamous Tuskeegee Institute work - when they can be legitimately called that)?
The analogy using the Nazi case isn't a good one. The Nazi were not doing anything close to good science even by the standards of the day.
The discussion here presupposes that those using data obtained through hacking would be doing good science.
So, ultimately, this is not a proper or useful comparison.
Fred
-- Fred Fuchs - Founder, CEO, & Producer FireSabre Consulting LLC Content Services for Virtual Worlds
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- // Alexander Halavais, Sociologist, Semiologist, and Saboteur Extraordinaire // Associate Professor of Social Technologies, Arizona State University // http://alex.halavais.net/bio @halavais
Just to clarify on the topic of journalists being comfortable using and/or publishing stolen/"hacked" data: in the United States because of the freedom of the press, journalists and the publications they work for are not liable for publishing information that was originally illegally obtained, although the journalist and publication did not behave illegally to obtain the data, and when publishing the information is in the public interest. This protection is likely not to apply to researchers unless used specifically in the publication context. Some other nations have this protection and most do not especially when the publicized (and/or illegally obtained) information violates individual privacy rights. It is really difficult to find a balance between using information that is very useful, illegally obtained and publicly available for research that could be interesting and potentially benefit society. Ultimately we should be working toward obtaining the kinds of information we want and need with the consent of the organizations and/or individuals who are part of the data sets we want to create. It would be much easier to involve organizations and individuals in research if a level of trust in researchers already existed so people know they are doing the right thing and/or protecting the data they acquire. This is still a work-in-progress. Just my $0.02. *Katherine On Wed, Oct 7, 2015 at 1:11 PM, Nathaniel Poor <natpoor@gmail.com> wrote:
Hello list-
I recently got into a discussion with a colleague about the ethics of using hacked data, specifically the Patreon hacked data (see here:
http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-... ).
He and I do crowdfunding work, and had wanted to look at Patreon, but as far as I can tell they have no easy hook into all their projects (for scraping), so, to me this data hack was like a gift! But he said there was no way we could use it. We aren't doing sentiment analysis or anything, we would use aggregated measures like funding levels and then report things like means and maybe a regression, so there would be no identifiable information whatsoever derived from the hacked data in any of our resulting work (we might go to the site and pull some quotes).
I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf ), and didn't see anything specifically about hacked data (I don't think "hacked" is the best word, but I don't like "stolen" either, but those are different discussions).
One relevant line I noticed was this one: "If access to an online context is publicly available, do members/participants/authors perceive the context to be public?" (p. 8) So, the problem with the data is that it's the entire website, so some was private and some was public, but now it's all public and everyone knows it's public.
To me, I agree that a lot of the data in the data-dump had been intended to be private -- apparently, direct messages are in there -- but we wouldn't use that data (it's not something we're interested in). We'd use data like number of funders and funding levels and then aggregate everything. I see that some of it was meant to be private, but given the entire site was hacked and exported I don't see how currently anyone could have an expectation of privacy any more. I'm not trying to torture the definition, it's just that it was private until it wasn't.
I can see that some academic researchers -- at least those in computer security -- would be interested in this data and should be able to publish in peer reviewed journals about it, in an anonymized manner (probably as an example of "here's a data hack like what we are talking about, here's what hackers released").
I also think that probably every script kiddie has downloaded the data, as has every grey and black market email list spammer, and probably every botnet purveyor (for passwords) and maybe even the hacking arm of the Chinese army and the NSA. My point here is that if we were to use the data in academic research we wouldn't be publicizing it to nefarious people who would misuse it since all of those people already have it. We could maybe help people who want to use crowdfunding some (hopefully!) if we have some results. (I guess I don't see that we would be doing any harm by using it.)
So, what do people think? Did I miss something in the AoIR guidelines? I realize I don't think it's clear either way, or I wouldn't be asking, so probably the answers will point to this as a grey area (so why do I even ask, I am not sure).
But I'm not looking for "You can't use it because it's hacked," because I don't think that explains anything. I could counter that with "It is publicly available found data," because it is, although I don't think that's the best reply either. Both lack nuance.
-Nat
-- Nathaniel Poor, Ph.D. http://natpoor.blogspot.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
participants (14)
-
Alex Halavais -
Alex Leavitt -
Anna Lauren Hoffmann -
Charles Ess -
Dave Dittrich -
Elijah Wright -
Fred Fuchs -
Jeremiah Spence -
Katherine Carpenter -
live -
Nathaniel Poor -
Peter Timusk -
Tim Laquintano -
Xenia Zeiler