Fwd: Facebook data destruction
Hi there, Remember the guy that announced a forecoming release to the research community of a huge dataset of publicly available facebook data? Remember many of you guys have disapproved of this idea for ethical reasons? Well, no big surprise, he was asked by facebook to destroy his dataset. But this genuine experiment yields an interesting question: <quoting Pete Warden> In an ideal world, we might not want any of this data to be public at all. At the moment it's easily available to commercial firms, but not to researchers, which seems like a bad deal all round. </quoting> http://petewarden.typepad.com/searchbrowser/2010/03/facebook-data-destructio... _ Christophe Prieur, prieur@liafa.jussieu.fr Liafa, Université Paris-Diderot http://liafa.jussieu.fr/~prieur/ [user experience research, social networks, (large) graph algorithms]
Thanks for sharing this, Christophe. (I don't see this quote within the link you provided. Where is it from?) I'm not surprised that Facebook has requested this action. But contrary to Warden's quote, I see a difference between the data being _available_ to commercial firms (which he means crawlable by search engine spiders, if I'm reading the post correctly), and having it _released en masse_ to the public in an easily digestible form. Given the threat of a lawsuit, it appears that the former is allowed by Facebook's TOS, but the latter is not. And, clearly, there is the problem of taking the information outside of its intended context once you aggregate it and release it to the public, as I previously detailed here: "Why Pete Warden Should Not Release Profile Data on 215 Million Facebook Users" http://michaelzimmer.org/2010/02/12/why-pete-warden-should-not-release-profi... Best, Michael Zimmer -- Michael Zimmer, PhD Assistant Professor, School of Information Studies Associate, Center for Information Policy Research University of Wisconsin-Milwaukee e: zimmerm@uwm.edu w: www.michaelzimmer.org On Mar 18, 2010, at 5:39 AM, Christophe Prieur wrote:
Hi there,
Remember the guy that announced a forecoming release to the research community of a huge dataset of publicly available facebook data? Remember many of you guys have disapproved of this idea for ethical reasons?
Well, no big surprise, he was asked by facebook to destroy his dataset. But this genuine experiment yields an interesting question:
<quoting Pete Warden> In an ideal world, we might not want any of this data to be public at all. At the moment it's easily available to commercial firms, but not to researchers, which seems like a bad deal all round. </quoting>
http://petewarden.typepad.com/searchbrowser/2010/03/facebook-data-destructio...
_ Christophe Prieur, prieur@liafa.jussieu.fr Liafa, Université Paris-Diderot http://liafa.jussieu.fr/~prieur/ [user experience research, social networks, (large) graph algorithms]
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
On Thu, Mar 18, 2010 at 08:32:41AM -0500, Michael Zimmer wrote:
I'm not surprised that Facebook has requested this action. But contrary to Warden's quote, I see a difference between the data being _available_ to commercial firms (which he means crawlable by search engine spiders, if I'm reading the post correctly), and having it _released en masse_ to the public in an easily digestible form.
Given the threat of a lawsuit, it appears that the former is allowed by Facebook's TOS, but the latter is not. And, clearly, there is the problem of taking the information outside of its intended context once you aggregate it and release it to the public, as I previously detailed here: "Why Pete Warden Should Not Release Profile Data on 215 Million Facebook Users" http://michaelzimmer.org/2010/02/12/why-pete-warden-should-not-release-profi...
While crawling public information for a generic search engine may not violate a sense of contextual integrity, that isn't the only other use of such data. A couple of examples could be a firm using facebook mining to build up / improve a profile of me to sell to others, or a government using the information to broaden a database on citizen's political leanings. Those certainly go beyond the expectations of usage of a lot of users of facebook. At present this sort of information is available to anyone with the resources to gather it (which aren't enormous). Of course, even if facebook tightens this up in the future, they're keen to sell plenty of personal information to others (which doesn't seem to be well understood among users I've spoken to), which does weaken the argument that they have an obligation to reconsider how/what public information is published. Nick White
I'm not going to sanction Facebook's rather cavalier attitude towards the selling/dissemination of personally identifiable information (though I do agree with Michael that Warden's use violates the ToS and their own does not); however, I think it's important to think about the different mechanisms for accountability in both cases. With an enormous data set released to the open web, there are no longer any mechanisms for accounability. Don't like that your personal information is out there? Too bad: it's not coming back, and you can be sure that datamining/harvesting businesses would've been the first to download that data (even if academic researchers would, too). With Facebook dealing in more B2B transactions, there is certainly a high degree of opacity, and I personally would probably disappove of a lot of their transactions. But at least in this case there are clear lines and mechanisms for accountability, derived both from the ToS (however user-unfriendly they are) and through the legally-binding data use contracts that FB is arranging with other companies. In practice, individuals are still up the creek a bit with regard to their own data on a personal basis but there's at least a clear line of blame and accountability for potential abuses. What's more potentially powerful in this context is the possibility of a class-action lawsuit enacted on behalf of a huge mass of FB users against any business that flagrantly violates privacy norms and/or against FB itself, a possibility of which I'm sure FB is keenly aware. Again: not sanctioning their behavior, but keeping this stuff behind a wall does have certain benefits in terms of potential remedies for bad action that become impossible when it's released fully into the wild. jkd On Thu, 18 Mar 2010 14:42:49 +0000, Nick White <air-l@njw.me.uk> wrote:
On Thu, Mar 18, 2010 at 08:32:41AM -0500, Michael Zimmer wrote:
I'm not surprised that Facebook has requested this action. But contrary to Warden's quote, I see a difference between the data being _available_ to commercial firms (which he means crawlable by search engine spiders, if I'm reading the post correctly), and having it _released en masse_ to the public in an easily digestible form.
Given the threat of a lawsuit, it appears that the former is allowed by Facebook's TOS, but the latter is not. And, clearly, there is the problem of taking the information outside of its intended context once you aggregate it and release it to the public, as I previously detailed here:
"Why Pete Warden Should Not Release Profile Data on 215 Million Facebook Users"
http://michaelzimmer.org/2010/02/12/why-pete-warden-should-not-release-profi...
While crawling public information for a generic search engine may not violate a sense of contextual integrity, that isn't the only other use of such data. A couple of examples could be a firm using facebook mining to build up / improve a profile of me to sell to others, or a government using the information to broaden a database on citizen's political leanings. Those certainly go beyond the expectations of usage of a lot of users of facebook.
At present this sort of information is available to anyone with the resources to gather it (which aren't enormous). Of course, even if facebook tightens this up in the future, they're keen to sell plenty of personal information to others (which doesn't seem to be well understood among users I've spoken to), which does weaken the argument that they have an obligation to reconsider how/what public information is published.
Nick White _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Couldn't a for-statistical-purpose-only access have been a possible option? I'm not familiar with handling such massive data, and I assume that could only have made sense through a paying service, but… allowing scripts to pull some aggregated data (say, preventing any results that didn't involve 10,000+ accounts) would have respected most privacy concerns, no? Having those data anywhere, hackable, is a legal risk and that enough justifies Facebook's threats but I'm still hoping for an academic access of this amazing database. 2010/3/18 jkd <jkd@email.unc.edu>
I'm not going to sanction Facebook's rather cavalier attitude towards the selling/dissemination of personally identifiable information (though I do agree with Michael that Warden's use violates the ToS and their own does not); however, I think it's important to think about the different mechanisms for accountability in both cases.
With an enormous data set released to the open web, there are no longer any mechanisms for accounability. Don't like that your personal information is out there? Too bad: it's not coming back, and you can be sure that datamining/harvesting businesses would've been the first to download that data (even if academic researchers would, too).
With Facebook dealing in more B2B transactions, there is certainly a high degree of opacity, and I personally would probably disappove of a lot of their transactions. But at least in this case there are clear lines and mechanisms for accountability, derived both from the ToS (however user-unfriendly they are) and through the legally-binding data use contracts that FB is arranging with other companies. In practice, individuals are still up the creek a bit with regard to their own data on a personal basis but there's at least a clear line of blame and accountability for potential abuses.
What's more potentially powerful in this context is the possibility of a class-action lawsuit enacted on behalf of a huge mass of FB users against any business that flagrantly violates privacy norms and/or against FB itself, a possibility of which I'm sure FB is keenly aware.
Again: not sanctioning their behavior, but keeping this stuff behind a wall does have certain benefits in terms of potential remedies for bad action that become impossible when it's released fully into the wild.
jkd
On Thu, 18 Mar 2010 14:42:49 +0000, Nick White <air-l@njw.me.uk> wrote:
On Thu, Mar 18, 2010 at 08:32:41AM -0500, Michael Zimmer wrote:
I'm not surprised that Facebook has requested this action. But contrary to Warden's quote, I see a difference between the data being _available_ to commercial firms (which he means crawlable by search engine spiders, if I'm reading the post correctly), and having it _released en masse_ to the public in an easily digestible form.
Given the threat of a lawsuit, it appears that the former is allowed by Facebook's TOS, but the latter is not. And, clearly, there is the problem of taking the information outside of its intended context once you aggregate it and release it to the public, as I previously detailed here:
"Why Pete Warden Should Not Release Profile Data on 215 Million Facebook Users"
http://michaelzimmer.org/2010/02/12/why-pete-warden-should-not-release-profi...
While crawling public information for a generic search engine may not violate a sense of contextual integrity, that isn't the only other use of such data. A couple of examples could be a firm using facebook mining to build up / improve a profile of me to sell to others, or a government using the information to broaden a database on citizen's political leanings. Those certainly go beyond the expectations of usage of a lot of users of facebook.
At present this sort of information is available to anyone with the resources to gather it (which aren't enormous). Of course, even if facebook tightens this up in the future, they're keen to sell plenty of personal information to others (which doesn't seem to be well understood among users I've spoken to), which does weaken the argument that they have an obligation to reconsider how/what public information is published.
Nick White _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
On Thu, Mar 18, 2010 at 08:32:41AM -0500, Michael Zimmer wrote:
(I don't see this quote within the link you provided. Where is it from?)
It's part of a comment in which Pete responds to privacy concerns: http://petewarden.typepad.com/searchbrowser/2010/03/facebook-data-destructio... (strangely there seem to be 2 separate comment feeds) Nick White
participants (5)
-
Bertil Hatt -
Christophe Prieur -
jkd -
Michael Zimmer -
Nick White