Re: [Air-L] Twitter Data Sharing Update - Thou Shalt Not Share Collections of Tweets
Stu- I'm not in full agreement with your starting point that tweets "yearn to be free". I think the nature of the platform (140 character limit, broadcast as one voice among millions of accounts, viewed via a live stream that makes it almost impossible to read every single one) also supports the notion that tweets are meant to be fleeting. -michael
On May 5, 2011, at 9:35 AM, Stuart Shulman wrote:
Hi Michael,
We did not contest the violation warning and so we took the data down. The policy is another matter. These are tweets that yearn to be free, insofar as tweets can collectively yearn for anything:
In this instance, I find myself liking the Facebook policy, which in an opposite manner sets data free. This may explain, in part, why the "Scraping Facebook" video seems to be getting more views than the one on harvesting Twitter tweets:
http://www.screencast.com/t/iW3rvdYY
~Stu
On Thu, May 5, 2011 at 10:25 AM, Michael Zimmer <zimmerm@uwm.edu> wrote: It appears the Twitter API doesn't care if you're selling or giving it away, as I.4.a prohibits any attempt to "sell, rent, lease, sublicense, redistribute, or syndicate access to the Twitter API or Twitter Content to any third party without prior written approval from Twitter", as well as noting that "Exporting Twitter Content to a datastore as a service or other cloud based service, however, is not permitted"
http://dev.twitter.com/pages/api_terms
I'm not justifying their terms, but it does appear that you violated them.
-mz
-- Michael Zimmer, PhD Assistant Professor, School of Information Studies Co-Director, Center for Information Policy Research University of Wisconsin-Milwaukee e: zimmerm@uwm.edu w: www.michaelzimmer.org
On May 5, 2011, at 7:24 AM, Stuart Shulman wrote:
Twitter closed down our efforts to share post-Osama bin Laden Twitter data (or any other collections) for research purposes, again citing their TOS & API TOS.
To be clear: we were giving the data away, not selling it. Also, it was not scraped of Twitter. Rather, it was gathered using a Twitter-authorized account and an API that lets us fetch 1500 items at a time.
It is a shame that the now 2 million tweets cannot, for example, be sampled and coded using a crowd source model. Or could they?
I am assuming the provision against sharing data does not extend to individuals who gather it and keep it to themselves or work with it in a research team.
~Stu
--
Stuart Shulman President & CEO Texifter, LLC <http://www.texifter.com/>
Have you tried DiscoverText? http://discovertext.com *Featuring the Facebook Graph & Twitter APIs* _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
To me it seems the owners of Internet-based services go too far sometimes in claiming rights to authors' content. To vary an old Internet metaphor: An owner of a road should not be entitled to claim royalties to pictures of cars that have driven over the road. Eventually it may boil down to something like this: you may publish collections of tweets, but they will have to be stripped from any service-specific "marks" (e.g. hashtags). Best --u At 9:51 Uhr -0500 5.5.2011, Michael Zimmer wrote:
Stu-
I'm not in full agreement with your starting point that tweets "yearn to be free". I think the nature of the platform (140 character limit, broadcast as one voice among millions of accounts, viewed via a live stream that makes it almost impossible to read every single one) also supports the notion that tweets are meant to be fleeting.
-michael
On May 5, 2011, at 9:35 AM, Stuart Shulman wrote:
Hi Michael,
We did not contest the violation warning and so we took the data down. The policy is another matter. These are tweets that yearn to be free, insofar as tweets can collectively yearn for anything:
In this instance, I find myself liking the Facebook policy, which in an opposite manner sets data free. This may explain, in part, why the "Scraping Facebook" video seems to be getting more views than the one on harvesting Twitter tweets:
http://www.screencast.com/t/iW3rvdYY
~Stu
On Thu, May 5, 2011 at 10:25 AM, Michael Zimmer <zimmerm@uwm.edu> wrote: It appears the Twitter API doesn't care if you're selling or giving it away, as I.4.a prohibits any attempt to "sell, rent, lease, sublicense, redistribute, or syndicate access to the Twitter API or Twitter Content to any third party without prior written approval from Twitter", as well as noting that "Exporting Twitter Content to a datastore as a service or other cloud based service, however, is not permitted"
http://dev.twitter.com/pages/api_terms
I'm not justifying their terms, but it does appear that you violated them.
-mz
-- Michael Zimmer, PhD Assistant Professor, School of Information Studies Co-Director, Center for Information Policy Research University of Wisconsin-Milwaukee e: zimmerm@uwm.edu w: www.michaelzimmer.org
On May 5, 2011, at 7:24 AM, Stuart Shulman wrote:
Twitter closed down our efforts to share post-Osama bin Laden Twitter data (or any other collections) for research purposes, again citing their TOS & API TOS.
To be clear: we were giving the data away, not selling it. Also, it was not scraped of Twitter. Rather, it was gathered using a Twitter-authorized account and an API that lets us fetch 1500 items at a time.
It is a shame that the now 2 million tweets cannot, for example, be sampled and coded using a crowd source model. Or could they?
I am assuming the provision against sharing data does not extend to individuals who gather it and keep it to themselves or work with it in a research team.
~Stu
--
Stuart Shulman President & CEO Texifter, LLC <http://www.texifter.com/>
Have you tried DiscoverText? http://discovertext.com *Featuring the Facebook Graph & Twitter APIs* _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Hey all, So, yes, regardless of whether or not its "right" of them to block the distribution of raw catalogs of their data, they now completely disallow this activity. Obviously, this makes our job tougher, especially in the process of vetting/reviewing (and particularly since this stuff is so new and really needs that review in order to make compelling arguments) a paper that comes out with Twitter data - I'm just a fledgling academic, but my understanding is that generally you need the raw data backing assertions in order to really test those assertions. I don't know if there's a similar situation out there where assertions are made, but the data backing them cannot be made public. Either way, I think there's two separate motivations for doing this: 1. The need to make sure data doesn't fall into the wrong hands (particularly spambots/applications that have been blacklisted/other people/programs that cause harm to their environment and eventual valuation), 2. The need to control really the only real piece of value for Twitter, the potential demographic data - Facebook played it correctly by never opening it up and banking that their in-house work would be able to add value to their system, without relying on open data and programmers leveraging it to basically increase the value of the company. Now that Twitter has scaled up to this size, the benefits of open data are starting to be outweighed by the costs, and I think that's the big deal. That said, I have talked with the people at Twitter, and they did agree that all analytical results from Twitter data can be let loose. We can imagine a situation where we have some raw set of data, then perform a vast battery of analytics on it, then open up those CSVs to the public - essentially, you could have the same effect of making all the raw data public, just broken up into these sets of analytical results (that is, someone could reverse engineer the analytics to get the catalog). This seems to be totally within TOS, and I've been working feverishly to get something out that allows us to collectively push data out this way. Basically, if we have a platform that easily allows us to hammer a dataset through 50 analytical processes, and the collection process for that dataset is very transparent, the collector algorithm is respected and understood, then we can sort of mitigate this problem (not the best solution, but a solution nonetheless). And you're right, Michael, about the LoC 6 month lag time, insofar as I have heard as well. Also, the LoC collecting that data and allowing access to that data are entirely separate beasts - I'm sure they'll allow open access, but the details about that, none of which are announced, could turn out to be insurmountable for large-scale research. Devin On May 5, 2011, at 8:59 AM, Stephen J Cavrak Jr wrote:
Quoting Michael Zimmer <zimmerm@uwm.edu>:
Stu-
... that tweets are meant to be fleeting.
music once you play it is in the air gone forever ...
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
This seems like an opportune time to ask whether there is ever a time when violating the Terms of Service is an ethical practice for researchers. (As a practical issue, clearly Twitter can block access to particular IP blocks that it finds violating its API ToS, or turn off the firehose if you are whitelisted.) The natural response to this for most people is that it is never ethically permissible to do so. While I recognize that (for example) this would expose you to personal sanctions from the company for violating those terms, and potentially expose your institution, I'm less concerned with the legal implications than I am with ethical restrictions this places on the researcher. I think there are cases where violation of a set of Terms is ethically permissible, a position I took up at an AoIR preconference workshop last year. This gives us a concrete example. It seems to me that the tweets themselves are not owned by Twitter, and so they are restricting your ability to access these materials programmatically, not to actually having or redistributing the content. If you "magically" were in possession of a collection of tweets, they would have little say in their redistribution (though the authors might, an issue that I think is separate). Specifically, Twitter prohibits "scraping" the service, but fails to define this. If I hire a war room to cut and paste tweets, does this violate the policy? It's simply not clear. It seems to me there is a kind of Turing-test for scrapers: Twitter would have no way to know (other than asking) whether I was scraping programmatically or had hired a room full of undergrads to cut and paste. I've gotten away from my original question. There's no question that the courts have thus far sided with ToS as generally being binding. But when is it (or is it ever) ethically either acceptable or necessary to violate a web site's Terms? Best, Alex -- // // This email is // [x] assumed public and may be blogged / forwarded. // [ ] assumed to be private, please ask before redistributing. // // Alexander C. Halavais, ciberflâneur // http://alex.halavais.net //
participants (5)
-
Alex Halavais -
Devin Gaffney -
Michael Zimmer -
Stephen J Cavrak Jr -
Ulf-Dietrich Reips