Re: [Air-L] Twitter no longer allowing use for scholarship - Update
Hi Andrea & AoIR List Sure, in the name of transparency, here's the text of the email I sent to the person I know at Twitter, with all the personal pleasantries edited out. I also included an attachment with more detailed needs from the people who emailed me, but as those were off-list messages, I don't feel like I have permission to repost them here. However, if anyone who emailed me wants to repost their message to the list to get a broader discussion going, that's fine with me. And of course, let me know if I missed anything or misrepresented anyone's needs. * * * "I'm going to try to summarize the main feedback. Basically, the type and breadth of research even among the 12 or so projects that I heard about from my outreach are so varied that at least one researcher out there is using every element of a tweet, profiles, and the interface to measure or look at something. For example there's a guy looking at regional variations in linguistic forms on twitter - so he's using geolocation, tagging and the words in individual tweets for analysis. There is a big project looking at crisis response by looking at twitter hashtags, and other projects where people are using twitter data to do network mapping - so they need @reply data, and individual user ids -- names or something else, at a minimum, that give them access to the body of tweets from individuals. One Australian researcher, Jean Burgess, put it well: "There are two main research needs: one is the need for public, (semi)open and reusable archives of tweets, particularly for keywords and hashtags, [but also location data (Amanda add)]; and the other is the more specific needs of particular research groups to track *users* and their social networks, for whatever reason." For academic researchers and serious scholarly work, researchers can't just use a cool website that pulls up pie charts of stats about users - they need to be able to access the data as directly as possible. Here's a list of the types of data people are using for their research work: -geotagging -location -institutional tweets -@replies -hashtags and post keywords -collocating a single user's tweets -RT counts and tracking -words within tweets -follower/following data -number of tweets -links -images/avatars -lists Sometimes researchers need subsamples of the twitter stream based around: 1. A date and time (e.g. all tweets two days before, during and two days after the Superbowl) 2. Hashtags or keywords 3. Individuals (e.g. top ten most active people in a particular community of practice) 4. Random subsample of the public twitter stream I'm sure I've missed some items that creative researchers will want to use. I think the big takeaway is that researchers are doing a ton of creative things with Twitter data and would be hugely grateful for the most robust access you can offer them. Also, that as you continue to update Twitter's capabilities, researchers will want to take advantage of the new functionality. Scholars (myself included) find Twitter fascinating and truly want to do research that examines the ways in which Twitter is being used, which I think helps scholars, helps Twitter, and helps your users. I hope this is useful. Let me know if there's anything else I can do that would help make the case that Twitter can and should release its data to academic and non-profit researchers." -Amanda Amanda Lenhart Pew Research Center alenhart@pewinternet.org -----Original Message----- From: Andrea Kavanaugh [mailto:kavan@cs.vt.edu] Sent: Friday, March 11, 2011 5:09 PM To: Amanda Lenhart Subject: Re: [Air-L] Twitter no longer allowing use for scholarship - Update hi Amanda, can you send us what you sent to Twitter as our collective research needs? Andrea On Mar 11, 2011, at 12:26 PM, Amanda Lenhart wrote:
Thanks to everyone who wrote with feedback and details about their twitter-oriented research needs. I've pulled together everyone's requests and I've forwarded them to my friend at Twitter. I'll update the list when I hear anything.
Thanks,
Amanda
Amanda Lenhart Pew Research Center alenhart@pewinternet.org<mailto:alenhart@pewinternet.org>
_ _ _ _ _ _ _
On Friday, I had a conversation at conference with someone I know who works at Twitter. We talked about this exact issue. And while Twitter can't change back the API because of other problems the change was fixing, she would very much like to give academics and non-profit researchers access to Twitter data. However, she has to push through a proposal internally to make this happen. She said it would help her make the case if I could tell her what parts of the data set researchers wanted to access.
I offered to ping the AoIR list to get a sense of what people want and need from Twitter to be able to do/continue their research.
Also, one thing my friend did mention -- because Twitter data can never be fully anonymized, there might be some limitations on what kind of analysis you could do - mostly along the lines of limits on analysis that would reveal information about the individual that they had not made explicit and which might be harmful (e.g. Using network analysis to speculate on users' sexual orientation).
So, please email me off-list and I'll compile the types of data requests and send them along to my Twitter friend.
Thanks,
Amanda
Amanda Lenhart Pew Research Center alenhart@pewinternet.org<mailto:alenhart@pewinternet.org>
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
This is a wonderful list; thank you for gathering it, sharing it with Twitter, and then doing the same with us here on AoIR. It would be a shame to have something so public be yet so inaccessible. ----- Jeffrey Keefer j.keefer@lancaster.ac.uk Blog: http://silenceandvoice.com Twitter: http://twitter.com/JeffreyKeefer Website: http://www.jeffreykeefer.com On Mar 11, 2011, at 5:44 PM, Amanda Lenhart wrote:
Hi Andrea & AoIR List
Sure, in the name of transparency, here's the text of the email I sent to the person I know at Twitter, with all the personal pleasantries edited out. I also included an attachment with more detailed needs from the people who emailed me, but as those were off-list messages, I don't feel like I have permission to repost them here. However, if anyone who emailed me wants to repost their message to the list to get a broader discussion going, that's fine with me.
And of course, let me know if I missed anything or misrepresented anyone's needs. * * *
"I'm going to try to summarize the main feedback. Basically, the type and breadth of research even among the 12 or so projects that I heard about from my outreach are so varied that at least one researcher out there is using every element of a tweet, profiles, and the interface to measure or look at something. For example there's a guy looking at regional variations in linguistic forms on twitter - so he's using geolocation, tagging and the words in individual tweets for analysis. There is a big project looking at crisis response by looking at twitter hashtags, and other projects where people are using twitter data to do network mapping - so they need @reply data, and individual user ids -- names or something else, at a minimum, that give them access to the body of tweets from individuals. One Australian researcher, Jean Burgess, put it well: "There are two main research needs: one is the need for public, (semi)open and reusable archives of tweets, particularly for keywords a nd hashtags, [but also location data (Amanda add)]; and the other is the more specific needs of particular research groups to track *users* and their social networks, for whatever reason."
For academic researchers and serious scholarly work, researchers can't just use a cool website that pulls up pie charts of stats about users - they need to be able to access the data as directly as possible.
Here's a list of the types of data people are using for their research work:
-geotagging -location -institutional tweets -@replies -hashtags and post keywords -collocating a single user's tweets -RT counts and tracking -words within tweets -follower/following data -number of tweets -links -images/avatars -lists
Sometimes researchers need subsamples of the twitter stream based around: 1. A date and time (e.g. all tweets two days before, during and two days after the Superbowl) 2. Hashtags or keywords 3. Individuals (e.g. top ten most active people in a particular community of practice) 4. Random subsample of the public twitter stream
I'm sure I've missed some items that creative researchers will want to use. I think the big takeaway is that researchers are doing a ton of creative things with Twitter data and would be hugely grateful for the most robust access you can offer them. Also, that as you continue to update Twitter's capabilities, researchers will want to take advantage of the new functionality.
Scholars (myself included) find Twitter fascinating and truly want to do research that examines the ways in which Twitter is being used, which I think helps scholars, helps Twitter, and helps your users.
I hope this is useful. Let me know if there's anything else I can do that would help make the case that Twitter can and should release its data to academic and non-profit researchers."
-Amanda
Amanda Lenhart Pew Research Center alenhart@pewinternet.org
-----Original Message----- From: Andrea Kavanaugh [mailto:kavan@cs.vt.edu] Sent: Friday, March 11, 2011 5:09 PM To: Amanda Lenhart Subject: Re: [Air-L] Twitter no longer allowing use for scholarship - Update
hi Amanda, can you send us what you sent to Twitter as our collective research needs? Andrea
On Mar 11, 2011, at 12:26 PM, Amanda Lenhart wrote:
Thanks to everyone who wrote with feedback and details about their twitter-oriented research needs. I've pulled together everyone's requests and I've forwarded them to my friend at Twitter. I'll update the list when I hear anything.
Thanks,
Amanda
Amanda Lenhart Pew Research Center alenhart@pewinternet.org<mailto:alenhart@pewinternet.org>
_ _ _ _ _ _ _
On Friday, I had a conversation at conference with someone I know who works at Twitter. We talked about this exact issue. And while Twitter can't change back the API because of other problems the change was fixing, she would very much like to give academics and non-profit researchers access to Twitter data. However, she has to push through a proposal internally to make this happen. She said it would help her make the case if I could tell her what parts of the data set researchers wanted to access.
I offered to ping the AoIR list to get a sense of what people want and need from Twitter to be able to do/continue their research.
Also, one thing my friend did mention -- because Twitter data can never be fully anonymized, there might be some limitations on what kind of analysis you could do - mostly along the lines of limits on analysis that would reveal information about the individual that they had not made explicit and which might be harmful (e.g. Using network analysis to speculate on users' sexual orientation).
So, please email me off-list and I'll compile the types of data requests and send them along to my Twitter friend.
Thanks,
Amanda
Amanda Lenhart Pew Research Center alenhart@pewinternet.org<mailto:alenhart@pewinternet.org>
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Hey Amanda, thanks a lot for putting this list together. I missed my chance to contribute, but my needs are well covered by what other colleagues have added to the list (esp. Jean's comments). My point of view is that the best way to facilitate research is for Twitter to give us as much direct access to the data as possible. Providing visualizations or specific stats (trending topics etc) alone is not sufficient, because our questions are too varied (which isn't to say that providing these things isn't great). The discussion reminds me a bit of the debate around virtual research environments in the Humanities and Soc. Sciences. Often builders of e-infrastructure assume that research processes are standardized and that accordingly what we need are tools that hide the complexity of the data from us and simplify interaction with the data. The API approach is superior to that, even if Twitter didn't intend for us to use the API for gathering data. It would be great if Twitter engaged in a continued dialogue with the research community about what we need (and, in return, what we can do with their data that might be interesting to them). - Cornelius On Fri, Mar 11, 2011 at 11:44 PM, Amanda Lenhart <alenhart@pewinternet.org>wrote:
Hi Andrea & AoIR List
Sure, in the name of transparency, here's the text of the email I sent to the person I know at Twitter, with all the personal pleasantries edited out. I also included an attachment with more detailed needs from the people who emailed me, but as those were off-list messages, I don't feel like I have permission to repost them here. However, if anyone who emailed me wants to repost their message to the list to get a broader discussion going, that's fine with me.
And of course, let me know if I missed anything or misrepresented anyone's needs. * * *
"I'm going to try to summarize the main feedback. Basically, the type and breadth of research even among the 12 or so projects that I heard about from my outreach are so varied that at least one researcher out there is using every element of a tweet, profiles, and the interface to measure or look at something. For example there's a guy looking at regional variations in linguistic forms on twitter - so he's using geolocation, tagging and the words in individual tweets for analysis. There is a big project looking at crisis response by looking at twitter hashtags, and other projects where people are using twitter data to do network mapping - so they need @reply data, and individual user ids -- names or something else, at a minimum, that give them access to the body of tweets from individuals. One Australian researcher, Jean Burgess, put it well: "There are two main research needs: one is the need for public, (semi)open and reusable archives of tweets, particularly for keywords a nd hashtags, [but also location data (Amanda add)]; and the other is the more specific needs of particular research groups to track *users* and their social networks, for whatever reason."
For academic researchers and serious scholarly work, researchers can't just use a cool website that pulls up pie charts of stats about users - they need to be able to access the data as directly as possible.
Here's a list of the types of data people are using for their research work:
-geotagging -location -institutional tweets -@replies -hashtags and post keywords -collocating a single user's tweets -RT counts and tracking -words within tweets -follower/following data -number of tweets -links -images/avatars -lists
Sometimes researchers need subsamples of the twitter stream based around: 1. A date and time (e.g. all tweets two days before, during and two days after the Superbowl) 2. Hashtags or keywords 3. Individuals (e.g. top ten most active people in a particular community of practice) 4. Random subsample of the public twitter stream
I'm sure I've missed some items that creative researchers will want to use. I think the big takeaway is that researchers are doing a ton of creative things with Twitter data and would be hugely grateful for the most robust access you can offer them. Also, that as you continue to update Twitter's capabilities, researchers will want to take advantage of the new functionality.
Scholars (myself included) find Twitter fascinating and truly want to do research that examines the ways in which Twitter is being used, which I think helps scholars, helps Twitter, and helps your users.
I hope this is useful. Let me know if there's anything else I can do that would help make the case that Twitter can and should release its data to academic and non-profit researchers."
-Amanda
Amanda Lenhart Pew Research Center alenhart@pewinternet.org
-----Original Message----- From: Andrea Kavanaugh [mailto:kavan@cs.vt.edu] Sent: Friday, March 11, 2011 5:09 PM To: Amanda Lenhart Subject: Re: [Air-L] Twitter no longer allowing use for scholarship - Update
hi Amanda, can you send us what you sent to Twitter as our collective research needs? Andrea
On Mar 11, 2011, at 12:26 PM, Amanda Lenhart wrote:
Thanks to everyone who wrote with feedback and details about their twitter-oriented research needs. I've pulled together everyone's requests and I've forwarded them to my friend at Twitter. I'll update the list when I hear anything.
Thanks,
Amanda
Amanda Lenhart Pew Research Center alenhart@pewinternet.org<mailto:alenhart@pewinternet.org>
_ _ _ _ _ _ _
On Friday, I had a conversation at conference with someone I know who works at Twitter. We talked about this exact issue. And while Twitter can't change back the API because of other problems the change was fixing, she would very much like to give academics and non-profit researchers access to Twitter data. However, she has to push through a proposal internally to make this happen. She said it would help her make the case if I could tell her what parts of the data set researchers wanted to access.
I offered to ping the AoIR list to get a sense of what people want and need from Twitter to be able to do/continue their research.
Also, one thing my friend did mention -- because Twitter data can never be fully anonymized, there might be some limitations on what kind of analysis you could do - mostly along the lines of limits on analysis that would reveal information about the individual that they had not made explicit and which might be harmful (e.g. Using network analysis to speculate on users' sexual orientation).
So, please email me off-list and I'll compile the types of data requests and send them along to my Twitter friend.
Thanks,
Amanda
Amanda Lenhart Pew Research Center alenhart@pewinternet.org<mailto:alenhart@pewinternet.org>
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr. Cornelius Puschmann, M.A. Department for English Language and Linguistics Heinrich-Heine-Universität Düsseldorf Building 23.11, Level 1, Room 21 Universitätsstrasse 1 40225 Düsseldorf Germany +49 211 81 15927 (office) Nachwuchsforschergruppe "Wissenschaft und Internet" / Junior Researchers Group "Science and the Internet" http://nfgwin.uni-duesseldorf.de
Hi List, I heard back from my friend at Twitter - I've pasted in her response to my message below - in it she notes which information is still available through the API, which is available but limited and what information isn't going to be trackable through the API. She also indicates places where we may need to establish a special link to a twitter staffer to work with academics on particular types of requests. She has annotated the bulleted list I provided and then provided a narrative below the numbered list of subsample types that researchers need. "I'm figuring out who the stakeholders are around here for issues like this, but in the meantime did want to pass along some information about possible workarounds that exist currently. -geotagging - you can glean geo data from a given tweet through API -location - yes, same as above, searching by location is limited (see below) -institutional tweets - you could track all tweets from a given account through API -@replies - also trackable via API -hashtags and post keywords - you can search/track by keyword but it's limited (see below) -collocating a single user's tweets - same as institutional -RT counts and tracking - yes. RT counts is an iffy API but it is/will be generally available -words within tweets - same as keyword -follower/following data - yes, for the instant when you query the API. this data over time isn't returned by us -number of tweets - that a user has posted? yes -links - limited to search restrictions (below) -images/avatars - yes -lists - yes Sometimes researchers need subsamples of the twitter stream based around: 1. A date and time (e.g. all tweets two days before, during and two days after the Superbowl) 2. Hashtags or keywords 3. Individuals (e.g. top ten most active people in a particular community of practice) 4. Random subsample of twitter stream 1 & 2 here are usually impossible for researchers to get, the way they want it, through our API, and this would be the main area we'd have to work to provide resources for, I think. "Historical" searches for keywords or over past date ranges have to be done through the search function or search API; this is pretty limited to a moving 5-7 day window. Nothing's available once it becomes older than that. Streaming API is much better at returning this kind of targeted data (there's even a "hose" that will just return tweets with links in them, as mentioned in the first list), like tweets with keywords, tweets from this location, tweets from this list of users -- but it's all in real time. You'd have to set up a connection way in advance to make sure you get everything as it happens, and then go back and perform your analysis on the accumulated corpus. #3 - If you determine yourself who the relevant individuals are, you can totally get their timelines through the API #4 - available in real time through Streaming API, but not through historical lookup Let me know if this helps at all in the meantime!" So, those of you who are still not able to gather the data you like even with what she suggests is still available above, please let me know. I think the key issue seems to be historic access to tweets, which I'm guessing Twitter can't afford to store given the volume of material passing through their servers. It's not clear from Gnip's (twitter's data reseller) website whether they offer historic access, but it doesn't look like they do, either. I suspect that the US Library of Congress, which is apparently working with Twitter and Gnip to archive twitter feeds may end up being the lone keeper of twitter data that is older than a week. However, they are still figuring out all the technical issues, and haven't begun making this information available publicly, and it may be some time before they do. Hopefully the above is helpful to some of you. But please do let me know (offlist) those of you who will be unable to collect your data after the end of this month, and I'll take that back to Twitter and we'll go from there. {Update: Since I wrote this and tried unsuccessfully to post this to the list yesterday, it seems like TwapperKeeper will enable some downloading (into Excel) of historic tweets. http://twapperkeeper.wordpress.com/2011/03/22/save-as-excel-feature-has-been.... And Luca Rosa posted a possible work-around to the list earlier today.} Thanks, Amanda Amanda Lenhart Senior Research Specialist Pew Research Center's Internet & American Life Project alenhart@pewinternet.org<mailto:alenhart@pewinternet.org> twitter: amanda_lenhart
This is very useful, Amanda; thank you for reaching out to your colleagues and sharing the response. As one of those people doing historical research of Twitter use and tags and such, I find it unfortunate that this information is no longer readily searchable. This is heightened given how I am still able to locate my first Tweet -- http://twitter.com/#!/JeffreyKeefer/statuses/5886035 -- a full 12,459 Tweets ago. Alas, it seems historical Tweets remain online (for the time being?), though there just may not be a way to access them. ----- Jeffrey Keefer j.keefer@lancaster.ac.uk Blog: http://silenceandvoice.com Twitter: http://twitter.com/JeffreyKeefer Website: http://www.jeffreykeefer.com On Mar 22, 2011, at 3:34 PM, Amanda Lenhart wrote:
Hi List,
I heard back from my friend at Twitter - I've pasted in her response to my message below - in it she notes which information is still available through the API, which is available but limited and what information isn't going to be trackable through the API. She also indicates places where we may need to establish a special link to a twitter staffer to work with academics on particular types of requests. She has annotated the bulleted list I provided and then provided a narrative below the numbered list of subsample types that researchers need.
"I'm figuring out who the stakeholders are around here for issues like this, but in the meantime did want to pass along some information about possible workarounds that exist currently.
-geotagging - you can glean geo data from a given tweet through API
-location - yes, same as above, searching by location is limited (see below)
-institutional tweets - you could track all tweets from a given account through API
-@replies - also trackable via API
-hashtags and post keywords - you can search/track by keyword but it's limited (see below)
-collocating a single user's tweets - same as institutional
-RT counts and tracking - yes. RT counts is an iffy API but it is/will be generally available
-words within tweets - same as keyword
-follower/following data - yes, for the instant when you query the API. this data over time isn't returned by us
-number of tweets - that a user has posted? yes
-links - limited to search restrictions (below)
-images/avatars - yes
-lists - yes
Sometimes researchers need subsamples of the twitter stream based around:
1. A date and time (e.g. all tweets two days before, during and
two days after the Superbowl)
2. Hashtags or keywords
3. Individuals (e.g. top ten most active people in a particular
community of practice)
4. Random subsample of twitter stream
1 & 2 here are usually impossible for researchers to get, the way they want it, through our API, and this would be the main area we'd have to work to provide resources for, I think.
"Historical" searches for keywords or over past date ranges have to be done through the search function or search API; this is pretty limited to a moving 5-7 day window. Nothing's available once it becomes older than that.
Streaming API is much better at returning this kind of targeted data (there's even a "hose" that will just return tweets with links in them, as mentioned in the first list), like tweets with keywords, tweets from this location, tweets from this list of users -- but it's all in real time. You'd have to set up a connection way in advance to make sure you get everything as it happens, and then go back and perform your analysis on the accumulated corpus.
#3 - If you determine yourself who the relevant individuals are, you can totally get their timelines through the API
#4 - available in real time through Streaming API, but not through historical lookup
Let me know if this helps at all in the meantime!"
So, those of you who are still not able to gather the data you like even with what she suggests is still available above, please let me know. I think the key issue seems to be historic access to tweets, which I'm guessing Twitter can't afford to store given the volume of material passing through their servers. It's not clear from Gnip's (twitter's data reseller) website whether they offer historic access, but it doesn't look like they do, either. I suspect that the US Library of Congress, which is apparently working with Twitter and Gnip to archive twitter feeds may end up being the lone keeper of twitter data that is older than a week. However, they are still figuring out all the technical issues, and haven't begun making this information available publicly, and it may be some time before they do.
Hopefully the above is helpful to some of you. But please do let me know (offlist) those of you who will be unable to collect your data after the end of this month, and I'll take that back to Twitter and we'll go from there.
{Update: Since I wrote this and tried unsuccessfully to post this to the list yesterday, it seems like TwapperKeeper will enable some downloading (into Excel) of historic tweets. http://twapperkeeper.wordpress.com/2011/03/22/save-as-excel-feature-has-been.... And Luca Rosa posted a possible work-around to the list earlier today.}
Thanks,
Amanda
Amanda Lenhart Senior Research Specialist Pew Research Center's Internet & American Life Project alenhart@pewinternet.org<mailto:alenhart@pewinternet.org> twitter: amanda_lenhart
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
participants (3)
-
Amanda Lenhart -
Cornelius Puschmann -
Jeffrey Keefer