Re: [Air-L] Q: Twitter text logs?
Doron, I would recommend using the Twitter API directly. Here is the page in the Twitter API Wiki that explains how to correctly call on the API for a search : http://apiwiki.twitter.com/Twitter-Search-API-Method:-search. When you make such a call into the API, it will return an XML file that can be imported into MS Excel (Windows only-- sorry, fellow Mac users). I wrote a step-by-step blog post on how to do this using the "user timeline" method when we were in the middle of our Twitter research project last semester: http://reyjunco.tumblr.com/post/219287195/how-to-export-twitter-updates-to-e... If you want to download a lot of tweets, make sure you pay particular attention to the *rpp* (returns per page) and the *page* parameters. I have a lot of experience with this as we downloaded and archived almost 3,500 tweets during our 15-week-long study. So, please let me know if you have any questions. Best, Rey Junco -- Dr. Reynol Junco Associate Professor Department of Academic Development and Counseling Director, Disability Services Lock Haven University http://www.reyjunco.com ------------------------------ Message: 5 Date: Tue, 16 Mar 2010 22:02:22 +0200 From: "Friedman Doron" <doronf@idc.ac.il> To: <air-l@listserv.aoir.org> Subject: [Air-L] Q: Twitter text logs? Message-ID: <EAC062C903BCD049AE7C3602A2F373A5AFC1EB@JAMES.idc.ac.il> Content-Type: text/plain; charset="us-ascii" Hi, Can anyone point us to authentic data from twitter that can be used for research purposes? We are looking for sub-networks of followers, and a set of texts generated by users that belong to the network. Alternatively status messages from Facebook with the corresponding friendship graphs can be useful as well. of course, the actual usernames can be encrypted for privacy. Of course we will give full credit to whoever has been able to harvest such data and make it available to the research community. Thanks! - doron ==================== Dr. Doron Friedman Lecturer, The Interdisciplinary Center, Herzliya (Israel) & Honorary Lecturer, University College London Mobile: +972-54-4461807 Office: +972-9-9527654 http://www.idc.ac.il/communications/avl -- Dr. Reynol Junco Associate Professor Department of Academic Development and Counseling Director, Disability Services Lock Haven University http://www.reyjunco.com
I'm using a similar methodology as Reynol for a project I am doing (see http://blog.ynada.com/179). Here's a very basic step-by-step description. Sorry,I realize that some if this stuff may be difficult to implement without the right technical expertise. Also, you need to be on Linux. :-( 1. Set up a special account to follow a group of users you want to study. This has several advantages (see below). You can see the account by using search, the public timeline, lists etc. I've seeded http://twitter.com/scientwists mainly via lists. 2. Use a script (e.g. in Bash, Python or Perl) to retrieve the tweets of the people you are following via the Twitter API and cURL. Run this script in regular intervals via a CronJob. I've set up and old laptop running 24/7 to do this for me and fetch new material once per hour, since Twitter doesn't reliabily return older data. 3. Use XSLT (e.g. xsltproc) to convert the XML into a nice concordance, for example CSV. 4. Use something like NLTK or R to extract hashtags, frequent terms, popular URLs, who is the most retweeted user.... other cool stuff you can think of. If you can, put everything on Dropbox. That way you get both a backup and live stats from anywhere you work. Why use a special account rather than just pulling data via the public timeline? a) you can let people know that you plan on using their tweets b) you can allow them to block you, effectively giving them the chance to opt out. just because their Twitter is public doesn't mean they want to be included in your study. c) they can get in touch with you, give feedback ask questions etc d) makes longitudinal research easier since you have a live database via the account e) really easy to expand/modify the corpus -- student assistants can easily help you with that whithout having to do any programming HTH, Cornelius Puschmann On Wed, Mar 17, 2010 at 12:56 AM, Reynol Junco <rey.junco@gmail.com> wrote:
Doron,
I would recommend using the Twitter API directly. Here is the page in the Twitter API Wiki that explains how to correctly call on the API for a search : http://apiwiki.twitter.com/Twitter-Search-API-Method:-search.
When you make such a call into the API, it will return an XML file that can be imported into MS Excel (Windows only-- sorry, fellow Mac users). I wrote a step-by-step blog post on how to do this using the "user timeline" method when we were in the middle of our Twitter research project last semester:
http://reyjunco.tumblr.com/post/219287195/how-to-export-twitter-updates-to-e...
If you want to download a lot of tweets, make sure you pay particular attention to the *rpp* (returns per page) and the *page* parameters.
I have a lot of experience with this as we downloaded and archived almost 3,500 tweets during our 15-week-long study. So, please let me know if you have any questions.
Best,
Rey Junco
-- Dr. Reynol Junco Associate Professor Department of Academic Development and Counseling Director, Disability Services Lock Haven University http://www.reyjunco.com
------------------------------
Message: 5 Date: Tue, 16 Mar 2010 22:02:22 +0200 From: "Friedman Doron" <doronf@idc.ac.il> To: <air-l@listserv.aoir.org> Subject: [Air-L] Q: Twitter text logs? Message-ID: <EAC062C903BCD049AE7C3602A2F373A5AFC1EB@JAMES.idc.ac.il> Content-Type: text/plain; charset="us-ascii"
Hi,
Can anyone point us to authentic data from twitter that can be used for research purposes? We are looking for sub-networks of followers, and a set of texts generated by users that belong to the network. Alternatively status messages from Facebook with the corresponding friendship graphs can be useful as well. of course, the actual usernames can be encrypted for privacy. Of course we will give full credit to whoever has been able to harvest such data and make it available to the research community.
Thanks!
- doron
====================
Dr. Doron Friedman
Lecturer, The Interdisciplinary Center, Herzliya (Israel) &
Honorary Lecturer, University College London
Mobile: +972-54-4461807
Office: +972-9-9527654
http://www.idc.ac.il/communications/avl
-- Dr. Reynol Junco Associate Professor Department of Academic Development and Counseling Director, Disability Services Lock Haven University http://www.reyjunco.com _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr. des. Cornelius Puschmann, M.A. Dept for English Language and Linguistics University of Düsseldorf, Germany -and- University Library Center (hbz), Cologne, Germany http://google.com/profiles/puschmann http://ynada.com http://elanguage.net
participants (2)
-
Cornelius Puschmann -
Reynol Junco