I have an untagged (plaintext) corpus of tweets from about 27k users (don't have a word count but I can check) that I could share in anonymized form (it's originally a conference dataset). Together with something like TextLAB (http://www.niederlandistik.fu-berlin.de/textstat/software-en.html) that works pretty well if you don't want to do anything terribly sophisticated. Best, Cornelius Puschmann, PhD University of Duesseldorf On Fri, Apr 3, 2009 at 9:15 PM, Fenwick Mckelvey <mckelveyf@gmail.com>wrote:
Hi, The Infoscape Lab has some experience tracking twitter feeds that I would like to share. First, twitter feeds are RSS feeds. As a result, a number of tools exists to archive an analysis content, like the Coding Analysis Toolkit. We use a RSS aggregator called Gregarius (http://gregarius.net/) to collect Twitter feeds and save them in a database. Importantly, Twitter moves faster than blogs. Most RSS aggregators collect on an hourly or daily basis. We have had to manually refresh our aggregator every 10 minutes to catch the flow of tweets in busy periods. Second, we have used hashtags as a way to sample discussion in Twitter. During our recent Canadian election, we tracked hashtags related to a leadership debate. Hashtags can be tricky because they change rapidly and seem to naturally emerge, so we also relied on a basket of users as well.
You can see our Twitter coverage of the Canadian debates here:
http://www.cbc.ca/news/canadavotes/campaign2/ormiston/2008/10/debate_hangove... . We are pretty happy with this time-sensitive sample of Twitter because it captured how people flock to the site during important moments like the debates. If any one has any more questions about our perspective or method on Twitter, I'd be happy to help.
All the best, Fen
On Thu, Apr 2, 2009 at 6:50 PM, Stuart Shulman <stuart.shulman@gmail.com> wrote:
Very cool. Thanks Andrew! This is a great time to send feedback about BAT. The purpose of the Blog Analysis Toolkit is to establish a socially-constructed repository of blog posts that are archived and accessible for research purposes. There are about 250 BAT users at the present moment archiving about 200 blogs.The posts are formatted in one of two ways to allow coding at the document or paragraph level using another free software system, the Coding Analysis Toolkit <http://cat.ucsur.pitt.edu/> (CAT). Once you join the system you have access to all the archived posts and you can add new blogs to the archiving process.
We have just started a new programmer to improve the platform, which is a free by-product of ongoing NSF-funded research. We want to increase its functionality and usability, so AoIR members are strongly encouraged to let us know what you want BAT to do in the future. We face challenges doing some simple things, like getting the comments and the archives. If you know how, perhaps how join the BAT development team. The quick-start BAT tutorial is online at:
http://www.screencast.com/t/OcRziCMg
~Stu
On Thu, Apr 2, 2009 at 6:10 PM, Andrew Long <ALong@infoscience.otago.ac.nz>wrote:
Incidentally, I have tried the Blog Analysis toolkit (see blelow) and it works fine. Grab the RSS feed from the right-hand side of the Twitter website and set this as the blog URL.
-- Dr. Stuart W. Shulman Assistant Professor Department of Political Science University of Massachusetts Amherst 200 Hicks Way Amherst, MA 01003
http://people.umass.edu/stu/ stu@polsci.umass.edu 413-545-5375
Editor, Journal of Information Technology and Politics http://www.jitp.net
Director, QDAP-UMass http://www.umass.edu/qdap/
Associate Director, National Center for Digital Government http://www.umass.edu/digitalcenter/ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Fenwick McKelvey PhD Student in Communication and Culture Ryerson / York Universities
Research Associate Infoscape Research Lab http://www.infoscapelab.ca
Research Associate VideoCom Research Initiative http://videocom.knet.ca _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr. des. Cornelius Puschmann, M.A. Department of English Language and Linguistics / University of Düsseldorf, Germany University Library Center (hbz), Cologne, Germany +49 211 811 5927 (office) +49 176 811 78067 (mobile) +49 211 139 566 84 (home) www.ynada.com www.elanguage.net