Advice on ripping a twitter feed
Hi everyone, I was wondering if anyone had any experience of taking a twitter feed and downloading the context of it, so as it could be turned into a text file for analysis? Using either an individual's twitter page or twitter search you can generate an RSS feed (for example: http://search.twitter.com/search.atom?q=%23barackobama is the feed for #BarackObama). But I can't find a way to download it in a usable form. I did try Stu Shulman's excellent blog capture tool (https://surveyweb2.ucsur.pitt.edu/qblog/page_login.php) but it didn't seem to be able to see the feeds. Does anyone have any suggestions? Many thanks, Nick -- Nick Anstead Lecturer in Politics, School of Political, Social and International Studies, University of East Anglia [r] Arts 3.59 [e] n.anstead@uea.ac.uk [p] 01603 59 2888 [m] 07788 413 443 [b] www.nickanstead.com/blog
2009/4/2 Anstead Nicholas Mr (PSI) <N.Anstead@uea.ac.uk>:
Hi everyone,
I was wondering if anyone had any experience of taking a twitter feed and downloading the context of it, so as it could be turned into a text file for analysis?
I'd also be interested in that ... I've used Wordle (which works with the RSS feed) on Twitter feeds & had hoped to compare a Wordle with a Treecloud ( http://www.treecloud.org/ & also http://www.daniel-lemire.com/blog/archives/2009/04/01/generate-your-own-tree... ) However, I couldn't work out how to get the text that I needed! Emma -- Emma Duke-Williams: School of Computing/ Faculty eLearning Co-ordinator, University of Portsmouth, UK. Blog: http://userweb.port.ac.uk/~duke-wie/blog/ Twitter: http://twitter.com/emmadw SL: Emmadw Rickenbacker
We are looking into the viability of capturing feeds with the Blog Analysis Toolkit...if there is an RSS feed, we should be able to capture it. https://surveyweb2.ucsur.pitt.edu/qblog/page_login.php On Thu, Apr 2, 2009 at 10:06 AM, Emma Duke-Williams < emma.dukewilliams@gmail.com> wrote:
2009/4/2 Anstead Nicholas Mr (PSI) <N.Anstead@uea.ac.uk>:
Hi everyone,
I was wondering if anyone had any experience of taking a twitter feed and downloading the context of it, so as it could be turned into a text file for analysis?
I'd also be interested in that ... I've used Wordle (which works with the RSS feed) on Twitter feeds & had hoped to compare a Wordle with a Treecloud ( http://www.treecloud.org/ & also
http://www.daniel-lemire.com/blog/archives/2009/04/01/generate-your-own-tree... ) However, I couldn't work out how to get the text that I needed!
Emma -- Emma Duke-Williams: School of Computing/ Faculty eLearning Co-ordinator, University of Portsmouth, UK. Blog: http://userweb.port.ac.uk/~duke-wie/blog/<http://userweb.port.ac.uk/%7Eduke-wie/blog/> Twitter: http://twitter.com/emmadw SL: Emmadw Rickenbacker _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr. Stuart W. Shulman Assistant Professor Department of Political Science University of Massachusetts Amherst 200 Hicks Way Amherst, MA 01003 http://people.umass.edu/stu/ stu@polsci.umass.edu 413-545-5375 Editor, Journal of Information Technology and Politics http://www.jitp.net Director, QDAP-UMass http://www.umass.edu/qdap/ Associate Director, National Center for Digital Government http://www.umass.edu/digitalcenter/
Take a look at http://contextminer.org and for real time http://monitter.com On Thu, Apr 2, 2009 at 10:09 AM, Stuart Shulman <stuart.shulman@gmail.com>wrote:
We are looking into the viability of capturing feeds with the Blog Analysis Toolkit...if there is an RSS feed, we should be able to capture it.
https://surveyweb2.ucsur.pitt.edu/qblog/page_login.php
On Thu, Apr 2, 2009 at 10:06 AM, Emma Duke-Williams < emma.dukewilliams@gmail.com> wrote:
2009/4/2 Anstead Nicholas Mr (PSI) <N.Anstead@uea.ac.uk>:
Hi everyone,
I was wondering if anyone had any experience of taking a twitter feed and downloading the context of it, so as it could be turned into a text file for analysis?
I'd also be interested in that ... I've used Wordle (which works with the RSS feed) on Twitter feeds & had hoped to compare a Wordle with a Treecloud ( http://www.treecloud.org/ & also
http://www.daniel-lemire.com/blog/archives/2009/04/01/generate-your-own-tree...
) However, I couldn't work out how to get the text that I needed!
Emma -- Emma Duke-Williams: School of Computing/ Faculty eLearning Co-ordinator, University of Portsmouth, UK. Blog: http://userweb.port.ac.uk/~duke-wie/blog/<http://userweb.port.ac.uk/%7Eduke-wie/blog/> <http://userweb.port.ac.uk/%7Eduke-wie/blog/> Twitter: http://twitter.com/emmadw SL: Emmadw Rickenbacker _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr. Stuart W. Shulman Assistant Professor Department of Political Science University of Massachusetts Amherst 200 Hicks Way Amherst, MA 01003
http://people.umass.edu/stu/ stu@polsci.umass.edu 413-545-5375
Editor, Journal of Information Technology and Politics http://www.jitp.net
Director, QDAP-UMass http://www.umass.edu/qdap/
Associate Director, National Center for Digital Government http://www.umass.edu/digitalcenter/ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
I've had a look at some of the tools that have been suggested: 1: Monitter - I couldn't work out how to get a text file output (actually, I must be pretty dense, as I couldn't even work out how to get a single persons updates, never mind extract them!) 2: Blog analysis tool kit. Managed to get a feed fine, though it had the URLs for the different tweets in, so I guess that would cause problems with the treecloud. 3: Context miner; seems to be very complicated to set up - I can't quite work out what to do. That said, I could see that it could be useful for something more complex than just getting a text file with a feed from a single person on Twitter - which is what I'm after &, I think, what the original poster was after. Emma -- Emma Duke-Williams: School of Computing/ Faculty eLearning Co-ordinator, University of Portsmouth, UK. Blog: http://userweb.port.ac.uk/~duke-wie/blog/ Twitter: http://twitter.com/emmadw SL: Emmadw Rickenbacker
This may be worth looking at:http://tweetake.com/ http://code.google.com/p/tweetake/ I don't know what their privacy policy is. ---- School of Interactive Computing Georgia Institute of Technology www.cc.gatech.edu/~yardi On Thu, Apr 2, 2009 at 10:35 AM, Emma Duke-Williams < emma.dukewilliams@gmail.com> wrote:
I've had a look at some of the tools that have been suggested: 1: Monitter - I couldn't work out how to get a text file output (actually, I must be pretty dense, as I couldn't even work out how to get a single persons updates, never mind extract them!)
2: Blog analysis tool kit. Managed to get a feed fine, though it had the URLs for the different tweets in, so I guess that would cause problems with the treecloud.
3: Context miner; seems to be very complicated to set up - I can't quite work out what to do. That said, I could see that it could be useful for something more complex than just getting a text file with a feed from a single person on Twitter - which is what I'm after &, I think, what the original poster was after.
Emma
-- Emma Duke-Williams: School of Computing/ Faculty eLearning Co-ordinator, University of Portsmouth, UK. Blog: http://userweb.port.ac.uk/~duke-wie/blog/ Twitter: http://twitter.com/emmadw SL: Emmadw Rickenbacker _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Incidentally, I have tried the Blog Analysis toolkit (see blelow) and it works fine. Grab the RSS feed from the right-hand side of the Twitter website and set this as the blog URL. : -----Original Message----- : From: Stuart Shulman [mailto:stuart.shulman@gmail.com] : Sent: Friday, 3 April 2009 3:10 a.m. : To: air-l@listserv.aoir.org : Subject: Re: [Air-L] Advice on ripping a twitter feed : : We are looking into the viability of capturing feeds with the Blog Analysis : Toolkit...if there is an RSS feed, we should be able to capture it. : : https://surveyweb2.ucsur.pitt.edu/qblog/page_login.php : : On Thu, Apr 2, 2009 at 10:06 AM, Emma Duke-Williams < : emma.dukewilliams@gmail.com> wrote: : : > 2009/4/2 Anstead Nicholas Mr (PSI) <N.Anstead@uea.ac.uk>: : > > Hi everyone, : > > : > > I was wondering if anyone had any experience of taking a twitter feed : > > and downloading the context of it, so as it could be turned into a text : > > file for analysis? : > > : > : > I'd also be interested in that ... I've used Wordle (which works with : > the RSS feed) on Twitter feeds & had hoped to compare a Wordle with a : > Treecloud ( http://www.treecloud.org/ & also : > : > http://www.daniel-lemire.com/blog/archives/2009/04/01/generate-your-own- : tree-clouds/ : > ) : > However, I couldn't work out how to get the text that I needed! : > : > Emma : > -- : > Emma Duke-Williams: : > School of Computing/ Faculty eLearning Co-ordinator, University of : > Portsmouth, UK. : > Blog: http://userweb.port.ac.uk/~duke- : wie/blog/<http://userweb.port.ac.uk/%7Eduke-wie/blog/> : > Twitter: http://twitter.com/emmadw : > SL: Emmadw Rickenbacker : > _______________________________________________ : > The Air-L@listserv.aoir.org mailing list : > is provided by the Association of Internet Researchers http://aoir.org : > Subscribe, change options or unsubscribe at: : > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org : > : > Join the Association of Internet Researchers: : > http://www.aoir.org/ : > : : : : -- : Dr. Stuart W. Shulman : Assistant Professor : Department of Political Science : University of Massachusetts Amherst : 200 Hicks Way : Amherst, MA 01003 : : http://people.umass.edu/stu/ : stu@polsci.umass.edu : 413-545-5375 : : Editor, Journal of Information Technology and Politics : http://www.jitp.net : : Director, QDAP-UMass : http://www.umass.edu/qdap/ : : Associate Director, National Center for Digital Government : http://www.umass.edu/digitalcenter/
Very cool. Thanks Andrew! This is a great time to send feedback about BAT. The purpose of the Blog Analysis Toolkit is to establish a socially-constructed repository of blog posts that are archived and accessible for research purposes. There are about 250 BAT users at the present moment archiving about 200 blogs.The posts are formatted in one of two ways to allow coding at the document or paragraph level using another free software system, the Coding Analysis Toolkit <http://cat.ucsur.pitt.edu/> (CAT). Once you join the system you have access to all the archived posts and you can add new blogs to the archiving process. We have just started a new programmer to improve the platform, which is a free by-product of ongoing NSF-funded research. We want to increase its functionality and usability, so AoIR members are strongly encouraged to let us know what you want BAT to do in the future. We face challenges doing some simple things, like getting the comments and the archives. If you know how, perhaps how join the BAT development team. The quick-start BAT tutorial is online at: http://www.screencast.com/t/OcRziCMg ~Stu On Thu, Apr 2, 2009 at 6:10 PM, Andrew Long <ALong@infoscience.otago.ac.nz>wrote:
Incidentally, I have tried the Blog Analysis toolkit (see blelow) and it works fine. Grab the RSS feed from the right-hand side of the Twitter website and set this as the blog URL.
-- Dr. Stuart W. Shulman Assistant Professor Department of Political Science University of Massachusetts Amherst 200 Hicks Way Amherst, MA 01003 http://people.umass.edu/stu/ stu@polsci.umass.edu 413-545-5375 Editor, Journal of Information Technology and Politics http://www.jitp.net Director, QDAP-UMass http://www.umass.edu/qdap/ Associate Director, National Center for Digital Government http://www.umass.edu/digitalcenter/
Hi, The Infoscape Lab has some experience tracking twitter feeds that I would like to share. First, twitter feeds are RSS feeds. As a result, a number of tools exists to archive an analysis content, like the Coding Analysis Toolkit. We use a RSS aggregator called Gregarius (http://gregarius.net/) to collect Twitter feeds and save them in a database. Importantly, Twitter moves faster than blogs. Most RSS aggregators collect on an hourly or daily basis. We have had to manually refresh our aggregator every 10 minutes to catch the flow of tweets in busy periods. Second, we have used hashtags as a way to sample discussion in Twitter. During our recent Canadian election, we tracked hashtags related to a leadership debate. Hashtags can be tricky because they change rapidly and seem to naturally emerge, so we also relied on a basket of users as well. You can see our Twitter coverage of the Canadian debates here: http://www.cbc.ca/news/canadavotes/campaign2/ormiston/2008/10/debate_hangove.... We are pretty happy with this time-sensitive sample of Twitter because it captured how people flock to the site during important moments like the debates. If any one has any more questions about our perspective or method on Twitter, I'd be happy to help. All the best, Fen On Thu, Apr 2, 2009 at 6:50 PM, Stuart Shulman <stuart.shulman@gmail.com> wrote:
Very cool. Thanks Andrew! This is a great time to send feedback about BAT. The purpose of the Blog Analysis Toolkit is to establish a socially-constructed repository of blog posts that are archived and accessible for research purposes. There are about 250 BAT users at the present moment archiving about 200 blogs.The posts are formatted in one of two ways to allow coding at the document or paragraph level using another free software system, the Coding Analysis Toolkit <http://cat.ucsur.pitt.edu/> (CAT). Once you join the system you have access to all the archived posts and you can add new blogs to the archiving process.
We have just started a new programmer to improve the platform, which is a free by-product of ongoing NSF-funded research. We want to increase its functionality and usability, so AoIR members are strongly encouraged to let us know what you want BAT to do in the future. We face challenges doing some simple things, like getting the comments and the archives. If you know how, perhaps how join the BAT development team. The quick-start BAT tutorial is online at:
http://www.screencast.com/t/OcRziCMg
~Stu
On Thu, Apr 2, 2009 at 6:10 PM, Andrew Long <ALong@infoscience.otago.ac.nz>wrote:
Incidentally, I have tried the Blog Analysis toolkit (see blelow) and it works fine. Grab the RSS feed from the right-hand side of the Twitter website and set this as the blog URL.
-- Dr. Stuart W. Shulman Assistant Professor Department of Political Science University of Massachusetts Amherst 200 Hicks Way Amherst, MA 01003
http://people.umass.edu/stu/ stu@polsci.umass.edu 413-545-5375
Editor, Journal of Information Technology and Politics http://www.jitp.net
Director, QDAP-UMass http://www.umass.edu/qdap/
Associate Director, National Center for Digital Government http://www.umass.edu/digitalcenter/ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Fenwick McKelvey PhD Student in Communication and Culture Ryerson / York Universities Research Associate Infoscape Research Lab http://www.infoscapelab.ca Research Associate VideoCom Research Initiative http://videocom.knet.ca
I have an untagged (plaintext) corpus of tweets from about 27k users (don't have a word count but I can check) that I could share in anonymized form (it's originally a conference dataset). Together with something like TextLAB (http://www.niederlandistik.fu-berlin.de/textstat/software-en.html) that works pretty well if you don't want to do anything terribly sophisticated. Best, Cornelius Puschmann, PhD University of Duesseldorf On Fri, Apr 3, 2009 at 9:15 PM, Fenwick Mckelvey <mckelveyf@gmail.com>wrote:
Hi, The Infoscape Lab has some experience tracking twitter feeds that I would like to share. First, twitter feeds are RSS feeds. As a result, a number of tools exists to archive an analysis content, like the Coding Analysis Toolkit. We use a RSS aggregator called Gregarius (http://gregarius.net/) to collect Twitter feeds and save them in a database. Importantly, Twitter moves faster than blogs. Most RSS aggregators collect on an hourly or daily basis. We have had to manually refresh our aggregator every 10 minutes to catch the flow of tweets in busy periods. Second, we have used hashtags as a way to sample discussion in Twitter. During our recent Canadian election, we tracked hashtags related to a leadership debate. Hashtags can be tricky because they change rapidly and seem to naturally emerge, so we also relied on a basket of users as well.
You can see our Twitter coverage of the Canadian debates here:
http://www.cbc.ca/news/canadavotes/campaign2/ormiston/2008/10/debate_hangove... . We are pretty happy with this time-sensitive sample of Twitter because it captured how people flock to the site during important moments like the debates. If any one has any more questions about our perspective or method on Twitter, I'd be happy to help.
All the best, Fen
On Thu, Apr 2, 2009 at 6:50 PM, Stuart Shulman <stuart.shulman@gmail.com> wrote:
Very cool. Thanks Andrew! This is a great time to send feedback about BAT. The purpose of the Blog Analysis Toolkit is to establish a socially-constructed repository of blog posts that are archived and accessible for research purposes. There are about 250 BAT users at the present moment archiving about 200 blogs.The posts are formatted in one of two ways to allow coding at the document or paragraph level using another free software system, the Coding Analysis Toolkit <http://cat.ucsur.pitt.edu/> (CAT). Once you join the system you have access to all the archived posts and you can add new blogs to the archiving process.
We have just started a new programmer to improve the platform, which is a free by-product of ongoing NSF-funded research. We want to increase its functionality and usability, so AoIR members are strongly encouraged to let us know what you want BAT to do in the future. We face challenges doing some simple things, like getting the comments and the archives. If you know how, perhaps how join the BAT development team. The quick-start BAT tutorial is online at:
http://www.screencast.com/t/OcRziCMg
~Stu
On Thu, Apr 2, 2009 at 6:10 PM, Andrew Long <ALong@infoscience.otago.ac.nz>wrote:
Incidentally, I have tried the Blog Analysis toolkit (see blelow) and it works fine. Grab the RSS feed from the right-hand side of the Twitter website and set this as the blog URL.
-- Dr. Stuart W. Shulman Assistant Professor Department of Political Science University of Massachusetts Amherst 200 Hicks Way Amherst, MA 01003
http://people.umass.edu/stu/ stu@polsci.umass.edu 413-545-5375
Editor, Journal of Information Technology and Politics http://www.jitp.net
Director, QDAP-UMass http://www.umass.edu/qdap/
Associate Director, National Center for Digital Government http://www.umass.edu/digitalcenter/ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Fenwick McKelvey PhD Student in Communication and Culture Ryerson / York Universities
Research Associate Infoscape Research Lab http://www.infoscapelab.ca
Research Associate VideoCom Research Initiative http://videocom.knet.ca _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr. des. Cornelius Puschmann, M.A. Department of English Language and Linguistics / University of Düsseldorf, Germany University Library Center (hbz), Cologne, Germany +49 211 811 5927 (office) +49 176 811 78067 (mobile) +49 211 139 566 84 (home) www.ynada.com www.elanguage.net
You could try http://tweetbackup.com to grab all tweets for a given user and then export these as text, HTML or XML. : -----Original Message----- : From: Anstead Nicholas Mr (PSI) [mailto:N.Anstead@uea.ac.uk] : Sent: Thursday, 2 April 2009 11:17 p.m. : To: air-l@listserv.aoir.org : Subject: [Air-L] Advice on ripping a twitter feed : : Hi everyone, : : I was wondering if anyone had any experience of taking a twitter feed : and downloading the context of it, so as it could be turned into a text : file for analysis? : : Using either an individual's twitter page or twitter search you can : generate an RSS feed (for example: : http://search.twitter.com/search.atom?q=%23barackobama is the feed for : #BarackObama). But I can't find a way to download it in a usable form. : : I did try Stu Shulman's excellent blog capture tool : (https://surveyweb2.ucsur.pitt.edu/qblog/page_login.php) but it didn't : seem to be able to see the feeds. : : Does anyone have any suggestions? : : Many thanks, Nick : : : -- : Nick Anstead : : Lecturer in Politics, : School of Political, Social and International Studies, : University of East Anglia : : [r] Arts 3.59 : [e] n.anstead@uea.ac.uk : [p] 01603 59 2888 : [m] 07788 413 443 : [b] www.nickanstead.com/blog
participants (8)
-
Andrew Long -
Anstead Nicholas Mr (PSI) -
Cornelius Puschmann -
Emma Duke-Williams -
Fenwick Mckelvey -
paul jones -
Sarita Yardi -
Stuart Shulman