Re: [Air-L] Academic replacements for TwapperKeeper.com?
Hi Deen, I've used both services and they are very good, but you're better off setting up a server and running a few apps on your own. I'm running a modified version of yTK through a MAMP server at the University of São Paulo (99.9% uptime). We've changed yourTwapperKeeper considerably. There are a few supplementary scripts that map users network after hashtag archiving. It takes a lot of requests to get it done, but it's been working fine. If you can get your IP address whitlistened, then you're good to map the network. For the time being I'd tell you to work on yTK until it does what you need. Let me know if you hear of any other way to work this out. []s Marco ___________________ Marco Toledo Bastos Postdoctoral Fellow FiloCom - ECA / USP (0055) 11 7102-4756 FAMe - UniFrankfurt (0049) 151-58768326
Date: Wed, 23 Feb 2011 11:37:08 -0800 From: Deen Freelon <dfreelon@u.washington.edu> To: air-l@listserv.aoir.org Subject: Re: [Air-L] Academic replacements for TwapperKeeper.com? Message-ID: <4D6561E3.6040505@u.washington.edu> Content-Type: text/plain; charset=windows-1252; format=flowed
I would also be curious to know what others have been using or plan to use for harvesting Twitter data. I've used both TwapperKeeper and 140kit, and found that the latter is quite good for hashtag archiving, but not as good at keyword archiving. Further, 140kit has a max scrape time of one week, although that is manually renewable I believe. Finally, both TK and 140kit can be quite slow and even unavailable at times, and as we've just seen they may shut down at any time.
All of this has made me quite wary of relying on externally managed "clouds" for data collection. That is why I intend to set up my own Twitter harvesting operation for use within my own department, as many CS researchers do, and would encourage others with the necessary means and knowledge to do the same. Much valuable data can be collected even within the default API query limits, though I'll certainly ask Twitter to put me on the whitelist. Running one's own archiving operation is fairly cheap, and since you're only archiving your own data, you aren't hamstrung by hundreds of other jobs running simultaneously.
If there's any interest in learning how to set up small-scale Twitter scrapes, let me know and I'll write something up when I have the time. Best, ~DEEN
We have been collecting tweets on DiscoverText. For example, we got quite a few leading up to the State of the Union: http://discovertext.com/sotu.aspx These tweets are available for use and re-use in their raw and de-duplicated forms. You can try DiscoverText for free for 30 days: https://discovertext.com/registration.aspx Login with your Facebook credentials if you also want to scrape public Facebook pages, but first we advise that you read what you have to agree to to use Facebook credentials. http://help.discovertext.com/Using%20Facebook%20to%20Login.ashx So far I have collected over 500,000 comments off the Whitehouse Official Facebook page. Inside DT you can remove duplicates, run searches that create sub-set "buckets", and assign groups of coders. We also feature tools for measuring inter-rater reliability and adjudicating the validity of coding. Several universities have purchased enterprise site licenses: https://discovertext.com/RegisterEduSelect.aspx If your university is on this list, you can get a license key for 2011 today. If your university wants to be on the list, have your IT purchasing agent contact me. ~Stu -- Stuart Shulman President & CEO Texifter, LLC <http://www.texifter.com/> Have you tried DiscoverText? http://discovertext.com *Featuring the Facebook Graph & Twitter APIs*
participants (2)
-
Marco Toledo Bastos -
Stuart Shulman