Hi Deen, I've used both services and they are very good, but you're better off setting up a server and running a few apps on your own. I'm running a modified version of yTK through a MAMP server at the University of São Paulo (99.9% uptime). We've changed yourTwapperKeeper considerably. There are a few supplementary scripts that map users network after hashtag archiving. It takes a lot of requests to get it done, but it's been working fine. If you can get your IP address whitlistened, then you're good to map the network. For the time being I'd tell you to work on yTK until it does what you need. Let me know if you hear of any other way to work this out. []s Marco ___________________ Marco Toledo Bastos Postdoctoral Fellow FiloCom - ECA / USP (0055) 11 7102-4756 FAMe - UniFrankfurt (0049) 151-58768326
Date: Wed, 23 Feb 2011 11:37:08 -0800 From: Deen Freelon <dfreelon@u.washington.edu> To: air-l@listserv.aoir.org Subject: Re: [Air-L] Academic replacements for TwapperKeeper.com? Message-ID: <4D6561E3.6040505@u.washington.edu> Content-Type: text/plain; charset=windows-1252; format=flowed
I would also be curious to know what others have been using or plan to use for harvesting Twitter data. I've used both TwapperKeeper and 140kit, and found that the latter is quite good for hashtag archiving, but not as good at keyword archiving. Further, 140kit has a max scrape time of one week, although that is manually renewable I believe. Finally, both TK and 140kit can be quite slow and even unavailable at times, and as we've just seen they may shut down at any time.
All of this has made me quite wary of relying on externally managed "clouds" for data collection. That is why I intend to set up my own Twitter harvesting operation for use within my own department, as many CS researchers do, and would encourage others with the necessary means and knowledge to do the same. Much valuable data can be collected even within the default API query limits, though I'll certainly ask Twitter to put me on the whitelist. Running one's own archiving operation is fairly cheap, and since you're only archiving your own data, you aren't hamstrung by hundreds of other jobs running simultaneously.
If there's any interest in learning how to set up small-scale Twitter scrapes, let me know and I'll write something up when I have the time. Best, ~DEEN