Re: [Air-L] Academic replacements for TwapperKeeper.com?

24 Feb 2011

      Hi Deen,

I've used both services and they are very good, but you're better off
setting up a server and running a few apps on your own. I'm running a
modified version of yTK through a MAMP server at the University of São Paulo
(99.9% uptime). We've changed yourTwapperKeeper considerably. There are a
few supplementary scripts that map users network after hashtag archiving. It
takes a lot of requests to get it done, but it's been working fine. If you
can get your IP address whitlistened, then you're good to map the network.
For the time being I'd tell you to work on yTK until it does what you need.
Let me know if you hear of any other way to work this out.

[]s
Marco
___________________
Marco Toledo Bastos
Postdoctoral Fellow
FiloCom - ECA / USP
(0055) 11 7102-4756
FAMe - UniFrankfurt
(0049) 151-58768326
...
Date: Wed, 23 Feb 2011 11:37:08 -0800
From: Deen Freelon <dfreelon@u.washington.edu>
To: air-l@listserv.aoir.org
Subject: Re: [Air-L] Academic replacements for TwapperKeeper.com?
Message-ID: <4D6561E3.6040505@u.washington.edu>
Content-Type: text/plain; charset=windows-1252; format=flowed
I would also be curious to know what others have been using or plan to
use for harvesting Twitter data. I've used both TwapperKeeper and
140kit, and found that the latter is quite good for hashtag archiving,
but not as good at keyword archiving. Further, 140kit has a max scrape
time of one week, although that is manually renewable I believe.
Finally, both TK and 140kit can be quite slow and even unavailable at
times, and as we've just seen they may shut down at any time.
All of this has made me quite wary of relying on externally managed
"clouds" for data collection. That is why I intend to set up my own
Twitter harvesting operation for use within my own department, as many
CS researchers do, and would encourage others with the necessary means
and knowledge to do the same. Much valuable data can be collected even
within the default API query limits, though I'll certainly ask Twitter
to put me on the whitelist. Running one's own archiving operation is
fairly cheap, and since you're only archiving your own data, you aren't
hamstrung by hundreds of other jobs running simultaneously.
If there's any interest in learning how to set up small-scale Twitter
scrapes, let me know and I'll write something up when I have the time.
Best, ~DEEN