Re: [Air-L] Academic replacements for TwapperKeeper.com?
Thanks for the information about 140kit.com, I will definitely check it out. I'm still wondering whether a more permanent solution can be found (funding drying up in May doesn't sound too promising). I have a simple BASh/cronjob-solution scripts to pull data from the API in regular intervals, perhaps I should just go with that. @Deen: you won't get whitelisted unless Twitter have changed their policy. I've been turned down twice on the grounds that there is whitelisting for applications only, not for academic research. Best, Cornelius Am 23.02.2011 20:37 schrieb "Deen Freelon" <dfreelon@u.washington.edu>: I would also be curious to know what others have been using or plan to use for harvesting Twitter data. I've used both TwapperKeeper and 140kit, and found that the latter is quite good for hashtag archiving, but not as good at keyword archiving. Further, 140kit has a max scrape time of one week, although that is manually renewable I believe. Finally, both TK and 140kit can be quite slow and even unavailable at times, and as we've just seen they may shut down at any time. All of this has made me quite wary of relying on externally managed "clouds" for data collection. That is why I intend to set up my own Twitter harvesting operation for use within my own department, as many CS researchers do, and would encourage others with the necessary means and knowledge to do the same. Much valuable data can be collected even within the default API query limits, though I'll certainly ask Twitter to put me on the whitelist. Running one's own archiving operation is fairly cheap, and since you're only archiving your own data, you aren't hamstrung by hundreds of other jobs running simultaneously. If there's any interest in learning how to set up small-scale Twitter scrapes, let me know and I'll write something up when I have the time. Best, ~DEEN On 2/23/11 11:18 AM, Matt Munley wrote:
Cornelius, How well would something like 140kit (htt...
-- Deen Freelon Ph.D. Candidate, Dept. of Communication University of Washington dfreelon@uw.edu http://dfreelon.org/ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provi...
I wrote to Deen separately, but here's the deal on whitelists: They decided about a week ago to wholesale turn down any requests for whitelisting IP's, academic or otherwise. Those who have them are those who have them. They did not specify any shift in policy for whitelisted accts, but AFAIK the whitelisting of an acct does not do much in the way of data collection via terms (Search) or via a username (REST). In other words, those days are over. They handed out a bunch of whitelistings back in the day, then stopped doing it entirely, then finally changed the policy instead of writing e-mails saying no every time. "funding" here is $15/month. I went to a school with 500 students and no endowment, so it was a big deal, but not really in the scope of things. I will pay the $15 a month from there in to continue running the website - analytical machines are hosted at Berkman for free, so its not really a scary thing. Either way, best of luck, its getting pretty hard to do data collection apparently. Also, I updated our TOS to reflect our stance on everyone running to the hills for exports - you can grab it from our front page: http://bit.ly/ddarvF Devin On Feb 23, 2011, at 3:29 PM, Cornelius Puschmann wrote:
Thanks for the information about 140kit.com, I will definitely check it out. I'm still wondering whether a more permanent solution can be found (funding drying up in May doesn't sound too promising).
I have a simple BASh/cronjob-solution scripts to pull data from the API in regular intervals, perhaps I should just go with that.
@Deen: you won't get whitelisted unless Twitter have changed their policy. I've been turned down twice on the grounds that there is whitelisting for applications only, not for academic research.
Best,
Cornelius
Am 23.02.2011 20:37 schrieb "Deen Freelon" <dfreelon@u.washington.edu>:
I would also be curious to know what others have been using or plan to use for harvesting Twitter data. I've used both TwapperKeeper and 140kit, and found that the latter is quite good for hashtag archiving, but not as good at keyword archiving. Further, 140kit has a max scrape time of one week, although that is manually renewable I believe. Finally, both TK and 140kit can be quite slow and even unavailable at times, and as we've just seen they may shut down at any time.
All of this has made me quite wary of relying on externally managed "clouds" for data collection. That is why I intend to set up my own Twitter harvesting operation for use within my own department, as many CS researchers do, and would encourage others with the necessary means and knowledge to do the same. Much valuable data can be collected even within the default API query limits, though I'll certainly ask Twitter to put me on the whitelist. Running one's own archiving operation is fairly cheap, and since you're only archiving your own data, you aren't hamstrung by hundreds of other jobs running simultaneously.
If there's any interest in learning how to set up small-scale Twitter scrapes, let me know and I'll write something up when I have the time. Best, ~DEEN
On 2/23/11 11:18 AM, Matt Munley wrote:
Cornelius, How well would something like 140kit (htt...
-- Deen Freelon Ph.D. Candidate, Dept. of Communication University of Washington dfreelon@uw.edu http://dfreelon.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provi... _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
People with access to a server can install their own instance of Twapperkeeper. But any public service replicating the functionality of the Twapperkeeper website would likely experience the same crackdown. I do wonder though whether it's coincidental that the crackdown on TK occurred just a couple of weeks after the owner decided to introduce "pro" accounts to cope with demand. More discussion here: http://readwriteweb.com/archives/twitter_puts_the_smack_down_on_another_popu... On 24/02/2011, at 9:29, "Cornelius Puschmann" <cornelius.puschmann@uni-duesseldorf.de> wrote:
Thanks for the information about 140kit.com, I will definitely check it out. I'm still wondering whether a more permanent solution can be found (funding drying up in May doesn't sound too promising).
I have a simple BASh/cronjob-solution scripts to pull data from the API in regular intervals, perhaps I should just go with that.
@Deen: you won't get whitelisted unless Twitter have changed their policy. I've been turned down twice on the grounds that there is whitelisting for applications only, not for academic research.
Best,
Cornelius
Am 23.02.2011 20:37 schrieb "Deen Freelon" <dfreelon@u.washington.edu>:
I would also be curious to know what others have been using or plan to use for harvesting Twitter data. I've used both TwapperKeeper and 140kit, and found that the latter is quite good for hashtag archiving, but not as good at keyword archiving. Further, 140kit has a max scrape time of one week, although that is manually renewable I believe. Finally, both TK and 140kit can be quite slow and even unavailable at times, and as we've just seen they may shut down at any time.
All of this has made me quite wary of relying on externally managed "clouds" for data collection. That is why I intend to set up my own Twitter harvesting operation for use within my own department, as many CS researchers do, and would encourage others with the necessary means and knowledge to do the same. Much valuable data can be collected even within the default API query limits, though I'll certainly ask Twitter to put me on the whitelist. Running one's own archiving operation is fairly cheap, and since you're only archiving your own data, you aren't hamstrung by hundreds of other jobs running simultaneously.
If there's any interest in learning how to set up small-scale Twitter scrapes, let me know and I'll write something up when I have the time. Best, ~DEEN
On 2/23/11 11:18 AM, Matt Munley wrote:
Cornelius, How well would something like 140kit (htt...
-- Deen Freelon Ph.D. Candidate, Dept. of Communication University of Washington dfreelon@uw.edu http://dfreelon.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provi... _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
participants (3)
-
Cornelius Puschmann -
Devin Gaffney -
Jean Burgess