Hello Andrew. I use Web Content Extractor from newprosoft.com in my research and it works quite good. Regards, Yohanan Ouaknine Graduate student, Knowledge management, Bar Ilan University, Israel On Wed, Oct 5, 2011 at 8:24 PM, Andrew Schrock <aschrock@usc.edu> wrote:
Has anybody successfully scraped a Google discussion group? I found a script online, but it's thrown off by the fact you now have to login to view any groups.
Google is getting squirrely about spammers scraping their data, so it may be a big roadblock. I'm looking at authorization with the Google PHP lib, but I'm not sure it will get me to groups, it all seems app-focused (so if you want to add items to a Google calendar for instance).
Much appreciate any ideas that don't involve me adding 6000-some message to my analysis software by hand :/
best Andrew
Andrew Schrock USC Annenberg Doctoral Student aschrock@usc.edu 714.330.6545
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- *יוחנן ועקנין* Yohanan Ouaknine <http://www.ois.co.il/> <http://maps.google.com/maps?q=&hl=en>*050-6279777 *yohanan.ouaknine@ois.co.il *http://il.linkedin.com/in/yohananouaknine* See who we know in common <http://www.linkedin.com/e/wwk/32969976/>