scraping google discussion groups?
Has anybody successfully scraped a Google discussion group? I found a script online, but it's thrown off by the fact you now have to login to view any groups. Google is getting squirrely about spammers scraping their data, so it may be a big roadblock. I'm looking at authorization with the Google PHP lib, but I'm not sure it will get me to groups, it all seems app-focused (so if you want to add items to a Google calendar for instance). Much appreciate any ideas that don't involve me adding 6000-some message to my analysis software by hand :/ best Andrew Andrew Schrock USC Annenberg Doctoral Student aschrock@usc.edu 714.330.6545
One option would be to save the pages either manually or using a crawler and then scrape the data out locally. It's more annoying than going directly from online to database, but it's still better than the alternative. ~DEEN On 10/5/11 2:24 PM, Andrew Schrock wrote:
Has anybody successfully scraped a Google discussion group? I found a script online, but it's thrown off by the fact you now have to login to view any groups.
Google is getting squirrely about spammers scraping their data, so it may be a big roadblock. I'm looking at authorization with the Google PHP lib, but I'm not sure it will get me to groups, it all seems app-focused (so if you want to add items to a Google calendar for instance).
Much appreciate any ideas that don't involve me adding 6000-some message to my analysis software by hand :/
best Andrew
Andrew Schrock USC Annenberg Doctoral Student aschrock@usc.edu 714.330.6545
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Deen Freelon Acting Assistant Professor American University School of Communication dfreelon@gmail.com http://dfreelon.org/
Hello Andrew. I use Web Content Extractor from newprosoft.com in my research and it works quite good. Regards, Yohanan Ouaknine Graduate student, Knowledge management, Bar Ilan University, Israel On Wed, Oct 5, 2011 at 8:24 PM, Andrew Schrock <aschrock@usc.edu> wrote:
Has anybody successfully scraped a Google discussion group? I found a script online, but it's thrown off by the fact you now have to login to view any groups.
Google is getting squirrely about spammers scraping their data, so it may be a big roadblock. I'm looking at authorization with the Google PHP lib, but I'm not sure it will get me to groups, it all seems app-focused (so if you want to add items to a Google calendar for instance).
Much appreciate any ideas that don't involve me adding 6000-some message to my analysis software by hand :/
best Andrew
Andrew Schrock USC Annenberg Doctoral Student aschrock@usc.edu 714.330.6545
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- *יוחנן ועקנין* Yohanan Ouaknine <http://www.ois.co.il/> <http://maps.google.com/maps?q=&hl=en>*050-6279777 *yohanan.ouaknine@ois.co.il *http://il.linkedin.com/in/yohananouaknine* See who we know in common <http://www.linkedin.com/e/wwk/32969976/>
Thanks Yohanan, Shawn, Jeremy and Deen for your helpful suggestions. I had been thinking too much about custom coding using their API and not enough about using existing scraping software. best Andrew On Oct 5, 2011, at 11:40 AM, יוחנן ועקנין wrote:
Hello Andrew. I use Web Content Extractor from newprosoft.com in my research and it works quite good. Regards, Yohanan Ouaknine Graduate student, Knowledge management, Bar Ilan University, Israel
On Wed, Oct 5, 2011 at 8:24 PM, Andrew Schrock <aschrock@usc.edu> wrote: Has anybody successfully scraped a Google discussion group? I found a script online, but it's thrown off by the fact you now have to login to view any groups.
Google is getting squirrely about spammers scraping their data, so it may be a big roadblock. I'm looking at authorization with the Google PHP lib, but I'm not sure it will get me to groups, it all seems app-focused (so if you want to add items to a Google calendar for instance).
Much appreciate any ideas that don't involve me adding 6000-some message to my analysis software by hand :/
best Andrew
Andrew Schrock USC Annenberg Doctoral Student aschrock@usc.edu 714.330.6545
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- יוחנן ועקנין Yohanan Ouaknine
050-6279777 yohanan.ouaknine@ois.co.il http://il.linkedin.com/in/yohananouaknine
See who we know in common
Andrew Schrock USC Annenberg Doctoral Student aschrock@usc.edu 714.330.6545
participants (3)
-
Andrew Schrock -
Deen Freelon -
יוחנן ועקנין