On Wed, 3 Oct 2007, Richard Smith wrote:
Rather than fight the system, why not do your research from *inside* facebook. Many people are already building social networking analysis tools as facebook aps, some with amazing visualization:
http://sfu.facebook.com/apps/application.php?id=2895690559&b&ref=pd
If one of these tools doesn't do what you need, contact the developers (many of them are students) and see if you can realize your objectives using the facebook API.
It isn't necessarily the case that you will come into conflict with the terms of service and it isn't necessarily the case that your research question can't be answered by a facebook app. I'd venture to guess that you can do what you want, a lot easier than you imagine, by working with facebook and facebook developers rather than trying to "scrape" or "spider" anything.
To a certain extent, this is true. However, Facebook exercises strict control over the "storability" of data from the Facebook API. With profile component data, you are not allowed to store this data for longer than 24 hours. Essentially, the Facebook side of the interaction is not storable under the TOS, which certainly limits usefulness when doing social networks research. What is unclear is how one might use "derivative" data - i.e. metadata about the profile that may be technically storable. While the Facebook API is fairly simple to use (I've developed a few apps), the limitations enacted by the company have kept me from pursuing it as a research vehicle. Sometimes I wonder where we'd be if Larry and Sergey followed all of the TOS'es of the sites they were scraping when they designed Backrub... alas. -Fred
...r
On 3-Oct-07, at 2:37 PM, elw@stderr.org wrote:
I have always been curious about the TOS on this. If I set up a group of people to click and record each page, I'm in the clear. So, what if it's a bookmark file they are clicking from? What if the outbound links are automatically filtered and collated? What if my browser is pre- fetching pages? I guess the question is: at what point does it become automated.
I expect that one of the real goals of that point of the TOS is to prevent someone from slurping out all of 'their' (our) data and using it to set up a competing SNS. Maybe not in quite those terms - but effectively.
I would love, love, love for folks to have better access to the innards of a few of these sites, so that butt-ugly hacks to extract data from them without offending anyone or breaking TOS on sites cease to be necessary....
It seems to me that there should be a kind of Turing Test for scraping and crawling: if you can't tell from the server side that it's not a human, then it should be considered a human.
I know, that's not a practical proposal, but I just *wish* that was how it was handled.
I wish it too. It would make so many things so much easier.
--elijah _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http:// listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Fred Stutzman 919-260-8508 ibiblio.org/fred fred@metalab.unc.edu Co-Founder and Developer, ClaimID.com Ph.D. Student, Teaching and Research Fellow, SILS UNC-Chapel Hill