Help with Facebook Research
hello, I am planning a survey of Facebook members at NJIT, where I am a PhD student. I would like to write a web crawl or similar program to identify through Facebook who is part of the NJIT network. I have seen other papers discuss this technique, but I need more specific details as to how to accomplish it. Any ideas? Many thanks, Cathy Dwyer, Lecturer Seidenberg School of Computer Science and Information Systems Pace University http://csis.pace.edu/~dwyer Office location: CS/IS Faculty Offices 163 Williams Street #225 212-346-1728
Hi Catherine, Before investing too much time in the technical specifics, I would suggest you contact Facebook and confirm they will not shut down your access once they identify you are crawling them. This has been an issue for my research team here at MSU and others I've spoken with. Nicole Catherine Dwyer writes:
hello, I am planning a survey of Facebook members at NJIT, where I am a PhD student. I would like to write a web crawl or similar program to identify through Facebook who is part of the NJIT network. I have seen other papers discuss this technique, but I need more specific details as to how to accomplish it. Any ideas?
Many thanks,
Cathy Dwyer, Lecturer Seidenberg School of Computer Science and Information Systems Pace University http://csis.pace.edu/~dwyer Office location: CS/IS Faculty Offices 163 Williams Street #225 212-346-1728
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Dear Cathy, Following up on Nicole's observations, I've encountered similar policies. After some Facebook research was released last month FB Privacy Counsel Chris Kelly followed up with me and stated: "Such sharing (or the scraping of data by researchers) is not allowable under our current policy." Sorry about this. -Fred On Wed, 3 Oct 2007, Nicole B Ellison wrote:
Hi Catherine, Before investing too much time in the technical specifics, I would suggest you contact Facebook and confirm they will not shut down your access once they identify you are crawling them. This has been an issue for my research team here at MSU and others I've spoken with.
Nicole
Catherine Dwyer writes:
hello, I am planning a survey of Facebook members at NJIT, where I am a PhD student. I would like to write a web crawl or similar program to identify through Facebook who is part of the NJIT network. I have seen other papers discuss this technique, but I need more specific details as to how to accomplish it. Any ideas?
Many thanks,
Cathy Dwyer, Lecturer Seidenberg School of Computer Science and Information Systems Pace University http://csis.pace.edu/~dwyer Office location: CS/IS Faculty Offices 163 Williams Street #225 212-346-1728
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Fred Stutzman 919-260-8508 ibiblio.org/fred fred@metalab.unc.edu Co-Founder and Developer, ClaimID.com Ph.D. Student, Teaching and Research Fellow, SILS UNC-Chapel Hill
I have always been curious about the TOS on this. If I set up a group of people to click and record each page, I'm in the clear. So, what if it's a bookmark file they are clicking from? What if the outbound links are automatically filtered and collated? What if my browser is pre-fetching pages? I guess the question is: at what point does it become automated. It seems to me that there should be a kind of Turing Test for scraping and crawling: if you can't tell from the server side that it's not a human, then it should be considered a human. I know, that's not a practical proposal, but I just *wish* that was how it was handled. Alex On 10/3/07, Fred Stutzman <fred@metalab.unc.edu> wrote:
Dear Cathy,
Following up on Nicole's observations, I've encountered similar policies. After some Facebook research was released last month FB Privacy Counsel Chris Kelly followed up with me and stated: "Such sharing (or the scraping of data by researchers) is not allowable under our current policy." Sorry about this.
-Fred
-- // // This email is // [X] assumed public and may be blogged / forwarded. // [ ] assumed to be private, please ask before redistributing. // // Alexander C. Halavais // Social Architect // http://alex.halavais.net //
I have always been curious about the TOS on this. If I set up a group of people to click and record each page, I'm in the clear. So, what if it's a bookmark file they are clicking from? What if the outbound links are automatically filtered and collated? What if my browser is pre-fetching pages? I guess the question is: at what point does it become automated.
I expect that one of the real goals of that point of the TOS is to prevent someone from slurping out all of 'their' (our) data and using it to set up a competing SNS. Maybe not in quite those terms - but effectively. I would love, love, love for folks to have better access to the innards of a few of these sites, so that butt-ugly hacks to extract data from them without offending anyone or breaking TOS on sites cease to be necessary....
It seems to me that there should be a kind of Turing Test for scraping and crawling: if you can't tell from the server side that it's not a human, then it should be considered a human.
I know, that's not a practical proposal, but I just *wish* that was how it was handled.
I wish it too. It would make so many things so much easier. --elijah
Rather than fight the system, why not do your research from *inside* facebook. Many people are already building social networking analysis tools as facebook aps, some with amazing visualization: http://sfu.facebook.com/apps/application.php?id=2895690559&b&ref=pd If one of these tools doesn't do what you need, contact the developers (many of them are students) and see if you can realize your objectives using the facebook API. It isn't necessarily the case that you will come into conflict with the terms of service and it isn't necessarily the case that your research question can't be answered by a facebook app. I'd venture to guess that you can do what you want, a lot easier than you imagine, by working with facebook and facebook developers rather than trying to "scrape" or "spider" anything. ...r On 3-Oct-07, at 2:37 PM, elw@stderr.org wrote:
I have always been curious about the TOS on this. If I set up a group of people to click and record each page, I'm in the clear. So, what if it's a bookmark file they are clicking from? What if the outbound links are automatically filtered and collated? What if my browser is pre- fetching pages? I guess the question is: at what point does it become automated.
I expect that one of the real goals of that point of the TOS is to prevent someone from slurping out all of 'their' (our) data and using it to set up a competing SNS. Maybe not in quite those terms - but effectively.
I would love, love, love for folks to have better access to the innards of a few of these sites, so that butt-ugly hacks to extract data from them without offending anyone or breaking TOS on sites cease to be necessary....
It seems to me that there should be a kind of Turing Test for scraping and crawling: if you can't tell from the server side that it's not a human, then it should be considered a human.
I know, that's not a practical proposal, but I just *wish* that was how it was handled.
I wish it too. It would make so many things so much easier.
--elijah _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http:// listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
On Wed, 3 Oct 2007, Richard Smith wrote:
Rather than fight the system, why not do your research from *inside* facebook. Many people are already building social networking analysis tools as facebook aps, some with amazing visualization:
http://sfu.facebook.com/apps/application.php?id=2895690559&b&ref=pd
If one of these tools doesn't do what you need, contact the developers (many of them are students) and see if you can realize your objectives using the facebook API.
It isn't necessarily the case that you will come into conflict with the terms of service and it isn't necessarily the case that your research question can't be answered by a facebook app. I'd venture to guess that you can do what you want, a lot easier than you imagine, by working with facebook and facebook developers rather than trying to "scrape" or "spider" anything.
To a certain extent, this is true. However, Facebook exercises strict control over the "storability" of data from the Facebook API. With profile component data, you are not allowed to store this data for longer than 24 hours. Essentially, the Facebook side of the interaction is not storable under the TOS, which certainly limits usefulness when doing social networks research. What is unclear is how one might use "derivative" data - i.e. metadata about the profile that may be technically storable. While the Facebook API is fairly simple to use (I've developed a few apps), the limitations enacted by the company have kept me from pursuing it as a research vehicle. Sometimes I wonder where we'd be if Larry and Sergey followed all of the TOS'es of the sites they were scraping when they designed Backrub... alas. -Fred
...r
On 3-Oct-07, at 2:37 PM, elw@stderr.org wrote:
I have always been curious about the TOS on this. If I set up a group of people to click and record each page, I'm in the clear. So, what if it's a bookmark file they are clicking from? What if the outbound links are automatically filtered and collated? What if my browser is pre- fetching pages? I guess the question is: at what point does it become automated.
I expect that one of the real goals of that point of the TOS is to prevent someone from slurping out all of 'their' (our) data and using it to set up a competing SNS. Maybe not in quite those terms - but effectively.
I would love, love, love for folks to have better access to the innards of a few of these sites, so that butt-ugly hacks to extract data from them without offending anyone or breaking TOS on sites cease to be necessary....
It seems to me that there should be a kind of Turing Test for scraping and crawling: if you can't tell from the server side that it's not a human, then it should be considered a human.
I know, that's not a practical proposal, but I just *wish* that was how it was handled.
I wish it too. It would make so many things so much easier.
--elijah _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http:// listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Fred Stutzman 919-260-8508 ibiblio.org/fred fred@metalab.unc.edu Co-Founder and Developer, ClaimID.com Ph.D. Student, Teaching and Research Fellow, SILS UNC-Chapel Hill
I am planning a survey of Facebook members at NJIT, where I am a PhD student. I would like to write a web crawl or similar program to identify through Facebook who is part of the NJIT network. I have seen other papers discuss this technique, but I need more specific details as to how to accomplish it. Any ideas?
The basic sketch of the technique is this: 1) identify a starting point [initial URL] 2a) programmatically collect all linked pages (in effect, use a regex that matches "a href=")... 2b) ...that match criteria you specify 3) recurse however.... Web crawls of Facebook are against the Terms of Service of the site. You might be able to work something out using the Facebook API, rather than by crawling. It will take some work. Your campus IRB will be highly unlikely to approve a project that explicitly violates the site's TOS; the TOS exists to give both Facebook and the other users of the site some notion of what sort of privacy exposure they are likely to be surrendering. --elijah
Are you doing this to identify participants? Could an easier option be to distribute a campus wide emailing offering the survey to facebook users? -Ellie On 10/3/07, elw@stderr.org <elw@stderr.org> wrote:
I am planning a survey of Facebook members at NJIT, where I am a PhD student. I would like to write a web crawl or similar program to identify through Facebook who is part of the NJIT network. I have seen other papers discuss this technique, but I need more specific details as to how to accomplish it. Any ideas?
The basic sketch of the technique is this:
1) identify a starting point [initial URL] 2a) programmatically collect all linked pages (in effect, use a regex that matches "a href=")... 2b) ...that match criteria you specify 3) recurse
however....
Web crawls of Facebook are against the Terms of Service of the site. You might be able to work something out using the Facebook API, rather than by crawling. It will take some work.
Your campus IRB will be highly unlikely to approve a project that explicitly violates the site's TOS; the TOS exists to give both Facebook and the other users of the site some notion of what sort of privacy exposure they are likely to be surrendering.
--elijah _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
participants (7)
-
Alex Halavais -
Catherine Dwyer -
E.W. -
elw@stderr.org -
Fred Stutzman -
Nicole B Ellison -
Richard Smith