I am planning a survey of Facebook members at NJIT, where I am a PhD student. I would like to write a web crawl or similar program to identify through Facebook who is part of the NJIT network. I have seen other papers discuss this technique, but I need more specific details as to how to accomplish it. Any ideas?
The basic sketch of the technique is this: 1) identify a starting point [initial URL] 2a) programmatically collect all linked pages (in effect, use a regex that matches "a href=")... 2b) ...that match criteria you specify 3) recurse however.... Web crawls of Facebook are against the Terms of Service of the site. You might be able to work something out using the Facebook API, rather than by crawling. It will take some work. Your campus IRB will be highly unlikely to approve a project that explicitly violates the site's TOS; the TOS exists to give both Facebook and the other users of the site some notion of what sort of privacy exposure they are likely to be surrendering. --elijah