On Mon, 6 Jun 2005 Gail Tailor wrote:
I have been following this discussion and have not yet seen anyone raise a question relating to the legality of using this approach to capture web pages in support of academic research, in particular those web pages that are clearly identified as being copyrighted. Are you contacting the owner of the site to ask permission to create a copy of their documents for current and future use? If not, how are you justifying using this approach to duplicating the data? Fair use laws?
Gail-- Here's my $.02 in response to your questions about fair use and copyright regarding Web pages... My understanding of the U.S. Digital Millenium Copyright Act is that all Web pages are copyrighted upon posting to the Web (which means that every instance of use of a Web browser can be argued to be illegal, since browsers "copy" Web pages as they display them). Clearly some interpretations of the DMCA are technically untenable. That said, copying Web pages (and storing the copies) for the purpose of academic research is clearly within the U.S. fair use doctrine. However, the applicability of the fair use doctrine in re-presenting copied Web pages on the Web or in print in the context of academic research has been interpreted in a range of ways by various U.S. universities, libraries, and academic publishers over the last five+ years-- resulting in a range of protocols regarding the means (e.g. opt-in or opt-out) and timing (e.g. before/during collection or before/during display) of notification of site producers. The Internet Archive (a non-proft organization) is an example of a liberal interpretation of the DMCA: it has taken the stance that previously produced Web resources should be preserved and available in the public domain, and generally operates on a post-display opt-out mechanism (see http://web.archive.org). In each of the Web collections in which our WebArchivist.org research group has participated, a different protocol has been employed in response to the nature of the colllection, and the policies of participating institutions. I highly commend the precedent that AOIR member Laura Gurak and Yale University Press set in Laura's book *Cyberliteracy*. A 2-page appendix explains the U.S. fair use doctrine and the rationale underlying Laura's (and the Press's) decision *not* to seek permissions from site producers for the screenshots included in this book.
How are you using the data after it has been captured? Are you extracting data
to support point in time studies? Longitudinal studies?
My collaborators and I have employed Web-based data (e.g. archived Web pages) and metadata (codes and other kinds of annotations associated with Web pages) in several kinds of analyses, both point-in-time and longitudinal. Some of our publications about Web-based research are available at http://webarchivist.org/resources.htm. You might be particularly interested in our presentation on the "Ethics of Web Archiving" from the 2003 Internet Research conferece, which is also available via that page.
I am also using this as an opportunity to advance research practices in a
manner that call attention to the dynamic nature of Internet web sites and some of the inherrent issues in taking this approach to collecting data in comparison to some of the more traditional methods that might be used when working with paper documents.
I share your interest in these issues, and would be happy to correspond further with you about this offlist. -Kirsten *************************************** Kirsten A. Foot, PhD Assistant Professor, Communication Co-Director, WebArchivist.org University of Washington Box 353740 Seattle, WA 98195-3740 206-543-4837 kfoot@u.washington.edu