in unixen cat * | stripHTML | wc will give you a wordcount of the pages in the directory without html. there are many ways to do stripHTML depending on your coding preference, one of the simplest is just to use regex... though that generally assumes good code... so... you might want to use tidy, to fix the code first, then.... strip it... heh. jeremy hunsinger Information Ethics Fellow, Center for Information Policy Research, School of Information Studies, University of Wisconsin-Milwaukee (www.cipr.uwm.edu) wiki.tmttlt.com www.tmttlt.com () ascii ribbon campaign - against html mail /\ - against microsoft attachments http://www.stswiki.org/ sts wiki http://cfp.learning-inquiry.info/ Learning Inquiry-the journal http://transdisciplinarystudies.tmttlt.com/ Transdisciplinary Studies:the book series