Has anyone heard of a linguistic study based on a corpus of newsgroup texts? I am trying to compile a corpus of texts from Japanese, English and Danish newsgroups in the 5-year period 1999-2003 in order to make a contrastive study of word-formation and neologisms in these 3 languages. However, much to my dismay I have recently discovered that the average rentention time of NNTP servers is somewhere between 2 weeks and a month! If there is some scholar out there who has compiled a corpus I should very much like to know. Thanks in advance! Jakob Halskov, MA, Ph.D. student -- Institute of Computational Linguistics Copenhagen Business School Bernhard Bangs Allé 17B 2000 Frederiksberg Denmark -------------------------- Phone (office) + 45 38153137