[Air-l] Any linguistic studies based on a newsgroup corpus?

13 Jan 2004

      Has anyone heard of a linguistic study based on a corpus of newsgroup texts?

I am trying to compile a corpus of texts from Japanese, English and Danish 
newsgroups in the 5-year period 1999-2003 in order to make a contrastive 
study of word-formation and neologisms in these 3 languages. However, much to 
my dismay I have recently discovered that the average rentention time of NNTP 
servers is somewhere between 2 weeks and a month!

If there is some scholar out there who has compiled a corpus I should very 
much like to know.

Thanks in advance!

Jakob Halskov, MA, Ph.D. student
-- 
Institute of Computational Linguistics
Copenhagen Business School
Bernhard Bangs Allé 17B
2000 Frederiksberg
Denmark
--------------------------
Phone (office) + 45 38153137