hey all, part of the research I am doing requires that I identify threads on a listserv for analysis. Threads consist of emails that are a series of responses to an initial email. of course the easiest way to do this is to sort emails by subject line. however as you might know this is not complete as, for example, some participants will change the subject for a variety of reasons while still remaining in the same thread. Thus one could analyze info in the email header to identify threads, but in my case this data is not always available. Alternatively, one could manually scan though the text of the emails - which is very time consuming when using a large email corpus. Therefore, what I need is a method (preferably automated) that can identify email threads by looking at the texts of the emails. I can imagine some software that does this and can create clusters of emails based on semantic similarities that I could equate to threads - but I haven't been able to identify any just yet... the units of analysis that I have described are fairly common and, I imagine, so is my problem. Thus perhaps people on this list can point me to existing methods/software/papers that have already addressed this issue? thanks Dhanaraj Dhanaraj Thakur Ph.D. Candidate School of Public Policy Georgia Institute of Technology