question on how to identify email threads on listserv
hey all, part of the research I am doing requires that I identify threads on a listserv for analysis. Threads consist of emails that are a series of responses to an initial email. of course the easiest way to do this is to sort emails by subject line. however as you might know this is not complete as, for example, some participants will change the subject for a variety of reasons while still remaining in the same thread. Thus one could analyze info in the email header to identify threads, but in my case this data is not always available. Alternatively, one could manually scan though the text of the emails - which is very time consuming when using a large email corpus. Therefore, what I need is a method (preferably automated) that can identify email threads by looking at the texts of the emails. I can imagine some software that does this and can create clusters of emails based on semantic similarities that I could equate to threads - but I haven't been able to identify any just yet... the units of analysis that I have described are fairly common and, I imagine, so is my problem. Thus perhaps people on this list can point me to existing methods/software/papers that have already addressed this issue? thanks Dhanaraj Dhanaraj Thakur Ph.D. Candidate School of Public Policy Georgia Institute of Technology
Hi Dhanaraj, I would try to export them to data files. Let's say HTML, ODT, etc. Then I would run software for content/text analysis (also called computer assisted qualitative data analysis software). Here you can find a lot of possibilities: http://courses.washington.edu/socw580/contentsoftware.shtml Here you have a review: http://people.iq.harvard.edu/~wlowe/Publications/rev.pdf Resources related to content analysis and text analysis http://www.content-analysis.de/software/qualitative-analysis I am sure there are others you can find using a search engine. Let us know which one worked for you, Michael Lee. michael@nexodigital.net MSc Candidate School of Communication University of Costa Rica On Thu, 2009-06-18 at 19:16 -0400, Dhanaraj Thakur wrote:
hey all,
part of the research I am doing requires that I identify threads on a listserv for analysis. Threads consist of emails that are a series of responses to an initial email.
of course the easiest way to do this is to sort emails by subject line. however as you might know this is not complete as, for example, some participants will change the subject for a variety of reasons while still remaining in the same thread. Thus one could analyze info in the email header to identify threads, but in my case this data is not always available. Alternatively, one could manually scan though the text of the emails - which is very time consuming when using a large email corpus.
Therefore, what I need is a method (preferably automated) that can identify email threads by looking at the texts of the emails. I can imagine some software that does this and can create clusters of emails based on semantic similarities that I could equate to threads - but I haven't been able to identify any just yet...
the units of analysis that I have described are fairly common and, I imagine, so is my problem. Thus perhaps people on this list can point me to existing methods/software/papers that have already addressed this issue?
thanks Dhanaraj
Dhanaraj Thakur Ph.D. Candidate School of Public Policy Georgia Institute of Technology
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Hi, the appendix of Jones Q., Rafaeli S and Ravid G. (2004) 'Information Overload and the Message Dynamics of Online Interaction Spaces: A Theoretical Model and Empirical Exploration'. Information Systems Research (Vol. 15, No. 2, June 2004, pp 194-210) (http://www.ravid.org/gilad/isr.pdf) describe the method we take to tackle the issue. mainly we use weighted sum with threshold. Gilad Dhanaraj Thakur wrote:
hey all,
part of the research I am doing requires that I identify threads on a listserv for analysis. Threads consist of emails that are a series of responses to an initial email.
of course the easiest way to do this is to sort emails by subject line. however as you might know this is not complete as, for example, some participants will change the subject for a variety of reasons while still remaining in the same thread. Thus one could analyze info in the email header to identify threads, but in my case this data is not always available. Alternatively, one could manually scan though the text of the emails - which is very time consuming when using a large email corpus.
Therefore, what I need is a method (preferably automated) that can identify email threads by looking at the texts of the emails. I can imagine some software that does this and can create clusters of emails based on semantic similarities that I could equate to threads - but I haven't been able to identify any just yet...
the units of analysis that I have described are fairly common and, I imagine, so is my problem. Thus perhaps people on this list can point me to existing methods/software/papers that have already addressed this issue?
thanks Dhanaraj
Dhanaraj Thakur Ph.D. Candidate School of Public Policy Georgia Institute of Technology
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- "The Mind is not a vessel to be filled, but a fire to be kindled." -- Plutarch, On Listening to Lectures Gilad Ravid, Ph.D. Department of Industrial Engineering and Management Ben Gurion University of the Negev, Israel http://www.ravid.org/gilad Office: +972-8-6472772 Mobile: +972-54-4905391 Skype: giladravid
You may also want to look at the tools Anatoliy Gruzd has developed for analysis of threaded texts. See www.textanalytics.net or https://www3.isrl.illinois.edu/~agruzd2/icta_web/ /Caroline ---- Original message ----
Date: Thu, 18 Jun 2009 19:16:45 -0400 From: Dhanaraj Thakur <dthakur@gatech.edu> Subject: [Air-L] question on how to identify email threads on listserv To: <air-l@listserv.aoir.org>
hey all,
part of the research I am doing requires that I identify threads on a listserv for analysis. Threads consist of emails that are a series of responses to an initial email.
of course the easiest way to do this is to sort emails by subject line. however as you might know this is not complete as, for example, some participants will change the subject for a variety of reasons while still remaining in the same thread. Thus one could analyze info in the email header to identify threads, but in my case this data is not always available. Alternatively, one could manually scan though the text of the emails - which is very time consuming when using a large email corpus.
Therefore, what I need is a method (preferably automated) that can identify email threads by looking at the texts of the emails. I can imagine some software that does this and can create clusters of emails based on semantic similarities that I could equate to threads - but I haven't been able to identify any just yet...
the units of analysis that I have described are fairly common and, I imagine, so is my problem. Thus perhaps people on this list can point me to existing methods/software/papers that have already addressed this issue?
thanks Dhanaraj
Dhanaraj Thakur Ph.D. Candidate School of Public Policy Georgia Institute of Technology
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Caroline Haythornthwaite Professor, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 501 East Daniel St., Champaign IL 61820 haythorn@illinois.edu OR haythorn@uiuc.edu
participants (4)
-
Caroline Haythornthwaite -
Dhanaraj Thakur -
Gilad Ravid -
Michael Lee