Email Analysis Software
Like many of you, I frequently analyze relatively large collections of email messages (e.g., up to 100,000 messages sent to an email list). To do this I have used a hodge-podge of different programs, none of which I am completely happy with. I have used both quantitative and qualitative approaches and would be interested in hearing from you about software that works well for either approach. The only requirement is that it works well with lots of messages. Below I have listed some of the basic things I've done with various tools. I'd love to see you all add to the list. Thanks ahead of time. Tool: Mailbag Assistant (PC only, $40 - see http://www.fookes.com/mailbag/index.php) - View messages from large corpus (90,000 messages) quickly (much quicker than traditional email clients) - Create complex searches (using regular expressions) that can be saved and re-run on different subsets of data - Export messages (or just message headers) into a database format (e.g., export header information into MS Access Database) - Run some basic built-in queries (e.g., # of messages per month, contributors, most frequently used words) Unfortunately, the version I used (as of a year ago) did not allow you to tag individual messages or pull out a random sample of messages easily. Tool: Custom built programs that help multiple coders tag messages that are shown in a web browser. Unfortunately, I could not find an existing program that worked and have used 2 different custom programs for the same purpose (each with slightly different functionality). The tools that were developed are not really meant to be easily used for other rating scenarios :(, so I am interested in finding a more general purpose rating support tool. The most important functionality included: - Displays a message (randomly selected from a corpus of messages that all get rated) through a web browser, along with a set of pre-defined codes with check boxes that can be marked off. - Supports multiple raters and calculates inter-rater reliability statistics (Cohen's kappa) - highlights words of interest to the coders on the web display - includes some analysis ability: can click on any code (in analysis mode) and all messages coded by either rater (or just one rater) will be displayed; can display messages where there was disagreement between raters; can find messages coded into multiple groups, etc. Also shows overall summary stats on number of messages coded into each group etc. Any thoughts on programs that do these things, or even more generally, tools that are useful in working with email (e.g., visualization of messages) would be greatly appreciated by me and probably many other list members. Thanks ahead of time. Derek Hansen Assistant Professor iSchool at Maryland
Hi, Derek: Because of the demands of electronic discovery, there are actually a large number of software packages designed to sort through email, search it, and provide for (usually *very*) basic tagging. Many of these are run as services, often with heavy support, and with the heavy fees large law firms can afford. So, the problem isn't so much finding software that can handle large numbers (tens of millions) of emails, but rather finding software that can do what you want it to. Searching the literature for electronic discovery will likely yield legal--rather than technical--articles. However most texts on computer and network forensics now discuss handling large collections of email. On the legal side, there is searching and query systems (like http://www.metalincs.com/), and on the forensics side, there are tools that do much the same thing, but with different ends. These may provide a start: http://portal.acm.org/citation.cfm?id=1113074 http://portal.acm.org/citation.cfm?id=1065226.1065291 - Alex -- // // This email is // [X] assumed public and may be blogged / forwarded. // [ ] assumed to be private, please ask before redistributing. // // Alexander C. Halavais, cyberflâneur // http://alex.halavais.net //
participants (2)
-
Alex Halavais -
Derek Hansen