Coding Analysis Toolkit - A new resource for researchers with large text annotation tasks
Members of the list may be interested in this narrated slide show about the Coding Analysis Toolkit developed by QDAP. It is viewable through a web browser at: http://www.qdap.pitt.edu/catshow.htm The Coding Analysis Toolkit, colloquially known as "CAT", was developed in the summer of 2007. It was designed by QDAP Director Dr. Stuart Shulman and created in collaboration with Mark Hoy, a Senior Programmer in the Carnegie Mellon University School of Computer Science. It is maintained by UCSUR Technology Director James Lefcakis. CAT is is hosted on UCSUR servers and made available on the web at: http://cat.ucsur.pitt.edu/. The system consists of a web-based suite of tools custom built from the ground-up to facilitate efficient and effective analysis of text datasets that have been coded using the commercial-off-the-shelf package ATLAS.ti (www.atlasti.com). We have recently posted a narrated slide show about CAT online. The Coding Analysis Toolkit was designed to use keystrokes and automation to clarify and speed-up the validation or consensus adjudication process. Special attention was paid during the design process to the need to eliminate the role of the computer mouse, thereby streamlining the physical and mental tasks in the coding analysis process. We anticipate that CAT will open new avenues for researchers interested in measuring and accurately reporting coder validity and reliability, as well as for those practicing consensus-based adjudication. The availability of CAT can improve the practice of qualitative data analysis at the University of Pittsburgh and beyond. Currently about 50 beta testers located in several countries have accounts on CAT. They have been given free access to the system for the rest of the calendar year. Systematic user feedback will be gathered via a beta tester web survey and will shape the future development of CAT. The capabilities of CAT and its reliability as a software tool may be sufficiently robust to merit commercial licensing to users starting in 2008. The CAT system allows a user to register for an account to log on, upload exported coded results from ATLAS.ti into the system, and run comparisons of inter-rater reliability measured using Fleiss' Kappa and Krippendorff's Alpha. The user can also choose to perform a code-by-code comparison of the data, revealing tables of quotations where coders agree, disagree, or overlap. For any comparisons, the user can view the data on the screen, or alternatively, download the data file as a rich-text file (.rtf). CAT's core functionality allows for the adjudication of coded items by an "expert" user who is a sub-account attached to the primary account holder of the system. The website and database itself resides on a Windows 2003 UCSUR server and the programming for the website is done using HTML, ASP.net 2.0and JavaScript. An expert user can log onto the system to validate the codes assigned to items in a dataset. While the expert user is validating codes, the system also keeps track of which codes are valid and which coders assigned those codes. This information is used to keep a historical track record of coders for assessing coder accuracy over time. It also allows the account holder to see a rank order list of the coders most likely to produce valid observations, report the overall validity scores by code, coder, or entire project, and end up with a 'clean' dataset consisting of only valid observations. In a newly developed CAT module, which is being beta tested internally during the fall 2007 semester by five QDAP coders, the project manager is able to upload raw datasets and have users code those datasets directly through the CAT interface. As is the case with the original adjudication toolkit, this new module features automated loading of discrete quotations and requires only keystrokes, instead of mouse clicks and drags, to apply the codes to the text. We estimate coding tasks using CAT are completed 2-3 times as fast as identical coding tasks conducted using ATLAS. While this high-speed "mouse-less" coding module would poorly serve many traditional qualitative research approaches, it is ideally suited to annotation tasks routinely generated by computer scientists. ~Stu -- Dr. Stuart W. Shulman Director, Sara Fine Institute School of Information Sciences Director, Qualitative Data Analysis Program University Center for Social and Urban Research http://qdap.ucsur.pitt.edu University of Pittsburgh 121 University Place, Suite 600 Pittsburgh, PA 15260 412.624.3776 (v) 412.624.4810 (f) http://shulman.ucsur.pitt.edu Editor, Journal of Information Technology and Politics http://www.jitp.net
participants (1)
-
Stuart Shulman