Charles, given that there are myriad sentiment tools out there, ranging from traditional presence/absence dictionary-based through value-based dictionary through statistical and now neural approaches and both commercial and academic systems, the most important methodological consideration is the alignment of the specific tool's performance characteristics with both the medium (in your case a newspaper), the language/grammar (the formality and editorial structure of the paper and how closely that aligns with the tool's training dataset), the era (many tools are only updated periodically and/or were trained on content from a specific period and can have severe mismatches even for MSM), the domain (most tools are not domain-adapted and this can cause severe problems in certain domains, such as when a news source refers to "the democratic party" and just "republicans" with "party" systematically yielding a more positive score for the former), and most importantly, the definition of the specific emotion (ie, there is no universal "anxiety" score). Typically this involves reviewing the validation studies for each potential tool and comparing them along these dimensions, though methodologically the definition of the specific measure is a very important piece that is often missed. With GDELT, we've run 40 common tools totaling a few thousand dimensions over around a billion or so global news articles ( https://blog.gdeltproject.org/?s=gcam), including multilingual versions of some of the tools to test the impact of various translation approaches on emotional recovery, so you can compare dimensions and see, for example, how different "anxiety" dimensions compare for outlets and domains most similar to yours - often the differences in the response curves can be a quite informative signal for some applications as well. All of the scores are open data, so you can get a sense for how they respond. Kalev On Thu, Sep 5, 2019 at 6:52 AM Charles M. Ess <c.m.ess@media.uio.no> wrote:
Dear colleagues,
One of our students is wanting to analyze emotional content in in the comment fields of a major newspaper vis-a-vis specific hot-button issues.
She has a good tool (I think) for scrapping the data - but she is stymied over the choice of an emotion analysis tool. She has looked at Senpy (http://senpy.gsi.upm.es/#test) and Twinword <https://www.twinword.com/api/emotion-analysis.php> - the latter seems the most accurate, but it is also expensive. She has recently discovered DepecheMood emotion lexicons (Staiano, J., & Guerini, M. (2014). Depechemood: a lexicon for emotion analysis from crowd-annotated news. arXiv preprint arXiv:1405.1605.) - but this suffers from a lack of clarity in terms of explaining its emotional categories: awe, indifference, sad, amusement , annoyance, joy, fear and anger.
For my part, I am entirely clueless. Any suggestions that she might pursue would be greatly appreciated.
best, - charles ess -- Professor in Media Studies Department of Media and Communication University of Oslo <http://www.hf.uio.no/imk/english/people/aca/charlees/index.html>
Postboks 1093 Blindern 0317 Oslo, Norway c.m.ess@media.uio.no _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/