Content Analysis of Historical Tweet Set
All- I am working on a qualitative content analysis of a historical tweet set from CrisisNLP from Imran et al.,(2016). http://crisisnlp.qcri.org/lrec2016/lrec2016.html I am using the California Earthquake dataset. The Tweets have been stripped down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of the Twitter information is discarded. I am using is NVIVO- known for its power for content analysis -- However - I am finding NVIVO unwieldy for a data of this size (~250,000 tweets). I wanted each unique Tweet to function as its own case. But - Nvivo would crash everytime. I have 18G RAM and a Raid Array. I do not have a server - although I could get one. I am working and coding side by side in Excel and in NVIVO with my data in 10 large sections of .csv files, instead of individual cases- and this is working (but laborious). QUESTION: Do you have any suggestions for software for large-scale content analysis of Tweets? I Do not need SNA capabilities. Thank you very much, Fran Hodgkins Doctoral Candidate (currently suffering through Chapter 4) Grand Canyon University USA
Have you looked at MaxQDA? I am happily using it for a smaller video dataset. I noticed in the most recent upgrade it has increased its Twitter-handling capacity, though I do not have the details to hand. It may not be powerful enough but it has a free 14 day trial (and has the great benefit of being easy to use)
On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins@gmail.com> wrote:
All- I am working on a qualitative content analysis of a historical tweet set from CrisisNLP from Imran et al.,(2016). http://crisisnlp.qcri.org/lrec2016/lrec2016.html I am using the California Earthquake dataset. The Tweets have been stripped down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of the Twitter information is discarded.
I am using is NVIVO- known for its power for content analysis --
However - I am finding NVIVO unwieldy for a data of this size (~250,000 tweets). I wanted each unique Tweet to function as its own case. But - Nvivo would crash everytime. I have 18G RAM and a Raid Array. I do not have a server - although I could get one.
I am working and coding side by side in Excel and in NVIVO with my data in 10 large sections of .csv files, instead of individual cases- and this is working (but laborious).
QUESTION: Do you have any suggestions for software for large-scale content analysis of Tweets? I Do not need SNA capabilities.
Thank you very much, Fran Hodgkins Doctoral Candidate (currently suffering through Chapter 4) Grand Canyon University USA
Hi Fran I’ve done some work around analysing tweets (and other text from social media) using R. I’ve put together a walkthrough video, sample data set, and the relevant code here on this blog post. http://harkive.org/h17-text-analysis/ - you’d be most welcome to use some or all of those resources. Don’t worry if you’ve not used R before - the script I’ve provided in that post should work if you create a copy of your dataset and change the column names to match the sample dataset I’ve provided. I’ve not used R with a dataset of the size you’re dealing with, so I can’t tell you how well it / your computer will handle things. Batches might be an idea, then, as suggested below, certainly if you want to try things out. The script eventually runs into some Topic Modelling and Sentiment Analysis, but you can run through it section by section until you reach the end of the initial exploratory stage (word frequencies and so on). This might help you make some sense of what’s in the dataset, and will help you weed out any unwanted elements. Happy to help if you want to run with any the above - I’d be intrigued by what the script I wrote came up with using a different type of data. Kind regards Craig Dr Craig Hamilton School of Media 3rd Floor, The Parkside Building Birmingham City University Birmingham, B5 07740 358162 t: @craigfots e: craig.hamilton@bcu.ac.uk<mailto:craig.hamilton@bcu.ac.uk> On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins@gmail.com<mailto:frances.hodgkins@gmail.com>> wrote: All- I am working on a qualitative content analysis of a historical tweet set from CrisisNLP from Imran et al.,(2016). http://crisisnlp.qcri.org/lrec2016/lrec2016.html I am using the California Earthquake dataset. The Tweets have been stripped down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of the Twitter information is discarded. I am using is NVIVO- known for its power for content analysis -- However - I am finding NVIVO unwieldy for a data of this size (~250,000 tweets). I wanted each unique Tweet to function as its own case. But - Nvivo would crash everytime. I have 18G RAM and a Raid Array. I do not have a server - although I could get one. I am working and coding side by side in Excel and in NVIVO with my data in 10 large sections of .csv files, instead of individual cases- and this is working (but laborious). QUESTION: Do you have any suggestions for software for large-scale content analysis of Tweets? I Do not need SNA capabilities. Thank you very much, Fran Hodgkins Doctoral Candidate (currently suffering through Chapter 4) Grand Canyon University USA _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
Hi Craig and AIR team, Along these lines, I was wondering if anyone would be able to point me at some methods that could help me analyse twitter data in Spanish? I’m trying to gauge the ‘emotional’ language used by Chavistas in the (very recent) Venezuelan election by looking at a set of 100 twitter accounts that are dedicated to publishing content in favour of Maduro (the current president), and 100 twitter accounts that are dedicated to publishing opposition content, as a control. I thought I might be able to use your script as well, Craig? Do you think it might be worth a try somehow in Spanish? I am able to download all the tweets using “Twint” https://github.com/haccer/twint <https://github.com/haccer/twint> I’m attaching the link as this might be of great use to some researchers out there. It also has a module that can be run from a Python console, or it can be run with commands directly from the terminal. It is extremely fast and reliable. It can also download by hashtags, or search queries, download all followers or the accounts that the user follows. Aside from something very basic such as word count of emotional words (love, hate, etc) a list that I will create in Spanish itself, I’m wondering if there are other interesting methods that could be applied to this set. I have also thought of choosing the 50 tweets that have been retweeted the most by Chavistas and do a “manual” coding of these? And try to correlate popularity with emotional density (measured by emotional words/ overall words in a tweet). But curious to see if there are other ideas, perhaps related to topics and their visualisation? Or any ideas for training an algorithm to code by emotional theme/topic for the rest of the dataset? Or ideas about looking at the dataset historically? Really any help, methodological ideas, visualisation ideas are greatly, greatly appreciated! Many thanks everyone, Warmly, Parvathi ________ Parvathi Subbiah PhD Candidate, Gates Cambridge Scholar Department of Politics and International Studies Centre for Latin American Studies University of Cambridge
On 24 May 2018, at 17:33, Craig Hamilton <Craig.Hamilton@bcu.ac.uk> wrote:
Hi Fran
I’ve done some work around analysing tweets (and other text from social media) using R. I’ve put together a walkthrough video, sample data set, and the relevant code here on this blog post. http://harkive.org/h17-text-analysis/ - you’d be most welcome to use some or all of those resources.
Don’t worry if you’ve not used R before - the script I’ve provided in that post should work if you create a copy of your dataset and change the column names to match the sample dataset I’ve provided. I’ve not used R with a dataset of the size you’re dealing with, so I can’t tell you how well it / your computer will handle things. Batches might be an idea, then, as suggested below, certainly if you want to try things out.
The script eventually runs into some Topic Modelling and Sentiment Analysis, but you can run through it section by section until you reach the end of the initial exploratory stage (word frequencies and so on). This might help you make some sense of what’s in the dataset, and will help you weed out any unwanted elements.
Happy to help if you want to run with any the above - I’d be intrigued by what the script I wrote came up with using a different type of data.
Kind regards Craig
Dr Craig Hamilton School of Media 3rd Floor, The Parkside Building Birmingham City University Birmingham, B5 07740 358162 t: @craigfots e: craig.hamilton@bcu.ac.uk<mailto:craig.hamilton@bcu.ac.uk> On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins@gmail.com<mailto:frances.hodgkins@gmail.com>> wrote:
All- I am working on a qualitative content analysis of a historical tweet set from CrisisNLP from Imran et al.,(2016). http://crisisnlp.qcri.org/lrec2016/lrec2016.html I am using the California Earthquake dataset. The Tweets have been stripped down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of the Twitter information is discarded.
I am using is NVIVO- known for its power for content analysis --
However - I am finding NVIVO unwieldy for a data of this size (~250,000 tweets). I wanted each unique Tweet to function as its own case. But - Nvivo would crash everytime. I have 18G RAM and a Raid Array. I do not have a server - although I could get one.
I am working and coding side by side in Excel and in NVIVO with my data in 10 large sections of .csv files, instead of individual cases- and this is working (but laborious).
QUESTION: Do you have any suggestions for software for large-scale content analysis of Tweets? I Do not need SNA capabilities.
Thank you very much, Fran Hodgkins Doctoral Candidate (currently suffering through Chapter 4) Grand Canyon University USA _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Hi Parvathi, I analyzed a couple hundred thousand tweets using NodeXL Pro, a fair portion of which were in Spanish. I was looking at culturally distinct notions of temporality, and generated an exhaustive word list for sentiment analysis in English, Spanish, and Korean. Hope this is helpful. Best, Diana Diana L. Ascher, PhD, MBA Department of Information Studies University of California, Los Angeles 290 Charles E. Young Drive North Los Angeles, CA 90095 dianaascher@ucla.edu <mailto:dianaascher@ucla.edu> @dianaascher <http://twitter.com/dianaascher> This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received it by mistake, please let me know by email reply and delete it from your system; you may not copy this message or disclose its contents to anyone. The integrity and security of this message cannot be guaranteed on the internet.
On May 24, 2018, at 11:55 AM, Parvathi Subbiah <pas89@cam.ac.uk> wrote:
Hi Craig and AIR team,
Along these lines, I was wondering if anyone would be able to point me at some methods that could help me analyse twitter data in Spanish? I’m trying to gauge the ‘emotional’ language used by Chavistas in the (very recent) Venezuelan election by looking at a set of 100 twitter accounts that are dedicated to publishing content in favour of Maduro (the current president), and 100 twitter accounts that are dedicated to publishing opposition content, as a control. I thought I might be able to use your script as well, Craig? Do you think it might be worth a try somehow in Spanish?
I am able to download all the tweets using “Twint” https://github.com/haccer/twint <https://github.com/haccer/twint> I’m attaching the link as this might be of great use to some researchers out there. It also has a module that can be run from a Python console, or it can be run with commands directly from the terminal. It is extremely fast and reliable. It can also download by hashtags, or search queries, download all followers or the accounts that the user follows.
Aside from something very basic such as word count of emotional words (love, hate, etc) a list that I will create in Spanish itself, I’m wondering if there are other interesting methods that could be applied to this set. I have also thought of choosing the 50 tweets that have been retweeted the most by Chavistas and do a “manual” coding of these? And try to correlate popularity with emotional density (measured by emotional words/ overall words in a tweet). But curious to see if there are other ideas, perhaps related to topics and their visualisation? Or any ideas for training an algorithm to code by emotional theme/topic for the rest of the dataset? Or ideas about looking at the dataset historically?
Really any help, methodological ideas, visualisation ideas are greatly, greatly appreciated! Many thanks everyone,
Warmly, Parvathi
________
Parvathi Subbiah PhD Candidate, Gates Cambridge Scholar Department of Politics and International Studies Centre for Latin American Studies University of Cambridge
On 24 May 2018, at 17:33, Craig Hamilton <Craig.Hamilton@bcu.ac.uk> wrote:
Hi Fran
I’ve done some work around analysing tweets (and other text from social media) using R. I’ve put together a walkthrough video, sample data set, and the relevant code here on this blog post. http://harkive.org/h17-text-analysis/ - you’d be most welcome to use some or all of those resources.
Don’t worry if you’ve not used R before - the script I’ve provided in that post should work if you create a copy of your dataset and change the column names to match the sample dataset I’ve provided. I’ve not used R with a dataset of the size you’re dealing with, so I can’t tell you how well it / your computer will handle things. Batches might be an idea, then, as suggested below, certainly if you want to try things out.
The script eventually runs into some Topic Modelling and Sentiment Analysis, but you can run through it section by section until you reach the end of the initial exploratory stage (word frequencies and so on). This might help you make some sense of what’s in the dataset, and will help you weed out any unwanted elements.
Happy to help if you want to run with any the above - I’d be intrigued by what the script I wrote came up with using a different type of data.
Kind regards Craig
Dr Craig Hamilton School of Media 3rd Floor, The Parkside Building Birmingham City University Birmingham, B5 07740 358162 t: @craigfots e: craig.hamilton@bcu.ac.uk<mailto:craig.hamilton@bcu.ac.uk> On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins@gmail.com<mailto:frances.hodgkins@gmail.com>> wrote:
All- I am working on a qualitative content analysis of a historical tweet set from CrisisNLP from Imran et al.,(2016). http://crisisnlp.qcri.org/lrec2016/lrec2016.html I am using the California Earthquake dataset. The Tweets have been stripped down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of the Twitter information is discarded.
I am using is NVIVO- known for its power for content analysis --
However - I am finding NVIVO unwieldy for a data of this size (~250,000 tweets). I wanted each unique Tweet to function as its own case. But - Nvivo would crash everytime. I have 18G RAM and a Raid Array. I do not have a server - although I could get one.
I am working and coding side by side in Excel and in NVIVO with my data in 10 large sections of .csv files, instead of individual cases- and this is working (but laborious).
QUESTION: Do you have any suggestions for software for large-scale content analysis of Tweets? I Do not need SNA capabilities.
Thank you very much, Fran Hodgkins Doctoral Candidate (currently suffering through Chapter 4) Grand Canyon University USA _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Dear Parvathi, Thanks for the email. By all means please do use the script. It should certainly work with the initial Document Term Matrix construction providing you amend the vector of Stop Words to Spanish, although you might encounter problems when words are stemmed to their roots. If you can get over those hurdles then the Topic Modelling element *should* work as this is based on the numbers in the DTM, rather than the words themselves. The Sentiment Analysis part will require sourcing Spanish rather than English reference libraries. I’ve not had a need to search for those, but I’d be surprised if such things didn’t exist. Rather than clogging up this list, perhaps you could email directly if you require further assistance. Fran, likewise, feel free to drop me a line. The same offer extends to any other AOIR members, of course. Kind regards Craig On 24 May 2018, at 19:55, Parvathi Subbiah <pas89@cam.ac.uk<mailto:pas89@cam.ac.uk>> wrote: Hi Craig and AIR team, Along these lines, I was wondering if anyone would be able to point me at some methods that could help me analyse twitter data in Spanish? I’m trying to gauge the ‘emotional’ language used by Chavistas in the (very recent) Venezuelan election by looking at a set of 100 twitter accounts that are dedicated to publishing content in favour of Maduro (the current president), and 100 twitter accounts that are dedicated to publishing opposition content, as a control. I thought I might be able to use your script as well, Craig? Do you think it might be worth a try somehow in Spanish? I am able to download all the tweets using “Twint” https://github.com/haccer/twint I’m attaching the link as this might be of great use to some researchers out there. It also has a module that can be run from a Python console, or it can be run with commands directly from the terminal. It is extremely fast and reliable. It can also download by hashtags, or search queries, download all followers or the accounts that the user follows. Aside from something very basic such as word count of emotional words (love, hate, etc) a list that I will create in Spanish itself, I’m wondering if there are other interesting methods that could be applied to this set. I have also thought of choosing the 50 tweets that have been retweeted the most by Chavistas and do a “manual” coding of these? And try to correlate popularity with emotional density (measured by emotional words/ overall words in a tweet). But curious to see if there are other ideas, perhaps related to topics and their visualisation? Or any ideas for training an algorithm to code by emotional theme/topic for the rest of the dataset? Or ideas about looking at the dataset historically? Really any help, methodological ideas, visualisation ideas are greatly, greatly appreciated! Many thanks everyone, Warmly, Parvathi ________ Parvathi Subbiah PhD Candidate, Gates Cambridge Scholar Department of Politics and International Studies Centre for Latin American Studies University of Cambridge On 24 May 2018, at 17:33, Craig Hamilton <Craig.Hamilton@bcu.ac.uk<mailto:Craig.Hamilton@bcu.ac.uk>> wrote: Hi Fran I’ve done some work around analysing tweets (and other text from social media) using R. I’ve put together a walkthrough video, sample data set, and the relevant code here on this blog post. http://harkive.org/h17-text-analysis/ - you’d be most welcome to use some or all of those resources. Don’t worry if you’ve not used R before - the script I’ve provided in that post should work if you create a copy of your dataset and change the column names to match the sample dataset I’ve provided. I’ve not used R with a dataset of the size you’re dealing with, so I can’t tell you how well it / your computer will handle things. Batches might be an idea, then, as suggested below, certainly if you want to try things out. The script eventually runs into some Topic Modelling and Sentiment Analysis, but you can run through it section by section until you reach the end of the initial exploratory stage (word frequencies and so on). This might help you make some sense of what’s in the dataset, and will help you weed out any unwanted elements. Happy to help if you want to run with any the above - I’d be intrigued by what the script I wrote came up with using a different type of data. Kind regards Craig Dr Craig Hamilton School of Media 3rd Floor, The Parkside Building Birmingham City University Birmingham, B5 07740 358162 t: @craigfots e: craig.hamilton@bcu.ac.uk<mailto:craig.hamilton@bcu.ac.uk><mailto:craig.hamilton@bcu.ac.uk> On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins@gmail.com<mailto:frances.hodgkins@gmail.com><mailto:frances.hodgkins@gmail.com>> wrote: All- I am working on a qualitative content analysis of a historical tweet set from CrisisNLP from Imran et al.,(2016). http://crisisnlp.qcri.org/lrec2016/lrec2016.html I am using the California Earthquake dataset. The Tweets have been stripped down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of the Twitter information is discarded. I am using is NVIVO- known for its power for content analysis -- However - I am finding NVIVO unwieldy for a data of this size (~250,000 tweets). I wanted each unique Tweet to function as its own case. But - Nvivo would crash everytime. I have 18G RAM and a Raid Array. I do not have a server - although I could get one. I am working and coding side by side in Excel and in NVIVO with my data in 10 large sections of .csv files, instead of individual cases- and this is working (but laborious). QUESTION: Do you have any suggestions for software for large-scale content analysis of Tweets? I Do not need SNA capabilities. Thank you very much, Fran Hodgkins Doctoral Candidate (currently suffering through Chapter 4) Grand Canyon University USA _______________________________________________ The Air-L@listserv.aoir.org<mailto:Air-L@listserv.aoir.org> mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/ _______________________________________________ The Air-L@listserv.aoir.org<mailto:Air-L@listserv.aoir.org> mailing list is provided by the Association of Internet Researchers http://aoir.org<http://aoir.org/> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
On Thu, May 24 2018, Parvathi Subbiah wrote:
Along these lines, I was wondering if anyone would be able to point me at some methods that could help me analyse twitter data in Spanish? [...] Or ideas about looking at the dataset historically?
Really any help, methodological ideas, visualisation ideas are greatly, greatly appreciated! Many thanks everyone,
There was a group of researchers working on the Twitter activity of the M15 (Indignados, aka Occupy in Spain, etc.) movement. They did have interesting approaches to time series data. Here is their blog: https://datanalysis15m.wordpress.com/ The most interesting from the conference advertised in the first entry for me was Miguel Aguilera's pink noise analysis: https://tecnopolitica.net/sites/default/files/miguelaguilera.pdf All the code should be online, and I guess they should know about tips for Spanish language Twitter analysis... Hope this helps, -- Maxigas, kiberpunk FA00 8129 13E9 2617 C614 0901 7879 63BC 287E D166 Lecturer in Critical Digital Media Practice Centre for Science Studies Department of Sociology Lancaster University https://relay70.metatron.ai/ Unix is a Registered Bell of AT&T Trademark Laboratories. - Donn Seeley O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
participants (6)
-
Craig Hamilton -
Diana Ascher -
f hodgkins -
maxigas -
Melissa Bliss -
Parvathi Subbiah