Hi Fran I’ve done some work around analysing tweets (and other text from social media) using R. I’ve put together a walkthrough video, sample data set, and the relevant code here on this blog post. http://harkive.org/h17-text-analysis/ - you’d be most welcome to use some or all of those resources. Don’t worry if you’ve not used R before - the script I’ve provided in that post should work if you create a copy of your dataset and change the column names to match the sample dataset I’ve provided. I’ve not used R with a dataset of the size you’re dealing with, so I can’t tell you how well it / your computer will handle things. Batches might be an idea, then, as suggested below, certainly if you want to try things out. The script eventually runs into some Topic Modelling and Sentiment Analysis, but you can run through it section by section until you reach the end of the initial exploratory stage (word frequencies and so on). This might help you make some sense of what’s in the dataset, and will help you weed out any unwanted elements. Happy to help if you want to run with any the above - I’d be intrigued by what the script I wrote came up with using a different type of data. Kind regards Craig Dr Craig Hamilton School of Media 3rd Floor, The Parkside Building Birmingham City University Birmingham, B5 07740 358162 t: @craigfots e: craig.hamilton@bcu.ac.uk<mailto:craig.hamilton@bcu.ac.uk> On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins@gmail.com<mailto:frances.hodgkins@gmail.com>> wrote: All- I am working on a qualitative content analysis of a historical tweet set from CrisisNLP from Imran et al.,(2016). http://crisisnlp.qcri.org/lrec2016/lrec2016.html I am using the California Earthquake dataset. The Tweets have been stripped down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of the Twitter information is discarded. I am using is NVIVO- known for its power for content analysis -- However - I am finding NVIVO unwieldy for a data of this size (~250,000 tweets). I wanted each unique Tweet to function as its own case. But - Nvivo would crash everytime. I have 18G RAM and a Raid Array. I do not have a server - although I could get one. I am working and coding side by side in Excel and in NVIVO with my data in 10 large sections of .csv files, instead of individual cases- and this is working (but laborious). QUESTION: Do you have any suggestions for software for large-scale content analysis of Tweets? I Do not need SNA capabilities. Thank you very much, Fran Hodgkins Doctoral Candidate (currently suffering through Chapter 4) Grand Canyon University USA _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/