Greetings, I am beginning a project for which we have 4 private smoking cessation groups (3 months each) to analyze (we are writing a grant to collect more data as well). I have seen some of the work related to sentiment analysis on Twitter but I am interested in developing a system that is better tailored to our data (e.g. being smokefree has specific meaning in this context). We are therefore considering developing a system by content coding a number of tweets (either having researchers code them or by having smokers code them) for positive and negative valence and perhaps some discrete emotions (e.g. sadness or hope). How many coded tweets would we need to train a simple machine learning system on our dataset (for example one of the many possibilities in R) and what are the best out-of-box programs to use? I know a bit about content analysis and about smoking cessation but not so much about machine learning. So bear in mind that you are dealing with a novice. Actually, if anyone would be interested in collaborating on the project who actually does know what they are doing, they would be welcome as well. Ashley