Lots of useful responses so far. Just wanted to add that we've dealt with a similar question in attempting to move from qualitative human coding to natural language processing. It was useful for us to think about the relationship between the phenomenon of interest and the units of analysis. ie. Do you have good theoretical (or prior empirical) reasons to believe that the differences between men and women that you are interested in vary with the number of words? If you are talking about individual word choice then the number of words that you sample ought to be relevant. If, though, the phenomena that you are interested in is at a different level of analysis, say paragraph level or post level, then your sample reasoning should match that. Perhaps the differences are in post openings or closings? Once you nail the unit of analysis question then you need to ask what you know about the population distribution of the phenomena you are interested in. For example if it shows up only once in every (approx) 1,000 words, then you'll need to sample enough 1,000 word units to ensure that you have enough possible places that it might have shown up (ie something like 300 x 1000) for the inferential logic to work. It's also possible that you don't yet know the patterns of difference, those might be what you are seeking to discover, although that would seem to call for a qualitative phase. In that case a logic of sufficiency (ie I've now seen enough examples, and I'm not seeing any new types, usually called "exhaustion", in reference to concepts, not the coder!) might help you determine when to stop coding. Of course such a strategy means that the claims you can make are different (ie this is a theory generative, not a theory testing, methodology). Once that process is done you'll have a better idea of the likely population distribution of your phenomena, which will then give you insight into what sample size you'd need to test your theory. Cheers, James <credibility information redacted ;> On 17 Aug 2009, at 8:29 PM, Karyn Hollis wrote:
Hi All-- This is a newbie question. I am planning to do a quantitative data analysis to study blogs for gender differences in CMC. Are there any rules for the size of samples? Would comparing male to female blog texts of a total of 50,000 words each be enough to claim statistical significance for any differences I find? Thanks for any advice, Karyn Hollis Villanova University _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/