Re: [Air-L] Text Sample Size?

18 Aug 2009

      Lots of useful responses so far. Just wanted to add that we've dealt  
with a similar question in attempting to move from qualitative human  
coding to natural language processing.  It was useful for us to think  
about the relationship between the phenomenon of interest and the  
units of analysis.

ie. Do you have good theoretical (or prior empirical) reasons to  
believe that the differences between men and women that you are  
interested in vary with the number of words?  If you are talking about  
individual word choice then the number of words that you sample ought  
to be relevant.  If, though, the phenomena that you are interested in  
is at a different level of analysis, say paragraph level or post  
level, then your sample reasoning should match that.  Perhaps the  
differences are in post openings or closings?

Once you nail the unit of analysis question then you need to ask what  
you know about the population distribution of the phenomena you are  
interested in.  For example if it shows up only once in every (approx)  
1,000 words, then you'll need to sample enough 1,000 word units to  
ensure that you have enough possible places that it might have shown  
up (ie something like 300 x 1000) for the inferential logic to work.

It's also possible that you don't yet know the patterns of difference,  
those might be what you are seeking to discover, although that would  
seem to call for a qualitative phase.  In that case a logic of  
sufficiency (ie I've now seen enough examples, and I'm not seeing any  
new types, usually called "exhaustion", in reference to concepts, not  
the coder!) might help you determine when to stop coding.  Of course  
such a strategy means that the claims you can make are different (ie  
this is a theory generative, not a theory testing, methodology).  Once  
that process is done you'll have a better idea of the likely  
population distribution of your phenomena, which will then give you  
insight into what sample size you'd need to test your theory.

Cheers,
James
<credibility information redacted ;>

On 17 Aug 2009, at 8:29 PM, Karyn Hollis wrote:
...
Hi All--
  This is a newbie question.  I am planning to do a quantitative data
  analysis to study blogs for gender differences in CMC.  Are there  
any
  rules for the size of samples?  Would comparing male to female blog
  texts of a total of 50,000 words each be enough to claim statistical
  significance for any differences I find?
  Thanks for any advice,
  Karyn Hollis
  Villanova University
_______________________________________________
The Air-L@listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers:
http://www.aoir.org/

Re: [Air-L] Text Sample Size?

James Howison