Alex, I believe you are right. There is no answer to the question 'how many observations do I need to enable statistically significance' as a rule of thumb. But, if you know a bit more about your planned analyses in advance, you may be able to estimate sample size using power tables. See Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates. Power calculations are useful only if your data also meet other criteria though, which need to be considered before you should be applying inferential statistics. One of the problems with web-based data is the relative ease of collecting 'large numbers' of responses, words, observations, etc. People often think that large numbers means they can 'find statistical significance'. What matters is the way you are sampling those units and how you have defined the larger population about which you hope to infer - and other elements such as the expected differences between groups and effect sizes, as Alex and Peter have already mentioned. Although a lot of this is standard methods textbook content, it's surprising how many published articles use statistical inference in situation where assumptions for it aren't met. Indeed, I'm still trying to get my head around it. Colleagues of mine have said things like 'it's not a random sample and I don't want to generalise my results to a larger population as I know I cannot, but I can still use statistical tests to test variables within my data, right?' Given these things get published, I'm confused myself. Then again, what is theoretically correct and what gets published aren't necessarily the same thing... Some answers and more questions for you! Monica Monica Barratt http://db.ndri.curtin.edu.au/student.asp?persid=650&typeid=1 2009/8/18 Alex Halavais <alex@halavais.net>
Karyn & Peter,
I'm hoping someone out there will correct me. I think you are looking for something like a rule of thumb, and I suspect that doesn't exist.
There are two questions. The first is how many blogs/bloggers you need to sample in order to generalize to all bloggers. I'm guessing that's not your question. (Although given the issues of arriving at a representative sample, it is not a trivial one.)
I think the question you are asking is (a) how many different bloggers you will need to sample in order to have the power necessary to demonstrate a significant difference between groups, and (b) how much text from each of these bloggers you will need.
Of course, that question hinges in part on the distribution of differences within your groups. That, in turn, is dependent on precisely how you are measuring such differences. (And we'll leave aside, for the moment, the question of whether those differences make a difference--i.e., the validity of whatever measure you choose to use.)
If you are using a metric that has been used in the past to show gender differences, you may be able to use whatever differences they found--in group and between--to estimate your own sample needs. In practice, though, if that literature exists--you probably just use the same sample size.
So, that is my non-answer.
- Alex
-- // // This email is // [x] assumed public and may be blogged / forwarded. // [ ] assumed to be private, please ask before redistributing. // // Alexander C. Halavais, ciberflâneur // http://alex.halavais.net //
On Mon, Aug 17, 2009 at 10:16 PM, Peter Timusk<ptimusk@sympatico.ca> wrote:
I have no idea of samples of words. I do know samples of persons.
A sample of persons below say 300 is suspect especially if not random. I am reading a few books in Internet studies that argue against previous studies by claiming the sample is too small and not random.
You can claim somethings with samples as small as 12 but the more items you want to measure the larger your sample should be IMHO.
A sample is best random. Some would argue a sample is only a sample if random. You can sample randomly and still choose roughly equal men and women.
Can you randomize your samples in some ways?
The Canadian Internet Use Survey has had a sample of more than 20,000 persons.
All that I am saying is probably to be found in most undergraduate statistics books.
You would need to ask text analysts about how to sample texts.
I follow gender and computers so would be interested in your results or what you are looking for.
Peter
On 17-Aug-09, at 8:29 PM, Karyn Hollis wrote:
Hi All-- This is a newbie question. I am planning to do a quantitative data analysis to study blogs for gender differences in CMC. Are there any rules for the size of samples? Would comparing male to female blog texts of a total of 50,000 words each be enough to claim statistical significance for any differences I find? Thanks for any advice, Karyn Hollis Villanova University
Peter Timusk statistical computer programmer ptimusk@sympatico.ca address 701-151 Parkdale Avenue Ottawa, Ontario Canada K1Y 4V8 Phone 613-729-8328
May all your numbers be quality numbers... even if they are only average numbers.
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/