Re: [Air-L] Text Sample Size?

18 Aug 2009

      Alex, I believe you are right. There is no answer to the question 'how many
observations do I need to enable statistically significance' as a rule of
thumb. But, if you know a bit more about your planned analyses in advance,
you may be able to estimate sample size using power tables.

See
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences
(2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.

Power calculations are useful only if your data also meet other criteria
though, which need to be considered before you should be applying
inferential statistics.

One of the problems with web-based data is the relative ease of collecting
'large numbers' of responses, words, observations, etc. People often think
that large numbers means they can 'find statistical significance'. What
matters is the way you are sampling those units and how you have defined the
larger population about which you hope to infer - and other elements such as
the expected differences between groups and effect sizes, as Alex and Peter
have already mentioned.

Although a lot of this is standard methods textbook content, it's surprising
how many published articles use statistical inference in situation where
assumptions for it aren't met. Indeed, I'm still trying to get my head
around it. Colleagues of mine have said things like 'it's not a random
sample and I don't want to generalise my results to a larger population as I
know I cannot, but I can still use statistical tests to test variables
within my data, right?' Given these things get published, I'm confused
myself. Then again, what is theoretically correct and what gets published
aren't necessarily the same thing...

Some answers and more questions for you!

Monica

Monica Barratt
http://db.ndri.curtin.edu.au/student.asp?persid=650&typeid=1

2009/8/18 Alex Halavais <alex@halavais.net>
...
Karyn & Peter,
I'm hoping someone out there will correct me. I think you are looking
for something like a rule of thumb, and I suspect that doesn't exist.
There are two questions. The first is how many blogs/bloggers you need
to sample in order to generalize to all bloggers. I'm guessing that's
not your question. (Although given the issues of arriving at a
representative sample, it is not a trivial one.)
I think the question you are asking is (a) how many different bloggers
you will need to sample in order to have the power necessary to
demonstrate a significant difference between groups, and (b) how much
text from each of these bloggers you will need.
Of course, that question hinges in part on the distribution of
differences within your groups. That, in turn, is dependent on
precisely how you are measuring such differences. (And we'll leave
aside, for the moment, the question of whether those differences make
a difference--i.e., the validity of whatever measure you choose to
use.)
If you are using a metric that has been used in the past to show
gender differences, you may be able to use whatever differences they
found--in group and between--to estimate your own sample needs. In
practice, though, if that literature exists--you probably just use the
same sample size.
So, that is my non-answer.
- Alex
--
//
// This email is
// [x] assumed public and may be blogged / forwarded.
// [ ] assumed to be private, please ask before redistributing.
//
// Alexander C. Halavais, ciberflâneur
// http://alex.halavais.net
//
On Mon, Aug 17, 2009 at 10:16 PM, Peter Timusk<ptimusk@sympatico.ca>
wrote:
...
I have no idea of samples of words. I do know samples of persons.
A sample of persons below say 300 is suspect especially if not random. I
am
reading a few books in Internet studies that argue against previous
studies
by claiming the sample is too small and not random.
You can claim somethings with samples as small as 12 but the more items
you
want to measure the larger your sample should be IMHO.
A sample is best random. Some would argue a sample is only a sample if
random. You can sample randomly and still choose roughly equal men and
women.
Can you randomize your samples in some ways?
The Canadian Internet Use Survey has had a sample of more than 20,000
persons.
All that I am saying is probably to be found in most undergraduate
statistics books.
You would need  to ask text analysts about how to sample texts.
I follow gender and computers so would be interested in your results or
what
you are looking for.
Peter
On 17-Aug-09, at 8:29 PM, Karyn Hollis wrote:
...
Hi All--
 This is a newbie question.  I am planning to do a quantitative data
 analysis to study blogs for gender differences in CMC.  Are there any
 rules for the size of samples?  Would comparing male to female blog
 texts of a total of 50,000 words each be enough to claim statistical
 significance for any differences I find?
 Thanks for any advice,
 Karyn Hollis
 Villanova University
Peter Timusk statistical computer programmer
ptimusk@sympatico.ca
address 701-151 Parkdale Avenue
Ottawa, Ontario Canada K1Y 4V8
Phone 613-729-8328
May all your numbers be quality numbers... even if they are only average
numbers.
_______________________________________________
The Air-L@listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers:
http://www.aoir.org/
_______________________________________________
The Air-L@listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers:
http://www.aoir.org/