[Air-L] A question for researchers interested in the basics of statistical inference

3 Sep 2009

      Hi everyone

I'm currently writing up my thesis which has the working title 'Researching
the forums: Illicit drug use in a networked world'. I conducted an online
survey using a purposive (nonprobability) sample of illicit drug users who
used internet message boards (forums) to discuss or read about drugs.
Originally I intended to conduct inferential statistics on this sample of
915, as this is the general practice in many other papers I had read. After
some more thought though, I'm leaning away from that.

Following is my thinking about this issue. I would really appreciate some
feedback on this from anyone with an interest in this areas (non experts
welcome too!)

*My understanding of the sampling and statistical inference in my thesis
work*

There are two types of samples: probability and nonprobability. Probability
samples occur when each individual from the population of interest has an
equal (non-zero) chance of being included in the sample (random selection).
In contrast, nonprobability samples contain self-selected individuals from a
population of interest - not everyone has a chance of participating, so we
can't calculate the relationship between the sample and the population of
interest.

Probability samples of illicit drug users are rare. This is because to
conduct a probability sample, the researcher needs to have a defined
population, such as a list of students or phone numbers of households.
Illicit drug use is a rare behaviour on a population level (excluding
perhaps, ever use of cannabis) and it is unlikely that list of drug users
will exist given the illegality of the behaviour and reluctance to
self-identify on such a list.

Inferential statistics are not compatible with nonprobability samples. A
core assumption of the use of inferential statistics is that individuals are
randomly selected from the population of interest. Without this randomness,
the logic of inferential statistics does not hold.

Inferential statistics can be further categorised into parametric and
non-parametric statistical methods. These types of inferential statistics
are chosen depending upon the distribution of the variables to be analysed;
eg. parametric statistics for continuous normal variables and nonparametric
statistics for nonnormal or categorical/ordinal variables.

Nonparametric or distribution free statistics are still inferential. So they
too are incompatible with nonprobability samples.

Descriptive statistics can still be applied to nonprobability samples to
determine the relationships between variables in the dataset. What should
not be done is 'significance testing' as the aim of this testing is to
determine whether a relationship is strong enough or a difference is large
enough, given the sample size, to be representative of a difference in the
population. This assumes that the sample has a known relationship to the
population. This is meaningless when applied to a nonprobability sample.

There are still good reasons to conduct a nonprobability sample. There are
simply situations when probability samples are impossible to obtain or just
too expensive (arguable this applies to my population of interest). They are
also useful in exploratory or preliminary studies (also relevant to me). The
trick is not to apply inappropriate statistical tests to data collected in
this way.

Why is it then that we see probability statistics routinely conducted upon
nonprobability samples, especially in the drug studies field? Is it
something about making our research appear more scientific with the addition
of a p < .05? Is it ignorance? Or do I have it wrong myself? Are there times
when inferential statistics, eg. a t-test or a correlation co-efficient can
be applied to nonprobability samples? Are there any exceptions to this rule?

-- 
Monica Barratt
BSc(Psych); PhD in progress...
National Drug Research Institute
Melbourne, Victoria, Australia
http://preview.tinyurl.com/lwyyzq