Hi Monica, Congrats on your thesis. I will take a stab at your questions. I think what might be problematic is the conception of inference. In inferential statistics, the base definition of inference is drawing inference about a larger population from a sampled data set. In these cases, we the golden sampling method is SRSWR, though population- inferential statistics are commonly computed on SRS samples, cluster samples, multi-stage samples, and so on. To produce unbiased estimates, inferential statistical methods have a set of assumptions. OLS, for example, has a number of assumptions - IV's variation is not random, no multicollinearity, homoskedasticity, mean of residuals is zero. Now, many of these assumptions are met when the sample that produced the data was a probability sample. If the estimates are unbiased, you can calculate variance, standard errors and confidence intervals for the population. Importantly, a sample does not require a random draw for valid inferential estimation. If a purposive sample can meet the assumptions of an inferential model, you can certainly produce unbiased estimates. However, gauging the degree of unbiasedness in a purposive sample is difficult, so it is unwise to assume true unbiasedness. Let me focus on your question regarding the propriety of using inferential techniques on purposive samples. A different, and complementary use of inferential statistics is to draw inferences about relations in data. For example, to test differences between groups or the relations between many variables in an analysis. In this case, the inference is not population-level; rather, it describes the relations in the population at hand. In these cases, we cannot argue that our estimates are representative and unbiased, but many models are robust enough, and have enough diagnostic features, that we can generally gauge the validity of the measures. In these cases, if we realize and report the limitations of the model, it is appropriate to use them. Now, the second part of your question dealt with parametric and non- parametric methods. In statistics, "parametric" is used to describe how the population fits to the parameters of a distribution. In most cases, we are concerned with the normal distribution. When the distribution of a population is non-parametric, it does not fit a particular distribution. Often this happens in cases where our sample is quite small. In this case a nonparametric method would apply. However, as populations grow larger, they tend to fit into distributions and distribution-appropriate methods would apply. The application of inferential methods to non-probability samples is appropriate if the inferences are to be drawn within the sample, and the characteristics of the distributions reasonably meet the criteria of the method. You generally should be careful when making claims outside of the sample (its representativeness) or to the degree of the un-biasedness, but you can use these techniques to make inferential estimates regarding the data at hand. Finally, with regards to confidence levels, in a between-means comparison such as a t-test, we are comparing the hypothetical distributions of the groups, and the significance test provides our intervals for comparison. Thanks, Fred On Sep 2, 2009, at 10:12 PM, Monica Barratt wrote:
Hi everyone
I'm currently writing up my thesis which has the working title 'Researching the forums: Illicit drug use in a networked world'. I conducted an online survey using a purposive (nonprobability) sample of illicit drug users who used internet message boards (forums) to discuss or read about drugs. Originally I intended to conduct inferential statistics on this sample of 915, as this is the general practice in many other papers I had read. After some more thought though, I'm leaning away from that.
Following is my thinking about this issue. I would really appreciate some feedback on this from anyone with an interest in this areas (non experts welcome too!)
*My understanding of the sampling and statistical inference in my thesis work*
There are two types of samples: probability and nonprobability. Probability samples occur when each individual from the population of interest has an equal (non-zero) chance of being included in the sample (random selection). In contrast, nonprobability samples contain self-selected individuals from a population of interest - not everyone has a chance of participating, so we can't calculate the relationship between the sample and the population of interest.
Probability samples of illicit drug users are rare. This is because to conduct a probability sample, the researcher needs to have a defined population, such as a list of students or phone numbers of households. Illicit drug use is a rare behaviour on a population level (excluding perhaps, ever use of cannabis) and it is unlikely that list of drug users will exist given the illegality of the behaviour and reluctance to self-identify on such a list.
Inferential statistics are not compatible with nonprobability samples. A core assumption of the use of inferential statistics is that individuals are randomly selected from the population of interest. Without this randomness, the logic of inferential statistics does not hold.
Inferential statistics can be further categorised into parametric and non-parametric statistical methods. These types of inferential statistics are chosen depending upon the distribution of the variables to be analysed; eg. parametric statistics for continuous normal variables and nonparametric statistics for nonnormal or categorical/ordinal variables.
Nonparametric or distribution free statistics are still inferential. So they too are incompatible with nonprobability samples.
Descriptive statistics can still be applied to nonprobability samples to determine the relationships between variables in the dataset. What should not be done is 'significance testing' as the aim of this testing is to determine whether a relationship is strong enough or a difference is large enough, given the sample size, to be representative of a difference in the population. This assumes that the sample has a known relationship to the population. This is meaningless when applied to a nonprobability sample.
There are still good reasons to conduct a nonprobability sample. There are simply situations when probability samples are impossible to obtain or just too expensive (arguable this applies to my population of interest). They are also useful in exploratory or preliminary studies (also relevant to me). The trick is not to apply inappropriate statistical tests to data collected in this way.
Why is it then that we see probability statistics routinely conducted upon nonprobability samples, especially in the drug studies field? Is it something about making our research appear more scientific with the addition of a p < .05? Is it ignorance? Or do I have it wrong myself? Are there times when inferential statistics, eg. a t-test or a correlation co- efficient can be applied to nonprobability samples? Are there any exceptions to this rule?
-- Monica Barratt BSc(Psych); PhD in progress... National Drug Research Institute Melbourne, Victoria, Australia http://preview.tinyurl.com/lwyyzq _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Fred Stutzman Ph.D. Student and Teaching Fellow School of Information and Library Science, UNC-Chapel Hill fred@fredstutzman.com | (919) 260-8508 | http://fredstutzman.com/