Hi everyone I'm currently writing up my thesis which has the working title 'Researching the forums: Illicit drug use in a networked world'. I conducted an online survey using a purposive (nonprobability) sample of illicit drug users who used internet message boards (forums) to discuss or read about drugs. Originally I intended to conduct inferential statistics on this sample of 915, as this is the general practice in many other papers I had read. After some more thought though, I'm leaning away from that. Following is my thinking about this issue. I would really appreciate some feedback on this from anyone with an interest in this areas (non experts welcome too!) *My understanding of the sampling and statistical inference in my thesis work* There are two types of samples: probability and nonprobability. Probability samples occur when each individual from the population of interest has an equal (non-zero) chance of being included in the sample (random selection). In contrast, nonprobability samples contain self-selected individuals from a population of interest - not everyone has a chance of participating, so we can't calculate the relationship between the sample and the population of interest. Probability samples of illicit drug users are rare. This is because to conduct a probability sample, the researcher needs to have a defined population, such as a list of students or phone numbers of households. Illicit drug use is a rare behaviour on a population level (excluding perhaps, ever use of cannabis) and it is unlikely that list of drug users will exist given the illegality of the behaviour and reluctance to self-identify on such a list. Inferential statistics are not compatible with nonprobability samples. A core assumption of the use of inferential statistics is that individuals are randomly selected from the population of interest. Without this randomness, the logic of inferential statistics does not hold. Inferential statistics can be further categorised into parametric and non-parametric statistical methods. These types of inferential statistics are chosen depending upon the distribution of the variables to be analysed; eg. parametric statistics for continuous normal variables and nonparametric statistics for nonnormal or categorical/ordinal variables. Nonparametric or distribution free statistics are still inferential. So they too are incompatible with nonprobability samples. Descriptive statistics can still be applied to nonprobability samples to determine the relationships between variables in the dataset. What should not be done is 'significance testing' as the aim of this testing is to determine whether a relationship is strong enough or a difference is large enough, given the sample size, to be representative of a difference in the population. This assumes that the sample has a known relationship to the population. This is meaningless when applied to a nonprobability sample. There are still good reasons to conduct a nonprobability sample. There are simply situations when probability samples are impossible to obtain or just too expensive (arguable this applies to my population of interest). They are also useful in exploratory or preliminary studies (also relevant to me). The trick is not to apply inappropriate statistical tests to data collected in this way. Why is it then that we see probability statistics routinely conducted upon nonprobability samples, especially in the drug studies field? Is it something about making our research appear more scientific with the addition of a p < .05? Is it ignorance? Or do I have it wrong myself? Are there times when inferential statistics, eg. a t-test or a correlation co-efficient can be applied to nonprobability samples? Are there any exceptions to this rule? -- Monica Barratt BSc(Psych); PhD in progress... National Drug Research Institute Melbourne, Victoria, Australia http://preview.tinyurl.com/lwyyzq