A question for researchers interested in the basics of statistical inference
Hi everyone I'm currently writing up my thesis which has the working title 'Researching the forums: Illicit drug use in a networked world'. I conducted an online survey using a purposive (nonprobability) sample of illicit drug users who used internet message boards (forums) to discuss or read about drugs. Originally I intended to conduct inferential statistics on this sample of 915, as this is the general practice in many other papers I had read. After some more thought though, I'm leaning away from that. Following is my thinking about this issue. I would really appreciate some feedback on this from anyone with an interest in this areas (non experts welcome too!) *My understanding of the sampling and statistical inference in my thesis work* There are two types of samples: probability and nonprobability. Probability samples occur when each individual from the population of interest has an equal (non-zero) chance of being included in the sample (random selection). In contrast, nonprobability samples contain self-selected individuals from a population of interest - not everyone has a chance of participating, so we can't calculate the relationship between the sample and the population of interest. Probability samples of illicit drug users are rare. This is because to conduct a probability sample, the researcher needs to have a defined population, such as a list of students or phone numbers of households. Illicit drug use is a rare behaviour on a population level (excluding perhaps, ever use of cannabis) and it is unlikely that list of drug users will exist given the illegality of the behaviour and reluctance to self-identify on such a list. Inferential statistics are not compatible with nonprobability samples. A core assumption of the use of inferential statistics is that individuals are randomly selected from the population of interest. Without this randomness, the logic of inferential statistics does not hold. Inferential statistics can be further categorised into parametric and non-parametric statistical methods. These types of inferential statistics are chosen depending upon the distribution of the variables to be analysed; eg. parametric statistics for continuous normal variables and nonparametric statistics for nonnormal or categorical/ordinal variables. Nonparametric or distribution free statistics are still inferential. So they too are incompatible with nonprobability samples. Descriptive statistics can still be applied to nonprobability samples to determine the relationships between variables in the dataset. What should not be done is 'significance testing' as the aim of this testing is to determine whether a relationship is strong enough or a difference is large enough, given the sample size, to be representative of a difference in the population. This assumes that the sample has a known relationship to the population. This is meaningless when applied to a nonprobability sample. There are still good reasons to conduct a nonprobability sample. There are simply situations when probability samples are impossible to obtain or just too expensive (arguable this applies to my population of interest). They are also useful in exploratory or preliminary studies (also relevant to me). The trick is not to apply inappropriate statistical tests to data collected in this way. Why is it then that we see probability statistics routinely conducted upon nonprobability samples, especially in the drug studies field? Is it something about making our research appear more scientific with the addition of a p < .05? Is it ignorance? Or do I have it wrong myself? Are there times when inferential statistics, eg. a t-test or a correlation co-efficient can be applied to nonprobability samples? Are there any exceptions to this rule? -- Monica Barratt BSc(Psych); PhD in progress... National Drug Research Institute Melbourne, Victoria, Australia http://preview.tinyurl.com/lwyyzq
Monica, There are three issues here: 1. the representativeness of your sample with respect to your target population (illicit drug users who used internet message boards) 2. the analysis that you can do with your sample and 3. What, given 1, you can say from the results of 2 about the target population 1. is tricky - you don't have any data (or do you? Does anyone else?) on the constitution of the target population. So you can't say how biased your sample is. It may not be biased - you may have a representative sample along key dimensions for your analysis. But how do you tell? This is a problem for all surveys of 'sensitive' issues. The usual resolution is to weight your sample so that on key dimensions it is representative of the target population. But this can only be done if you know the constitution of the target population with respect to these dimensions.... 2. the analysis. You can certainly do things like regression analysis with your sample. The 'statistical significance' simply tells you how confident you can be that any statistical effect is real (i.e. non random) for the sample. 3. Your problem is that, given 1 you may not be able to make claims from the results of 2 about the population - merely what you found in your sample. If the sample is biased but you can weight it to account for this bias (see 1) then your analysis results for the sample can be claimed to be true of the target population. Others may have different views :-) Ben On 3 Sep 2009, at 03:12, Monica Barratt wrote:
Hi everyone
I'm currently writing up my thesis which has the working title 'Researching the forums: Illicit drug use in a networked world'. I conducted an online survey using a purposive (nonprobability) sample of illicit drug users who used internet message boards (forums) to discuss or read about drugs. Originally I intended to conduct inferential statistics on this sample of 915, as this is the general practice in many other papers I had read. After some more thought though, I'm leaning away from that.
---- Dr Ben Anderson Sociology @ Essex http://www.essex.ac.uk/sociology/staff/profile.aspx?ID=118 Centre for Research on Economic Sociology and Innovation http://cresi.wordpress.com
and something I forgot to say...
2. the analysis. You can certainly do things like regression analysis with your sample. The 'statistical significance' simply tells you how confident you can be that any statistical effect is real (i.e. non random) for the sample.
in most situations it's recommended to look at confidence intervals not just p values because they tell you the estimated maximum and minimum size of the effect/difference given your chosen probability level. Just because an effect is statistically significant doesn't mean it's either large or important :-) http://www.bmj.com/cgi/content/abstract/292/6522/746 is a good place to start. Ben
Hi Monica, Congrats on your thesis. I will take a stab at your questions. I think what might be problematic is the conception of inference. In inferential statistics, the base definition of inference is drawing inference about a larger population from a sampled data set. In these cases, we the golden sampling method is SRSWR, though population- inferential statistics are commonly computed on SRS samples, cluster samples, multi-stage samples, and so on. To produce unbiased estimates, inferential statistical methods have a set of assumptions. OLS, for example, has a number of assumptions - IV's variation is not random, no multicollinearity, homoskedasticity, mean of residuals is zero. Now, many of these assumptions are met when the sample that produced the data was a probability sample. If the estimates are unbiased, you can calculate variance, standard errors and confidence intervals for the population. Importantly, a sample does not require a random draw for valid inferential estimation. If a purposive sample can meet the assumptions of an inferential model, you can certainly produce unbiased estimates. However, gauging the degree of unbiasedness in a purposive sample is difficult, so it is unwise to assume true unbiasedness. Let me focus on your question regarding the propriety of using inferential techniques on purposive samples. A different, and complementary use of inferential statistics is to draw inferences about relations in data. For example, to test differences between groups or the relations between many variables in an analysis. In this case, the inference is not population-level; rather, it describes the relations in the population at hand. In these cases, we cannot argue that our estimates are representative and unbiased, but many models are robust enough, and have enough diagnostic features, that we can generally gauge the validity of the measures. In these cases, if we realize and report the limitations of the model, it is appropriate to use them. Now, the second part of your question dealt with parametric and non- parametric methods. In statistics, "parametric" is used to describe how the population fits to the parameters of a distribution. In most cases, we are concerned with the normal distribution. When the distribution of a population is non-parametric, it does not fit a particular distribution. Often this happens in cases where our sample is quite small. In this case a nonparametric method would apply. However, as populations grow larger, they tend to fit into distributions and distribution-appropriate methods would apply. The application of inferential methods to non-probability samples is appropriate if the inferences are to be drawn within the sample, and the characteristics of the distributions reasonably meet the criteria of the method. You generally should be careful when making claims outside of the sample (its representativeness) or to the degree of the un-biasedness, but you can use these techniques to make inferential estimates regarding the data at hand. Finally, with regards to confidence levels, in a between-means comparison such as a t-test, we are comparing the hypothetical distributions of the groups, and the significance test provides our intervals for comparison. Thanks, Fred On Sep 2, 2009, at 10:12 PM, Monica Barratt wrote:
Hi everyone
I'm currently writing up my thesis which has the working title 'Researching the forums: Illicit drug use in a networked world'. I conducted an online survey using a purposive (nonprobability) sample of illicit drug users who used internet message boards (forums) to discuss or read about drugs. Originally I intended to conduct inferential statistics on this sample of 915, as this is the general practice in many other papers I had read. After some more thought though, I'm leaning away from that.
Following is my thinking about this issue. I would really appreciate some feedback on this from anyone with an interest in this areas (non experts welcome too!)
*My understanding of the sampling and statistical inference in my thesis work*
There are two types of samples: probability and nonprobability. Probability samples occur when each individual from the population of interest has an equal (non-zero) chance of being included in the sample (random selection). In contrast, nonprobability samples contain self-selected individuals from a population of interest - not everyone has a chance of participating, so we can't calculate the relationship between the sample and the population of interest.
Probability samples of illicit drug users are rare. This is because to conduct a probability sample, the researcher needs to have a defined population, such as a list of students or phone numbers of households. Illicit drug use is a rare behaviour on a population level (excluding perhaps, ever use of cannabis) and it is unlikely that list of drug users will exist given the illegality of the behaviour and reluctance to self-identify on such a list.
Inferential statistics are not compatible with nonprobability samples. A core assumption of the use of inferential statistics is that individuals are randomly selected from the population of interest. Without this randomness, the logic of inferential statistics does not hold.
Inferential statistics can be further categorised into parametric and non-parametric statistical methods. These types of inferential statistics are chosen depending upon the distribution of the variables to be analysed; eg. parametric statistics for continuous normal variables and nonparametric statistics for nonnormal or categorical/ordinal variables.
Nonparametric or distribution free statistics are still inferential. So they too are incompatible with nonprobability samples.
Descriptive statistics can still be applied to nonprobability samples to determine the relationships between variables in the dataset. What should not be done is 'significance testing' as the aim of this testing is to determine whether a relationship is strong enough or a difference is large enough, given the sample size, to be representative of a difference in the population. This assumes that the sample has a known relationship to the population. This is meaningless when applied to a nonprobability sample.
There are still good reasons to conduct a nonprobability sample. There are simply situations when probability samples are impossible to obtain or just too expensive (arguable this applies to my population of interest). They are also useful in exploratory or preliminary studies (also relevant to me). The trick is not to apply inappropriate statistical tests to data collected in this way.
Why is it then that we see probability statistics routinely conducted upon nonprobability samples, especially in the drug studies field? Is it something about making our research appear more scientific with the addition of a p < .05? Is it ignorance? Or do I have it wrong myself? Are there times when inferential statistics, eg. a t-test or a correlation co- efficient can be applied to nonprobability samples? Are there any exceptions to this rule?
-- Monica Barratt BSc(Psych); PhD in progress... National Drug Research Institute Melbourne, Victoria, Australia http://preview.tinyurl.com/lwyyzq _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Fred Stutzman Ph.D. Student and Teaching Fellow School of Information and Library Science, UNC-Chapel Hill fred@fredstutzman.com | (919) 260-8508 | http://fredstutzman.com/
Thanks again for all the comments. I have snatched a handful of time so will have a go at continuing the discussion... These are my current thoughts and reactions that I'm sharing with you. I would be most grateful for further feedback and especially any references/texts to back up your ideas so I can do some further reading to assist my understanding of these critical issues. In response to Ben Anderson (3/9): You mention the 3 issues: (1) how representative is my sample? (2) what analysis can I do with my sample (3) can I say anything about the larger population from my analyses As to the representativeness, it's not possible to know how representative my sample is of the target group: who are Australians who have recently used 'party drugs' (ecstasy, methamphetamine, etc) AND who use online forums / internet message boards. This subgroup of the wider population who have recently used party drugs is likely to differ on key characteristics, and I can discuss those, but I can't know the biases of this sample in comparison to the whole target group because they have never been systematically studied. As for the analysis, my concern is that - although many papers are published in this field where analyses of association (eg. regression) or difference (eg. ANOVA) are conducted on samples like mine (not randomly selected), psych and stats textbooks stress the critical assumption of having a sample frame if one wants to apply the logic of probability. As I understand it, a simple test like a t-test is asking - e.g., is the difference between scores for (say) males and females large enough that I can reject the null hypothesis that males and females in the wider population do not differ on this specific score, with 95% (or 99%) certainty (assuming the sample is randomly selected from the wider population). If I am conducting an exploratory study where little is known about the wider population and I can't randomly sample from it, what use is this t-test? Surely I should just give the two different scores for males and females. As an exploratory study, this information is still useful, as it provides something to start with for further research. A reading I found incredibly useful was: Berk, R. A., & Freedman, D. (2003). Statistical assumptions as empirical commitments. In T. G. Blomberg & S. Cohen (Eds.), Law, punishment, and social control: Essays in honor of Sheldon Messinger (2nd ed., pp. 235-254). New York: Aldine. (which is available through Google Books to read) One relevant quote is: "the moment that conventional statistical inferences are made from convenience samples, substantive assumptions are made about how the social world operates. Conventional statistical inferences (e.g., formulas for the standard error of a mean, t-tests) depend on the assumption of random sampling. This is not a matter of debate or opinion; it is a matter of mathematical necessity. When applied to convenience samples, the random sampling assumption is not a mere technicality or a minor revision on the periphery; the assumption becomes an integral part of the theory" ... rest of article well worth the read! in response to Ben Anderson (7/9): Re confidence intervals. I've definitely read many times cautions about applying confidence intervals to purposive/non-random samples. These margins of error are based on specific statistical assumptions that don't appear to hold true for non-systematic samples. So, yes, I agree we need to move away from just looking at the p value to reporting confidence intervals and effect sizes - but reporting these for convenience samples does not appear to be sound. An article I read that seeks to "compare the characteristics of a self-selected, convenience sample of men who have sex with men (MSM) recruited through the internet with MSM drawn from a national probability survey in Great Britain" provides an example of this thinking. They calculate confidence intervals for the probability sample but do not present CIs for the convenience sample, stating "CIs for the internet percentages were narrow, and are not presented here because they add little to the interpretation of the data from this non-probability sample." I intend to do a similar analysis to compare my sample with a probability sample of Australian 'party drug' users. Evans, A. R., Wiggins, R. D., Mercer, C. H., Bolding, G. J., & Elford, J. (2007). Men who have sex with men in great britain: Comparison of a self-selected internet sample with a national probability sample. Sexually Transmitted Infections, 83(3), 200-205. In response to jeremy hunsinger: "Then you have to say... do i want to make inferences about my sample as population, my sample as representative of a larger population, or my sample as representative of the world. each of those three questions will take you toward slightly different answers." This is helpful as a way of thinking about the issue for me. I have no desire to make inferences about the world or the larger population from my sample, which was always about recruiting a specific sub-population of drug users that had not been studied before. "However, if your sample was large enough. you could treat it as a population and subsample it to make inferences amongst its differences." The sample is N=837 so it is "large enough" but I'm still stumped about what meaning significance tests would have in this case. Rest assured, I have conducted many statistical tests on the data to explore it, and most of them are 'significant' associations or differences, but I'm stuck on how to interpret them. ...continued part 2 (email was too long!)
continued from previous message... In response to Fred Stutzman: I won't re-quote half of your response although it was all very useful to me. I would be most grateful for any text/reference you can provide that discusses the use of inferential statistics on a purposive sample - "to draw inference about relations in data". Again, maybe I am missing a piece of the puzzle, but why use inferential statistics to draw inferences about the relations between variables when we can just measure them - if we are assuming the sample is the population. Eg. I ask of my data, are monthly+ ecstasy users younger than occasional ecstasy users? and I find regular users are, for example, mean age 20 compared to mean age 25 for occasional users. Because I don't know the bias in my sample, I don't know if this will be wildly different with another sample of this population, so I could just report the two means and note the difference. Although it makes my research look more sophisticated with the p value (and even confidence intervals around those means), I'm not convinced that it makes sense to include them! In response to Peter Timusk: Sorry if that came across as absolute in my original post - of course there are examples of probability samples of drug users. These are generally the exception rather than the norm, though. And I would not seek to generalise to a wider population of drug users from my sample - I'm really just interested in understanding them as a population, knowing they would differ in substantive ways from more general populations of people who use drugs. As for the size of the sample, again, I'm concerned that the size of the sample is less of an issue than the way it is sampled. Having a larger sample with unknown bias is still a sample with unknown bias! Thanks again (and sorry for the rather long post) Monica
participants (3)
-
Ben Anderson -
Fred Stutzman -
Monica Barratt