Question re Size of Data Set...
Hello All: This is my debut post to this excellent list after a long time spent reading lots of good stuff from others. I have a question re the size of my data set for my dissertation project: Is my data set too large, too small, or just right? I'd very much appreciate any insight/ideas/feedback anyone has about this. My advisor and I aren't that sure about this issue, and I haven't been able to discern much about this issue from a lot of the studies I've read. I do realize my question is thus far meaningless without knowing anything about my project, so here's some more information/background: I'm doing a close look at one message board community--one devoted to discussion of a particular college basketball team. I've got all sorts of things I'm interested in, but my central research question has to do with the ways that people teach each other and learn from one another the conventions for discourse in/on the message board. (I'm also interested in potential emerging genres of writing, the influence of sports fandom on online literacy practices, and perhaps even examining issues related to gender (which I realize is a pretty general thing to say, but I'll keep it at that for now).) I've got two main sources of data: both (1) archived threads/posts from the message board, and (2) online questionnaires that participants/members filled out. My question concerns source (1)-- the archival data. I have tons of data archived. I used one of those "site-sucker" programs to grab all the discussions on the message board over about a 8 month period of time. Given that this message board is a pretty busy one and that I'm using a ground-theory approach to the data analysis, I chose to sample a smaller set of the overall data. I used an "event sampling" method and, with input from posters on the message board, chose 5 "big" events around which to sample discussion. I then also chose 5 other events that occurred during the months I archived discussion that were not listed as "big" events by anyone who offered their sense of the "big" events. I didn't, though, choose just those threads of discussion related to those "big" and non-big events, but rather used those as anchoring moments in time, and then sampled ALL discussion that occurred on those dates, and one day prior and one day later. This resulted in such a large data set that I ended up using only 3 "big" events and 3 non- big ones, and then sampling for those dates, and the days immediately around them. What I'm left with now is about 4000 individual .html pages, some of which have fairly detailed threads of discussion, with sizable individual posts, and also, of course, many of which that have cursory, short sentences that perhaps look more like "chat." This is a lot of stuff to wade through, yet it does represent only 18 days of life on this message board. Thus far I've been going through the data in separate "passes," looking for answers to particular aspects of my research question, and it's a daunting thing. I know research takes a lot of work and time, but I thought it wise to get feedback to see if I'm going overboard here. So does my sample sound reasonable? I'm well aware that the way I sample will directly impact the kinds of conclusions can draw and level of rigor folks see in my work. Any thoughts? Good sources re this kind of methodology? I've got Virtual Methods Ed. by Hine, among other sources, and haven't seen anything yet re sample size. Maybe I missed it somehow? many thanks, Matthew Pearson mdpearson@wisc.edu PhD Candidate, University of Wisconsin Department of English-- Composition and Rhetoric; Research Assistant, UC-Irvine Writing Project; & Man on the Street
Hey Matthew, I'm a PhD student too and I'm going through exactly the same problem. For me the question isn't so much about the amount of data but how are you going to use it. How in depth is your analysis going to be into the text and how much discussion do you want to generate out of it?
From those two questions I'd perhaps want to ask if you're doing any discourse analysis in your ethnographic research (and I'm only assuming you're doing an ethnography of the message board because you mentioned Hine). If you are, then the analysis could get quite deep, and all those "chat" type posts will be useful. You might want to check out Ron Scollon and Suzie Wong Scollon's book, Nexus Analysis to help you out there.
From the looks of it, in my small postgrad opinion, you could do a lot with the sample you have. I would just go with what you have.
paul teusner fishers, surfers and casters http://teusner.org/ -----Original Message----- From: air-l-bounces@listserv.aoir.org [mailto:air-l-bounces@listserv.aoir.org] On Behalf Of Matthew Pearson Sent: Thursday, 25 January 2007 10:32 To: air-l@listserv.aoir.org Subject: [Air-l] Question re Size of Data Set... Hello All: This is my debut post to this excellent list after a long time spent reading lots of good stuff from others. I have a question re the size of my data set for my dissertation project: Is my data set too large, too small, or just right? I'd very much appreciate any insight/ideas/feedback anyone has about this. My advisor and I aren't that sure about this issue, and I haven't been able to discern much about this issue from a lot of the studies I've read. I do realize my question is thus far meaningless without knowing anything about my project, so here's some more information/background: I'm doing a close look at one message board community--one devoted to discussion of a particular college basketball team. I've got all sorts of things I'm interested in, but my central research question has to do with the ways that people teach each other and learn from one another the conventions for discourse in/on the message board. (I'm also interested in potential emerging genres of writing, the influence of sports fandom on online literacy practices, and perhaps even examining issues related to gender (which I realize is a pretty general thing to say, but I'll keep it at that for now).) I've got two main sources of data: both (1) archived threads/posts from the message board, and (2) online questionnaires that participants/members filled out. My question concerns source (1)-- the archival data. I have tons of data archived. I used one of those "site-sucker" programs to grab all the discussions on the message board over about a 8 month period of time. Given that this message board is a pretty busy one and that I'm using a ground-theory approach to the data analysis, I chose to sample a smaller set of the overall data. I used an "event sampling" method and, with input from posters on the message board, chose 5 "big" events around which to sample discussion. I then also chose 5 other events that occurred during the months I archived discussion that were not listed as "big" events by anyone who offered their sense of the "big" events. I didn't, though, choose just those threads of discussion related to those "big" and non-big events, but rather used those as anchoring moments in time, and then sampled ALL discussion that occurred on those dates, and one day prior and one day later. This resulted in such a large data set that I ended up using only 3 "big" events and 3 non- big ones, and then sampling for those dates, and the days immediately around them. What I'm left with now is about 4000 individual .html pages, some of which have fairly detailed threads of discussion, with sizable individual posts, and also, of course, many of which that have cursory, short sentences that perhaps look more like "chat." This is a lot of stuff to wade through, yet it does represent only 18 days of life on this message board. Thus far I've been going through the data in separate "passes," looking for answers to particular aspects of my research question, and it's a daunting thing. I know research takes a lot of work and time, but I thought it wise to get feedback to see if I'm going overboard here. So does my sample sound reasonable? I'm well aware that the way I sample will directly impact the kinds of conclusions can draw and level of rigor folks see in my work. Any thoughts? Good sources re this kind of methodology? I've got Virtual Methods Ed. by Hine, among other sources, and haven't seen anything yet re sample size. Maybe I missed it somehow? many thanks, Matthew Pearson mdpearson@wisc.edu PhD Candidate, University of Wisconsin Department of English-- Composition and Rhetoric; Research Assistant, UC-Irvine Writing Project; & Man on the Street _______________________________________________ The air-l@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
Hi Matthew Doesn't this seem like a 'how long is a piece of string' question! However, to a great extent, the answer lies in your methodology. You say you're using a "ground-theory approach", I presume you mean grounded theory (darn that keyboard)... In which case the key issue here is that you are looking for saturation. It is impossible to predict when this will occur in advance and so the correct answer to your question is 'don't know' - you need to begin your analysis. I think anyone who's done GTM kind of gets a gut feel for whether what they have is enough once they start looking at and becoming immersed in the data. You've identified a sub-set of the message board postings (that's a good start as its at least more manageable) and your analysis will emerge some categories, processes and attributes. If you find these becoming saturated with the data you have, you're (probably) doing ok. If you get to the end and you're still discovering new stuff, then I'd suggest you might want to go back and extend the data set. As for good sources, your issue here is methodology, less so the internet (but it is interesting to see how GTM is used in similar studies) - I would suggest you get clear about GTM, which version you are using and why and this will help you no end. One of the best discipline areas for writings on grounded theory is actually nursing research (I kid you not!). Good luck Andy -----Original Message----- From: air-l-bounces@listserv.aoir.org [mailto:air-l-bounces@listserv.aoir.org] On Behalf Of Matthew Pearson Sent: Thursday, January 25 2007 12:32 To: air-l@listserv.aoir.org Subject: [Air-l] Question re Size of Data Set... Hello All: This is my debut post to this excellent list after a long time spent reading lots of good stuff from others. I have a question re the size of my data set for my dissertation project: Is my data set too large, too small, or just right? I'd very much appreciate any insight/ideas/feedback anyone has about this. My advisor and I aren't that sure about this issue, and I haven't been able to discern much about this issue from a lot of the studies I've read. I do realize my question is thus far meaningless without knowing anything about my project, so here's some more information/background: I'm doing a close look at one message board community--one devoted to discussion of a particular college basketball team. I've got all sorts of things I'm interested in, but my central research question has to do with the ways that people teach each other and learn from one another the conventions for discourse in/on the message board. (I'm also interested in potential emerging genres of writing, the influence of sports fandom on online literacy practices, and perhaps even examining issues related to gender (which I realize is a pretty general thing to say, but I'll keep it at that for now).) I've got two main sources of data: both (1) archived threads/posts from the message board, and (2) online questionnaires that participants/members filled out. My question concerns source (1)-- the archival data. I have tons of data archived. I used one of those "site-sucker" programs to grab all the discussions on the message board over about a 8 month period of time. Given that this message board is a pretty busy one and that I'm using a ground-theory approach to the data analysis, I chose to sample a smaller set of the overall data. I used an "event sampling" method and, with input from posters on the message board, chose 5 "big" events around which to sample discussion. I then also chose 5 other events that occurred during the months I archived discussion that were not listed as "big" events by anyone who offered their sense of the "big" events. I didn't, though, choose just those threads of discussion related to those "big" and non-big events, but rather used those as anchoring moments in time, and then sampled ALL discussion that occurred on those dates, and one day prior and one day later. This resulted in such a large data set that I ended up using only 3 "big" events and 3 non- big ones, and then sampling for those dates, and the days immediately around them. What I'm left with now is about 4000 individual .html pages, some of which have fairly detailed threads of discussion, with sizable individual posts, and also, of course, many of which that have cursory, short sentences that perhaps look more like "chat." This is a lot of stuff to wade through, yet it does represent only 18 days of life on this message board. Thus far I've been going through the data in separate "passes," looking for answers to particular aspects of my research question, and it's a daunting thing. I know research takes a lot of work and time, but I thought it wise to get feedback to see if I'm going overboard here. So does my sample sound reasonable? I'm well aware that the way I sample will directly impact the kinds of conclusions can draw and level of rigor folks see in my work. Any thoughts? Good sources re this kind of methodology? I've got Virtual Methods Ed. by Hine, among other sources, and haven't seen anything yet re sample size. Maybe I missed it somehow? many thanks, Matthew Pearson mdpearson@wisc.edu PhD Candidate, University of Wisconsin Department of English-- Composition and Rhetoric; Research Assistant, UC-Irvine Writing Project; & Man on the Street _______________________________________________ The air-l@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
Not clear about why this would be surprising. Did you think that nurses don't do research, or that they would use another method? M-H On 26/01/2007, at 9:13 AM, Andy Williamson wrote:
As for good sources, your issue here is methodology, less so the internet (but it is interesting to see how GTM is used in similar studies) - I would suggest you get clear about GTM, which version you are using and why and this will help you no end. One of the best discipline areas for writings on grounded theory is actually nursing research (I kid you not!).
Hi Mary Wasn't surprising to me as my wife is a nurse-reseacher (and so are many of our friends!). I was being slighly tounge-in-cheek... But what does surpise me (pleasently I should add) is that one particular discipline can gain such strong expertise in a particular area. Nursing is perhaps one of the strongest qualitative research disciplines, yet it is one that researchers in my discipline (IT) would willingly overlook as irrelevant. I've had the challenging experience of trying to convice new reseachers that the best book for them to read is called 'Nursing research' as it is relevant, very well written and accessible! It's just a timely reminder to keep looking beyond our own disciplines :-) Andy -----Original Message----- From: Mary-Helen Ward [mailto:mhward@usyd.edu.au] Sent: Friday, January 26 2007 11:29 To: air-l@listserv.aoir.org; andy@wairua.co.nz Cc: mdpearson@wisc.edu Subject: Re: [Air-l] Question re Size of Data Set... Not clear about why this would be surprising. Did you think that nurses don't do research, or that they would use another method? M-H On 26/01/2007, at 9:13 AM, Andy Williamson wrote:
As for good sources, your issue here is methodology, less so the internet (but it is interesting to see how GTM is used in similar studies) - I would suggest you get clear about GTM, which version you are using and why and this will help you no end. One of the best discipline areas for writings on grounded theory is actually nursing research (I kid you not!).
Hi Matthew You don't mention what method you have used to analyse your data. Do you have some kind of database set up to enable you to make links, organise, and other wise see the data in new ways, rather than just reading it? A good software program can really help you see patterns in big wodges of text. NVIVO is a well-known one but I'm sure people here can point you to others. M-H On 25/01/2007, at 10:31 AM, Matthew Pearson wrote:
Hello All:
This is my debut post to this excellent list after a long time spent reading lots of good stuff from others.
I have a question re the size of my data set for my dissertation project: Is my data set too large, too small, or just right?
I'd very much appreciate any insight/ideas/feedback anyone has about this. My advisor and I aren't that sure about this issue, and I haven't been able to discern much about this issue from a lot of the studies I've read.
<snip>
Any thoughts? Good sources re this kind of methodology? I've got Virtual Methods Ed. by Hine, among other sources, and haven't seen anything yet re sample size. Maybe I missed it somehow?
many thanks,
Matthew Pearson mdpearson@wisc.edu PhD Candidate, University of Wisconsin Department of English-- Composition and Rhetoric; Research Assistant, UC-Irvine Writing Project; & Man on the Street
Thanks to all who responded to my query on- and off-list. I appreciate your time and insights. Not that I'm not up for hearing more about this, but I think it's safe to say that I've got a lot of good suggestions, directions, and next-steps to get to work on... thanks again, Matthew Pearson mdpearson@wisc.edu PhD Candidate, University of Wisconsin Department of English--Composition and Rhetoric; UC-Irvine Writing Project Research Assistant; & Man on the Street
participants (4)
-
Andy Williamson -
Mary-Helen Ward -
Matthew Pearson -
Paul Teusner