gini coefficients for measuring participation?
One possibility that would not involve writing a program, would be to calculate the distribution of messages in the same way that economists calculate the distribution of wealth in a country. They create an index of population (0% of the people to 100% of the people) and an index of the wealth (0% of the wealth to 100% of the wealth). In a perfectly egalitarian country 10% of the people have 10% of the wealth, 50% of the people have 50% of the wealth, etc, and you can graph the slope 1=1. In an inegalitarian country 10% of the people have 40% of the wealth, 50% of the people have 80% of the wealth and when you graph all of the points you get a curve of inequality. There is a basic calculus method for calculating the area of the graph between the perfectly egalitarian distribution (1=1 slope) and the ineglitarian distribution (wierd slope), and this is called the gini coefficient. Scandinavian countries ahve a small gini coefficient and Brazil has a big gini coefficient. So you might take a sample day or week, count the messages, and then label each message by the author's name. Say you have 1000 messages but only 100 authors. Graph this, 10% of the authors write XX% of the messages, 20% of the authors write XXX% of the messages, and so on up to 100% of the authors writing 100% of the messages. The area of the graph between the egalitarian distribution of messages and your sample gives you a measure of content inequality - how much of the content is generated by an elite group. If the metric is low your group is pretty egalitarian, if the metric is high you have a clique generating most of the content. The metric would be most meaningful if you could compare it to another group's metric, so if you did this over several sample periods, you could say whether the distribution of message posts was getting more or less egalitarian / elitist as the list was evolving. would make purdy graphs, and could be used for almost any sample of content & authors where you can attribute 100% of the content to 100% of the authors. p. Philip N. Howard Assistant Professor Department of Communication University of Washington http://faculty.washington.edu/pnhoward/
Just a brief question here concerning the following item:
If the metric is low your group is pretty egalitarian, if the metric is high you have a clique generating most of the content. The metric would be most meaningful if you could compare it to another group's metric, so if you did this over several sample periods, you could say whether the distribution of message posts was getting more or less egalitarian / elitist as the list was evolving.
Is the definition of egalitarianism and elitism being used here purely a quantitative concept? Or, would one argue that the actual amount of messages generated by a single individual determines the fact that the person has become part of a dominant elite in a newsgroup? I am not sure about the conclusions one could draw from numbers alone, it seems very open ended. One may be extremely lonely and thus seeking attention in posting a large number of messages. One may be a determined spammer, and be reviled by the newsgroup, a case where a large number of messages actually influences the group's perception of one's 'low status' within the group. In very limited experience of my own, those actually seen as the dominant elite members of a list often lurk, rarely responding or posting, and often ignoring the many 'vocal' attempts by some members to drag them into debates. When these elite members do post, the messages often show careless indifference in construction, multiple spelling errors, strange accidental capitalizations midway through a word (almost as if to say, "I'm not used to typing, my secretary normally does that."). What do you think? Cheers, Max. Dr. Maximilian C. Forte Assistant Professor Dept. of Anthropology and Sociology University College of Cape Breton 1250 Grand Lake Road P.O. Box 5300 Sydney, NS B1P-6L2, Canada E-mail: max_forte@uccb.ca Faculty Web page: http://faculty.uccb.ns.ca/mforte/ Office B.273 Telephone: 902-563-1947
The Gini coefficient is a nice way to summarize the overall participation in the online community. However, I would caution you to draw conclusions about egalitarianism/elitism based on just the number itself. In my work, I have seen extremely high Gini numbers (very few people contributing most of the messages to the list) - and it may not be neccessarily "bad." Two things to consider: - what is your underlying theory about this group and how they operate. one may think that equal participation is a "good" thing. but the group may have been structured to provide "expert" help and thus you would see high GINI numbers. so you need to really think about what the group is for - interview some key and peripheral members and then build a theory about GINI. - there are also practicalities to consider. a low GINI number in a list serv may mean that there is just a ton of noise. imagine being on a list where every one wanted to chime in and give their $0.02. it would get pretty noisy, pretty fast. and maybe not very effective. since all of us are on this list - it may be fun to figure out our Gini number!!! list admins? K -- =============================================== Karim R. Lakhani MIT Sloan School of Management & The Boston Consulting Group, Strategy Practice Initiative e-mail: karim.lakhani@sloan.mit.edu | lakhani.karim@bcg.com voice: 617-851-1224 fax: 617-344-0403 http://spoudaiospaizen.net/ http://opensource.mit.edu | http://freesoftware.mit.edu http://userinnovation.mit.edu
Dear all, "Karim R. Lakhani" wrote: (...)
- there are also practicalities to consider. a low GINI number in a list serv may mean that there is just a ton of noise. imagine being on a list where every one wanted to chime in and give their $0.02. it would get pretty noisy, pretty fast. and maybe not very effective.
As I read this post, I remembered having read that a number of people have been working on power law theories describing generally the distribution of links in a network, i.e. a rather uneven dsitribution, with many nodes having small numbers of edges, and a very few having very large numbers of edges connected to them. Thinking of Usenet discussions as networks (nodes=actors, edges=messages), one would expect the distribution of messages/authors to follow such a power law. (Interestingly, physicists relate the occurence of power law-like distributions to the phenomenon of self-organisation, confirming the consideration by Karim.) Now, I was wondering whether anybody has thought about analyzing the distributions of messages in newsgroups etc. in such a way to find out about the type of distribution? Until now, I only know of some Gini/Lorenz-curve analyses. Greetings, Steffen --------------------------- Steffen Albrecht TU Hamburg-Harburg AB 1-11 Schwarzenbergstr. 95 21071 Hamburg Germany Tel. +49 40 42878-3680 Fax: +49 40 42878-2635 eMail: steffen.albrecht@tuhh.de www: http://www.tu-harburg.de/tbg
This is a very interesting way to operationalize the measurement of participation. In my work (asynchronous collaboration by document annotation) I have operationalized participation as total word count (4 or more letters). That is easy for me because I capture the word count in a log file that also contains the participant's identification. I can see the need to develop a normalizing coefficient that would consider the relative prolixity of each participant. Some participants post huge adjectival annotations, others many small and terse messages. Message counts are thus pretty meaningless. The measurement of quality is of course in great need of research. Philip Howard wrote:
One possibility that would not involve writing a program, would be to calculate the distribution of messages in the same way that economists calculate the distribution of wealth in a country. They create an index of population (0% of the people to 100% of the people) and an index of the wealth (0% of the wealth to 100% of the wealth). In a perfectly egalitarian country 10% of the people have 10% of the wealth, 50% of the people have 50% of the wealth, etc, and you can graph the slope 1=1. In an inegalitarian country 10% of the people have 40% of the wealth, 50% of the people have 80% of the wealth and when you graph all of the points you get a curve of inequality. There is a basic calculus method for calculating the area of the graph between the perfectly egalitarian distribution (1=1 slope) and the ineglitarian distribution (wierd slope), and this is called the gini coefficient. Scandinavian countries ahve a small gini coefficient and Brazil has a big gini coefficient. So you might take a sample day or week, count the messages, and then label each message by the author's name. Say you have 1000 messages but only 100 authors. Graph this, 10% of the authors write XX% of the messages, 20% of the authors write XXX% of the messages, and so on up to 100% of the authors writing 100% of the messages. The area of the graph between the egalitarian distribution of messages and your sample gives you a measure of content inequality - how much of the content is generated by an elite group. If the metric is low your group is pretty egalitarian, if the metric is high you have a clique generating most of the content. The metric would be most meaningful if you could compare it to another group's metric, so if you did this over several sample periods, you could say whether the distribution of message posts was getting more or less egalitarian / elitist as the list was evolving.
would make purdy graphs, and could be used for almost any sample of content & authors where you can attribute 100% of the content to 100% of the authors. p. Philip N. Howard Assistant Professor Department of Communication University of Washington http://faculty.washington.edu/pnhoward/
_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l
-- Charlie Hendricksen, PhD Research Collaboration Architect "Information technology structures human relationships." Dissertation link: http://faculty.washington.edu/bkn/public/pubs/diss.html DocReview link: http://purl.oclc.org/DocReview/get
participants (5)
-
Charles Hendricksen -
Karim R. Lakhani -
Maximilian C. Forte -
Philip Howard -
Steffen Albrecht