AoIR communal data-database

newer
CFP - Media Ecology Association...

Nancy Baym

17 Nov 2001 17 Nov '01

7:53 p.m.

Hello, I am developing an 'available research data' section of the member website. The basic goals of this are to provide members with easy access to existing data sources that they can incorporate into their own analyses, to provide members with easy access to tools that will help them conduct research, and to foster collaborative analyses by providing a mechanism for people with shared interests to network around specific materials. I need your help to identify and generate resources to include, and welcome any help in building or maintaining this communal resource. I envision the following sorts of materials [these will be either links or, with permission, archived copies on aoir's server]: (1) data sets that are already available online (e.g. HomeNet data, Pew Internet data...). If you have any suggestions of such data sets that we should include, please email me a brief description and a url for the materials. (2) data collecting tools that are available online such as the recently mentioned NetMiner. If you know of any such tools for analysing the internet, please email me a brief description and a url. (3) online archives such as webarchive.org, and the recent election and 9-11 archives built by aoir members. If you know of good archives, email me the urls and a blurb. (4) data sets contributed by our members either because you are done with them and want to share, or because you are looking for collaborators to help you analyze the materials (it seems a characteristic of a lot of net research that we end up with a lot more data than we can analyze). If you might be interested in sharing your data, please email me a description of what you have (don't send me data sets yet, I'm not that far along). Depending on how free you want your materials to be, you could choose between giving aoir a copy of the data to archive (with info about it) or you could give us a description of the data, the sort of work you're looking for others to do with it, and contact information for those who want to follow up with you (i.e. you hang on to the data files). Our intention is that access to such private resources contributed by aoir members would be limited to aoir members. (5) what would be useful to include that I am leaving out? If you have any other suggestions regarding the shape of this venture or the specifics of what it will include or how it will be done, please email them to me or, better yet, raise them for discussion on air-l. This is in planning stages, and is open to good ideas about how to make it better. This will only be worthwhile if it's something we use, and we'll only use it if it's got things worth using there, so please let me know what you're aware of or what you would be willing to contribute. There will be plenty of work to go around in building and maintaining this resource, so I welcome anyone's participation. Thank you in advance for your input, Nancy _________________________________________________________ Nancy Baym nbaym@ku.edu http://www.ku.edu/home/nbaym Communication Studies, University of Kansas 102 Bailey, 1440 Jayhawk Blvd., Lawrence, KS 66045, USA VP, Association of Internet Researchers: http://aoir.org

Show replies by date

Charlie Hendricksen

17 Nov 17 Nov

8:14 p.m.

New subject: [Air-l] AoIR communal data-database

Friends, In point 4 of Nancy's proposal for a data repository there is this statement: "Our intention is that access to such private resources contributed by aoir members would be limited to aoir members." I see no reasonable justification for restricting access and would not participate in the venture if such restrictions are adopted. The question of metadata raises a difficult barrier to building the proposed repository. Data is pretty much useless without metadata. The amount of work required to obtain useful metadata is likely to exceed what a volunteer effort can support. Nancy Baym wrote:

...

Hello,

I am developing an 'available research data' section of the member website. The basic goals of this are to provide members with easy access to existing data sources that they can incorporate into their own analyses, to provide members with easy access to tools that will help them conduct research, and to foster collaborative analyses by providing a mechanism for people with shared interests to network around specific materials. I need your help to identify and generate resources to include, and welcome any help in building or maintaining this communal resource. I envision the following sorts of materials [these will be either links or, with permission, archived copies on aoir's server]:

(1) data sets that are already available online (e.g. HomeNet data, Pew Internet data...). If you have any suggestions of such data sets that we should include, please email me a brief description and a url for the materials.

(2) data collecting tools that are available online such as the recently mentioned NetMiner. If you know of any such tools for analysing the internet, please email me a brief description and a url.

(3) online archives such as webarchive.org, and the recent election and 9-11 archives built by aoir members. If you know of good archives, email me the urls and a blurb.

(4) data sets contributed by our members either because you are done with them and want to share, or because you are looking for collaborators to help you analyze the materials (it seems a characteristic of a lot of net research that we end up with a lot more data than we can analyze). If you might be interested in sharing your data, please email me a description of what you have (don't send me data sets yet, I'm not that far along). Depending on how free you want your materials to be, you could choose between giving aoir a copy of the data to archive (with info about it) or you could give us a description of the data, the sort of work you're looking for others to do with it, and contact information for those who want to follow up with you (i.e. you hang on to the data files). Our intention is that access to such private resources contributed by aoir members would be limited to aoir members.

(5) what would be useful to include that I am leaving out?

If you have any other suggestions regarding the shape of this venture or the specifics of what it will include or how it will be done, please email them to me or, better yet, raise them for discussion on air-l. This is in planning stages, and is open to good ideas about how to make it better. This will only be worthwhile if it's something we use, and we'll only use it if it's got things worth using there, so please let me know what you're aware of or what you would be willing to contribute. There will be plenty of work to go around in building and maintaining this resource, so I welcome anyone's participation.

Thank you in advance for your input,

Nancy

_________________________________________________________ Nancy Baym nbaym@ku.edu http://www.ku.edu/home/nbaym Communication Studies, University of Kansas 102 Bailey, 1440 Jayhawk Blvd., Lawrence, KS 66045, USA VP, Association of Internet Researchers: http://aoir.org

_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l

-- Charlie Hendricksen veritas@u.washington.edu "Information technology structures human relationships." "Models relate concepts."

jeremy hunsinger

8:49 p.m.

New subject: [Air-l] AoIR communal data-database

I'm not sure what level of metadata you are talking about here... collecting a description of the study, the authors, the coding, etc. should all be public in the codebook for the study... if it is not then the study probably wouldn't be useful to others in any case. perhaps I am on the wrong track here?

...

The question of metadata raises a difficult barrier to building the proposed repository. Data is pretty much useless without metadata. The amount of work required to obtain useful metadata is likely to exceed what a volunteer effort can suppor

...

jeremy hunsinger on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy

Charlie Hendricksen

18 Nov 18 Nov

8:05 p.m.

New subject: [Air-l] AoIR communal data-database

Yes, the "codebook" for the study should have all the metadata necessary. But are the codebooks searchable? If the repository is of any size at all, then it needs to be searchable. Would you like to read all the codebooks in order to see if there was any data you could use? If the codebooks are disassembled and placed in a database that allows searching then the repository is very useful. My guess is that codebooks are idiosyncratic and of wildly varying quality. This means that the metadata would be incomplete in many cases. This raises the issue of what the metadata should include. jeremy hunsinger wrote:

...

I'm not sure what level of metadata you are talking about here... collecting a description of the study, the authors, the coding, etc. should all be public in the codebook for the study... if it is not then the study probably wouldn't be useful to others in any case. perhaps I am on the wrong track here?

...
The question of metadata raises a difficult barrier to building the proposed repository. Data is pretty much useless without metadata. The amount of work required to obtain useful metadata is likely to exceed what a volunteer effort can suppor

...
jeremy hunsinger on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy

_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l

-- Charlie Hendricksen veritas@u.washington.edu "Information technology structures human relationships." "Models relate concepts."

jeremy hunsinger

8:18 p.m.

New subject: [Air-l] AoIR communal data-database

yes, but I don't think one needs to have metadata at the level of variables. People might want that data, then they should see if the study might have that information, download the study and look for it themselves. I think one needs to have it at the level of the study. I'm assuming that this will all be in a database eventually, so categories such as the openarchives.org metadata set would be best, it is a standard, it describes unique objects like a study, etc. the lack of exact and complete metadata has not hindered the development of such projects in the past, i guess in the end it is always a balance between the practical and the ideal situations. On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:

...

Yes, the "codebook" for the study should have all the metadata necessary. But are the codebooks searchable? If the repository is of any size at all, then it needs to be searchable. Would you like to read all the codebooks in order to see if there was any data you could use? If the codebooks are disassembled and placed in a database that allows searching then the repository is very useful. My guess is that codebooks are idiosyncratic and of wildly varying quality. This means that the metadata would be incomplete in many cases.

This raises the issue of what the metadata should include. jeremy hunsinger on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy

Charlie Hendricksen

8:35 p.m.

New subject: [Air-l] AoIR communal data-database

OK, there is a proposal: the metadata should be based on the openarchives.org database. Now, in total ignorance of that metadata set, let me say that that metadata set might be so extensive that the data providers would be discouraged from submitting their metadata. There is an exquisite balance between the work required to input metadata and the rewards for participating in such a project. In my experience that balance goes to simplicity of an order that makes the metadata marginally useful. I would argue for a custom metadata set -- that argument based on experience with the hopelessly complex metadata standards of the Federal Geographic Data Committee. jeremy hunsinger wrote:

...

yes, but I don't think one needs to have metadata at the level of variables. People might want that data, then they should see if the study might have that information, download the study and look for it themselves. I think one needs to have it at the level of the study. I'm assuming that this will all be in a database eventually, so categories such as the openarchives.org metadata set would be best, it is a standard, it describes unique objects like a study, etc.

the lack of exact and complete metadata has not hindered the development of such projects in the past, i guess in the end it is always a balance between the practical and the ideal situations.

On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:

...
Yes, the "codebook" for the study should have all the metadata necessary. But are the codebooks searchable? If the repository is of any size at all, then it needs to be searchable. Would you like to read all the codebooks in order to see if there was any data you could use? If the codebooks are disassembled and placed in a database that allows searching then the repository is very useful. My guess is that codebooks are idiosyncratic and of wildly varying quality. This means that the metadata would be incomplete in many cases.

This raises the issue of what the metadata should include. jeremy hunsinger on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy

_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l

-- Charlie Hendricksen veritas@u.washington.edu "Information technology structures human relationships." "Models relate concepts."

Suzie Allard

9:41 p.m.

New subject: [Air-l] AoIR communal data-database

I thought a quick note on openarchives.org might help in the discussion. The openarchives.org metadata set is designed to promote low barrier interoperability therefore it's design, while elegant, is simple. It is also designed to be flexible and allow for communities to add elements to create metadata formats that are unique to their discipline (or in our case multidisciplines!) I believe that we could use the openarchives set to document that a dataset in the collection may already be classified with an existing metadata format such as one specified by the Federal Geographic Data Committee, so it could then be searched on those elements in a secondary action. The suggestion of using the openarchives approach sounds promising because the openarchives.org concept is useful in promoting the use of and integration of materials that may be resident in depositories in a distributed environment by using a data-provider/service provider model and defining the mechanism for metadata harvesting. It has its basis in Dublin Core, which is pretty simple to use and therefore promotes self-archiving by authors. Despite its name, openarchives.org does not promote any policy about wide accessibility of content; that decision is left to the data-providers, so AoIR could make the choice of unlimited/limited access. This AoIR project sounds very interesting and useful too! Suzie

...

From: Charlie Hendricksen <veritas@u.washington.edu> Organization: Department of Geography, University of Washington Reply-To: air-l@aoir.org Date: Sun, 18 Nov 2001 13:35:03 -0700 To: air-l@aoir.org Subject: Re: [Air-l] AoIR communal data-database

OK, there is a proposal: the metadata should be based on the openarchives.org database. Now, in total ignorance of that metadata set, let me say that that metadata set might be so extensive that the data providers would be discouraged from submitting their metadata. There is an exquisite balance between the work required to input metadata and the rewards for participating in such a project. In my experience that balance goes to simplicity of an order that makes the metadata marginally useful. I would argue for a custom metadata set -- that argument based on experience with the hopelessly complex metadata standards of the Federal Geographic Data Committee.

jeremy hunsinger wrote:

...
yes, but I don't think one needs to have metadata at the level of variables. People might want that data, then they should see if the study might have that information, download the study and look for it themselves. I think one needs to have it at the level of the study. I'm assuming that this will all be in a database eventually, so categories such as the openarchives.org metadata set would be best, it is a standard, it describes unique objects like a study, etc.

the lack of exact and complete metadata has not hindered the development of such projects in the past, i guess in the end it is always a balance between the practical and the ideal situations.

On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:

...
Yes, the "codebook" for the study should have all the metadata necessary. But are the codebooks searchable? If the repository is of any size at all, then it needs to be searchable. Would you like to read all the codebooks in order to see if there was any data you could use? If the codebooks are disassembled and placed in a database that allows searching then the repository is very useful. My guess is that codebooks are idiosyncratic and of wildly varying quality. This means that the metadata would be incomplete in many cases.

This raises the issue of what the metadata should include. jeremy hunsinger on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy

_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l

-- Charlie Hendricksen veritas@u.washington.edu

"Information technology structures human relationships." "Models relate concepts."

_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l

************************************************************ Suzie Allard, KY Opportunity Fellow University of Kentucky (859)257-3771 College of Communications and Information Studies School of Library and Information Science 520 King Library South, Lexington, KY 40506-0039 e-mail: slalla0@uky.edu homepage: http://sac.uky.edu/~slalla0/ ***********************************************************

Nancy Baym

17 Nov 17 Nov

11:22 p.m.

New subject: [Air-l] AoIR communal data-database

...

Friends,

In point 4 of Nancy's proposal for a data repository there is this statement: "Our intention is that access to such private resources contributed by aoir members would be limited to aoir members." I see no reasonable justification for restricting access and would not participate in the venture if such restrictions are adopted.

My assumption was that people would prefer to limit the access to their data, otherwise it would fall under that first category of data already available on the web. Personally, if I were going to make data I'd collected available, I'd like to know that there was a limited set of people who would have access to that, and that I could get that list on the member website. However, the level of access is certainly open for discussion and I'd be inclined to defer to the will of the people who were willing to share their data through a resource like this. If they want it available to all, then that's fine. The issue of how much of what aoir does under its auspices should be available to all and how much should be available only to members is a tricky one and there are arguments on both sides. It's a matter of ongoing discussion with every idea we come up with. Speaking only for myself, my train of logic goes like this --> do we distinguish between members and nonmembers? if we don't what does membership mean? if membership doesn't mean anything then why join? if no one joins there's no budget, eventually no conferences, eventually no association. While I believe that aoir should not be an exclusive little clique, I do think it's important to provide benefits for members that are better than the benefits of not being a member. It's not like membership is hard to come by. Regarding metadata, I concur with Jeremy. If we're talking about data that are incomprehensible without being in on the research program or that needs a lot of sophisticated metastuff that's more than a codebook and explanation could provide, then it's probably not appropriate for this. On the other hand, there is a lot of data available already on the web that's being used just like this (e.g. Pew's data). Regarding whether this is too big to be sustained by volunteers, maybe a volunteer effort can't sustain this. If this is not something people would find adequately valuable to participate in, then it won't work. On the other hand, all of AoIR thus far would seem to be a lot more than a volunteer effort could sustain, and it seems to be working pretty well because people have cared enough to volunteer their energies. Nancy _________________________________________________________ Nancy Baym nbaym@ku.edu http://www.ku.edu/home/nbaym Communication Studies, University of Kansas 102 Bailey, 1440 Jayhawk Blvd., Lawrence, KS 66045, USA VP, Association of Internet Researchers: http://aoir.org

Charlie Hendricksen

18 Nov 18 Nov

8:24 p.m.

New subject: [Air-l] AoIR communal data-database

I have replied to Nancy's response below inline. Nancy Baym wrote:

...

...
Friends,

In point 4 of Nancy's proposal for a data repository there is this statement: "Our intention is that access to such private resources contributed by aoir members would be limited to aoir members." I see no reasonable justification for restricting access and would not participate in the venture if such restrictions are adopted.

My assumption was that people would prefer to limit the access to their data, otherwise it would fall under that first category of data already available on the web. Personally, if I were going to make data I'd collected available, I'd like to know that there was a limited set of people who would have access to that, and that I could get that list on the member website. However, the level of access is certainly open for discussion and I'd be inclined to defer to the will of the people who were willing to share their data through a resource like this. If they want it available to all, then that's fine.

Yes! The use of data for other uses such as meta analysis should usually be contingent on the agreement of the original author. I think that the data repository would be most useful as a catalog of available datasets. If the metadata pointed to a dataset that might be of use, then the new user could contact the holder of the data and arrange for use. It might be useful (but perhaps embarrassing) to allow annotation of the metadata.

...

The issue of how much of what aoir does under its auspices should be available to all and how much should be available only to members is a tricky one and there are arguments on both sides. It's a matter of ongoing discussion with every idea we come up with. Speaking only for myself, my train of logic goes like this --> do we distinguish between members and nonmembers? if we don't what does membership mean? if membership doesn't mean anything then why join? if no one joins there's no budget, eventually no conferences, eventually no association. While I believe that aoir should not be an exclusive little clique, I do think it's important to provide benefits for members that are better than the benefits of not being a member. It's not like membership is hard to come by.

If the repository is principally a repository of metadata for available data, I see no reason why AoIR would lose anything by making it public. You might even attract new members.

...

Regarding metadata, I concur with Jeremy. If we're talking about data that are incomprehensible without being in on the research program or that needs a lot of sophisticated metastuff that's more than a codebook and explanation could provide, then it's probably not appropriate for this. On the other hand, there is a lot of data available already on the web that's being used just like this (e.g. Pew's data).

A metadata repository need only be as sophisticated as is needed to eliminate hopefully optimistic requests, and to attract requests for data that is likely to be useful. In other words it needs to eliminate obviously false positives and obviously false negatives. Having been involved in a metadata cataloging process, I appreciate the existence of overly sophisticated "metastuff" (beautiful term!).

...

Regarding whether this is too big to be sustained by volunteers, maybe a volunteer effort can't sustain this. If this is not something people would find adequately valuable to participate in, then it won't work. On the other hand, all of AoIR thus far would seem to be a lot more than a volunteer effort could sustain, and it seems to be working pretty well because people have cared enough to volunteer their energies.

Well, I hope that the authors of data can find the time to submit metadata. The existence of a well designed metadata core, along with tools to submit that metadata, and review it before publication, is important. Similarly, tools that allow exploration of the metadata database are essential to the dissemination of that product. I have some tools and an unpublished paper that may be useful.

...

Nancy

_________________________________________________________ Nancy Baym nbaym@ku.edu http://www.ku.edu/home/nbaym Communication Studies, University of Kansas 102 Bailey, 1440 Jayhawk Blvd., Lawrence, KS 66045, USA VP, Association of Internet Researchers: http://aoir.org

_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l

-- Charlie Hendricksen veritas@u.washington.edu "Information technology structures human relationships." "Models relate concepts."

8992

Age (days ago)

8993

Last active (days ago)

List overview

Download

8 comments

4 participants

participants (4)

Charlie Hendricksen
jeremy hunsinger
Nancy Baym
Suzie Allard