A note about searching. Data does not need to have metadata to be searchable; however, it certainly helps. The web consists of data that has little useful metadata content, and it is very searchable, as the search engines have shown. There are automatic methods for generating metadata which can be particularly applicable to a domain or appropriate for a particular view. See for example researchindex.com where the author, title, and citation metadata are automatically generated. Lee Giles air-l-request@aoir.org wrote:
Today's Topics:
1. Re: AoIR communal data-database (Charlie Hendricksen) 2. Re: AoIR communal data-database (jeremy hunsinger) 3. Re: AoIR communal data-database (Charlie Hendricksen) 4. Re: AoIR communal data-database (Charlie Hendricksen)
Message: 1 Date: Sun, 18 Nov 2001 13:05:23 -0700 From: Charlie Hendricksen <veritas@u.washington.edu> Organization: Department of Geography, University of Washington To: air-l@aoir.org Subject: Re: [Air-l] AoIR communal data-database Reply-To: air-l@aoir.org
Yes, the "codebook" for the study should have all the metadata necessary. But are the codebooks searchable? If the repository is of any size at all, then it needs to be searchable. Would you like to read all the codebooks in order to see if there was any data you could use? If the codebooks are disassembled and placed in a database that allows searching then the repository is very useful. My guess is that codebooks are idiosyncratic and of wildly varying quality. This means that the metadata would be incomplete in many cases.
This raises the issue of what the metadata should include.
jeremy hunsinger wrote:
I'm not sure what level of metadata you are talking about here... collecting a description of the study, the authors, the coding, etc. should all be public in the codebook for the study... if it is not then the study probably wouldn't be useful to others in any case. perhaps I am on the wrong track here?
The question of metadata raises a difficult barrier to building the proposed repository. Data is pretty much useless without metadata. The amount of work required to obtain useful metadata is likely to exceed what a volunteer effort can suppor
jeremy hunsinger on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy
_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l
-- Charlie Hendricksen veritas@u.washington.edu
"Information technology structures human relationships." "Models relate concepts."
--__--__--
Message: 2 Date: Sun, 18 Nov 2001 15:18:41 -0500 Subject: Re: [Air-l] AoIR communal data-database From: jeremy hunsinger <jhuns@vt.edu> To: air-l@aoir.org Reply-To: air-l@aoir.org
yes, but I don't think one needs to have metadata at the level of variables. People might want that data, then they should see if the study might have that information, download the study and look for it themselves. I think one needs to have it at the level of the study. I'm assuming that this will all be in a database eventually, so categories such as the openarchives.org metadata set would be best, it is a standard, it describes unique objects like a study, etc.
the lack of exact and complete metadata has not hindered the development of such projects in the past, i guess in the end it is always a balance between the practical and the ideal situations.
On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:
Yes, the "codebook" for the study should have all the metadata necessary. But are the codebooks searchable? If the repository is of any size at all, then it needs to be searchable. Would you like to read all the codebooks in order to see if there was any data you could use? If the codebooks are disassembled and placed in a database that allows searching then the repository is very useful. My guess is that codebooks are idiosyncratic and of wildly varying quality. This means that the metadata would be incomplete in many cases.
This raises the issue of what the metadata should include. jeremy hunsinger on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy
--__--__--
Message: 3 Date: Sun, 18 Nov 2001 13:24:14 -0700 From: Charlie Hendricksen <veritas@u.washington.edu> Organization: Department of Geography, University of Washington To: air-l@aoir.org Subject: Re: [Air-l] AoIR communal data-database Reply-To: air-l@aoir.org
I have replied to Nancy's response below inline.
Nancy Baym wrote:
Friends,
In point 4 of Nancy's proposal for a data repository there is this statement: "Our intention is that access to such private resources contributed by aoir members would be limited to aoir members." I see no reasonable justification for restricting access and would not participate in the venture if such restrictions are adopted.
My assumption was that people would prefer to limit the access to their data, otherwise it would fall under that first category of data already available on the web. Personally, if I were going to make data I'd collected available, I'd like to know that there was a limited set of people who would have access to that, and that I could get that list on the member website. However, the level of access is certainly open for discussion and I'd be inclined to defer to the will of the people who were willing to share their data through a resource like this. If they want it available to all, then that's fine.
Yes! The use of data for other uses such as meta analysis should usually be contingent on the agreement of the original author. I think that the data repository would be most useful as a catalog of available datasets. If the metadata pointed to a dataset that might be of use, then the new user could contact the holder of the data and arrange for use. It might be useful (but perhaps embarrassing) to allow annotation of the metadata.
The issue of how much of what aoir does under its auspices should be available to all and how much should be available only to members is a tricky one and there are arguments on both sides. It's a matter of ongoing discussion with every idea we come up with. Speaking only for myself, my train of logic goes like this --> do we distinguish between members and nonmembers? if we don't what does membership mean? if membership doesn't mean anything then why join? if no one joins there's no budget, eventually no conferences, eventually no association. While I believe that aoir should not be an exclusive little clique, I do think it's important to provide benefits for members that are better than the benefits of not being a member. It's not like membership is hard to come by.
If the repository is principally a repository of metadata for available data, I see no reason why AoIR would lose anything by making it public. You might even attract new members.
Regarding metadata, I concur with Jeremy. If we're talking about data that are incomprehensible without being in on the research program or that needs a lot of sophisticated metastuff that's more than a codebook and explanation could provide, then it's probably not appropriate for this. On the other hand, there is a lot of data available already on the web that's being used just like this (e.g. Pew's data).
A metadata repository need only be as sophisticated as is needed to eliminate hopefully optimistic requests, and to attract requests for data that is likely to be useful. In other words it needs to eliminate obviously false positives and obviously false negatives. Having been involved in a metadata cataloging process, I appreciate the existence of overly sophisticated "metastuff" (beautiful term!).
Regarding whether this is too big to be sustained by volunteers, maybe a volunteer effort can't sustain this. If this is not something people would find adequately valuable to participate in, then it won't work. On the other hand, all of AoIR thus far would seem to be a lot more than a volunteer effort could sustain, and it seems to be working pretty well because people have cared enough to volunteer their energies.
Well, I hope that the authors of data can find the time to submit metadata. The existence of a well designed metadata core, along with tools to submit that metadata, and review it before publication, is important. Similarly, tools that allow exploration of the metadata database are essential to the dissemination of that product. I have some tools and an unpublished paper that may be useful.
Nancy
_________________________________________________________ Nancy Baym nbaym@ku.edu http://www.ku.edu/home/nbaym Communication Studies, University of Kansas 102 Bailey, 1440 Jayhawk Blvd., Lawrence, KS 66045, USA VP, Association of Internet Researchers: http://aoir.org
_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l
-- Charlie Hendricksen veritas@u.washington.edu
"Information technology structures human relationships." "Models relate concepts."
--__--__--
Message: 4 Date: Sun, 18 Nov 2001 13:35:03 -0700 From: Charlie Hendricksen <veritas@u.washington.edu> Organization: Department of Geography, University of Washington To: air-l@aoir.org Subject: Re: [Air-l] AoIR communal data-database Reply-To: air-l@aoir.org
OK, there is a proposal: the metadata should be based on the openarchives.org database. Now, in total ignorance of that metadata set, let me say that that metadata set might be so extensive that the data providers would be discouraged from submitting their metadata. There is an exquisite balance between the work required to input metadata and the rewards for participating in such a project. In my experience that balance goes to simplicity of an order that makes the metadata marginally useful. I would argue for a custom metadata set -- that argument based on experience with the hopelessly complex metadata standards of the Federal Geographic Data Committee.
jeremy hunsinger wrote:
yes, but I don't think one needs to have metadata at the level of variables. People might want that data, then they should see if the study might have that information, download the study and look for it themselves. I think one needs to have it at the level of the study. I'm assuming that this will all be in a database eventually, so categories such as the openarchives.org metadata set would be best, it is a standard, it describes unique objects like a study, etc.
the lack of exact and complete metadata has not hindered the development of such projects in the past, i guess in the end it is always a balance between the practical and the ideal situations.
On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:
Yes, the "codebook" for the study should have all the metadata necessary. But are the codebooks searchable? If the repository is of any size at all, then it needs to be searchable. Would you like to read all the codebooks in order to see if there was any data you could use? If the codebooks are disassembled and placed in a database that allows searching then the repository is very useful. My guess is that codebooks are idiosyncratic and of wildly varying quality. This means that the metadata would be incomplete in many cases.
This raises the issue of what the metadata should include. jeremy hunsinger on the ibook www.cddc.vt.edu www.cddc.vt.edu/jeremy
_______________________________________________ Air-l mailing list Air-l@aoir.org http://www.aoir.org/mailman/listinfo/air-l
-- Charlie Hendricksen veritas@u.washington.edu
"Information technology structures human relationships." "Models relate concepts."
-
-- C. Lee Giles, David Reese Professor School of Information Sciences and Technology and Computer Science and Engineering The Pennsylvania State University 001 Thomas Bldg, University Park, PA, 16801, USA 814 865 7884; FAX: 6426 http://ist.psu.edu/giles