[Air-l] Re: Air-l digest, Vol 1 #217 - 19 msgs

19 Nov 2001


      A note about searching. Data does not need to have metadata
to be searchable; however, it certainly helps.
The web consists of data that has little useful metadata
content, and it is very searchable, as the search engines have shown.
There are automatic methods for generating metadata which can be
particularly applicable to a domain or appropriate for a particular
view. See for example researchindex.com where the author, title, and
citation metadata are automatically generated.

Lee Giles


air-l-request@aoir.org wrote:
...
Today's Topics:
1. Re: AoIR communal data-database (Charlie Hendricksen)
   2. Re: AoIR communal data-database (jeremy hunsinger)
   3. Re: AoIR communal data-database (Charlie Hendricksen)
   4. Re: AoIR communal data-database (Charlie Hendricksen)
Message: 1
Date: Sun, 18 Nov 2001 13:05:23 -0700
From: Charlie Hendricksen <veritas@u.washington.edu>
Organization: Department of Geography, University of Washington
To: air-l@aoir.org
Subject: Re: [Air-l] AoIR communal data-database
Reply-To: air-l@aoir.org
Yes, the "codebook" for the study should have all the metadata
necessary.  But are the codebooks searchable?  If the repository is of
any size at all, then it needs to be searchable.  Would you like to
read all the codebooks in order to see if there was any data you could
use?  If the codebooks are disassembled and placed in a database that
allows searching then the repository is very useful.  My guess is that
codebooks are idiosyncratic and of wildly varying quality.  This means
that the metadata would be incomplete in many cases.
This raises the issue of what the metadata should include.
jeremy hunsinger wrote:
...
I'm not sure what level of metadata you are talking about here...
collecting a description of the study, the authors, the coding, etc.
should all be public in the codebook for the study... if it is not then
the study probably wouldn't be useful to others in any case.  perhaps I
am on the wrong track here?
...
The question of metadata raises a difficult barrier to building
the proposed repository.  Data is pretty much useless without
metadata.  The amount of work required to obtain useful metadata is
likely to exceed what a volunteer effort can suppor
...
jeremy hunsinger
on the ibook
www.cddc.vt.edu
www.cddc.vt.edu/jeremy
_______________________________________________
Air-l mailing list
Air-l@aoir.org
http://www.aoir.org/mailman/listinfo/air-l
--
            Charlie Hendricksen   veritas@u.washington.edu
"Information technology structures human relationships."
                            "Models relate concepts."
--__--__--
Message: 2
Date: Sun, 18 Nov 2001 15:18:41 -0500
Subject: Re: [Air-l] AoIR communal data-database
From: jeremy hunsinger <jhuns@vt.edu>
To: air-l@aoir.org
Reply-To: air-l@aoir.org
yes, but I don't think one needs to have metadata at the level of
variables.  People might want that data, then they should see if the
study might have that information, download the study and look for it
themselves.  I think one needs to have it at the level of the study.
I'm assuming that this will all be in a database eventually, so
categories such as the openarchives.org metadata set would be best, it
is a standard, it describes unique objects like a study, etc.
the lack of exact and complete metadata has not hindered the development
of such projects in the past, i guess in the end it is always a balance
between the practical and the ideal situations.
On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:
...
Yes, the "codebook" for the study should have all the metadata
necessary.  But are the codebooks searchable?  If the repository is of
any size at all, then it needs to be searchable.  Would you like to
read all the codebooks in order to see if there was any data you could
use?  If the codebooks are disassembled and placed in a database that
allows searching then the repository is very useful.  My guess is that
codebooks are idiosyncratic and of wildly varying quality.  This means
that the metadata would be incomplete in many cases.
This raises the issue of what the metadata should include.
jeremy hunsinger
on the ibook
www.cddc.vt.edu
www.cddc.vt.edu/jeremy
--__--__--
Message: 3
Date: Sun, 18 Nov 2001 13:24:14 -0700
From: Charlie Hendricksen <veritas@u.washington.edu>
Organization: Department of Geography, University of Washington
To: air-l@aoir.org
Subject: Re: [Air-l] AoIR communal data-database
Reply-To: air-l@aoir.org
I have replied to Nancy's response below inline.
Nancy Baym wrote:
...
...
Friends,
In point 4 of Nancy's proposal for a data repository there is this
statement: "Our intention is that access to such private resources
contributed by aoir members
would be limited to aoir members."  I see no reasonable justification
for restricting access and would not participate in the venture if
such restrictions are adopted.
My assumption was that people would prefer to limit the access to
their data, otherwise it would fall under that first category of data
already available on the web. Personally, if I were going to make
data I'd collected available, I'd like to know that there was a
limited set of people who would have access to that, and that I could
get that list on the member website. However, the level of access is
certainly open for discussion and I'd be inclined to defer to the
will of the people who were willing to share their data through a
resource like this. If they want it available to all, then that's
fine.
Yes! The use of data for other uses such as meta analysis should
usually be contingent on the agreement of the original author.  I
think that the data repository would be most useful as a catalog of
available datasets.  If the metadata pointed to a dataset that might
be of use, then the new user could contact the holder of the data and
arrange for use.  It might be useful (but perhaps embarrassing) to
allow annotation of the metadata.
...
The issue of how much of what aoir does under its auspices should be
available to all and how much should be available only to members is
a tricky one and there are arguments on both sides. It's a matter of
ongoing discussion with every idea we come up with. Speaking only for
myself, my train of logic goes like this --> do we distinguish
between members and nonmembers? if we don't what does membership
mean? if membership doesn't mean anything then why join? if no one
joins there's no budget, eventually no conferences, eventually no
association. While I believe that aoir should not be an exclusive
little clique, I do think it's important to provide benefits for
members that are better than the benefits of not being a member. It's
not like membership is hard to come by.
If the repository is principally a repository of metadata for
available data, I see no reason why AoIR would lose anything by making
it public.  You might even attract new members.
...
Regarding metadata, I concur with Jeremy. If we're talking about data
that are incomprehensible without being in on the research program or
that needs a lot of sophisticated metastuff that's more than a
codebook and explanation could provide, then it's probably not
appropriate for this. On the other hand, there is a lot of data
available already on the web that's being used just like this (e.g.
Pew's data).
A metadata repository need only be as sophisticated as is needed to
eliminate hopefully optimistic requests, and to attract requests for
data that is likely to be useful.  In other words it needs to
eliminate obviously false positives and obviously false negatives.
Having been involved in a metadata cataloging process, I appreciate
the existence of overly sophisticated "metastuff" (beautiful term!).
...
Regarding whether this is too big to be sustained by volunteers,
maybe a volunteer effort can't sustain this. If this is not something
people would find adequately valuable to participate in, then it
won't work. On the other hand, all of AoIR thus far would seem to be
a lot more than a volunteer effort could sustain, and it seems to be
working pretty well because people have cared enough to volunteer
their energies.
Well, I hope that the authors of data can find the time to submit
metadata.  The existence of a well designed metadata core, along with
tools to submit that metadata, and review it before publication, is
important.  Similarly, tools that allow exploration of the metadata
database are essential to the dissemination of that product.  I have
some tools and an unpublished paper that may be useful.
...
Nancy
_________________________________________________________
Nancy Baym
nbaym@ku.edu
http://www.ku.edu/home/nbaym
Communication Studies, University of Kansas
102 Bailey, 1440 Jayhawk Blvd., Lawrence, KS 66045, USA
VP, Association of Internet Researchers: http://aoir.org
_______________________________________________
Air-l mailing list
Air-l@aoir.org
http://www.aoir.org/mailman/listinfo/air-l
--
            Charlie Hendricksen   veritas@u.washington.edu
"Information technology structures human relationships."
                            "Models relate concepts."
--__--__--
Message: 4
Date: Sun, 18 Nov 2001 13:35:03 -0700
From: Charlie Hendricksen <veritas@u.washington.edu>
Organization: Department of Geography, University of Washington
To: air-l@aoir.org
Subject: Re: [Air-l] AoIR communal data-database
Reply-To: air-l@aoir.org
OK, there is a proposal: the metadata should be based on the
openarchives.org database.  Now, in total ignorance of that metadata
set, let me say that that metadata set might be so extensive that the
data providers would be discouraged from submitting their metadata.
There is an exquisite balance between the work required to input
metadata and the rewards for participating in such a project.  In my
experience that balance goes to simplicity of an order that makes the
metadata marginally useful.  I would argue for a custom metadata set
-- that argument based on experience with the hopelessly complex
metadata standards of the Federal Geographic Data Committee.
jeremy hunsinger wrote:
...
yes, but I don't think one needs to have metadata at the level of
variables.  People might want that data, then they should see if the
study might have that information, download the study and look for it
themselves.  I think one needs to have it at the level of the study.
I'm assuming that this will all be in a database eventually, so
categories such as the openarchives.org metadata set would be best, it
is a standard, it describes unique objects like a study, etc.
the lack of exact and complete metadata has not hindered the development
of such projects in the past, i guess in the end it is always a balance
between the practical and the ideal situations.
On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:
...
Yes, the "codebook" for the study should have all the metadata
necessary.  But are the codebooks searchable?  If the repository is of
any size at all, then it needs to be searchable.  Would you like to
read all the codebooks in order to see if there was any data you could
use?  If the codebooks are disassembled and placed in a database that
allows searching then the repository is very useful.  My guess is that
codebooks are idiosyncratic and of wildly varying quality.  This means
that the metadata would be incomplete in many cases.
This raises the issue of what the metadata should include.
jeremy hunsinger
on the ibook
www.cddc.vt.edu
www.cddc.vt.edu/jeremy
_______________________________________________
Air-l mailing list
Air-l@aoir.org
http://www.aoir.org/mailman/listinfo/air-l
--
            Charlie Hendricksen   veritas@u.washington.edu
"Information technology structures human relationships."
                            "Models relate concepts."
-
--
C. Lee Giles, David Reese Professor
School of Information Sciences and Technology
and Computer Science and Engineering
The Pennsylvania State University
001 Thomas Bldg,
University Park, PA, 16801, USA
814 865 7884; FAX: 6426
http://ist.psu.edu/giles