Elijah, As I get deeper into this problem, I am slowly coming to terms with the fact that the data I am after could only be generated at a great expense. Primetrica comes the closest, but as I say, their datasets don't provide all data points necessary for network analysis i.e. square-matrix of the form ABCD A -yyy B y-yy C yy-y D yyy- where y is a measure of data flow from city in row i to city in column j. Too bad, because data like this could really open up some interesting problems. Thanks for your observations, Justin Message: 15 Date: Thu, 20 May 2004 10:29:28 -0500 (CDT) From: elijah wright <elw@stderr.org> To: air-l@aoir.org Subject: Re: [Air-l] Data Reply-To: air-l@aoir.org
I am interested in hearing any thoughts you have on a data problem that I have, that I am sure many of you have approached, and which is, of course, a result of the structure of the Internet itself. In my ideal world, I would be able to build a relational database of data traffic between the largest cities worldwide.
Social problem the first - the information you'd most like to have is closely guarded by the involved companies. They keep it secret so that other companies can't deduce all of their peering agreements and thereby figure out how best to 'take advantage' of network position for profit. This is a pretty common problem for a decentralized network, in my experience.
The data I have found shows gross data traffic between nodes, which includes traffic originated in third-party cities and destined for fourth-party cities, for example, and which does not provide an estimate of the traffic originated in 3 and destined for 4. This means that the data doesn't relate every node in the city system to every other in terms of network traffic inbound and outbound.
right - the nodes which are most easily measured/evaluated (the network hubs) don't actually act as termination points for a whole lot of traffic. they're just points in the system as a whole, with peers that serve endpoints but are not backbone nodes themselves.
Have you approached this problem? Do you have any thoughts on how currently available data can be patched for network analysis, or how such a relational database could be built in the future?
a graph-like structure is good for this, IMHO. something like this: sourcenode destnode measurement eval.date sourcenode destnode measurement eval.date sourcenode destnode measurement eval.date ad nauseum. you may need some more values, depending on what it is that you're wanting to do. but that general form (spreadsheet-like) is one of the simpler structures to store in a database, and reformatting those tables into something that tools like UCINet or Pajek can display is not such a terrible task. elijah _____________________________________ Justin Rosenthal MA Candidate - Social Science University of Chicago jrr@uchicago.edu