Justin, It's quite a problem. GISes, and tools like UCINET or even plain old spreadsheets, are great for manipulating the data -- once you have it. Some thoughts which may be of help:
In my ideal world, I would be able to build a relational database of data traffic between the largest cities worldwide. The data I have found shows gross data traffic between nodes, which includes traffic originated in third-party cities and destined for fourth-party cities, for example, and which does not provide an estimate of the traffic originated in 3 and destined for 4. This means that the data doesn't relate every node in the city system to every other in terms of network traffic inbound and outbound.
The TeleGeography research group (now part of PriMetrica, Inc.) has this kind of data for major cities. The non-provider- specific datasets they create on city-to-city Internet *bandwidth* displays the characteristics you mention. But their city-to-city Internet *traffic* data are end-to-end. Elijah is right to point out that collating, tabulating, and verifying this data is very difficult. That's probably why the dataset only exists for major cities, and why TeleGeography only began the traffic work only after a few years on the Internet bandwidth side.
you have any thoughts on how currently available data can be patched for network analysis, or how such a relational database could be built in the future?
If you go to TeleGeography, you'll probably want to ask for older data -- the most recent stuff is quite expensive. If you prefer to construct your own, you'll probably want to do something attainable. An interesting approach might be to combine DNS lookups with Web link analysis. In other words, something like: 1) come up with a list of cities you want to test for; 2) find some set of Web sites for each of those cities, for example by looking up the registered addresses for the "top x" Web sites according to Nielsen//NetRatings or ComScore or Alexa or whoever else you judge least bad; 3) tabulate city-to-city links between these Web sites. Based on the argument that these Web sites probably represented some very high percentage of all Web usage (the ratings service you went with would make some claim here), it seems to me you'd have something useable. Now, that would obviously give you links between cities to which Web sites are registered -- not between the cities in which Web sites are hosted. If you thought the latter was more relevant, you'd probably want to make step 2 a bit fancier, involving DNS to IP to geolocation using one of many techniques. As to whether or not doing all this is less costly time-wise than paying for what someone else has used their time to do ... that's another story. It's a not-insignificant but interesting programming challenge, anyway; the devil would be in the tweaking. A final thought: some (particularly George Barnett's group at SUNY Buffalo) have done a fair bit of work with country-to- country telephone traffic in this vein. Because international PSTN traffic has been collected for a much longer time, that kind of data (again, ITU or TeleGeography) is much more -- and much more cheaply -- obtainable. cheers Bram
Bram Dov Abramson wrote:
If you prefer to construct your own, you'll probably want to do something attainable. An interesting approach might be to combine DNS lookups with Web link analysis. In other words, something like:
1) come up with a list of cities you want to test for;
2) find some set of Web sites for each of those cities, for example by looking up the registered addresses for the "top x" Web sites according to Nielsen//NetRatings or ComScore or Alexa or whoever else you judge least bad;
3) tabulate city-to-city links between these Web sites.
That sounds pretty familiar. I did this--in a very limited way--for several "global cities" about six years ago. Here's the paper: http://alex.halavais.net/research/99-informational_city.pdf There's a pretty visualization of the network in the 14th slide in here (fair warning, the file is ~1.4 Mb): http://alex.halavais.net/news/archives/nerdi.pdf I don't want to discourage anyone, but I think George Barnett, my colleague here at SUNY Buffalo, has the right idea in making the best of already available (if expensive) data sets. Gathering the data is definitely NOT half the fun :). I'm working now, along with one of my students (Jia Lin) to do a much more extensive analysis of cities in the US. Hopefully we'll have some preliminary results shortly. Alex
participants (2)
-
Bram Dov Abramson -
halavaisļ¼ buffalo.edu