Re: [Air-L] Government website harvesting

17 Jan 2017

      I've been scraping media (hearing videos and associated pdfs/transcripts)
and metadata from a handful of legislative committees. This has been
complicated by the fact that they use proprietary Flash-based streaming
servers (AdobeHDS, provided by Akamai), which requires a sniffing
authentication keys and re-assembling many ~1 sec video fragments back into
a whole.  They also block Tor exit nodes -- ironic for the Senate Select
Committee on Intelligence...
-a

On Mon, Jan 16, 2017 at 4:45 PM, Ed Summers <ehs@pobox.com> wrote:
...
There is also the #DataRescue effort that seems to be a loosely knit group
of activists that includes some folks from the Internet Archive.
https://envirodatagov.org/
https://github.com/edgi-govdata-archiving/
http://www.ppehlab.org/blogposts/2017/1/15/datarescue-philly-builds-
datarefuge
Apologies if it was mentioned already...
//Ed
_______________________________________________
The Air-L@listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/
listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers:
http://www.aoir.org/

Re: [Air-L] Government website harvesting

abram stern (aphid)