I've been scraping media (hearing videos and associated pdfs/transcripts) and metadata from a handful of legislative committees. This has been complicated by the fact that they use proprietary Flash-based streaming servers (AdobeHDS, provided by Akamai), which requires a sniffing authentication keys and re-assembling many ~1 sec video fragments back into a whole. They also block Tor exit nodes -- ironic for the Senate Select Committee on Intelligence... -a On Mon, Jan 16, 2017 at 4:45 PM, Ed Summers <ehs@pobox.com> wrote:
There is also the #DataRescue effort that seems to be a loosely knit group of activists that includes some folks from the Internet Archive.
https://envirodatagov.org/ https://github.com/edgi-govdata-archiving/ http://www.ppehlab.org/blogposts/2017/1/15/datarescue-philly-builds- datarefuge
Apologies if it was mentioned already...
//Ed _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/ listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/