Rainer, if you're interested more in tagged imagery rather than Instagram imagery specifically, the GDELT Visual Global Knowledge Graph (VGKG) dataset may be of particular interest. It consists of more than 150 million images drawn from global news coverage worldwide over the last 9 months and passed through Google's Cloud Vision API deep learning service. Each record includes the URL of the image, the URL of the article it appeared in, a set of tags that categorize both the objects and activities depicted in the image, full OCR (including OCR of script and logographic languages), identification of major worldwide commercial and NGO logos, estimation of violence level, facial detection (but not recognition) and an estimation of the facial sentiment of each human face, and the estimated location the image was taken (based purely on visual analysis): http://blog.gdeltproject.org/announcing-the-new-gdelt-visual-global-knowledg... The full dataset is rather large, weighing in at around 850GB and each record is encoded as a JSON blob which can be quite large for highly detailed images, so the collection requires a bit of expertise to work with, but there is a great third party R package that is able to process the data into a more easily workable format ( https://github.com/abresler/gdeltr2). Within the next few weeks, additional fields will encode all EXIF, IPTC and XMP metadata encoded in each image (around 10% of news imagery includes expanded metadata such as publisher-assigned keywords and textual descriptions) and three perceptual hashes are being added (Average Hash, Perceptive Hash and Difference Hash) to allow visual similarity comparison and search: http://blog.gdeltproject.org/vgkg-adds-exif-support/ http://blog.gdeltproject.org/vgkg-adds-perceptual-hashing-image-similarity-s... The dataset currently updates every 15 minutes, but in the next month will be switching to updating every 1 minute, meaning if you're interested in realtime visual analysis, this dataset may be of great interest: http://blog.gdeltproject.org/visual-gkg-to-be-first-gdelt-gen-3-release/ Finally, through a partnership with the Internet Archive, the URLs of all images in this collection are sent to the Internet Archive each day, which preserves each image and its corresponding article into their permanent primary archive that powers the Wayback Machine: http://blog.gdeltproject.org/gdelt-internet-archives-collaboration-to-archiv... Finally, if you're interested in historical imagery, you might take a look at the Internet Archive Book Images Collection I built several years ago with the Internet Archive, extracting the images of more than 600 million pages of public domain books dating back 500 years from over 1,000 libraries worldwide - the image files, book-level metadata, and the text immediately surrounding each image as it appeared on the page is all available: http://blogs.loc.gov/thesignal/2014/12/unlocking-the-imagery-of-500-years-of... http://www.bbc.com/news/technology-28976849 http://blog.gdeltproject.org/500-years-of-the-images-of-the-worlds-books-now... Hope this helps! Kalev http://blog.gdeltproject.org/ http://kalevleetaru.com/ On Sat, Sep 17, 2016 at 11:27 AM, Rainer Hillrichs < hillrichs@uni-mannheim.de> wrote:
Dear all,
I searched on the list and on the web but couldn't find anything: I'm looging for a tool that collects Instagram images, websites, and data associated with a specific tag. Basically, I want to type in a tag and end up with a folder full of images, websites, and a table with data (e.g. user name, date posted, URL, other tags). I already suspect that is a lot to ask for ;-) Even a simpler tool would be a good start! As long as I don't have to to end up saving individual images, websites, and typiing/copying stuff into a table.
Suggestions very much appreciated! Rainer
-- Dr. Rainer Hillrichs Universität Mannheim https://uni-mannheim.academia.edu/RainerHillrichs _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/ listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/