the best way to archive web material?
Hello and apologies if this has been asked recently or seems a bit basic! Does anyone have a recommendation for software to archive web material? I am heading a project to study political activism on the Russian internet and we need to store a range of different types of web pages across time ... I can't even get my PC to store even a small amount with full images. My research partner in Ukraine can, but she has a Mac (not an option available at my university right now). I have a small budget to buy some software, although freeware suggestions always appreciated. I want to have the archive complete so that we can work with it, share it with other researchers, go back to it as necessary, etc., so I really want to have full graphics etc. Optimally, it would be something that could do automatic crawls and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system. Suggestions from this clever and useful list most welcome, although currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (: Sincerely Sarah Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com <http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401
Dear Sarah I am using zotero which is a free add on to Firefox http://www.zotero.org/ Good thing about it: it takes captures of webpages as they are at any particular moment + creates info on URL, date of access etc (Zotero was originaly developed as a tool to create and share bibliographies) Files are easy to organise into folders and subfolders, and I think there is an option to have your archive stored on zotero site , to be able to share (haven't explored this as I work along on my project) Not so good thing: can't download videos. So you will need to download separately. I am sure there are other, better ways, so look forward to other responses Adi -- Dr. Adi Kuntsman Leverhulme Early Career Fellow Research Institute for Cosmopolitan Cultures The University of Manchester Second Floor, Arthur Lewis Building, room 2.007 Oxford Road, Manchester M13 9PL, UK http://www.socialsciences.manchester.ac.uk/ricc/index.html http://adi.kuntsman.googlepages.com ________________________________ From: Sarah Oates <s.oates@lbss.gla.ac.uk> To: air-l@listserv.aoir.org Sent: Thu, October 21, 2010 3:25:47 PM Subject: [Air-L] the best way to archive web material? Hello and apologies if this has been asked recently or seems a bit basic! Does anyone have a recommendation for software to archive web material? I am heading a project to study political activism on the Russian internet and we need to store a range of different types of web pages across time ... I can't even get my PC to store even a small amount with full images. My research partner in Ukraine can, but she has a Mac (not an option available at my university right now). I have a small budget to buy some software, although freeware suggestions always appreciated. I want to have the archive complete so that we can work with it, share it with other researchers, go back to it as necessary, etc., so I really want to have full graphics etc. Optimally, it would be something that could do automatic crawls and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system. Suggestions from this clever and useful list most welcome, although currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (: Sincerely Sarah Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com<http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401 _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
Sarah I use WebCite http://www.webcitation.org/ and Evernote http://www.evernote.com/. Cheers WL On 22/10/2010, at 1:33 AM, Adi Kuntsman wrote:
Dear Sarah
I am using zotero which is a free add on to Firefox http://www.zotero.org/ Good thing about it: it takes captures of webpages as they are at any particular moment + creates info on URL, date of access etc (Zotero was originaly developed as a tool to create and share bibliographies) Files are easy to organise into folders and subfolders, and I think there is an option to have your archive stored on zotero site , to be able to share (haven't explored this as I work along on my project)
Not so good thing: can't download videos. So you will need to download separately.
I am sure there are other, better ways, so look forward to other responses Adi
--
Dr. Adi Kuntsman Leverhulme Early Career Fellow Research Institute for Cosmopolitan Cultures The University of Manchester Second Floor, Arthur Lewis Building, room 2.007 Oxford Road, Manchester M13 9PL, UK http://www.socialsciences.manchester.ac.uk/ricc/index.html http://adi.kuntsman.googlepages.com
________________________________ From: Sarah Oates <s.oates@lbss.gla.ac.uk> To: air-l@listserv.aoir.org Sent: Thu, October 21, 2010 3:25:47 PM Subject: [Air-L] the best way to archive web material?
Hello and apologies if this has been asked recently or seems a bit basic!
Does anyone have a recommendation for software to archive web material? I am heading a project to study political activism on the Russian internet and we need to store a range of different types of web pages across time ... I can't even get my PC to store even a small amount with full images. My research partner in Ukraine can, but she has a Mac (not an option available at my university right now). I have a small budget to buy some software, although freeware suggestions always appreciated. I want to have the archive complete so that we can work with it, share it with other researchers, go back to it as necessary, etc., so I really want to have full graphics etc. Optimally, it would be something that could do automatic crawls and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system.
Suggestions from this clever and useful list most welcome, although currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (:
Sincerely Sarah
Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT
Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com<http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
I've been using HTTrack http://www.httrack.com/ (suggested by Jeremy) for a while…Unfortunately, it breaks the crawling process at the very beginning sometimes. Am not sure why it does so, but I suppose it is related to the structure of the website or the portion of the website you are trying to download for offline browsing. I've switched to the HTML Spider in the Free Download Manager http://www.freedownloadmanager.org/ and haven't faced any problem since then. Adjusting the crawling settings in this spider (depth, in(ex)cluding images, in(ex)cluding files, etc) is much easier than adjusting them in HTTrack. In an early release I used, HTTrack was silently fetching the whole Yahoo to me =) /Sari On Thu, Oct 21, 2010 at 11:16 PM, WL Wong <wwon8281@uni.sydney.edu.au>wrote:
Sarah
I use WebCite http://www.webcitation.org/ and Evernote http://www.evernote.com/.
Cheers WL On 22/10/2010, at 1:33 AM, Adi Kuntsman wrote:
Dear Sarah
I am using zotero which is a free add on to Firefox http://www.zotero.org/ Good thing about it: it takes captures of webpages as they are at any particular moment + creates info on URL, date of access etc (Zotero was originaly developed as a tool to create and share bibliographies) Files are easy to organise into folders and subfolders, and I think there is an option to have your archive stored on zotero site , to be able to share (haven't explored this as I work along on my project)
Not so good thing: can't download videos. So you will need to download separately.
I am sure there are other, better ways, so look forward to other responses Adi
--
Dr. Adi Kuntsman Leverhulme Early Career Fellow Research Institute for Cosmopolitan Cultures The University of Manchester Second Floor, Arthur Lewis Building, room 2.007 Oxford Road, Manchester M13 9PL, UK http://www.socialsciences.manchester.ac.uk/ricc/index.html http://adi.kuntsman.googlepages.com
________________________________ From: Sarah Oates <s.oates@lbss.gla.ac.uk> To: air-l@listserv.aoir.org Sent: Thu, October 21, 2010 3:25:47 PM Subject: [Air-L] the best way to archive web material?
Hello and apologies if this has been asked recently or seems a bit basic!
Does anyone have a recommendation for software to archive web material? I am heading a project to study political activism on the Russian internet and we need to store a range of different types of web pages across time ... I can't even get my PC to store even a small amount with full images. My research partner in Ukraine can, but she has a Mac (not an option available at my university right now). I have a small budget to buy some software, although freeware suggestions always appreciated. I want to have the archive complete so that we can work with it, share it with other researchers, go back to it as necessary, etc., so I really want to have full graphics etc. Optimally, it would be something that could do automatic crawls and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system.
Suggestions from this clever and useful list most welcome, although currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (:
Sincerely Sarah
Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT
Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com<http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- -----BEGIN PGP PUBLIC KEY BLOCK----- Version: PGP Desktop 9.5.0 (Build 1202) mQCNBEgtLgoBBACqQYBgYCY40SblWGbTcrvwCngPrjx2CNtcfR/ATvZ4mbF/xHgy SzV6+XRs76hgAv0K2AG+i4UjDwRRJfb8HPe8DVtsyOQNPFtZO9Gk700aD7MndwlF m7HrGwc5uBfnH6iUws1o/Z1J7i+5fUfk3mew/b3532WxLvDi+QUSxlsKdQARAQAB tCRTYXJpIEhhaiBIdXNzZWluIDxhbmd5am9vQHlhaG9vLmNvbT6JAPIEEAECAFwF AkgtL4UwFIAAAAAAIAAHcHJlZmVycmVkLWVtYWlsLWVuY29kaW5nQHBncC5jb21w Z3BtaW1lCAsJBwgDAgEKAhkBBRsDAAAABBYDAgEFHgEAAAAHFQgCCgkDAQAKCRCy i48IPBmZbZoNA/0ckC3rWxoe/Jf66+YauicNtH8zZmr9Y7dypV+yZm/vrkAtffcY 1VKMhj9YMpqwzylP/nomuG211bWoGhMzAb7CAho1tS3KXtUNZzLj1U5hvRtWfrWc dipwY3YJbnaFdkzIi9xj3HMZ4BKHQZtBKjwru6HafQF2smokS8yjxTKELA== =9/vk -----END PGP PUBLIC KEY BLOCK-----
Httrack tends to respect robots.txt, which will prevent spidering, and that may have been the issue there, but it may have been something else, i'd have to look at the site:) researchers should respect robots.txt too, i think, though archivists should only respect it to a lesser degree, there are discussions of robots.txt in the list archives that as i recall go on for some pages. Mostly in my work i'm interested in text, and some design, images/videos within that text, but mostly text. the reasons i like httrack and wget, wget is mostly what i use... is that it does grab what I need. sometimes you might need more. danah asked me about javascript based sites in an offlist, and well those do cause major issues, as will sites that encrypt their html through javascript, but... most sites, you can just grab the rendered html. For me, i archive my data sets primarily so others can use them, verify them, if they want, i don't do it for my own analysis, but mostly because you aren't really doing science unless you make the data accessible and analyzable by people whose opinion may differ. granted though, i don't usually release them until something is published from them, except for the wikipedia data... that i never published and just put online. On Oct 21, 2010, at 6:33 PM, Sari wrote:
I've been using HTTrack http://www.httrack.com/ (suggested by Jeremy) for a while…Unfortunately, it breaks the crawling process at the very beginning sometimes. Am not sure why it does so, but I suppose it is related to the structure of the website or the portion of the website you are trying to download for offline browsing.
I've switched to the HTML Spider in the Free Download Manager http://www.freedownloadmanager.org/ and haven't faced any problem since then. Adjusting the crawling settings in this spider (depth, in(ex)cluding images, in(ex)cluding files, etc) is much easier than adjusting them in HTTrack.
In an early release I used, HTTrack was silently fetching the whole Yahoo to me =)
/Sari
On Thu, Oct 21, 2010 at 11:16 PM, WL Wong <wwon8281@uni.sydney.edu.au>wrote:
Sarah
I use WebCite http://www.webcitation.org/ and Evernote http://www.evernote.com/.
Cheers WL On 22/10/2010, at 1:33 AM, Adi Kuntsman wrote:
Dear Sarah
I am using zotero which is a free add on to Firefox http://www.zotero.org/ Good thing about it: it takes captures of webpages as they are at any particular moment + creates info on URL, date of access etc (Zotero was originaly developed as a tool to create and share bibliographies) Files are easy to organise into folders and subfolders, and I think there is an option to have your archive stored on zotero site , to be able to share (haven't explored this as I work along on my project)
Not so good thing: can't download videos. So you will need to download separately.
I am sure there are other, better ways, so look forward to other responses Adi
--
Dr. Adi Kuntsman Leverhulme Early Career Fellow Research Institute for Cosmopolitan Cultures The University of Manchester Second Floor, Arthur Lewis Building, room 2.007 Oxford Road, Manchester M13 9PL, UK http://www.socialsciences.manchester.ac.uk/ricc/index.html http://adi.kuntsman.googlepages.com
________________________________ From: Sarah Oates <s.oates@lbss.gla.ac.uk> To: air-l@listserv.aoir.org Sent: Thu, October 21, 2010 3:25:47 PM Subject: [Air-L] the best way to archive web material?
Hello and apologies if this has been asked recently or seems a bit basic!
Does anyone have a recommendation for software to archive web material? I am heading a project to study political activism on the Russian internet and we need to store a range of different types of web pages across time ... I can't even get my PC to store even a small amount with full images. My research partner in Ukraine can, but she has a Mac (not an option available at my university right now). I have a small budget to buy some software, although freeware suggestions always appreciated. I want to have the archive complete so that we can work with it, share it with other researchers, go back to it as necessary, etc., so I really want to have full graphics etc. Optimally, it would be something that could do automatic crawls and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system.
Suggestions from this clever and useful list most welcome, although currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (:
Sincerely Sarah
Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT
Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com<http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- -----BEGIN PGP PUBLIC KEY BLOCK----- Version: PGP Desktop 9.5.0 (Build 1202)
mQCNBEgtLgoBBACqQYBgYCY40SblWGbTcrvwCngPrjx2CNtcfR/ATvZ4mbF/xHgy SzV6+XRs76hgAv0K2AG+i4UjDwRRJfb8HPe8DVtsyOQNPFtZO9Gk700aD7MndwlF m7HrGwc5uBfnH6iUws1o/Z1J7i+5fUfk3mew/b3532WxLvDi+QUSxlsKdQARAQAB tCRTYXJpIEhhaiBIdXNzZWluIDxhbmd5am9vQHlhaG9vLmNvbT6JAPIEEAECAFwF AkgtL4UwFIAAAAAAIAAHcHJlZmVycmVkLWVtYWlsLWVuY29kaW5nQHBncC5jb21w Z3BtaW1lCAsJBwgDAgEKAhkBBRsDAAAABBYDAgEFHgEAAAAHFQgCCgkDAQAKCRCy i48IPBmZbZoNA/0ckC3rWxoe/Jf66+YauicNtH8zZmr9Y7dypV+yZm/vrkAtffcY 1VKMhj9YMpqwzylP/nomuG211bWoGhMzAb7CAho1tS3KXtUNZzLj1U5hvRtWfrWc dipwY3YJbnaFdkzIi9xj3HMZ4BKHQZtBKjwru6HafQF2smokS8yjxTKELA== =9/vk -----END PGP PUBLIC KEY BLOCK----- _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
Jeremy Hunsinger Center for Digital Discourse and Culture Virginia Tech Words are things; and a small drop of ink, falling like dew upon a thought, produces that which makes thousands, perhaps millions, think. --Byron
Hi Sarah, I have used wget successfully: http://gnuwin32.sourceforge.net/packages/wget.htm (download) http://www.gnu.org/software/wget/ (general website) http://www.gnu.org/software/wget/manual/html_node/index.html (complete manual docs.) It is free and open source. However, it does run from the command line, which can be a factor for some people. --Joe ------------------------------------------------------------ Joseph John Williams Associate Professor of Rhetoric and Writing University of Arkansas at Little Rock -----Original Message----- From: air-l-bounces@listserv.aoir.org [mailto:air-l-bounces@listserv.aoir.org] On Behalf Of Sarah Oates Sent: Thursday, October 21, 2010 9:26 AM To: air-l@listserv.aoir.org Subject: [Air-L] the best way to archive web material? Hello and apologies if this has been asked recently or seems a bit basic! Does anyone have a recommendation for software to archive web material? I am heading a project to study political activism on the Russian internet and we need to store a range of different types of web pages across time ... I can't even get my PC to store even a small amount with full images. My research partner in Ukraine can, but she has a Mac (not an option available at my university right now). I have a small budget to buy some software, although freeware suggestions always appreciated. I want to have the archive complete so that we can work with it, share it with other researchers, go back to it as necessary, etc., so I really want to have full graphics etc. Optimally, it would be something that could do automatic crawls and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system. Suggestions from this clever and useful list most welcome, although currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (: Sincerely Sarah Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com <http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401 _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
On Thursday, October 21, 2010, Sarah Oates wrote:
Does anyone have a recommendation for software to archive web material?
When I was trying to pull various sources together, including the Nupedia email archives which could only be found on the Way Back Machine, I use wget as I describe here: http://reagle.org/joseph/blog/method/nupedia-l-archives
Thank you, this is really useful! And thank to everyone for the other suggestions as well, it's just amazing how people share knowledge here ... it makes being an academic fun again. Sarah Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com <http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401 ________________________________ From: Joseph Reagle [mailto:joseph.2008@reagle.org] Sent: Thu 21/10/2010 15:40 To: air-l@listserv.aoir.org Cc: Sarah Oates Subject: Re: [Air-L] the best way to archive web material? On Thursday, October 21, 2010, Sarah Oates wrote:
Does anyone have a recommendation for software to archive web material?
When I was trying to pull various sources together, including the Nupedia email archives which could only be found on the Way Back Machine, I use wget as I describe here: http://reagle.org/joseph/blog/method/nupedia-l-archives
Hi Sarah, not sure the programs does all what you need, but I have been using a firefox addon called Scrapbook for a while, to archive the web: http://amb.vis.ne.jp/mozilla/scrapbook/ you can save and manage collections of websites/pages, and it is quite powerful in my view,as it has a few useful management tools. You can easily back up collections and transfer them between scrapbook (so you can share with colleagues the data). I am not sure it does automatic crawling, but you might find something that can work with it. Well you can download it and play with it for a while :) and later decide if it is worth using it. I hope this is of some help S. -- My institutional page (not always updated with latest achievements) http://www.nuim.ie/nirsa/people/postdocs/stefano_de_paoli.shtml === New Fibreculture Paper on Cheating in MMORPGs: http://sixteen.fibreculturejournal.org/the-assemblage-of-cheating-how-to-stu... s and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system.
Suggestions from this clever and useful list most welcome, although currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (:
Sincerely Sarah
Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT
Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com <http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- My institutional page (not always updated with latest achievements) http://www.nuim.ie/nirsa/people/postdocs/stefano_de_paoli.shtml === Fibreculture Paper on Cheating in MMORPGs: http://sixteen.fibreculturejournal.org/the-assemblage-of-cheating-how-to-stu... === New on the Social Construction of GIS http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V9K-517GVGD-1&_us...
Hi Sarah, I have also been using Zotero pretty extensively as a part of my research, like Adi noted earlier, not as intended as a bibliographic reference tool, but as a tool to take snapshots of web pages as I experienced them, organize them, annotate them, and tag them for later use. I have not experimented with archiving these online for various reasons but there is a way to sync with an online database of your materials. My experience has been mostly positive, but there are somethings that are worth considering before you dive to deeply. Feel free to contact me directly if you have any questions. Regards, Dan ------------------------------------ Dan Perkel PhD Candidate School of Information, Berkeley Center for New Media UC Berkeley http://people.ischool.berkeley.edu/~dperkel On Thu, Oct 21, 2010 at 8:08 AM, Stefano De Paoli <Stefano.DePaoli@nuim.ie>wrote:
Hi Sarah,
not sure the programs does all what you need, but I have been using a firefox addon called Scrapbook for a while, to archive the web:
http://amb.vis.ne.jp/mozilla/scrapbook/
you can save and manage collections of websites/pages, and it is quite powerful in my view,as it has a few useful management tools.
You can easily back up collections and transfer them between scrapbook (so you can share with colleagues the data).
I am not sure it does automatic crawling, but you might find something that can work with it.
Well you can download it and play with it for a while :) and later decide if it is worth using it.
I hope this is of some help
S.
-- My institutional page (not always updated with latest achievements) http://www.nuim.ie/nirsa/people/postdocs/stefano_de_paoli.shtml === New Fibreculture Paper on Cheating in MMORPGs:
http://sixteen.fibreculturejournal.org/the-assemblage-of-cheating-how-to-stu...
s and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system.
Suggestions from this clever and useful list most welcome, although
currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (:
Sincerely Sarah
Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT
Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com <http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- My institutional page (not always updated with latest achievements) http://www.nuim.ie/nirsa/people/postdocs/stefano_de_paoli.shtml === Fibreculture Paper on Cheating in MMORPGs:
http://sixteen.fibreculturejournal.org/the-assemblage-of-cheating-how-to-stu... === New on the Social Construction of GIS
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V9K-517GVGD-1&_us... _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
I'd be happy to give you and your team a free http://www.Dedoose.com account if you think it might be helpful. It's a very powerful web-based research tool. ~ Jason -----Original Message----- From: air-l-bounces@listserv.aoir.org [mailto:air-l-bounces@listserv.aoir.org] On Behalf Of Sarah Oates Sent: Thursday, October 21, 2010 7:26 AM To: air-l@listserv.aoir.org Subject: [Air-L] the best way to archive web material? Hello and apologies if this has been asked recently or seems a bit basic! Does anyone have a recommendation for software to archive web material? I am heading a project to study political activism on the Russian internet and we need to store a range of different types of web pages across time ... I can't even get my PC to store even a small amount with full images. My research partner in Ukraine can, but she has a Mac (not an option available at my university right now). I have a small budget to buy some software, although freeware suggestions always appreciated. I want to have the archive complete so that we can work with it, share it with other researchers, go back to it as necessary, etc., so I really want to have full graphics etc. Optimally, it would be something that could do automatic crawls and downloads as well, although as we are tending to focus on relatively short periods of intense interest around particularly issues/events, we don't need a long-term crawl system. Suggestions from this clever and useful list most welcome, although currently this list is making me sad that I am not in Sweden to meet people at exciting venues and hear what I am sure is some great work (: Sincerely Sarah Sarah Oates Professor of Political Communication School of Social and Political Sciences Adam Smith Building University of Glasgow Glasgow G12 8RT Email: sarah.oates@glasgow.ac.uk Website: www.media-politics.com <http://www.media-politics.com/> Telephone: (0)141 330 5124 The University of Glasgow, charity number SC004401 _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
participants (10)
-
Adi Kuntsman -
Dan Perkel -
Jason Taylor -
jeremy hunsinger -
Joe Williams -
Joseph Reagle -
Sarah Oates -
Sari -
Stefano De Paoli -
WL Wong