I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way? Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York
WORDPRESS has a feature for this: http://en.blog.wordpress.com/2006/06/12/xml-import-export/ If it is a WORDPRESS blog, you can ask the owner to create a bulk export in XML. Better still is the new offering from GNIP: http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-ava... The future is bright for getting big collections. ~Stu On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy@yahoo.com> wrote:
I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way?
Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr. Stuart W. Shulman people.umass.edu/stu Editor Emeritus, JITP jitp.net <http://www.jitp.net> Director, QDAP-UMass umass.edu/qdap <http://www.umass.edu/qdap> Founder and CEO, Texifter texifter.com <http://www.texifter.com> LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<http://www.linkedin.com/pub/stuart-shulman/10/351/899> Twitter: twitter.com/#!/StuartWShulman<http://twitter.com/#%21/StuartWShulman>
hi, Quoting Stuart Shulman <stuart.shulman@gmail.com>:
WORDPRESS has a feature for this:
http://en.blog.wordpress.com/2006/06/12/xml-import-export/
If it is a WORDPRESS blog, you can ask the owner to create a bulk export in XML.
If you are archiving blog that you don't have access to export functions, I would use 'wget'. It contains features to get everything, no matter how deep the structure is. http://en.wikipedia.org/wiki/Wget /Jarkko
Better still is the new offering from GNIP:
http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-ava...
The future is bright for getting big collections.
~Stu
On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy@yahoo.com> wrote:
I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way?
Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
--
Dr. Stuart W. Shulman people.umass.edu/stu
Editor Emeritus, JITP jitp.net <http://www.jitp.net>
Director, QDAP-UMass umass.edu/qdap <http://www.umass.edu/qdap>
Founder and CEO, Texifter texifter.com <http://www.texifter.com>
LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<http://www.linkedin.com/pub/stuart-shulman/10/351/899> Twitter: twitter.com/#!/StuartWShulman<http://twitter.com/#%21/StuartWShulman> _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
**************************** Jarkko Moilanen (+358 45 8877 150) M.Soc.Sc. (Political Science) PhD Student, Information studies, University of Tampere Blog: http://blog.ossoil.com/ ------------------------- Founder of Hackerspace 5w, Finland, Tampere - http://5w.fi Founder of MeeGo Network Finland, http://meegonetwork.fi Founder of Open Coral - http://open-coral.org Founder of Finnish Biohacker community, http://biohakkeri.fi ****************************
Hi All, I used SiteSucker to download entire blogs (http://www.sitesucker.us/home.html). It worked very well for both blogspot and wordpress blogs, but excess files had to be cleaned up and deleted before analysis. I ran into issues trying to find content analysis software that would allow me to code html files. If anyone has suggestions for software for qualitative analysis of websites and/or downloaded html files, I'd love to hear about it! Best, Wendy Wendy M. Christensen, Ph.D. Visiting Assistant Professor Department of Sociology and Anthropology Bowdoin College wchriste@bowdoin.edu<mailto:wchriste@bowdoin.edu> On Jan 20, 2012, at 3:40 AM, Jarkko Moilanen wrote: hi, Quoting Stuart Shulman <stuart.shulman@gmail.com<mailto:stuart.shulman@gmail.com>>: WORDPRESS has a feature for this: http://en.blog.wordpress.com/2006/06/12/xml-import-export/ If it is a WORDPRESS blog, you can ask the owner to create a bulk export in XML. If you are archiving blog that you don't have access to export functions, I would use 'wget'. It contains features to get everything, no matter how deep the structure is. http://en.wikipedia.org/wiki/Wget /Jarkko Better still is the new offering from GNIP: http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-ava... The future is bright for getting big collections. ~Stu On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy@yahoo.com> wrote: I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way? Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/ -- Dr. Stuart W. Shulman people.umass.edu/stu Editor Emeritus, JITP jitp.net <http://www.jitp.net> Director, QDAP-UMass umass.edu/qdap <http://www.umass.edu/qdap> Founder and CEO, Texifter texifter.com <http://www.texifter.com> LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<http://www.linkedin.com/pub/stuart-shulman/10/351/899> Twitter: twitter.com/#!/StuartWShulman<http://twitter.com/#%21/StuartWShulman> _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/ **************************** Jarkko Moilanen (+358 45 8877 150) M.Soc.Sc. (Political Science) PhD Student, Information studies, University of Tampere Blog: http://blog.ossoil.com/ ------------------------- Founder of Hackerspace 5w, Finland, Tampere - http://5w.fi Founder of MeeGo Network Finland, http://meegonetwork.fi Founder of Open Coral - http://open-coral.org Founder of Finnish Biohacker community, http://biohakkeri.fi **************************** _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
*Hi All, * I use Web content extractor from Newprosoft ( http://www.newprosoft.com/web-content-extractor.htm). The good thing is that I can extract specific fields (like date, name, user or all the post), add some manipulations thru javascript code and export all the dataset to Excel. *Regards, Yohanan Ouaknine MA student, Information studies (Knowledge management), Bar Ilan University (Israel) * On Fri, Jan 20, 2012 at 6:05 AM, Wendy Christensen <wchriste@bowdoin.edu> wrote:
Hi All,
I used SiteSucker to download entire blogs (
http://www.sitesucker.us/home.html). It worked very well for both blogspot and wordpress blogs, but excess files had to be cleaned up and deleted before analysis.
I ran into issues trying to find content analysis software that would
allow me to code html files. If anyone has suggestions for software for qualitative analysis of websites and/or downloaded html files, I'd love to hear about it!
Best, Wendy
Wendy M. Christensen, Ph.D.
Visiting Assistant Professor Department of Sociology and Anthropology Bowdoin College wchriste@bowdoin.edu<mailto:wchriste@bowdoin.edu>
On Jan 20, 2012, at 3:40 AM, Jarkko Moilanen wrote:
hi,
Quoting Stuart Shulman <stuart.shulman@gmail.com<mailto:
stuart.shulman@gmail.com>>:
WORDPRESS has a feature for this:
http://en.blog.wordpress.com/2006/06/12/xml-import-export/
If it is a WORDPRESS blog, you can ask the owner to create a bulk export
in
XML.
If you are archiving blog that you don't have access to export functions, I would use 'wget'. It contains features to get everything, no matter how deep the structure is.
http://en.wikipedia.org/wiki/Wget
/Jarkko
Better still is the new offering from GNIP:
http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-ava...
The future is bright for getting big collections.
~Stu
On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy@yahoo.com> wrote:
I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way?
Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
--
Dr. Stuart W. Shulman people.umass.edu/stu
Editor Emeritus, JITP jitp.net <http://www.jitp.net>
Director, QDAP-UMass umass.edu/qdap <http://www.umass.edu/qdap>
Founder and CEO, Texifter texifter.com <http://www.texifter.com>
LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<
http://www.linkedin.com/pub/stuart-shulman/10/351/899>
Twitter: twitter.com/#!/StuartWShulman< http://twitter.com/#%21/StuartWShulman> _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
**************************** Jarkko Moilanen (+358 45 8877 150) M.Soc.Sc. (Political Science) PhD Student, Information studies, University of Tampere Blog: http://blog.ossoil.com/ ------------------------- Founder of Hackerspace 5w, Finland, Tampere - http://5w.fi Founder of MeeGo Network Finland, http://meegonetwork.fi Founder of Open Coral - http://open-coral.org Founder of Finnish Biohacker community, http://biohakkeri.fi **************************** _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- יוחנן ועקנין Yohanan Ouaknine 050-6279777 yohanan.ouaknine@ois.co.il http://il.linkedin.com/in/yohananouaknine See who we know in common
Hello Collette, the open source website copier HTTrack (http://www.httrack.com/) is a good option for archiving websites, though it doesn't do much in the way of analysis. It saves downloaded pages as static, linked HTML files, which you can then traverse with your browser or easily access with other analytical tools. ~Nicholas Nicholas Taylor | Information Technology Specialist | Library of Congress Web Archiving<http://www.loc.gov/webarchiving/> Phone: (202) 707-3940 | E-mail: ntay@loc.gov<mailto:ntay@loc.gov> | Twitter: @nullhandle<http://twitter.com/nullhandle>
Hi, I always use the Offline Explorer Pro (MetaProducts Offline Explorer Pro) to archive websites or blogs. It's not for free as HTTrack, but it's giving you more options. The license fee is about 75,00 Euro. ~Uta -------------------------------------------------- Dr. Uta Russmann Post-Doc Researcher AUTNES - Innsbruck Austrian National Election Study Institut für Politikwissenschaft UNIVERSITÄT INNSBRUCK ICT-Technologiepark, 1.OG Nord Technikerstraße 21 a, A-6020 Innsbruck Telefon +43 (0) 512 / 507 - 38208 Fax +43 (0) 512 / 507 - 38299 E-Mail Uta.Russmann@uibk.ac.at -----Ursprüngliche Nachricht----- Von: air-l-bounces@listserv.aoir.org [mailto:air-l-bounces@listserv.aoir.org] Im Auftrag von Taylor, Nicholas A. Gesendet: Montag, 23. Jänner 2012 15:30 An: 'Collette Sosnowy' Cc: air-l@listserv.aoir.org Betreff: Re: [Air-L] archiving entire blogs Hello Collette, the open source website copier HTTrack (http://www.httrack.com/) is a good option for archiving websites, though it doesn't do much in the way of analysis. It saves downloaded pages as static, linked HTML files, which you can then traverse with your browser or easily access with other analytical tools. ~Nicholas Nicholas Taylor | Information Technology Specialist | Library of Congress Web Archiving<http://www.loc.gov/webarchiving/> Phone: (202) 707-3940 | E-mail: ntay@loc.gov<mailto:ntay@loc.gov> | Twitter: @nullhandle<http://twitter.com/nullhandle> _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
participants (7)
-
"Rußmann, Uta" -
C Sosnowy -
Jarkko Moilanen -
Stuart Shulman -
Taylor, Nicholas A. -
Wendy Christensen -
יוחנן ועקנין