archiving entire blogs - Air-L - lists.aoir.org

newer
2nd CFP: 1st International...

archiving entire blogs

older
reminder: JCMC special issue on...

C Sosnowy

20 Jan 2012 20 Jan '12

2:31 a.m.

I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way? Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York

Reply

Sign in to reply online Use email software

Show replies by date

Stuart Shulman

20 Jan 20 Jan

8:16 a.m.

WORDPRESS has a feature for this: http://en.blog.wordpress.com/2006/06/12/xml-import-export/ If it is a WORDPRESS blog, you can ask the owner to create a bulk export in XML. Better still is the new offering from GNIP: http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-ava... The future is bright for getting big collections. ~Stu On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy@yahoo.com> wrote:

I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way?

Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Dr. Stuart W. Shulman people.umass.edu/stu Editor Emeritus, JITP jitp.net <http://www.jitp.net> Director, QDAP-UMass umass.edu/qdap <http://www.umass.edu/qdap> Founder and CEO, Texifter texifter.com <http://www.texifter.com> LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<http://www.linkedin.com/pub/stuart-shulman/10/351/899> Twitter: twitter.com/#!/StuartWShulman<http://twitter.com/#%21/StuartWShulman>

Reply

Sign in to reply online Use email software

Jarkko Moilanen

8:40 a.m.

hi, Quoting Stuart Shulman <stuart.shulman@gmail.com>:

WORDPRESS has a feature for this:

http://en.blog.wordpress.com/2006/06/12/xml-import-export/

If it is a WORDPRESS blog, you can ask the owner to create a bulk export in XML.

If you are archiving blog that you don't have access to export functions, I would use 'wget'. It contains features to get everything, no matter how deep the structure is. http://en.wikipedia.org/wiki/Wget /Jarkko

Better still is the new offering from GNIP:

http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-ava...

The future is bright for getting big collections.

~Stu

On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy@yahoo.com> wrote:

...
I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way?

Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

--

Dr. Stuart W. Shulman people.umass.edu/stu

Editor Emeritus, JITP jitp.net <http://www.jitp.net>

Director, QDAP-UMass umass.edu/qdap <http://www.umass.edu/qdap>

Founder and CEO, Texifter texifter.com <http://www.texifter.com>

LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<http://www.linkedin.com/pub/stuart-shulman/10/351/899> Twitter: twitter.com/#!/StuartWShulman<http://twitter.com/#%21/StuartWShulman> _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

**************************** Jarkko Moilanen (+358 45 8877 150) M.Soc.Sc. (Political Science) PhD Student, Information studies, University of Tampere Blog: http://blog.ossoil.com/ ------------------------- Founder of Hackerspace 5w, Finland, Tampere - http://5w.fi Founder of MeeGo Network Finland, http://meegonetwork.fi Founder of Open Coral - http://open-coral.org Founder of Finnish Biohacker community, http://biohakkeri.fi ****************************

Reply

Sign in to reply online Use email software

Wendy Christensen

2:05 p.m.

Hi All, I used SiteSucker to download entire blogs (http://www.sitesucker.us/home.html). It worked very well for both blogspot and wordpress blogs, but excess files had to be cleaned up and deleted before analysis. I ran into issues trying to find content analysis software that would allow me to code html files. If anyone has suggestions for software for qualitative analysis of websites and/or downloaded html files, I'd love to hear about it! Best, Wendy Wendy M. Christensen, Ph.D. Visiting Assistant Professor Department of Sociology and Anthropology Bowdoin College wchriste@bowdoin.edu<mailto:wchriste@bowdoin.edu> On Jan 20, 2012, at 3:40 AM, Jarkko Moilanen wrote: hi, Quoting Stuart Shulman <stuart.shulman@gmail.com<mailto:stuart.shulman@gmail.com>>: WORDPRESS has a feature for this: http://en.blog.wordpress.com/2006/06/12/xml-import-export/ If it is a WORDPRESS blog, you can ask the owner to create a bulk export in XML. If you are archiving blog that you don't have access to export functions, I would use 'wget'. It contains features to get everything, no matter how deep the structure is. http://en.wikipedia.org/wiki/Wget /Jarkko Better still is the new offering from GNIP: http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-ava... The future is bright for getting big collections. ~Stu On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy@yahoo.com> wrote: I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way? Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/ -- Dr. Stuart W. Shulman people.umass.edu/stu Editor Emeritus, JITP jitp.net <http://www.jitp.net> Director, QDAP-UMass umass.edu/qdap <http://www.umass.edu/qdap> Founder and CEO, Texifter texifter.com <http://www.texifter.com> LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<http://www.linkedin.com/pub/stuart-shulman/10/351/899> Twitter: twitter.com/#!/StuartWShulman<http://twitter.com/#%21/StuartWShulman> _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/ **************************** Jarkko Moilanen (+358 45 8877 150) M.Soc.Sc. (Political Science) PhD Student, Information studies, University of Tampere Blog: http://blog.ossoil.com/ ------------------------- Founder of Hackerspace 5w, Finland, Tampere - http://5w.fi Founder of MeeGo Network Finland, http://meegonetwork.fi Founder of Open Coral - http://open-coral.org Founder of Finnish Biohacker community, http://biohakkeri.fi **************************** _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/

Reply

Sign in to reply online Use email software

יוחנן ועקנין

2:13 p.m.

*Hi All, * I use Web content extractor from Newprosoft ( http://www.newprosoft.com/web-content-extractor.htm). The good thing is that I can extract specific fields (like date, name, user or all the post), add some manipulations thru javascript code and export all the dataset to Excel. *Regards, Yohanan Ouaknine MA student, Information studies (Knowledge management), Bar Ilan University (Israel) * On Fri, Jan 20, 2012 at 6:05 AM, Wendy Christensen <wchriste@bowdoin.edu> wrote:

Hi All,

I used SiteSucker to download entire blogs (

http://www.sitesucker.us/home.html). It worked very well for both blogspot and wordpress blogs, but excess files had to be cleaned up and deleted before analysis.

I ran into issues trying to find content analysis software that would

allow me to code html files. If anyone has suggestions for software for qualitative analysis of websites and/or downloaded html files, I'd love to hear about it!

Best, Wendy

Wendy M. Christensen, Ph.D.

Visiting Assistant Professor Department of Sociology and Anthropology Bowdoin College wchriste@bowdoin.edu<mailto:wchriste@bowdoin.edu>

On Jan 20, 2012, at 3:40 AM, Jarkko Moilanen wrote:

hi,

Quoting Stuart Shulman <stuart.shulman@gmail.com<mailto:

stuart.shulman@gmail.com>>:

WORDPRESS has a feature for this:

http://en.blog.wordpress.com/2006/06/12/xml-import-export/

If it is a WORDPRESS blog, you can ask the owner to create a bulk export

in

XML.

If you are archiving blog that you don't have access to export functions, I would use 'wget'. It contains features to get everything, no matter how deep the structure is.

http://en.wikipedia.org/wiki/Wget

/Jarkko

Better still is the new offering from GNIP:

http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-ava...

The future is bright for getting big collections.

~Stu

On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy@yahoo.com> wrote:

I would like to be able to archive an entire blog (and ideally be able to download it) for analysis. I've looked at WebCite and Zotero but neither seem to have this capability. Does anyone know of another way?

Collette Sosnowy M.A., Ph.D. Candidate Environmental Psychology Program The Graduate Center of the City University of New York _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

--

Dr. Stuart W. Shulman people.umass.edu/stu

Editor Emeritus, JITP jitp.net <http://www.jitp.net>

Director, QDAP-UMass umass.edu/qdap <http://www.umass.edu/qdap>

Founder and CEO, Texifter texifter.com <http://www.texifter.com>

LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<

http://www.linkedin.com/pub/stuart-shulman/10/351/899>

Twitter: twitter.com/#!/StuartWShulman< http://twitter.com/#%21/StuartWShulman> _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

**************************** Jarkko Moilanen (+358 45 8877 150) M.Soc.Sc. (Political Science) PhD Student, Information studies, University of Tampere Blog: http://blog.ossoil.com/ ------------------------- Founder of Hackerspace 5w, Finland, Tampere - http://5w.fi Founder of MeeGo Network Finland, http://meegonetwork.fi Founder of Open Coral - http://open-coral.org Founder of Finnish Biohacker community, http://biohakkeri.fi **************************** _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- יוחנן ועקנין Yohanan Ouaknine 050-6279777 yohanan.ouaknine@ois.co.il http://il.linkedin.com/in/yohananouaknine See who we know in common

Reply

Sign in to reply online Use email software

Taylor, Nicholas A.

23 Jan 23 Jan

2:30 p.m.

Hello Collette, the open source website copier HTTrack (http://www.httrack.com/) is a good option for archiving websites, though it doesn't do much in the way of analysis. It saves downloaded pages as static, linked HTML files, which you can then traverse with your browser or easily access with other analytical tools. ~Nicholas Nicholas Taylor | Information Technology Specialist | Library of Congress Web Archiving<http://www.loc.gov/webarchiving/> Phone: (202) 707-3940 | E-mail: ntay@loc.gov<mailto:ntay@loc.gov> | Twitter: @nullhandle<http://twitter.com/nullhandle>

Reply

Sign in to reply online Use email software

"Rußmann, Uta"

2:41 p.m.

Hi, I always use the Offline Explorer Pro (MetaProducts Offline Explorer Pro) to archive websites or blogs. It's not for free as HTTrack, but it's giving you more options. The license fee is about 75,00 Euro. ~Uta -------------------------------------------------- Dr. Uta Russmann Post-Doc Researcher AUTNES - Innsbruck Austrian National Election Study Institut für Politikwissenschaft UNIVERSITÄT INNSBRUCK ICT-Technologiepark, 1.OG Nord Technikerstraße 21 a, A-6020 Innsbruck Telefon +43 (0) 512 / 507 - 38208 Fax +43 (0) 512 / 507 - 38299 E-Mail Uta.Russmann@uibk.ac.at -----Ursprüngliche Nachricht----- Von: air-l-bounces@listserv.aoir.org [mailto:air-l-bounces@listserv.aoir.org] Im Auftrag von Taylor, Nicholas A. Gesendet: Montag, 23. Jänner 2012 15:30 An: 'Collette Sosnowy' Cc: air-l@listserv.aoir.org Betreff: Re: [Air-L] archiving entire blogs Hello Collette, the open source website copier HTTrack (http://www.httrack.com/) is a good option for archiving websites, though it doesn't do much in the way of analysis. It saves downloaded pages as static, linked HTML files, which you can then traverse with your browser or easily access with other analytical tools. ~Nicholas Nicholas Taylor | Information Technology Specialist | Library of Congress Web Archiving<http://www.loc.gov/webarchiving/> Phone: (202) 707-3940 | E-mail: ntay@loc.gov<mailto:ntay@loc.gov> | Twitter: @nullhandle<http://twitter.com/nullhandle> _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/

Reply

Sign in to reply online Use email software

5274

Age (days ago)

5277

Last active (days ago)

Download

6 comments

7 participants

tags

participants (7)

"Rußmann, Uta"
C Sosnowy
Jarkko Moilanen
Stuart Shulman
Taylor, Nicholas A.
Wendy Christensen
יוחנן ועקנין