Chinese language social media data mining tools
Hello clever AOIR folks Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages? Thanks! Helen -- Professor Helen Kennedy, Chair in Digital Society Department of Sociological Studies / Faculty of Social Sciences Elmfield, Northumberland Road Sheffield S10 2TU T: 0114 2226488 E: h.kennedy@sheffield.ac.uk LATEST ARTICLE: *'*The Feeling of Numbers: emotions in everyday engagements with data and their visualisation <http://journals.sagepub.com/doi/abs/10.1177/0038038516674675?journalCode=soca>', *Sociology*, 2017.
Richard Rogers 2015 book *Digital Methods* (MIT Press) may have some suggestions re Chinese digital studies. https://www.amazon.com/Digital-Methods-Press-Richard-Rogers/dp/026252824X/re... On Tue, May 9, 2017 at 11:58 AM, Helen Kennedy <h.kennedy@sheffield.ac.uk> wrote:
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages?
Thanks!
Helen
-- Professor Helen Kennedy, Chair in Digital Society Department of Sociological Studies / Faculty of Social Sciences Elmfield, Northumberland Road Sheffield S10 2TU T: 0114 2226488 E: h.kennedy@sheffield.ac.uk
LATEST ARTICLE: *'*The Feeling of Numbers: emotions in everyday engagements with data and their visualisation <http://journals.sagepub.com/doi/abs/10.1177/0038038516674675?journalCode= soca>', *Sociology*, 2017. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/ listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
CALL FOR PARTICIPATION ==================================================================================== 1st LAGOS SUMMER SCHOOL IN DIGITAL HUMANITIES (An International Workshop in Digital Humanities & Digital Scholarship) July 10 – 15, 2017 University of Lagos, Lagos, NigeriaTheme: THE HUMANITIES IN THE AGE OF TECHNOLOGY: A MULTIDISCIPLINARY EXPLORATION OF DIGITAL TOOLS FOR INNOVATION AND DEVELOPMENT The Digital Humanities Research Unit, University of Lagos, Nigeria, in collaboration with Global Outlook::Digital Humanities (GO::DH) & The Force11 Future Commons Working Group, is pleased to announce the programme of the 1st Lagos Summer School in Digital Humanities (LSSDH-2017) which will take place at the University of Lagos, Lagos, Nigeria from July 10-15, 2017. LSSDH-2017 is organised to raise and train a new generation of scholars and researchers in the Humanities in the Global South: that is equipped with digital tools to conduct cutting-edge research and compete on the global stage. It also creates a multidisciplinary forum for debates, conversation and discussion on the role of digital technologies in research, society and development. This 2017 International Workshop in Digital Humanities & Digital Scholarship focuses primarily on approaches, methods and tools in Digital Humanities. With renowned experts and scholars drawn from Canada, USA and Germany, it offers 10 workshop streams designed for doctoral students, junior scientists and early career scholars and researchers. It is primarily designed to train and equip beginners in Digital Humanities, that is, participants without a prior background in the new field. It also includes lectures, project presentations and roundtable discussions. Workshop attendees will also have the opportunity to present their own research and discuss with colleagues and experts how to utilise different kinds of digital approaches and tools for their research projects. They will also learn some basic principles in digital scholarship (scholarly writing). Although priority will be given to doctoral students and junior scholars, the Lagos Summer School in Digital Humanities equally offers training to anyone willing to explore a range of opportunities in Digital Humanities, including academics at all career stages, postgraduate students, project managers, and people who work in IT, libraries, and cultural studies. Workshop Topics - A Basic Introduction to Digital Humanities: Concepts and Tools - Text Analysis and Lexical Computing: Exploring Digital Tools and Methods - Digital Humanities Project Ideation and Development in Africa - Basic Programming for the Humanities - The Scholarly Commons: Principles and Practices in Digital Scholarship Public Lectures The Summer Institute provides the space for a lecture to be delivered every day by experienced scholars and experts. The lecture is directed not only at the participants of the Summer School but also at other academics, students and the interested public.These lectures are designed to provide a theoretical and practical framework for the exploration and discussion of the relevance and benefits of Digital Humanities. The lectures will also present possible windows for collaborations between the Sciences and the humanities for innovation and development. Project Presentations Doctoral students, young scholars and researchers are encouraged to present their research projects or any ongoing projects related to the theme(s) of the Summer School/ Digital Humanities. The session also offers a great opportunity for software developers and corporate organisations to present their products, software, and applications that are relevant to the academic, research and policy communities.This session is also aimed at demonstrating some of the possibilities of the implementation of technology-based research in the Humanities within the larger academic and social contexts. Roundtable Discussion The Summer School will also provide a platform for scholars within the intersecting disciplines and corporate organisations, i.e. Humanities, Engineering, Computer Science as well as ICT-based and Telecommunication companies to discuss important topics and collaborative projects in Digital Humanities. How to apply In order to guarantee a space, interested participants should send a brief personal statement and CV to dhunilag@gmail.com or apply on the Registration page. Applications will be accepted until 31st May, 2017. Please note that the available limited slots will be allocated on a first-come-first-served basis by the LOC. It is also compulsory that you bring your own device (e.g. Laptop, Tablet) when coming for the workshops.For more information, email the convener, Tunde Opeibi, ( dhunilag@gmail.com; bopeibi@unilag.edu.ng ) or Ayodeji Adedara (adedejiadedara@gmail.com ) or Makanjuola Ogunleye( dhunilag@gmail.com; lssdh.unilag@gmail.com ) . Keynote Speaker & Lead Workshop FacilitatorProfessor Dan O’DonnellDepartment of English and University Library University of Lethbridge, CanadaFounding Chair, Global Outlook::Digital Humanities (GO::DH)Vice President, Force 11 Other Confirmed International Guest Speakers/Workshop Facilitators - Professor Ronald J. Stephens– Purdue University, Indianapolis - Dr Presley Ifukor– University of Muenster, Germany/Government of Alberta, Canada - Dr Paige Morgan – University of Miami, Florida Lead Paper Presenters - Professor Segun Awonusi– University of Lagos - Professor Duro Oni– University of Lagos - Professor Rotimi Taiwo – Obafemi Awolowo University, Ile-Ife - Professor Olusoji Ilori– Faculty of Science, University of Lagos Further enquiries are available at: www.dhunilag.com Tunde Opeibi, PhD, Convener of the Lagos Summer School Digital Humanities
I don't aware of any ready to use data mining tools. Probably you need to develop the tool yourself. For one project (e.g. Weiboscope [1]), we need to gather data from the API first and then do the data analysis. The tricky part about the Chinese language is that it is not space delimited and therefore one cannot tokenize a sentence into words as in the case of English (or other space delimited languages such as French or German.) It can be solved partially using text segmenters such as Stanford NLP toolkits or Jieba. [1] Fu, Chan, Chau. Assessing Censorship on Microblogs in China. https://hub.hku.hk/bitstream/10722/183851/1/content.pdf?accept=1 On Tue, May 9, 2017 at 11:58 PM, Helen Kennedy <h.kennedy@sheffield.ac.uk> wrote:
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages?
Thanks!
Helen
-- Professor Helen Kennedy, Chair in Digital Society Department of Sociological Studies / Faculty of Social Sciences Elmfield, Northumberland Road Sheffield S10 2TU T: 0114 2226488 E: h.kennedy@sheffield.ac.uk
LATEST ARTICLE: *'*The Feeling of Numbers: emotions in everyday engagements with data and their visualisation <http://journals.sagepub.com/doi/abs/10.1177/0038038516674675?journalCode=soca>', *Sociology*, 2017. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
While there are a few options, mainly using R packages, to get data from Chinese social media such as Weibo (its API is similar to Twitter as you would expect), if your intention is to do text mining on the content, the latter is great pain in the neck. I have been battling on this for a while but in short: - You need a stemmer/tokenizer that works with Mandarin. The best was ICTCLAS developed by the Chinese Academy of Science. There are still a few open source versions circulating online but the most updated one has become proprietary. - Once you have the corpus pre-processed than you can use NLP packages for topic modelling but there are some segmentation issues to take care of. You can't really use out of the box solutions such as wordstat or the like because they just produce nonsense. Overall, not a walk in the park for sure. Good luck, GV On Tue, 2017-05-09 at 16:58 +0100, Helen Kennedy wrote:
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages?
Thanks!
Helen
-- ---------------- Giuseppe A. Veltri work email: giuseppe.veltri@unitn.it Twitter: @gaveltri ResearchGate
It may be worth looking at: https://api.anacode.de/landing/ Best, S On 9 May 2017 at 16:58, Helen Kennedy <h.kennedy@sheffield.ac.uk> wrote:
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages?
Thanks!
Helen
-- Professor Helen Kennedy, Chair in Digital Society Department of Sociological Studies / Faculty of Social Sciences Elmfield, Northumberland Road Sheffield S10 2TU T: 0114 2226488 E: h.kennedy@sheffield.ac.uk
LATEST ARTICLE: *'*The Feeling of Numbers: emotions in everyday engagements with data and their visualisation <http://journals.sagepub.com/doi/abs/10.1177/0038038516674675?journalCode= soca>', *Sociology*, 2017. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/ listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Stefania Vicari Senior Lecturer in Digital Sociology Programme Manager for the MA Digital Media and Society Department of Sociological Studies The University of Sheffield Elmfield, Northumberland Road Sheffield S10 2TU Webpage: http://www.sheffield.ac.uk/socstudies/staff/staff-profiles/stefania-vicari Email: s.vicari@sheffield.ac.uk Twitter: @stefaniavicari <https://twitter.com/stefaniavicari> Recent paper: Vicari, S. & Cappai, F. (2016) Health Activism and the Logic of Connective Action <http://www.tandfonline.com/doi/full/10.1080/1369118X.2016.1154587>. *Information, Communication & Society* 19(11): 1653-1671.
As part of my PhD, I did a lot of research based on data collected from both Weibo and Twitter. Finding few existing, functional tools, I wrote custom python codes to download and process various sorts of data from both Twitter and Weibo, including a code to tokenize weibo posts. Seeing this thread brings up an issue I have been thinking about in terms of how the community of Internet researchers work with code. Other academics I know who work in sciences share all their codes online (git hub etc.), have a practice of working together to debug this code and receive academic credit when their codes are used by others. I’ve seen very little of this in social science research. Are there any Internet researchers who share code they have created who could advise as to what their practices are in this regard? Is there any sort of standard among Internet researchers (and should there be) in terms of sharing code created for academic purposes with other academics? Gillian Bolsover Researcher Oxford Internet Institute University of Oxford PGP Key: 17EC60B3 ________________________________________ De : Air-L [air-l-bounces@listserv.aoir.org] de la part de Stefania Vicari [s.vicari@sheffield.ac.uk] Envoyé : mardi 9 mai 2017 19:51 À : Helen Kennedy Cc : air-l@listserv.aoir.org Objet : Re: [Air-L] Chinese language social media data mining tools It may be worth looking at: https://api.anacode.de/landing/ Best, S On 9 May 2017 at 16:58, Helen Kennedy <h.kennedy@sheffield.ac.uk> wrote:
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages?
Thanks!
Helen
-- Professor Helen Kennedy, Chair in Digital Society Department of Sociological Studies / Faculty of Social Sciences Elmfield, Northumberland Road Sheffield S10 2TU T: 0114 2226488 E: h.kennedy@sheffield.ac.uk
LATEST ARTICLE: *'*The Feeling of Numbers: emotions in everyday engagements with data and their visualisation <http://journals.sagepub.com/doi/abs/10.1177/0038038516674675?journalCode= soca>', *Sociology*, 2017. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/ listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Stefania Vicari Senior Lecturer in Digital Sociology Programme Manager for the MA Digital Media and Society Department of Sociological Studies The University of Sheffield Elmfield, Northumberland Road Sheffield S10 2TU Webpage: http://www.sheffield.ac.uk/socstudies/staff/staff-profiles/stefania-vicari Email: s.vicari@sheffield.ac.uk Twitter: @stefaniavicari <https://twitter.com/stefaniavicari> Recent paper: Vicari, S. & Cappai, F. (2016) Health Activism and the Logic of Connective Action <http://www.tandfonline.com/doi/full/10.1080/1369118X.2016.1154587>. *Information, Communication & Society* 19(11): 1653-1671. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
Hi Gillian Sorry to hijack the conversation a bit, but just to say that this is VERY interesting to a separate network/mailing list that I administrate called PaSS (Programming-as-Social-Science): https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=PASS In my experience, for the most part, there are large communities of social science researchers who do internet/digital-type research but have been slow to uptake actual coding skills for various reasons. So the idea of PaSS has been to kickstart discussions around precisely these topics, as an interdisciplinary network for researchers interested in programming both as a research device and an object of study, particularly around the methodological innovations happening through social science usages of digital data. All of this is to say that anyone who is interested in exploring these issues further, please do feel free to subscribe to PaSS via the link above (and Gillian, this might provide some added bonus material to what is going on within this thread, if you send an email out there too?)! Best Phil Brooker ________________________________________ From: Air-L <air-l-bounces@listserv.aoir.org> on behalf of Gillian Bolsover <gillian.bolsover@oii.ox.ac.uk> Sent: 10 May 2017 14:33 To: Stefania Vicari; Helen Kennedy Cc: air-l@listserv.aoir.org Subject: Re: [Air-L] Chinese language social media data mining tools As part of my PhD, I did a lot of research based on data collected from both Weibo and Twitter. Finding few existing, functional tools, I wrote custom python codes to download and process various sorts of data from both Twitter and Weibo, including a code to tokenize weibo posts. Seeing this thread brings up an issue I have been thinking about in terms of how the community of Internet researchers work with code. Other academics I know who work in sciences share all their codes online (git hub etc.), have a practice of working together to debug this code and receive academic credit when their codes are used by others. I’ve seen very little of this in social science research. Are there any Internet researchers who share code they have created who could advise as to what their practices are in this regard? Is there any sort of standard among Internet researchers (and should there be) in terms of sharing code created for academic purposes with other academics? Gillian Bolsover Researcher Oxford Internet Institute University of Oxford PGP Key: 17EC60B3 ________________________________________ De : Air-L [air-l-bounces@listserv.aoir.org] de la part de Stefania Vicari [s.vicari@sheffield.ac.uk] Envoyé : mardi 9 mai 2017 19:51 À : Helen Kennedy Cc : air-l@listserv.aoir.org Objet : Re: [Air-L] Chinese language social media data mining tools It may be worth looking at: https://api.anacode.de/landing/ Best, S On 9 May 2017 at 16:58, Helen Kennedy <h.kennedy@sheffield.ac.uk> wrote:
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages?
Thanks!
Helen
-- Professor Helen Kennedy, Chair in Digital Society Department of Sociological Studies / Faculty of Social Sciences Elmfield, Northumberland Road Sheffield S10 2TU T: 0114 2226488 E: h.kennedy@sheffield.ac.uk
LATEST ARTICLE: *'*The Feeling of Numbers: emotions in everyday engagements with data and their visualisation <http://journals.sagepub.com/doi/abs/10.1177/0038038516674675?journalCode= soca>', *Sociology*, 2017. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/ listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Stefania Vicari Senior Lecturer in Digital Sociology Programme Manager for the MA Digital Media and Society Department of Sociological Studies The University of Sheffield Elmfield, Northumberland Road Sheffield S10 2TU Webpage: http://www.sheffield.ac.uk/socstudies/staff/staff-profiles/stefania-vicari Email: s.vicari@sheffield.ac.uk Twitter: @stefaniavicari <https://twitter.com/stefaniavicari> Recent paper: Vicari, S. & Cappai, F. (2016) Health Activism and the Logic of Connective Action <http://www.tandfonline.com/doi/full/10.1080/1369118X.2016.1154587>. *Information, Communication & Society* 19(11): 1653-1671. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
I can imagine why it's hard to find good data mining tool in social science. One reason is the sampling issue. Sample representativeness is essential to most social science research (like probability-based phone survey or multi-stage stratified sampling). While sampling scheme varies across studies, customized code is often preferred. We did random sampling on Weibo and Twitter. The methodology is provided in the following two PLOS ONE papers. Hope this helps. Fu, KW, Chau M (2013) Reality Check for the Chinese Microblog Space: A Random Sampling Approach. PLoS ONE 8(3): e58356. doi:10.1371/journal.pone.0058356 Liang, H, & Fu, KW. (2015). Testing Propositions Derived from Twitter Studies: Generalization and Replication in Computational Social Science. PLoS ONE, 10(8), e0134270.doi:10.1371/journal.pone.0134270 King-wa Fu Associate Professor, Journalism and Media Studies Centre, The University of Hong Kong Visiting Associate Professor 2016-2017, MIT Media Lab (Fulbright Scholar) website: https://sites.google.com/site/fukingwa/ -----Original Message----- From: Air-L [mailto:air-l-bounces@listserv.aoir.org] On Behalf Of Gillian Bolsover Sent: Wednesday, May 10, 2017 9:33 PM To: Stefania Vicari <s.vicari@sheffield.ac.uk>; Helen Kennedy <h.kennedy@sheffield.ac.uk> Cc: air-l@listserv.aoir.org Subject: Re: [Air-L] Chinese language social media data mining tools As part of my PhD, I did a lot of research based on data collected from both Weibo and Twitter. Finding few existing, functional tools, I wrote custom python codes to download and process various sorts of data from both Twitter and Weibo, including a code to tokenize weibo posts. Seeing this thread brings up an issue I have been thinking about in terms of how the community of Internet researchers work with code. Other academics I know who work in sciences share all their codes online (git hub etc.), have a practice of working together to debug this code and receive academic credit when their codes are used by others. I’ve seen very little of this in social science research. Are there any Internet researchers who share code they have created who could advise as to what their practices are in this regard? Is there any sort of standard among Internet researchers (and should there be) in terms of sharing code created for academic purposes with other academics? Gillian Bolsover Researcher Oxford Internet Institute University of Oxford PGP Key: 17EC60B3 ________________________________________ De : Air-L [air-l-bounces@listserv.aoir.org] de la part de Stefania Vicari [s.vicari@sheffield.ac.uk] Envoyé : mardi 9 mai 2017 19:51 À : Helen Kennedy Cc : air-l@listserv.aoir.org Objet : Re: [Air-L] Chinese language social media data mining tools It may be worth looking at: https://api.anacode.de/landing/ Best, S On 9 May 2017 at 16:58, Helen Kennedy <h.kennedy@sheffield.ac.uk> wrote:
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages?
Thanks!
Helen
-- Professor Helen Kennedy, Chair in Digital Society Department of Sociological Studies / Faculty of Social Sciences Elmfield, Northumberland Road Sheffield S10 2TU T: 0114 2226488 E: h.kennedy@sheffield.ac.uk
LATEST ARTICLE: *'*The Feeling of Numbers: emotions in everyday engagements with data and their visualisation <http://journals.sagepub.com/doi/abs/10.1177/0038038516674675?journalC ode= soca>', *Sociology*, 2017. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/ listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Stefania Vicari Senior Lecturer in Digital Sociology Programme Manager for the MA Digital Media and Society Department of Sociological Studies The University of Sheffield Elmfield, Northumberland Road Sheffield S10 2TU Webpage: http://www.sheffield.ac.uk/socstudies/staff/staff-profiles/stefania-vicari Email: s.vicari@sheffield.ac.uk Twitter: @stefaniavicari <https://twitter.com/stefaniavicari> Recent paper: Vicari, S. & Cappai, F. (2016) Health Activism and the Logic of Connective Action <http://www.tandfonline.com/doi/full/10.1080/1369118X.2016.1154587>. *Information, Communication & Society* 19(11): 1653-1671. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
participants (9)
-
C.H. -
Gillian Bolsover -
Giuseppe A. Veltri -
Helen Kennedy -
kwfu -
Phillip Brooker -
Presley Ifukor -
Stefania Vicari -
Thomas Ball