While there are a few options, mainly using R packages, to get data from Chinese social media such as Weibo (its API is similar to Twitter as you would expect), if your intention is to do text mining on the content, the latter is great pain in the neck. I have been battling on this for a while but in short: - You need a stemmer/tokenizer that works with Mandarin. The best was ICTCLAS developed by the Chinese Academy of Science. There are still a few open source versions circulating online but the most updated one has become proprietary. - Once you have the corpus pre-processed than you can use NLP packages for topic modelling but there are some segmentation issues to take care of. You can't really use out of the box solutions such as wordstat or the like because they just produce nonsense. Overall, not a walk in the park for sure. Good luck, GV On Tue, 2017-05-09 at 16:58 +0100, Helen Kennedy wrote:
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media data mining tools that work on Chinese social media platforms / with Chinese languages?
Thanks!
Helen
-- ---------------- Giuseppe A. Veltri work email: giuseppe.veltri@unitn.it Twitter: @gaveltri ResearchGate