Re: [Air-L] Chinese language social media data mining tools

9 May 2017

      While there are a few options, mainly using R packages, to get data
from Chinese social media such as Weibo (its API is similar to Twitter
as you would expect), if your intention is to do text mining on the
content, the latter is great pain in the neck. I have been battling on
this for a while but in short:
- You need a stemmer/tokenizer that works with Mandarin. The best was
ICTCLAS developed by the Chinese Academy of Science. There are still a
few open source versions circulating online but the most updated one
has become proprietary. 
- Once you have the corpus pre-processed than you can use NLP packages
for topic modelling but there are some segmentation issues to take care
of. You can't really use out of the box solutions such as wordstat or
the like because they just produce nonsense.

Overall, not a walk in the park for sure.

Good luck,
GV
On Tue, 2017-05-09 at 16:58 +0100, Helen Kennedy wrote:
...
Hello clever AOIR folks
Asking for postgrad students: any recommendations of social media
data
mining tools that work on Chinese social media platforms / with
Chinese
languages?
Thanks!
Helen
-- 
----------------
Giuseppe A. Veltri
work email: giuseppe.veltri@unitn.it
Twitter: @gaveltri
ResearchGate