Re: [Air-L] Twitter data collection tools
you could look at R's twitterR package. See https://cran.r-project.org/web/packages/twitteR/twitteR.pdf But then you would need to know how to use R. Not that difficult but yet a new program. One benefit: you have it directly ready for analysis Hope that helps Maurice On Wed, Nov 11, 2015 at 3:59 AM, Gohar F. Khan <gohar.feroz@gmail.com> wrote:
Hello list members:
I am looking for tools which can help extract all possible Twitter statistics (such as, number of tweets, followers, followings, mentions, re-tweets, favorites) for a list of Twitter handlers (around 120 accounts). *In particular, I look for a tool that can take the IDs as a single file and provide the desired statistics for each ID. *
The Webometrics Analyst has this functionality, but unfortunately it only provides followers and followings data. I am also familiar with the several other tools including the ones mentioned in the Dean Freelon's curated list < https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqD...
, but non of these can extract all the information I need. Some tools provide more statistics, but they work with one ID at time.
I will greatly appreciate any suggestions.
Thank you,
--
Gohar Feroz Khan, PhD
Adjunct Faculty & Research Adviser Korea Advance Institute of Science & Technology (KAIST) Global Information and Telecommunication Technology Program (ITTP) 291 Daehak-ro, Yuseong-gu, Daejeon, South Korea.
------------------------------------------------------------------------------ Check out my new book on social media analytics <http://7layersanalytics.com/>! ------------------------------------------------------------ -------------------- Please consider submitting your work to the social media analytics track at PACIS201 <http://www.pacis2016.org/Page/Index/71>6. ------------------------------------------------------------ -------------------- Social Identities: || Blog <http://gfkhan.wordpress.com> || Twitter <https://twitter.com/gfkhan> || LinkedIn <https://www.linkedin.com/pub/gohar-feroz-khan/7/62b/42> || Research Centre <http://centreforsocialtech.com/>|| _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- ________________________________________________ Maurice Vergeer To contact me, see http://mauricevergeer.nl/node/5 To see my publications, see http://mauricevergeer.nl/node/1 ________________________________________________
In addition to looking at tools, you might want to consider looking at the Twitter API documentation to get a sense for what data and metadata are made available by Twitter vs. what data you think you need. Here, for example, is the list of all fields related to tweets: https://dev.twitter.com/overview/api/tweets; users: https://dev.twitter.com/overview/api/users; and "entities in objects," such as hashtags: https://dev.twitter.com/overview/api/entities-in-twitter-objects. I believe there may also be some limits placed on how far back in time you can request data via the API. So, any tool that provides Twitter data, unless it's doing something really unique, is going to be limited by what's available in these field lists and by the constraint of when it can or did start pulling data for the accounts you want to analyze. As others have mentioned, then, there's no way to use the API to say, e.g., "give me all tweets ever by user X". Instead, you (or your tool/service) would have to make repeated requests until you felt you had obtained all tweets, possibly on a go-forward basis from whenever the time cutoff is. To actually do this, I can second the suggestion for twitteR; using e.g. this tutorial, you can pretty easily pull anything the API provides, in batch, and accumulate what you need over time in a native R data structure (or just write it to CSV, etc.): http://www.r-bloggers.com/getting-started-with-twitter-in-r/. With your input of IDs, you would just have to loop through it in R. You may also want to check out the Twitter support in such generic data mining/processing tools as RapidMiner ( http://docs.rapidminer.com/studio/how-to/cloud-connectivity/twitter.html), Talend (http://www.datalytyx.com/twitter-sentiment-analysis-using-talend/), KNIME (https://www.knime.org/blog/knime-twitter-nodes), or Pentaho ( http://www.patlaf.com/query-twitter-api-with-pentaho-pdi-kettle/). These are generic data integration/extraction/manipulation/analysis tools designed to help build data flows visually and in batch. R wins, as it often does, for simplicity and control reasons, but because these other tools are more visual and are designed specifically for batch processing, they may also be worth exploring. Cheers, Cory Salveson http://corysalveson.com | @argotechnica <https://twitter.com/argotechnica> On Wed, Nov 11, 2015 at 12:35 AM, Maurice Vergeer <m.vergeer@maw.ru.nl> wrote:
you could look at R's twitterR package. See https://cran.r-project.org/web/packages/twitteR/twitteR.pdf But then you would need to know how to use R. Not that difficult but yet a new program. One benefit: you have it directly ready for analysis Hope that helps Maurice
On Wed, Nov 11, 2015 at 3:59 AM, Gohar F. Khan <gohar.feroz@gmail.com> wrote:
Hello list members:
I am looking for tools which can help extract all possible Twitter statistics (such as, number of tweets, followers, followings, mentions, re-tweets, favorites) for a list of Twitter handlers (around 120 accounts). *In particular, I look for a tool that can take the IDs as a single file and provide the desired statistics for each ID. *
The Webometrics Analyst has this functionality, but unfortunately it only provides followers and followings data. I am also familiar with the several other tools including the ones mentioned in the Dean Freelon's curated list <
https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqD...
, but non of these can extract all the information I need. Some tools provide more statistics, but they work with one ID at time.
I will greatly appreciate any suggestions.
Thank you,
--
Gohar Feroz Khan, PhD
Adjunct Faculty & Research Adviser Korea Advance Institute of Science & Technology (KAIST) Global Information and Telecommunication Technology Program (ITTP) 291 Daehak-ro, Yuseong-gu, Daejeon, South Korea.
------------------------------------------------------------------------------
Check out my new book on social media analytics <http://7layersanalytics.com/>! ------------------------------------------------------------ -------------------- Please consider submitting your work to the social media analytics track at PACIS201 <http://www.pacis2016.org/Page/Index/71>6. ------------------------------------------------------------ -------------------- Social Identities: || Blog <http://gfkhan.wordpress.com> || Twitter <https://twitter.com/gfkhan> || LinkedIn <https://www.linkedin.com/pub/gohar-feroz-khan/7/62b/42> || Research Centre <http://centreforsocialtech.com/>|| _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- ________________________________________________ Maurice Vergeer To contact me, see http://mauricevergeer.nl/node/5 To see my publications, see http://mauricevergeer.nl/node/1 ________________________________________________ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
As others have pointed out, no matter what tool you use, Twitter data is dynamic and will change basically as soon as you collect it. It also has API limitations that are sometimes clear and sometimes note. Not being able to pull a complete timeline for users with thousands of tweets is a technical limit for sure. Some of the variables you mention, retweets for instance, can be roughly calculated from the user and tweet data the API does make available. I just added issues to my own repo about retweets and plain text parsing. Those will likely get attention next week or later. Just like any data collection, what you actually need and how to calculate it depends on what questions you want to answer. On Wed, Nov 11, 2015 at 7:49 AM, Cory Salveson <corysalveson@gmail.com> wrote:
In addition to looking at tools, you might want to consider looking at the Twitter API documentation to get a sense for what data and metadata are made available by Twitter vs. what data you think you need. Here, for example, is the list of all fields related to tweets: https://dev.twitter.com/overview/api/tweets; users: https://dev.twitter.com/overview/api/users; and "entities in objects," such as hashtags: https://dev.twitter.com/overview/api/entities-in-twitter-objects. I believe there may also be some limits placed on how far back in time you can request data via the API. So, any tool that provides Twitter data, unless it's doing something really unique, is going to be limited by what's available in these field lists and by the constraint of when it can or did start pulling data for the accounts you want to analyze.
As others have mentioned, then, there's no way to use the API to say, e.g., "give me all tweets ever by user X". Instead, you (or your tool/service) would have to make repeated requests until you felt you had obtained all tweets, possibly on a go-forward basis from whenever the time cutoff is. To actually do this, I can second the suggestion for twitteR; using e.g. this tutorial, you can pretty easily pull anything the API provides, in batch, and accumulate what you need over time in a native R data structure (or just write it to CSV, etc.): http://www.r-bloggers.com/getting-started-with-twitter-in-r/. With your input of IDs, you would just have to loop through it in R.
You may also want to check out the Twitter support in such generic data mining/processing tools as RapidMiner ( http://docs.rapidminer.com/studio/how-to/cloud-connectivity/twitter.html), Talend (http://www.datalytyx.com/twitter-sentiment-analysis-using-talend/ ), KNIME (https://www.knime.org/blog/knime-twitter-nodes), or Pentaho ( http://www.patlaf.com/query-twitter-api-with-pentaho-pdi-kettle/). These are generic data integration/extraction/manipulation/analysis tools designed to help build data flows visually and in batch. R wins, as it often does, for simplicity and control reasons, but because these other tools are more visual and are designed specifically for batch processing, they may also be worth exploring.
Cheers,
Cory Salveson http://corysalveson.com | @argotechnica <https://twitter.com/argotechnica>
On Wed, Nov 11, 2015 at 12:35 AM, Maurice Vergeer <m.vergeer@maw.ru.nl> wrote:
you could look at R's twitterR package. See https://cran.r-project.org/web/packages/twitteR/twitteR.pdf But then you would need to know how to use R. Not that difficult but yet a new program. One benefit: you have it directly ready for analysis Hope that helps Maurice
On Wed, Nov 11, 2015 at 3:59 AM, Gohar F. Khan <gohar.feroz@gmail.com> wrote:
Hello list members:
I am looking for tools which can help extract all possible Twitter statistics (such as, number of tweets, followers, followings, mentions, re-tweets, favorites) for a list of Twitter handlers (around 120 accounts). *In particular, I look for a tool that can take the IDs as a single file and provide the desired statistics for each ID. *
The Webometrics Analyst has this functionality, but unfortunately it only provides followers and followings data. I am also familiar with the several other tools including the ones mentioned in the Dean Freelon's curated list <
https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqD...
, but non of these can extract all the information I need. Some tools provide more statistics, but they work with one ID at time.
I will greatly appreciate any suggestions.
Thank you,
--
Gohar Feroz Khan, PhD
Adjunct Faculty & Research Adviser Korea Advance Institute of Science & Technology (KAIST) Global Information and Telecommunication Technology Program (ITTP) 291 Daehak-ro, Yuseong-gu, Daejeon, South Korea.
------------------------------------------------------------------------------
Check out my new book on social media analytics <http://7layersanalytics.com/>! ------------------------------------------------------------ -------------------- Please consider submitting your work to the social media analytics track at PACIS201 <http://www.pacis2016.org/Page/Index/71>6. ------------------------------------------------------------ -------------------- Social Identities: || Blog <http://gfkhan.wordpress.com> || Twitter <https://twitter.com/gfkhan> || LinkedIn <https://www.linkedin.com/pub/gohar-feroz-khan/7/62b/42> || Research Centre <http://centreforsocialtech.com/>|| _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- ________________________________________________ Maurice Vergeer To contact me, see http://mauricevergeer.nl/node/5 To see my publications, see http://mauricevergeer.nl/node/1 ________________________________________________ _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
participants (3)
-
Cory Salveson -
Libby Hemphill -
Maurice Vergeer