Text/Data Mining Software Suggestions: for YouTube, Facebook & Instagram?

Cristina Migliaccio

6 Nov 2020 6 Nov '20

1:59 p.m.

Dear Colleagues, Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram? I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc. Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!) Warm thanks- Cristina Migliaccio

Show replies by date

Alexandre Leroux

9 Nov 9 Nov

8:21 a.m.

Facepager for FB and YT it has a user interface and a decent documentation. There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only. On 6/11/20 14:59, Cristina Migliaccio wrote:

...

Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be

Brooke Criswell

10 Nov 10 Nov

5:22 a.m.

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping. Best try is to propose your study to a researcher at Facebook On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...

Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Bernhard Rieder

11:35 a.m.

Dear colleagues, I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager. For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere. For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/ All the best, Bernhard

...

On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L <air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Stuart Shulman

12:43 p.m.

I feel that "if the social relevance of the topic warrants it" is not an argument university human subjects review panels can use to defend techniques that break laws and violate privacy. Although I try my best to support students and faculty doing socially relevant studies, I draw a red line when it comes to knowingly violating platform rules. I have watched this debate evolve for a decade and sought carefully to engineer compliance into our tools. Meanwhile, concerned academics are writing papers about the API lockdowns as if their own disregard for laws and norms (ex., the right to be forgotten) is not one of the root causes of the lockdowns. Not the sole cause, as there are strategic financial drivers impelling lockdowns, among other legal landmines. However, if you are one of the many thousands of academics sitting on spreadsheets of social data you have never checked for deletions before doing analysis and have no plans to delete all of that data at some point, I would suggest you are failing a basic test for ethical and legal research practices. While some well-funded research groups are working hard, for example, to use the compliance stream with stored Twitter data, the vast majority appear to treat stored social data as if it were the exactly the same as a stamp collection or a scrapbook of newspaper clippings you might store forever and claim complete ownership over. It is not. By pursuing research without the same level of ethical and legal compliance routinely required for interviews and focus groups (ex., we de-identify data, destroy the recordings, and delete the transcripts after the research is complete), the anarchic world of scraping, storing, and mining the personal data of millions of people mimics the very things we like the least about the platforms. Whilst we may deem our own research to be warranted irrespective of any or all laws and norms, anyone from any perspective could use that argument to study anything using any method, no matter how invasive, insensitive, or harmful to the research subjects. After a decade on Twitter as @stuartwshulman, I recently deleted my account. Many of you reading this post may have stored some Tweets I wrote about politics, sports, and growing garlic as well as my family, dogs, and close friends. Please be advised I request that you delete 100% of my Tweets from your databases. If you can comply and actually do so, good work, as you are well ahead of the curve. If you cannot or will not, you should report to the research compliance office at your university (especially in Europe) and explain why you cannot find and delete that data for me, and every other former Twitter user like me who may not have thought of using a listserv post to flag this issue related to my right to be forgotten. Dr. Stuart ShulmanU.S. Soccer Federation C-Licensed Coach Northampton High School Boys Varsity Coach On Tue, Nov 10, 2020 at 6:44 AM Bernhard Rieder <berno.rieder@gmail.com> wrote:

...

Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L < air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Bernhard Rieder

1:21 p.m.

Dear Stuart, Thanks for you reply, but I feel that this is a bit disingenuous. I should certainly have expanded on my "if the topic warrants it", but I think that there are many ways to balance our need for further understanding of platforms and platform life with ethical research practices in ways that include scraping. The AoIR ethics guidelines already provide a nuanced guide to how to weigh different concerns - and respect for user privacy is not the only moral value that drives our decision-making. And not everything is a slippery slope that needs to be dealt with in absolute terms. I would also like to remind you that the legality of scraping is by no means as clear-cut as you present it here, neither in the US nor in Europe. Even public institutions like the Conseil Supérieur de l’Audiovisuel in France are publishing studies based on scraping. (https://www.csa.fr/Informer/Toutes-les-actualites/Actualites/Pourquoi-et-com...) With respect to the GDPR, I would recommend having a look at the recent opinion statement by the European Data Protection Supervisor (https://edps.europa.eu/sites/edp/files/publication/20-01-06_opinion_research...) which highlight the "special regime" research affords within that regulation. While my comment was certainly too quick and underdeveloped, let's not pretend that this is a clear-cut affair; and how we weigh the different considerations that come into play will necessarily involve compromise and balancing of different value concerns. Given the enormous amount of power platforms wield in our societies, and not simply through their capture of data but already by the sheer fact of mass connectivity, I find it dangerous to exclude certain research methods from the outset. All the best, Bernhard

...

On 10 Nov 2020, at 12:43, Stuart Shulman <stuart.shulman@gmail.com> wrote:

I feel that "if the social relevance of the topic warrants it" is not an argument university human subjects review panels can use to defend techniques that break laws and violate privacy. Although I try my best to support students and faculty doing socially relevant studies, I draw a red line when it comes to knowingly violating platform rules. I have watched this debate evolve for a decade and sought carefully to engineer compliance into our tools. Meanwhile, concerned academics are writing papers about the API lockdowns as if their own disregard for laws and norms (ex., the right to be forgotten) is not one of the root causes of the lockdowns. Not the sole cause, as there are strategic financial drivers impelling lockdowns, among other legal landmines. However, if you are one of the many thousands of academics sitting on spreadsheets of social data you have never checked for deletions before doing analysis and have no plans to delete all of that data at some point, I would suggest you are failing a basic test for ethical and legal research practices. While some well-funded research groups are working hard, for example, to use the compliance stream with stored Twitter data, the vast majority appear to treat stored social data as if it were the exactly the same as a stamp collection or a scrapbook of newspaper clippings you might store forever and claim complete ownership over. It is not. By pursuing research without the same level of ethical and legal compliance routinely required for interviews and focus groups (ex., we de-identify data, destroy the recordings, and delete the transcripts after the research is complete), the anarchic world of scraping, storing, and mining the personal data of millions of people mimics the very things we like the least about the platforms. Whilst we may deem our own research to be warranted irrespective of any or all laws and norms, anyone from any perspective could use that argument to study anything using any method, no matter how invasive, insensitive, or harmful to the research subjects. After a decade on Twitter as @stuartwshulman, I recently deleted my account. Many of you reading this post may have stored some Tweets I wrote about politics, sports, and growing garlic as well as my family, dogs, and close friends. Please be advised I request that you delete 100% of my Tweets from your databases. If you can comply and actually do so, good work, as you are well ahead of the curve. If you cannot or will not, you should report to the research compliance office at your university (especially in Europe) and explain why you cannot find and delete that data for me, and every other former Twitter user like me who may not have thought of using a listserv post to flag this issue related to my right to be forgotten.

Dr. Stuart Shulman

U.S. Soccer Federation C-Licensed Coach Northampton High School Boys Varsity Coach

On Tue, Nov 10, 2020 at 6:44 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L <air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Brooke Criswell

3:53 p.m.

My apologies. I was just passing along what I have been told because of privacy settings within Facebook and Instagram. I have been told specifically by Facebook there is no "legal" way to scrape comments or different things like that. Now likes and shares etc, I have no idea. So I was just passing that along. I am by no means an expert in all of the ways and was not aware of other ways like Facepager. I just know Facebook is very strict with their data especially because of the privacy policy and settings people can individually make. I have been told Facebook closed off their API except for when working in collaborations or specifically accepted to get data from their research team. Very sorry if I gave wrong information. This is just what I have learned and been told and would never want anyone to get into trouble or collect items they weren't technically supposed to. Best of luck and if you do find anything please share! Take care all. On Tue, Nov 10, 2020, 5:35 AM Bernhard Rieder <berno.rieder@gmail.com> wrote:

...

Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L < air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Bernhard Rieder

5:11 p.m.

Hi again, No need to apologize, Brooke, we are all in a situation that is marred by insecurity, opacity, and conflicting information. My apologies if my comments came off too strong, also to Stuart. With regards to Facepager, the tool made it through Facebook's app review (Jakob, I'll have to ping you sometime soon to ask how you did it), which means that its functionalities were audited by the company, giving some legal security. This does of course not eliminate ethical questions. With regards to Instagram, what baffles me is that scraping via instaloader actually works better than data retrieval via the API ever did, which means that there is some level of acquiescence. One can easily get up to 100s of 1000s of posts for a given hashtag. What Mirko is saying about university support is super important, but I also want to highlight the great work by AlgorithmWatch and colleagues Jef Ausloos, Paddy Leerssen and Pim ten Thije on legal frameworks for more robust data access, for example here: https://www.ivir.nl/publicaties/download/GoverningPlatforms_IViR_study_June2... This may be naive, but I have the hope that the upcoming EU Digital Services Act will have some provisions for academic research, or at least some clarifications. The current situation is creating serious chilling effects for research, without protecting data subjects from the most predatory practices, since scraping works so well (in technical terms) in many cases - or not at all in others. Commissioner Vestager has sent some positive signals in that direction. The reason why I am very hesitant about taking ToS as legal gospel is a) that courts have ruled otherwise when it comes to scraping and b) because I find the idea that platform companies can dictate what we are able to know about platforms, how they operate and what happens on them highly problematic and worth fighting against. Jeanette Hofmann and I have a paper on that front coming forth very soon ,-) All the best, Bernhard

...

On 10 Nov 2020, at 15:53, Brooke Criswell <bcriswell@email.fielding.edu> wrote:

My apologies. I was just passing along what I have been told because of privacy settings within Facebook and Instagram. I have been told specifically by Facebook there is no "legal" way to scrape comments or different things like that. Now likes and shares etc, I have no idea. So I was just passing that along. I am by no means an expert in all of the ways and was not aware of other ways like Facepager. I just know Facebook is very strict with their data especially because of the privacy policy and settings people can individually make. I have been told Facebook closed off their API except for when working in collaborations or specifically accepted to get data from their research team.

Very sorry if I gave wrong information. This is just what I have learned and been told and would never want anyone to get into trouble or collect items they weren't technically supposed to.

Best of luck and if you do find anything please share!

Take care all.

On Tue, Nov 10, 2020, 5:35 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L <air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Brooke Criswell

5:19 p.m.

Bernhard, Do you have any of those court cases you could send links to me? I would really love to learn more about this subject, especially as an early career researcher. And are the laws very different in Europe compared to the US? (I am in the US). This is all so interesting to me! On Tue, Nov 10, 2020, 11:11 AM Bernhard Rieder <berno.rieder@gmail.com> wrote:

...

Hi again,

No need to apologize, Brooke, we are all in a situation that is marred by insecurity, opacity, and conflicting information. My apologies if my comments came off too strong, also to Stuart.

With regards to Facepager, the tool made it through Facebook's app review (Jakob, I'll have to ping you sometime soon to ask how you did it), which means that its functionalities were audited by the company, giving some legal security. This does of course not eliminate ethical questions.

With regards to Instagram, what baffles me is that scraping via instaloader actually works better than data retrieval via the API ever did, which means that there is some level of acquiescence. One can easily get up to 100s of 1000s of posts for a given hashtag.

What Mirko is saying about university support is super important, but I also want to highlight the great work by AlgorithmWatch and colleagues Jef Ausloos, Paddy Leerssen and Pim ten Thije on legal frameworks for more robust data access, for example here: https://www.ivir.nl/publicaties/download/GoverningPlatforms_IViR_study_June2...

This may be naive, but I have the hope that the upcoming EU Digital Services Act will have some provisions for academic research, or at least some clarifications. The current situation is creating serious chilling effects for research, without protecting data subjects from the most predatory practices, since scraping works so well (in technical terms) in many cases - or not at all in others. Commissioner Vestager has sent some positive signals in that direction.

The reason why I am very hesitant about taking ToS as legal gospel is a) that courts have ruled otherwise when it comes to scraping and b) because I find the idea that platform companies can dictate what we are able to know about platforms, how they operate and what happens on them highly problematic and worth fighting against. Jeanette Hofmann and I have a paper on that front coming forth very soon ,-)

All the best, Bernhard

...
On 10 Nov 2020, at 15:53, Brooke Criswell <bcriswell@email.fielding.edu> wrote:

My apologies. I was just passing along what I have been told because of privacy settings within Facebook and Instagram. I have been told specifically by Facebook there is no "legal" way to scrape comments or different things like that. Now likes and shares etc, I have no idea. So I was just passing that along. I am by no means an expert in all of the ways and was not aware of other ways like Facepager. I just know Facebook is very strict with their data especially because of the privacy policy and settings people can individually make. I have been told Facebook closed off their API except for when working in collaborations or specifically accepted to get data from their research team.

Very sorry if I gave wrong information. This is just what I have learned and been told and would never want anyone to get into trouble or collect items they weren't technically supposed to.

Best of luck and if you do find anything please share!

Take care all.

On Tue, Nov 10, 2020, 5:35 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L < air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Bernhard Rieder

5:32 p.m.

Brooke, I am no legal expert myself in any form or function, but here is a case in the US that made the rounds some time ago: https://arstechnica.com/tech-policy/2019/09/web-scraping-doesnt-violate-anti... This may also get interesting: https://www.wsj.com/articles/facebook-seeks-shutdown-of-nyu-research-project... The problem is that the legal situation is simply not 100% clear, neither in the US, nor in Europe. Best, Bernhard

...

On 10 Nov 2020, at 17:19, Brooke Criswell <bcriswell@email.fielding.edu> wrote:

Bernhard,

Do you have any of those court cases you could send links to me? I would really love to learn more about this subject, especially as an early career researcher. And are the laws very different in Europe compared to the US? (I am in the US).

This is all so interesting to me!

On Tue, Nov 10, 2020, 11:11 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Hi again,

No need to apologize, Brooke, we are all in a situation that is marred by insecurity, opacity, and conflicting information. My apologies if my comments came off too strong, also to Stuart.

With regards to Facepager, the tool made it through Facebook's app review (Jakob, I'll have to ping you sometime soon to ask how you did it), which means that its functionalities were audited by the company, giving some legal security. This does of course not eliminate ethical questions.

With regards to Instagram, what baffles me is that scraping via instaloader actually works better than data retrieval via the API ever did, which means that there is some level of acquiescence. One can easily get up to 100s of 1000s of posts for a given hashtag.

What Mirko is saying about university support is super important, but I also want to highlight the great work by AlgorithmWatch and colleagues Jef Ausloos, Paddy Leerssen and Pim ten Thije on legal frameworks for more robust data access, for example here: https://www.ivir.nl/publicaties/download/GoverningPlatforms_IViR_study_June2...

This may be naive, but I have the hope that the upcoming EU Digital Services Act will have some provisions for academic research, or at least some clarifications. The current situation is creating serious chilling effects for research, without protecting data subjects from the most predatory practices, since scraping works so well (in technical terms) in many cases - or not at all in others. Commissioner Vestager has sent some positive signals in that direction.

The reason why I am very hesitant about taking ToS as legal gospel is a) that courts have ruled otherwise when it comes to scraping and b) because I find the idea that platform companies can dictate what we are able to know about platforms, how they operate and what happens on them highly problematic and worth fighting against. Jeanette Hofmann and I have a paper on that front coming forth very soon ,-)

All the best, Bernhard

...
On 10 Nov 2020, at 15:53, Brooke Criswell <bcriswell@email.fielding.edu> wrote:

My apologies. I was just passing along what I have been told because of privacy settings within Facebook and Instagram. I have been told specifically by Facebook there is no "legal" way to scrape comments or different things like that. Now likes and shares etc, I have no idea. So I was just passing that along. I am by no means an expert in all of the ways and was not aware of other ways like Facepager. I just know Facebook is very strict with their data especially because of the privacy policy and settings people can individually make. I have been told Facebook closed off their API except for when working in collaborations or specifically accepted to get data from their research team.

Very sorry if I gave wrong information. This is just what I have learned and been told and would never want anyone to get into trouble or collect items they weren't technically supposed to.

Best of luck and if you do find anything please share!

Take care all.

On Tue, Nov 10, 2020, 5:35 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L <air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Brooke Criswell

5:35 p.m.

That's wild to me. Thanks for the discussion, information, and links! Appreciate it all. On Tue, Nov 10, 2020, 11:32 AM Bernhard Rieder <berno.rieder@gmail.com> wrote:

...

Brooke,

I am no legal expert myself in any form or function, but here is a case in the US that made the rounds some time ago: https://arstechnica.com/tech-policy/2019/09/web-scraping-doesnt-violate-anti...

This may also get interesting: https://www.wsj.com/articles/facebook-seeks-shutdown-of-nyu-research-project...

The problem is that the legal situation is simply not 100% clear, neither in the US, nor in Europe.

Best, Bernhard

...
On 10 Nov 2020, at 17:19, Brooke Criswell <bcriswell@email.fielding.edu> wrote:

Bernhard,

Do you have any of those court cases you could send links to me? I would really love to learn more about this subject, especially as an early career researcher. And are the laws very different in Europe compared to the US? (I am in the US).

This is all so interesting to me!

On Tue, Nov 10, 2020, 11:11 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Hi again,

No need to apologize, Brooke, we are all in a situation that is marred by insecurity, opacity, and conflicting information. My apologies if my comments came off too strong, also to Stuart.

With regards to Facepager, the tool made it through Facebook's app review (Jakob, I'll have to ping you sometime soon to ask how you did it), which means that its functionalities were audited by the company, giving some legal security. This does of course not eliminate ethical questions.

With regards to Instagram, what baffles me is that scraping via instaloader actually works better than data retrieval via the API ever did, which means that there is some level of acquiescence. One can easily get up to 100s of 1000s of posts for a given hashtag.

What Mirko is saying about university support is super important, but I also want to highlight the great work by AlgorithmWatch and colleagues Jef Ausloos, Paddy Leerssen and Pim ten Thije on legal frameworks for more robust data access, for example here: https://www.ivir.nl/publicaties/download/GoverningPlatforms_IViR_study_June2...

This may be naive, but I have the hope that the upcoming EU Digital Services Act will have some provisions for academic research, or at least some clarifications. The current situation is creating serious chilling effects for research, without protecting data subjects from the most predatory practices, since scraping works so well (in technical terms) in many cases - or not at all in others. Commissioner Vestager has sent some positive signals in that direction.

The reason why I am very hesitant about taking ToS as legal gospel is a) that courts have ruled otherwise when it comes to scraping and b) because I find the idea that platform companies can dictate what we are able to know about platforms, how they operate and what happens on them highly problematic and worth fighting against. Jeanette Hofmann and I have a paper on that front coming forth very soon ,-)

All the best, Bernhard

...
On 10 Nov 2020, at 15:53, Brooke Criswell < bcriswell@email.fielding.edu> wrote:

My apologies. I was just passing along what I have been told because of privacy settings within Facebook and Instagram. I have been told specifically by Facebook there is no "legal" way to scrape comments or different things like that. Now likes and shares etc, I have no idea. So I was just passing that along. I am by no means an expert in all of the ways and was not aware of other ways like Facepager. I just know Facebook is very strict with their data especially because of the privacy policy and settings people can individually make. I have been told Facebook closed off their API except for when working in collaborations or specifically accepted to get data from their research team.

Very sorry if I gave wrong information. This is just what I have learned and been told and would never want anyone to get into trouble or collect items they weren't technically supposed to.

Best of luck and if you do find anything please share!

Take care all.

On Tue, Nov 10, 2020, 5:35 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L < air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Schaefer, M.T. (Mirko)

6:06 p.m.

Hi all, there is already some documentation about platforms trying to stifle research. The ones Bernhard mentioned and there were also very informative links in the replies to my request about "Legal challenges for researchers" on this list (28 October). Also note the case of Spotify trying to prevent the publication of the Spotify Teardown book by our colleagues in Sweden (I had no idea that Rolling Stone covered this: https://www.rollingstone.com/pro/features/spotify-teardown-book-streaming-mu...) I am still looking for examples of GDPR challenges for the kind of research that is represented on this list, and I am still very much interested to what extent universities are supporting researchers in dealing with these issues. Cheers, mirko ________________________________ From: Air-L <air-l-bounces@listserv.aoir.org> on behalf of Brooke Criswell via Air-L <air-l@listserv.aoir.org> Sent: 10 November 2020 18:35 To: Bernhard Rieder <berno.rieder@gmail.com> Cc: Air-L@listserv.aoir.org <Air-L@listserv.aoir.org> Subject: Re: [Air-L] Text/Data Mining Software Suggestions: for YouTube, Facebook & Instagram? That's wild to me. Thanks for the discussion, information, and links! Appreciate it all. On Tue, Nov 10, 2020, 11:32 AM Bernhard Rieder <berno.rieder@gmail.com> wrote:

...

Brooke,

I am no legal expert myself in any form or function, but here is a case in the US that made the rounds some time ago: https://arstechnica.com/tech-policy/2019/09/web-scraping-doesnt-violate-anti...

This may also get interesting: https://www.wsj.com/articles/facebook-seeks-shutdown-of-nyu-research-project...

The problem is that the legal situation is simply not 100% clear, neither in the US, nor in Europe.

Best, Bernhard

...
On 10 Nov 2020, at 17:19, Brooke Criswell <bcriswell@email.fielding.edu> wrote:

Bernhard,

Do you have any of those court cases you could send links to me? I would really love to learn more about this subject, especially as an early career researcher. And are the laws very different in Europe compared to the US? (I am in the US).

This is all so interesting to me!

On Tue, Nov 10, 2020, 11:11 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Hi again,

No need to apologize, Brooke, we are all in a situation that is marred by insecurity, opacity, and conflicting information. My apologies if my comments came off too strong, also to Stuart.

With regards to Facepager, the tool made it through Facebook's app review (Jakob, I'll have to ping you sometime soon to ask how you did it), which means that its functionalities were audited by the company, giving some legal security. This does of course not eliminate ethical questions.

With regards to Instagram, what baffles me is that scraping via instaloader actually works better than data retrieval via the API ever did, which means that there is some level of acquiescence. One can easily get up to 100s of 1000s of posts for a given hashtag.

What Mirko is saying about university support is super important, but I also want to highlight the great work by AlgorithmWatch and colleagues Jef Ausloos, Paddy Leerssen and Pim ten Thije on legal frameworks for more robust data access, for example here: https://www.ivir.nl/publicaties/download/GoverningPlatforms_IViR_study_June2...

This may be naive, but I have the hope that the upcoming EU Digital Services Act will have some provisions for academic research, or at least some clarifications. The current situation is creating serious chilling effects for research, without protecting data subjects from the most predatory practices, since scraping works so well (in technical terms) in many cases - or not at all in others. Commissioner Vestager has sent some positive signals in that direction.

The reason why I am very hesitant about taking ToS as legal gospel is a) that courts have ruled otherwise when it comes to scraping and b) because I find the idea that platform companies can dictate what we are able to know about platforms, how they operate and what happens on them highly problematic and worth fighting against. Jeanette Hofmann and I have a paper on that front coming forth very soon ,-)

All the best, Bernhard

...
On 10 Nov 2020, at 15:53, Brooke Criswell < bcriswell@email.fielding.edu> wrote:

My apologies. I was just passing along what I have been told because of privacy settings within Facebook and Instagram. I have been told specifically by Facebook there is no "legal" way to scrape comments or different things like that. Now likes and shares etc, I have no idea. So I was just passing that along. I am by no means an expert in all of the ways and was not aware of other ways like Facepager. I just know Facebook is very strict with their data especially because of the privacy policy and settings people can individually make. I have been told Facebook closed off their API except for when working in collaborations or specifically accepted to get data from their research team.

Very sorry if I gave wrong information. This is just what I have learned and been told and would never want anyone to get into trouble or collect items they weren't technically supposed to.

Best of luck and if you do find anything please share!

Take care all.

On Tue, Nov 10, 2020, 5:35 AM Bernhard Rieder <berno.rieder@gmail.com> wrote: Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L < air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/

Jakob Jünger

4:43 p.m.

Dear all, balancing ethics, science and tos is complex and from my point of view has to be evaluated with regard to each specific case. The different scientific ethics guidelines and local laws are quite helpful. For example, in Germany, there are very specific regulations for science in the GDPR and in the copyright law. Further guidelines are published by the so called RatSWD. The platforms' terms allow some but not all use cases. On this basis, for example, compiling a textmining corpus for the analysis of language features is justified in many cases. From my experience, the universities' data protection officers are very helpful with setting up the needed protocols to balance privacy and research interests. Anyway, I don't know enough about the situation in different countries to give advice. That being said, if webscraping is an option, you will find some basic Instagram presets in the wiki of Facepager (Presets -> Load a preset). Best regards Jakob Am 10.11.2020 um 12:35 schrieb Bernhard Rieder:

...

Dear colleagues,

I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager.

For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere.

For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/

All the best, Bernhard

...
On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L <air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

Ganiat.Kazeem

9:18 p.m.

Actually mining data may present research with ethical issues such as right of use of information, privacy, right of access, etc. So it is much more than skipping past a corporate policy Although not always the case some policies have a firm rooting in adhering to service level agreements between users and service providers Facebook being a free platform does not mean that the information about the users activities especially things like posts, comments, likes, etc ( things people will do because they are addressing their comment to a specific person or persons or organisation) should become public. In principle, I assume it would be far easier to recruit facebook users who are willing to share their user data in this way first. -----Original Message----- From: Air-L <air-l-bounces@listserv.aoir.org> On Behalf Of Bernhard Rieder Sent: 10 November 2020 11:35 To: Brooke Criswell <bcriswell@email.fielding.edu> Cc: Air-L@listserv.aoir.org Subject: Re: [Air-L] Text/Data Mining Software Suggestions: for YouTube, Facebook & Instagram? CAUTION: This mail comes from outside the University. Please consider this before opening attachments, clicking links, or acting on the content. Dear colleagues, I would like to disagree with Brooke here. Facebook data can still be accessed through non-scraping based API-access, most importantly the awesome Facepager. For Instagram, scraping is indeed the go-to technique (instaloader works very well) and I would like to defend the idea that ToS should not hinder researchers if the social relevance of the topic warrants it. Adhering to corporate policy is not the gold standard for what independent research should strive for, in my view. Proposing topics to people at Facebook may be a strategy for certain topics, but for anything that does not fit within the narrow interests of the platform, this will most likely go nowhere. For YouTube, you can also check out the YouTube Data Tools that I have been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/ All the best, Bernhard

...

On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L <air-l@listserv.aoir.org> wrote:

Facebook and Instagram are strict and according to terms and conditions they don't allow any data scraping.

Best try is to propose your study to a researcher at Facebook

On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux@ulb.ac.be> wrote:

...
Facepager for FB and YT it has a user interface and a decent documentation.

There are scrappers for instagram but those don't comply with the platform terms of use and afaik are terminal only.

On 6/11/20 14:59, Cristina Migliaccio wrote:

...
Dear Colleagues,

Advance apologies if this question has been addressed (as I am certain it has been) in some previous forum/email---does an easy to use text/data mining software/platform exist that works across these 3 social media platforms: YouTube, Facebook & Instagram?

I would like to collect data on alphabetic features but also paralinguistic features such as likes, shares, etc.

Any suggestions whatsoever for a text/data mining beginner would be greatly appreciated (videos, lectures to this end also appreciated!)

Warm thanks- Cristina Migliaccio _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

-- Alexandre Leroux Ph.D candidate Group for research on Ethnic Relations, Migrations and Equality (GERME) Université Libre de Bruxelles (ULB) alleroux@ulb.ac.be _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers: http://www.aoir.org/

2060

Age (days ago)

2064

Last active (days ago)

List overview

Download

13 comments

8 participants

participants (8)

Alexandre Leroux
Bernhard Rieder
Brooke Criswell
Cristina Migliaccio
Ganiat.Kazeem
Jakob Jünger
Schaefer, M.T. (Mirko)
Stuart Shulman