Thanks to all for the helpful suggestions!
On Sep 7, 2023, at 06:39, kalev leetaru via Air-L <air-l@listserv.aoir.org> wrote:
As a general note, it is important to note the massive difference between newer non-deterministic ASR's like Whisper that prioritize "fluency over fidelity" and classical and modern deterministic large model ASRs. In short, with models like Whisper, the transcription will vary every time you run the model and will often deviate from what the speaker actually said if it is a statistically unlikely utterance, as they are designed to essentially rewrite the utterances into a more readable and understandable text, as opposed to capture what was actually said. For a live video conference where understandability trumps accuracy, that might be just fine, but for interview transcriptions to be used for content analysis and even for providing the most accurate transcriptions for assistive needs, keep those issues in mind. Also remember that models like Whisper can hallucinate and truncate *badly* and recommended mitigations often don't work on many content types:
https://blog.gdeltproject.org/a-deep-dive-exploration-applying-openais-whisp... https://blog.gdeltproject.org/a-deep-dive-exploration-applying-openais-whisp... https://blog.gdeltproject.org/testing-the-new-openai-whisper-asr-large-v2-mo... https://blog.gdeltproject.org/openais-whisper-asr-how-the-nato-threat-to-put... https://blog.gdeltproject.org/experiments-with-whisper-asr-model-parameters-... https://blog.gdeltproject.org/ais-pivot-to-fluency-existential-non-determini...
The same holds for translation - keep in mind that the newer generative translation models like Meta's Seamless have a number of critical departures from what we think of as traditional NMT:
https://blog.gdeltproject.org/experiments-with-speech-transcription-translat...
And keep in mind that if you're using LLMs to summarize or otherwise distill transcripts, hallucination will manifest in highly unpredictable ways based on the alignment of the source topics and the training data:
https://blog.gdeltproject.org/hallucination-in-summarization-when-chatgpt-ha...
Embedding models can be used for mitigation for monolingual tasks:
https://blog.gdeltproject.org/using-embedding-ranking-to-combat-llm-hallucin...
But keep in mind that in multilingual tasks, even bitext models and the most recent LLM embedders will still favor stilted NMT translations over human translations, so they are counter-productive for multilingual mitigation contexts:
https://blog.gdeltproject.org/authoritative-human-vs-nmt-llm-translation-emb...
Kalev
On Wed, Sep 6, 2023 at 7:06 PM Hamlet Lopez via Air-L < air-l@listserv.aoir.org> wrote:
Hi Natalie,
I won't talk about the ethics involved here. But you can get high quality automatic transcription of your interviews for free, from your personal computer, if you are not afraid of console programs. You can use open source software.
Check https://github.com/ggerganov/whisper.cpp
I use whisper.cpp for interviews in Spanish, with excelent results. The English language models are even better.
Best wishes,
Hamlet
On 9/6/23, Natalie Rock via Air-L <air-l@listserv.aoir.org> wrote: Ethics mavens, your insights would be appreciated: should I be concerned about this note sent from Rev.com? I use Rev to create automated transcripts of research interviews. Does this note imply my transcripts might be used for AI training?
Begin forwarded message:
From: Your Team at Rev <yourteam@rev.com> Date: September 6, 2023 at 14:08:15 PDT
In your Terms of Service and Data Processing Agreement with Rev, we commit to notifying you before any new third-party sub-processor starts processing applicable data. We’re sending this message because we’re planning to appoint a new third-party sub-processor for Rev to support an upcoming new feature.
OpenAI - OpenAI is a company focused on AI research and deployment. Their API platform provides models, tools, and technologies for AI development. These services are utilized to enhance the efficiency, scalability, and flexibility of both human and Automatic Speech Recognition (ASR) services. This new third-party sub-processor has been verified to ensure it meets Rev security and privacy standards and will meet Rev’s data processing terms. No customer owned data will be provided to this third party until the sub-processor notification period has passed.
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Dr.C. Hamlet López García Investigador Instituto Cubano de Investigación Cultural "Juan Marinello" Profesor Universidad de la Habana _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/