Talk @ 1729: “Audio classification with Hugging Face transformers”

Julien Simon - Jul 28 '22 - - Dev Community

In this video, I show how to use fine-tune a state of the art Conformer model for audio keyword classification, and build a Gradio Space to showcase it. I also quickly test the model with distorted audio to see how resilient it it.

Dataset: https://huggingface.co/datasets/speech_commands

Base model: https://huggingface.co/facebook/wav2vec2-conformer-rel-pos-large

Fine-tuned model: https://huggingface.co/juliensimon/wav2vec2-conformer-rel-pos-large-finetuned-speech-commands

Space: https://huggingface.co/spaces/juliensimon/keyword-spotting

Notebook: https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/keyword-spotting

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player