Clear Filters
Clear Filters

Deep Learning for speech proficiency scoring?

2 views (last 30 days)
Hi!
I have a dataset consisting of > 100 recordings of approx. 10 minutes consisting of patients undergoing a speech evaluation interview with a speech therapist. I also have access to the resulting score of these interviews, on a scale from 1 to 10. I want to train an intial deep learning network which predicts the score of a patient based on this dataset. My question is this: Is it better to label the entire interview with a 9 if that's the score given to that patient or would you rather want to implement some sort of speech2text function for the entire interview so that each interview yields a whole set of pairs consisting of words and the score for the entire interview? Then when the network is asked to score a new interview it would then to speech2text on that file and match each word with its closets matches?
Best,
Joel
  1 Comment
Brian Hemmat
Brian Hemmat on 3 May 2021
Hi Joel,
When you suggest pairs of words and scores, you mean by 'word' either the raw audio or some set of acoustic features, and not the text, correct? speech2text might be useful for segmenting audio, but I don't think it will retain valuable information like whether or not stuttering is present.
At what time scale or part of speech does whatever you are evaluating show up? Are you looking for articulation disorders? Fluency disorders? What time scale you need to feed to your system will depend on what you are evaluating. Whatever the rating represents, my suspician is that you will want to segment the audio into 5-20 seconds clips, with the segments having the same label as the whole clip.

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!