Current location - Education and Training Encyclopedia - Graduation thesis - Artificial intelligence surgical papers
Artificial intelligence surgical papers
Google conducted an exploratory study to prove that speech enhancement technology, especially noise suppressor, can also be used in cochlear implants to enhance users' understanding of sound in noisy environment.

Cochlear implant is an electronic device, which can be placed in the inner ear of hearing-impaired people by surgery, and the external audio processing unit sends out current to stimulate nerves. Although cochlear implants can stimulate these currents and convert them into audible speech, the listening experience will be very different due to the user's environment, especially the noisy environment. Modern cochlear implants use external audio processing unit to calculate pulse signals to drive electrodes. The important challenge in this field has always been to find a way to process sound well and convert it into appropriate electrode pulses.

In order to solve this problem, industrial and academic scientists held a cochlear implant hackathon to brainstorm, while Google proposed to use SOLVE TASNET speech enhancement model in cochlear implants to suppress non-speech sounds, so that users can hear human voices more clearly. The researchers decomposed the audio into 16 overlapping bands, corresponding to 16 electrodes in cochlear implants. However, because the dynamic range of sound can easily span multiple orders of magnitude, which is beyond the expected range of electric shock, researchers need to use the paradigm method to compress the dynamic range.

Cochlear implant users have different preferences, and the overall experience comes from users' evaluation of listening to various types of audio, including music. Researchers say that although music is an important sound type for users, it is also a particularly difficult category to deal with. Because Google's speech enhancement network is trained to suppress non-speech sounds, including noise and music, they take extra measures to prevent the speech enhancement network from suppressing music sounds. In order to achieve this goal, researchers use the open source YAMNet classifier to judge speech and non-speech sounds, so as to adjust the proportion of mixed audio in real time to ensure that enough music will not be filtered and can be heard by users.

The researchers used the conv- Tasnett model to realize the enhancement module of non-speech audio, which can separate different sounds. At first, the original audio waveform will be converted into a form that can be used by neural network, and the sound will be separated according to feature analysis. The model will intercept features and generate two masks, one for sound and the other for noise. These shields represent the degree of sound and noise. By multiplying masking and analysis features and some conversion calculations, audio with separated speech and noise can be obtained. The researchers mentioned that conv- Tasnett model has the characteristics of low delay, and it can generate the estimated value of separating speech and noise immediately.

Through the blind listening test, this research result can make the listener understand the speech content without too much background noise, but there is still much room for improvement in speech intelligibility. In addition, because this research is still in the exploratory stage, the researchers used a model of 2.9 million variables, but this model is too big to be actually applied to today's cochlear implants, just to show the future potential of this technology.