hh.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Speech Intelligibility Measurement on the basis of ITU-T Recommendation P.863
Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Laboratoriet för intelligenta system.
2012 (Engelska)Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)Alternativ titel
Speech Intelligibility Measurement on the basis of ITU-T Recommendation P.863 (Engelska)
Abstract [en]

Objective speech intelligibility measurement techniques like AI (Articulation Index) and AI based STI (Speech Transmission Index) fail to assess speech intelligibility in modern telecommunication networks that use several non-linear processing for enhancing speech. Moreover, these techniques do not allow prediction of single individual CVC (Consonant Vowel Consonant) word intelligibility scores. ITU-T P.863 standard [1], which was developed for assessing speech quality, is used as a starting point to develop a simple new model for predicting subjective speech intelligibility of individual CVC words. Subjective intelligibility measurements were carried out for a large set of speech degradations. The subjective test uses single CVC word presentations in an eight alternative closed response set experiment. Subjects assess individual degraded CVC words and an average of correct recognition is used as the intelligibility score for a particular CVC word. The first subjective database uses CVC words that have variations in the first consonant i.e. /C/ous (represented as "kæʊs" using International Phonetic Association phonetic alphabets). This database is used for developing the objective model, while a new database based on VC words (Vowel Consonant) that uses variations in the second consonant (a/C/ e.g. aH, aL) is used for validating the model.

ITU-T P.863 shows very poor results with a correlation of 0.30 for the first subjective database. A first extension to make P.863 suited for intelligibility prediction is done by restructuring speech material to meet the temporal structure requirements (speech+silence+speech) set for standard P.863 measurements. The restructuring is done by concatenating every original and degraded CVC word with itself. There is no significant improvement in correlation (0.34) when using P.863 on the restructured first subjective database (speech material meets temporal requirements).  In this thesis a simple model based on P.863 is developed for assessing intelligibility of individual CVC words. The model uses a linear combination of a simple time clipping indicator (missing speech parts) and a “Good frame count” indicator which is based on the local perceptual (frame by frame) signal to noise ratio. Using this model on the restructured first database, a reasonably good correlation of 0.81 is seen between subjective scores and the model output values. For the validation database, a correlation of around 0.76 is obtained. Further validation on an existing database at TNO, which uses time clipping degradation only, shows an excellent correlation of 0.98.

Although a reasonably good correlation is seen on the first database and the validation database, it is too low for reliable measurements. Further validation and development is required, nevertheless the results show that a perception-based technique that uses internal representations of signals can be used for predicting subjective intelligibility scores of individual CVC words.

Ort, förlag, år, upplaga, sidor
2012. , s. 66
Nyckelord [en]
Speech Intelligibility, POLQA
Nationell ämneskategori
Teknik och teknologier Signalbehandling
Identifikatorer
URN: urn:nbn:se:hh:diva-20023Lokalt ID: IDE1271OAI: oai:DiVA.org:hh-20023DiVA, id: diva2:571613
Externt samarbete
TNO, The Netherlands
Ämne / kurs
Datateknik
Presentation
2012-11-06, Halmstad, 15:15 (Engelska)
Uppsök
teknik
Handledare
Examinatorer
Tillgänglig från: 2012-11-30 Skapad: 2012-11-23 Senast uppdaterad: 2012-11-30Bibliografiskt granskad

Open Access i DiVA

Speech Intelligibility Measurement on the basis of ITU-T Recommendation P.863(1357 kB)1906 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1357 kBChecksumma SHA-512
1b76a5c9b4818bc599b01cdcec92a7d5556b1114497f5e971e736595a3cdf0e3df0164ef85fcb2a8d23a79758faddaa8e7af85b9e9ba0fc6e76e20bee38e12af
Typ fulltextMimetyp application/pdf

Sök vidare i DiVA

Av författaren/redaktören
GHIMIRE, SWATANTRA
Av organisationen
Laboratoriet för intelligenta system
Teknik och teknologierSignalbehandling

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 1914 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 802 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf