作成者 |
|
|
本文言語 |
|
出版者 |
|
|
発行日 |
|
収録物名 |
|
巻 |
|
号 |
|
開始ページ |
|
終了ページ |
|
出版タイプ |
|
アクセス権 |
|
Crossref DOI |
|
権利関係 |
|
概要 |
Speaker Recognition (SR) uses a person's voice to identify them. Due to their high performance and capability to recompense for session/channel inconsistencies, i-vectors have recently gained populari...ty as SRS input features. Additional speaker-specific perceptual cues can be derived from behaviors and learned characteristics, such as vocabulary selection, accent, intonation style, and emotional aspects. Humans also use the speaker's sound signature similarity to known speakers to improve sound recognition precision. We need a new feature vector representation that compares a mark speaker's speech to a set of reference speaker’s (codebook/dictionary). The speaker's utterance is encoded as cosine distance feature vectors (CDF). Back-end classifiers use SVMs (CDF-SVM). As a result, an SVM classifier with an intersection kernel captures the most acoustic similarities between target and reference speakers. Determining speaker discrimination is more important with reference speakers that are acoustically similar. Using CDF sparingly improves discriminative power by keeping only a few large values that correspond to the most similar reference speakers and setting all other elements to 0. On the core shorting condition of NIST's 2008 SRE databases, CDF-SVM outperforms SR systems using I-Vectors.続きを見る
|