語音識別 (Automatic Speech Recognition, ASR)是指利用計算機實現從語音到文字自動轉換的任務。在實際應用中,語音識別通常與自然語言理解、自然語言生成和語音合成等技術結合在一起,提供一個基於語音的自然流暢的人機交互方法。

VIP內容

主題:Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

摘要:詞彙不足(OOV)的問題對於任何語音識別係統都是典型的,混合係統通常被構造為識別一組固定的單詞,並且很少包含係統開發過程中會遇到的所有單詞。 覆蓋OOV的一種流行方法是使用子詞單位而不是詞。 如果可以從當前子詞單元構建該詞,則這樣的係統可以潛在地識別任何以前看不見的詞,但是也可以識別不存在的詞。 另一種流行的方法是修改係統的HMM部分,以便可以使用我們要添加到係統中的自定義單詞集輕鬆有效地擴展它。 在本文中,我們在圖形構造和搜索方法級別上探索了該解決方案的不同現有方法。 我們還提出了一種新穎的詞彙擴展技術,該技術解決了有關識別圖處理的一些常見內部子例程問題。

成為VIP會員查看完整內容
0
12
0

最新內容

We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions. We perform experiments on the standard LibriSpeech dataset, and leverage additional unlabeled data from LibriVox through pseudo-labeling. We show that while Transformer-based acoustic models have superior performance with the supervised dataset alone, semi-supervision improves all models across architectures and loss functions and bridges much of the performance gaps between them. In doing so, we reach a new state-of-the-art for end-to-end acoustic models decoded with an external language model in the standard supervised learning setting, and a new absolute state-of-the-art with semi-supervised training. Finally, we study the effect of leveraging different amounts of unlabeled audio, propose several ways of evaluating the characteristics of unlabeled audio which improve acoustic modeling, and show that acoustic models trained with more audio rely less on external language models.

0
0
0
下載
預覽

最新論文

We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions. We perform experiments on the standard LibriSpeech dataset, and leverage additional unlabeled data from LibriVox through pseudo-labeling. We show that while Transformer-based acoustic models have superior performance with the supervised dataset alone, semi-supervision improves all models across architectures and loss functions and bridges much of the performance gaps between them. In doing so, we reach a new state-of-the-art for end-to-end acoustic models decoded with an external language model in the standard supervised learning setting, and a new absolute state-of-the-art with semi-supervised training. Finally, we study the effect of leveraging different amounts of unlabeled audio, propose several ways of evaluating the characteristics of unlabeled audio which improve acoustic modeling, and show that acoustic models trained with more audio rely less on external language models.

0
0
0
下載
預覽
Top