About: Recovery of Rare Words in Lecture Speech     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
Description
  • The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, each of which might appear a few times only in particular documents. In most documents, however, these rare words are not seen at all. The first type of words is typically included in the language model of an automatic speech recognizer (ASR) and is thus widely referred to as invocabulary (IV). Words of the second type are missing in the language model and thus are called out-of-vocabulary (OOV). However, these words usually carry important information. We use a hybrid word/sub-word recognizer to detect OOV words occurring in English talks and describe them as sequences of sub-words.We detected about one third of all OOV words, and were able to recover the correct spelling for 26.2% of all detections by using a phoneme-to-grapheme (P2G) conversion trained on the recognition dictionary. By omitting detections corresponding to
  • The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, each of which might appear a few times only in particular documents. In most documents, however, these rare words are not seen at all. The first type of words is typically included in the language model of an automatic speech recognizer (ASR) and is thus widely referred to as invocabulary (IV). Words of the second type are missing in the language model and thus are called out-of-vocabulary (OOV). However, these words usually carry important information. We use a hybrid word/sub-word recognizer to detect OOV words occurring in English talks and describe them as sequences of sub-words.We detected about one third of all OOV words, and were able to recover the correct spelling for 26.2% of all detections by using a phoneme-to-grapheme (P2G) conversion trained on the recognition dictionary. By omitting detections corresponding to (en)
Title
  • Recovery of Rare Words in Lecture Speech
  • Recovery of Rare Words in Lecture Speech (en)
skos:prefLabel
  • Recovery of Rare Words in Lecture Speech
  • Recovery of Rare Words in Lecture Speech (en)
skos:notation
  • RIV/00216305:26230/10:PU89608!RIV11-GA0-26230___
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(GA102/08/0707), S, Z(MSM0021630528)
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 284237
http://linked.open...ai/riv/idVysledku
  • RIV/00216305:26230/10:PU89608
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • speech, rare words, recognizer, detect OOV words, sub-words, lectures (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [D085C7340192]
http://linked.open...v/mistoKonaniAkce
  • Brno
http://linked.open...i/riv/mistoVydani
  • Brno
http://linked.open...i/riv/nazevZdroje
  • Proc. Text, Speech and Dialogue 2010
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Burget, Lukáš
  • Kombrink, Stefan
  • Hannemann, Mirko
  • Heřmanský, Hynek
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
http://linked.open...n/vavai/riv/zamer
number of pages
http://purl.org/ne...btex#hasPublisher
  • Springer-Verlag
https://schema.org/isbn
  • 978-3-642-15759-2
http://localhost/t...ganizacniJednotka
  • 26230
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 112 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software