Attributes | Values |
---|
rdf:type
| |
Description
| - This paper deals with the task of adaptation of an existing Czech largevocabulary speech recognition (LVCSR) system to the language used in previous historical epochs (before 1990). The goal is to fit its lexicon and language model (LM) so that the system could be employed for the automatic transcription of old spoken documents in the Czech Radio archive. The main problem is the lack of texts (in electronic form) from the 1945-1990 period. The only available and large enough source is digitized copies of Rud´e Pr´avo, the newspaper of the former Communist party of Czechoslovakia, the actual ruling body in the state. The newspaper has been scanned and converted into text via an OCR software. However, the amount of OCR errors is very high and so we have to apply several text pre-processing techniques to get a corpus suitable for the lexicon and language model ’downdating’ (i.e. adaptation to the past). The proposed techniques helped us a) to reduce the number of out-of-vocabulary strings from 8.5 to 6.4 millions, b) to identify 6.7 thousand history-conditioned word candidates to be added to the lexicon and c) to build a more appropriate LM. The adapted LVCSR system was evaluated on broadcast news from 1969-1989 where its word-error-rate decreased from 17.05 to 14.33%.
- This paper deals with the task of adaptation of an existing Czech largevocabulary speech recognition (LVCSR) system to the language used in previous historical epochs (before 1990). The goal is to fit its lexicon and language model (LM) so that the system could be employed for the automatic transcription of old spoken documents in the Czech Radio archive. The main problem is the lack of texts (in electronic form) from the 1945-1990 period. The only available and large enough source is digitized copies of Rud´e Pr´avo, the newspaper of the former Communist party of Czechoslovakia, the actual ruling body in the state. The newspaper has been scanned and converted into text via an OCR software. However, the amount of OCR errors is very high and so we have to apply several text pre-processing techniques to get a corpus suitable for the lexicon and language model ’downdating’ (i.e. adaptation to the past). The proposed techniques helped us a) to reduce the number of out-of-vocabulary strings from 8.5 to 6.4 millions, b) to identify 6.7 thousand history-conditioned word candidates to be added to the lexicon and c) to build a more appropriate LM. The adapted LVCSR system was evaluated on broadcast news from 1969-1989 where its word-error-rate decreased from 17.05 to 14.33%. (en)
|
Title
| - Downdating lexicon and language model for automatic transcription of Czech historical spoken documents
- Downdating lexicon and language model for automatic transcription of Czech historical spoken documents (en)
|
skos:prefLabel
| - Downdating lexicon and language model for automatic transcription of Czech historical spoken documents
- Downdating lexicon and language model for automatic transcription of Czech historical spoken documents (en)
|
skos:notation
| - RIV/46747885:24220/13:#0002791!RIV14-MK0-24220___
|
http://linked.open...avai/riv/aktivita
| |
http://linked.open...avai/riv/aktivity
| |
http://linked.open...vai/riv/dodaniDat
| |
http://linked.open...aciTvurceVysledku
| |
http://linked.open.../riv/druhVysledku
| |
http://linked.open...iv/duvernostUdaju
| |
http://linked.open...titaPredkladatele
| |
http://linked.open...dnocenehoVysledku
| |
http://linked.open...ai/riv/idVysledku
| - RIV/46747885:24220/13:#0002791
|
http://linked.open...riv/jazykVysledku
| |
http://linked.open.../riv/klicovaSlova
| - historical speech recognition; oral archives; lexicon (en)
|
http://linked.open.../riv/klicoveSlovo
| |
http://linked.open...ontrolniKodProRIV
| |
http://linked.open...v/mistoKonaniAkce
| |
http://linked.open...i/riv/mistoVydani
| |
http://linked.open...i/riv/nazevZdroje
| - 16th International Conference, TSD 2013
|
http://linked.open...in/vavai/riv/obor
| |
http://linked.open...ichTvurcuVysledku
| |
http://linked.open...cetTvurcuVysledku
| |
http://linked.open...vavai/riv/projekt
| |
http://linked.open...UplatneniVysledku
| |
http://linked.open...iv/tvurceVysledku
| - Chaloupka, Josef
- Málek, Jiří
- Nouza, Jan
- Červa, Petr
|
http://linked.open...vavai/riv/typAkce
| |
http://linked.open.../riv/zahajeniAkce
| |
issn
| |
number of pages
| |
http://bibframe.org/vocab/doi
| - 10.1007/978-3-642-40585-3_26
|
http://purl.org/ne...btex#hasPublisher
| - Springer-Verlag Berlin Heidelber
|
https://schema.org/isbn
| |
http://localhost/t...ganizacniJednotka
| |