About: Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
rdfs:seeAlso
Description
  • We present a dataset of telephone conversations in English and Czech, developed to train acoustic models for automatic speech recognition (ASR) in spoken dialogue systems (SDSs). The data comprise 45 hours of speech in English and over 18 hours in Czech. All audio data and a large part of transcriptions was collected using crowdsourcing; the rest was transcribed by hired transcribers. We release the data together with scripts for data re-processing and building acoustic models using the HTK and Kaldi ASR toolkits. We publish the trained models described in this paper as well. The data are released under the CC-BY-SA 3.0 license, the scripts are licensed under Apache 2.0. In the paper, we report on the methodology of collecting the data, on the size and properties of the data, and on the scripts and their use. We verify the usability of the datasets by training and valuating acoustic models using the presented data and scripts.
  • We present a dataset of telephone conversations in English and Czech, developed to train acoustic models for automatic speech recognition (ASR) in spoken dialogue systems (SDSs). The data comprise 45 hours of speech in English and over 18 hours in Czech. All audio data and a large part of transcriptions was collected using crowdsourcing; the rest was transcribed by hired transcribers. We release the data together with scripts for data re-processing and building acoustic models using the HTK and Kaldi ASR toolkits. We publish the trained models described in this paper as well. The data are released under the CC-BY-SA 3.0 license, the scripts are licensed under Apache 2.0. In the paper, we report on the methodology of collecting the data, on the size and properties of the data, and on the scripts and their use. We verify the usability of the datasets by training and valuating acoustic models using the presented data and scripts. (en)
Title
  • Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license
  • Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license (en)
skos:prefLabel
  • Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license
  • Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license (en)
skos:notation
  • RIV/00216208:11320/14:10289384!RIV15-MSM-11320___
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(LK11221), P(LM2010013), S
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 17484
http://linked.open...ai/riv/idVysledku
  • RIV/00216208:11320/14:10289384
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • license; under; shared; corpus; speech; telephone; czech; english; free (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [DF60C11C61C3]
http://linked.open...v/mistoKonaniAkce
  • Reykjavík, Iceland
http://linked.open...i/riv/mistoVydani
  • Reykjavík, Iceland
http://linked.open...i/riv/nazevZdroje
  • Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014)
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Dušek, Ondřej
  • Jurčíček, Filip
  • Korvas, Matěj
  • Plátek, Ondřej
  • Žilka, Lukáš
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
number of pages
http://purl.org/ne...btex#hasPublisher
  • European Language Resources Association
https://schema.org/isbn
  • 978-2-9517408-8-4
http://localhost/t...ganizacniJednotka
  • 11320
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 47 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software