About: Czech spontaneous speech corpus with structural metadata     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
Description
  • This paper describes a Czech spontaneous speech corpus consisting of radio talk show recordings. As the first complete non-English MDE corpus, it has been annotated with structural metadata information beyond the words that is critical to both increasing transcript readability and allowing application of downstream NLP methods. Metadata annotation involves partitioning verbatim transcripts into syntactic/semantic units (SUs) that function to express a complete idea; and identifying fillers and edit disfluencies. Annotation guidelines for English metadata developed by Linguistic Data Consortium were taken as the starting point, with changes applied to accommodate specific phenomena of Czech. In addition to the necessary language-dependent modifications, we further propose some language-independent modifications including limited prosodic labeling at SU boundaries.
  • This paper describes a Czech spontaneous speech corpus consisting of radio talk show recordings. As the first complete non-English MDE corpus, it has been annotated with structural metadata information beyond the words that is critical to both increasing transcript readability and allowing application of downstream NLP methods. Metadata annotation involves partitioning verbatim transcripts into syntactic/semantic units (SUs) that function to express a complete idea; and identifying fillers and edit disfluencies. Annotation guidelines for English metadata developed by Linguistic Data Consortium were taken as the starting point, with changes applied to accommodate specific phenomena of Czech. In addition to the necessary language-dependent modifications, we further propose some language-independent modifications including limited prosodic labeling at SU boundaries. (en)
  • Tento článek popisuje český korpus spontánní řeči skládajícíse z nahrávek rozhlasových diskusních pořadů. Jako první kompletní neanglický MDE korpus byl anotován strukturálními metadaty, která zvyšují čitelnost přepisů člověkem a umožňují i další automatické zpracování. Anotace zahrnuje rozdělení přepisů do syntakticko-sémantických jednotek a identifikace výplní a neplynulostí. Mimo modifikací nutných pouze pro češtinu také navrhujeme některé modifikace nezávislé na jazyku, jako je například limitované prozodické značkování na hranicích syntakticko-sémantických jednotek. (cs)
Title
  • Czech spontaneous speech corpus with structural metadata
  • Czech spontaneous speech corpus with structural metadata (en)
  • Český korpus spontánní řeči s anotací strukturálních metadat (cs)
skos:prefLabel
  • Czech spontaneous speech corpus with structural metadata
  • Czech spontaneous speech corpus with structural metadata (en)
  • Český korpus spontánní řeči s anotací strukturálních metadat (cs)
skos:notation
  • RIV/49777513:23520/05:00000284!RIV07-MSM-23520___
http://linked.open.../vavai/riv/strany
  • 1165
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(LC536), Z(MSM 235200004)
http://linked.open...iv/cisloPeriodika
  • 0
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 516830
http://linked.open...ai/riv/idVysledku
  • RIV/49777513:23520/05:00000284
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • SUs; structural metadata; spontaneous speech; disfluencies; fillers (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...odStatuVydavatele
  • PT - Portugalská republika
http://linked.open...ontrolniKodProRIV
  • [98A7D988FAA6]
http://linked.open...i/riv/nazevZdroje
  • Eurospeech
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...v/svazekPeriodika
  • 2005
http://linked.open...iv/tvurceVysledku
  • Psutka, Josef
  • Kolář, Jáchym
  • Švec, Jan
  • Kozlíková, Dagmar
  • Strassel, Stephanie
  • Walker, Christopher
http://linked.open...n/vavai/riv/zamer
issn
  • 1018-4074
number of pages
http://localhost/t...ganizacniJednotka
  • 23520
is http://linked.open...avai/riv/vysledek of
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 41 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software