HTML Microdata document

This HTML5 document contains 47 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

Prefix	IRI
n5	http://linked.opendata.cz/ontology/domain/vavai/riv/typAkce/
dcterms	http://purl.org/dc/terms/
n12	http://localhost/temp/predkladatel/
n6	http://purl.org/net/nknouf/ns/bibtex#
n16	http://linked.opendata.cz/resource/domain/vavai/projekt/
n14	http://linked.opendata.cz/resource/domain/vavai/riv/tvurce/
n13	http://linked.opendata.cz/ontology/domain/vavai/
n20	https://schema.org/
s	http://schema.org/
skos	http://www.w3.org/2004/02/skos/core#
n3	http://linked.opendata.cz/ontology/domain/vavai/riv/
n2	http://linked.opendata.cz/resource/domain/vavai/vysledek/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n21	http://linked.opendata.cz/resource/domain/vavai/vysledek/RIV%2F49777513%3A23520%2F08%3A00500384%21RIV09-MSM-23520___/
n7	http://linked.opendata.cz/ontology/domain/vavai/riv/klicoveSlovo/
n9	http://linked.opendata.cz/ontology/domain/vavai/riv/duvernostUdaju/
xsdh	http://www.w3.org/2001/XMLSchema#
n19	http://linked.opendata.cz/ontology/domain/vavai/riv/aktivita/
n15	http://linked.opendata.cz/ontology/domain/vavai/riv/jazykVysledku/
n18	http://linked.opendata.cz/ontology/domain/vavai/riv/obor/
n17	http://linked.opendata.cz/ontology/domain/vavai/riv/druhVysledku/
n10	http://reference.data.gov.uk/id/gregorian-year/

Statements

Subject Item: n2:RIV%2F49777513%3A23520%2F08%3A00500384%21RIV09-MSM-23520___
rdf:type: skos:Concept n13:Vysledek
dcterms:description: Structural metadata extraction (MDE) research aims to develop techniques for automatic conversion of raw speech recognition output to forms that are more useful to humans and to downstream automatic processes. It may be achieved by inserting boundaries of syntactic/semantic units to the flow of speech, labeling non-content words like filled pauses and discourse markers for optional removal, and identifying sections of disfluent speech. This paper compares two Czech MDE speech corpora, one in the domain of broadcast news and the other in the domain of broadcast conversations. A variety of statistics about fillers, edit disfluencies, and syntactic/semantic units are presented. In addition, it is reported that disfluent portions of speech show differences in the distribution of parts of speech (POS) of their content in comparison with the general POS distribution. Structural metadata extraction (MDE) research aims to develop techniques for automatic conversion of raw speech recognition output to forms that are more useful to humans and to downstream automatic processes. It may be achieved by inserting boundaries of syntactic/semantic units to the flow of speech, labeling non-content words like filled pauses and discourse markers for optional removal, and identifying sections of disfluent speech. This paper compares two Czech MDE speech corpora, one in the domain of broadcast news and the other in the domain of broadcast conversations. A variety of statistics about fillers, edit disfluencies, and syntactic/semantic units are presented. In addition, it is reported that disfluent portions of speech show differences in the distribution of parts of speech (POS) of their content in comparison with the general POS distribution. V úlohách extrakce strukturálních metadat (MDE) je cílem vyvinout techniky pro automatickou konverzi nestrukturovaného výstupu z automatického rozpoznávače řeči do formy více čitelné a vhodnější pro následné zpracování. Toho může být dosaženo vložením hranic syntaktických celků a označením výplňkových a opravených slov pro jejich případné vymazání. Tento článek srovnává dva české řečové MDE korpusy, jeden v doméně zpráv a druhý v doméně živě přenášených diskuzí. Je zde prezentováno množství statistik o výplňových slovech a frázích, editačních neplynulostech a syntakticko-sémantických jednotkách. Mimo jiné uvádíme statistiky ukazující, že neplynulé části řeči mají významně jiné rozdělení slovních druhů než celý korpus. Dva popisované české korpusy nejsou pouze srovnány mezi sebou, ale také s dostupnými anglickými korpusy.
dcterms:title: Structural metadata annotation of speech corpora: Comparing broadcast news and broadcast conversations Structural metadata annotation of speech corpora: Comparing broadcast news and broadcast conversations Anotace strukturálních metadat v řečových korpusech: Srovnání rozhlasových zpráv a rozhlasových diskuzí
skos:prefLabel: Anotace strukturálních metadat v řečových korpusech: Srovnání rozhlasových zpráv a rozhlasových diskuzí Structural metadata annotation of speech corpora: Comparing broadcast news and broadcast conversations Structural metadata annotation of speech corpora: Comparing broadcast news and broadcast conversations
skos:notation: RIV/49777513:23520/08:00500384!RIV09-MSM-23520___
n3:aktivita: n19:P
n3:aktivity: P(2C06020), P(ME 909)
n3:dodaniDat: n10:2009
n3:domaciTvurceVysledku: n14:2328372 n14:8780943
n3:druhVysledku: n17:D
n3:duvernostUdaju: n9:S
n3:entitaPredkladatele: n21:predkladatel
n3:idSjednocenehoVysledku: 397781
n3:idVysledku: RIV/49777513:23520/08:00500384
n3:jazykVysledku: n15:eng
n3:klicovaSlova: structural metadata; MDE; disfluencies; fillers; sentence segmentation
n3:klicoveSlovo: n7:sentence%20segmentation n7:MDE n7:structural%20metadata n7:disfluencies n7:fillers
n3:kontrolniKodProRIV: [794C887F4EB7]
n3:mistoKonaniAkce: Marrakech
n3:mistoVydani: Paris
n3:nazevZdroje: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
n3:obor: n18:JD
n3:pocetDomacichTvurcuVysledku: 2
n3:pocetTvurcuVysledku: 2
n3:projekt: n16:2C06020 n16:ME%20909
n3:rokUplatneniVysledku: n10:2008
n3:tvurceVysledku: Kolář, Jáchym Švec, Jan
n3:typAkce: n5:WRD
n3:zahajeniAkce: 2008-06-01+02:00
s:numberOfPages: 6
n6:hasPublisher: ELRA
n20:isbn: 2-9517408-4-0
n12:organizacniJednotka: 23520