About: Multimodal Phoneme Recognition of Meeting Data

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Multimodal Phoneme Recognition of Meeting Data Goto Sponge NotDistinct Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

Attributes	Values
rdf:type	skos:Concept http://linked.opendata.cz/ontology/domain/vavai/Vysledek
Description	Rozpoznávání fonémů z meetingových dat pomocí audio-vizuálních parametrů<br> (cs) This paper describes experiments in automatic recognition of context-independent phoneme strings from meeting data using audio-visual features. Visual features are known to improve accuracy and noise robustness of automatic speech recognizers. However, many problems appear when not "visually clean'' data is provided, such as data without limited variation in the speaker's frontal pose, lighting conditions, background, etc. The goal of this work was to test whether visual information can be helpful for recognition of phonemes using neural nets. While the audio part is fixed and uses standard Mel filter-bank energies, different features describing the video were tested: average brightness, DCT coefficients extracted from region-of-interest (ROI), o ptical flow analysis and lip-position features. The recognition was evaluated on a sub-set of IDIAP meeting room data. We have seen small improvement when compared to purely audio-recognition, but further work needs to be done especially concerning the d This paper describes experiments in automatic recognition of context-independent phoneme strings from meeting data using audio-visual features. Visual features are known to improve accuracy and noise robustness of automatic speech recognizers. However, many problems appear when not "visually clean'' data is provided, such as data without limited variation in the speaker's frontal pose, lighting conditions, background, etc. The goal of this work was to test whether visual information can be helpful for recognition of phonemes using neural nets. While the audio part is fixed and uses standard Mel filter-bank energies, different features describing the video were tested: average brightness, DCT coefficients extracted from region-of-interest (ROI), o ptical flow analysis and lip-position features. The recognition was evaluated on a sub-set of IDIAP meeting room data. We have seen small improvement when compared to purely audio-recognition, but further work needs to be done especially concerning the d (en)
Title	Multimodal Phoneme Recognition of Meeting Data Multimodal Phoneme Recognition of Meeting Data (en) Multimodální rozpoznávání fonémů na meeting datech (cs)
skos:prefLabel	Multimodal Phoneme Recognition of Meeting Data Multimodal Phoneme Recognition of Meeting Data (en) Multimodální rozpoznávání fonémů na meeting datech (cs)
skos:notation	RIV/00216305:26230/04:PU49308!RIV06-GA0-26230___
http://linked.open.../vavai/riv/strany	379-384
http://linked.open...avai/riv/aktivita	P Z
http://linked.open...avai/riv/aktivity	P(GA102/02/0124), P(GP102/02/D108), Z(MSM 262200012)
http://linked.open...iv/cisloPeriodika	3206
http://linked.open...vai/riv/dodaniDat	2006
http://linked.open...aciTvurceVysledku	Černocký, Jan Motlíček, Petr
http://linked.open.../riv/druhVysledku	J - Článek v odborném periodiku
http://linked.open...iv/duvernostUdaju	S - Úplné a pravdivé údaje nepodléhající ochraně podle zvláštních právních předpisů
http://linked.open...titaPredkladatele	Vysoké učení technické v Brně / Fakulta informačních technologií
http://linked.open...dnocenehoVysledku	575027
http://linked.open...ai/riv/idVysledku	RIV/00216305:26230/04:PU49308
http://linked.open...riv/jazykVysledku	eng - angličtina
http://linked.open.../riv/klicovaSlova	speech processing, audio-video processing, feature extraction, pattern recognition (en)
http://linked.open.../riv/klicoveSlovo	audio-video processing feature extraction pattern recognition speech processing
http://linked.open...odStatuVydavatele	DE - Spolková republika Německo
http://linked.open...ontrolniKodProRIV	[22284505B5A2]
http://linked.open...i/riv/nazevZdroje	Lecture Notes in Computer Science (IF 0,513)
http://linked.open...in/vavai/riv/obor	JC
http://linked.open...ichTvurcuVysledku	2 (xsd:int)
http://linked.open...cetTvurcuVysledku	2 (xsd:int)
http://linked.open...vavai/riv/projekt	Voice technologies for support of information society Data-driven and anthropic speech coding and recognition
http://linked.open...UplatneniVysledku	2004
http://linked.open...v/svazekPeriodika	2004
http://linked.open...iv/tvurceVysledku	Černocký, Jan Motlíček, Petr
http://linked.open...n/vavai/riv/zamer	http://linked.opendata.cz/resource/domain/vavai/zamer/MSM%20262200012
issn	0302-9743
number of pages	6 (xsd:int)
http://localhost/t...ganizacniJednotka	26230

Faceted Search & Find service v1.16.118 as of Jun 21 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 112 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software