About: Avoiding overfit by restricted model search in tree-based EEG classification

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Avoiding overfit by restricted model search in tree-based EEG classification Goto Sponge NotDistinct Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

Attributes	Values
rdf:type	skos:Concept http://linked.opendata.cz/ontology/domain/vavai/Vysledek
rdfs:seeAlso	http://2011.isiproceedings.org/papers/950644.pdf
Description	This work follows up previous studies where EEG frequency spectra of a group of experimental subjects were analyzed in order to find accurate enough classifiers discriminating somnolence (sleepiness) from other brain states. The classifiers considered were complex models whose building blocks were classification forests constructed by the Random Forests method. Since the EEG signals are highly individual, it is necessary to tailor a separate classifier for each subject. Earlier studies have shown, however, that a model (classification forest) based exclusively on the subject's own data may be improved, when combined (through a weighted average of votes for classes) with a model derived from the data of a well selected subset S of other subjects. Different strategies of the search for a proper set S were experimentally compared and a “winning strategy chosen. The starting point of the present study was a strange and undesirable behavior of the best models from the previous research stage, observed when the size of the forests (number of trees) was varied (originally, the default of 500 trees per forest was used): Some of the models deteriorated with the growing forest size. Such phenomena often result from an overfit due to a too extensive model search, but the tendency to overfit is not, as a rule, a property of the components of the models, i.e. of the forests. That has given rise to the hypothesis that combining bigger forest is more prone to overfit than that of the smaller ones. If so, the model search should be the more restricted (i.e. the number of candidate models kept smaller), the bigger forests are combined. The hypothesis is supported by a computational experiment whose results are reported: After applying restrictions to the size of set S (and to the values of some numerical parameters of the models, too), the trend of model deterioration with the growing forest size vanished or, at least, was considerably attenuated. This work follows up previous studies where EEG frequency spectra of a group of experimental subjects were analyzed in order to find accurate enough classifiers discriminating somnolence (sleepiness) from other brain states. The classifiers considered were complex models whose building blocks were classification forests constructed by the Random Forests method. Since the EEG signals are highly individual, it is necessary to tailor a separate classifier for each subject. Earlier studies have shown, however, that a model (classification forest) based exclusively on the subject's own data may be improved, when combined (through a weighted average of votes for classes) with a model derived from the data of a well selected subset S of other subjects. Different strategies of the search for a proper set S were experimentally compared and a “winning strategy chosen. The starting point of the present study was a strange and undesirable behavior of the best models from the previous research stage, observed when the size of the forests (number of trees) was varied (originally, the default of 500 trees per forest was used): Some of the models deteriorated with the growing forest size. Such phenomena often result from an overfit due to a too extensive model search, but the tendency to overfit is not, as a rule, a property of the components of the models, i.e. of the forests. That has given rise to the hypothesis that combining bigger forest is more prone to overfit than that of the smaller ones. If so, the model search should be the more restricted (i.e. the number of candidate models kept smaller), the bigger forests are combined. The hypothesis is supported by a computational experiment whose results are reported: After applying restrictions to the size of set S (and to the values of some numerical parameters of the models, too), the trend of model deterioration with the growing forest size vanished or, at least, was considerably attenuated. (en)
Title	Avoiding overfit by restricted model search in tree-based EEG classification Avoiding overfit by restricted model search in tree-based EEG classification (en)
skos:prefLabel	Avoiding overfit by restricted model search in tree-based EEG classification Avoiding overfit by restricted model search in tree-based EEG classification (en)
skos:notation	RIV/67985807:_____/12:00390576!RIV13-AV0-67985807
http://linked.open...avai/riv/aktivita	P Z
http://linked.open...avai/riv/aktivity	P(ME 949), Z(AV0Z10300504)
http://linked.open...vai/riv/dodaniDat	2013
http://linked.open...aciTvurceVysledku	Klaschka, Jan
http://linked.open.../riv/druhVysledku	D - Článek ve sborníku
http://linked.open...iv/duvernostUdaju	S - Úplné a pravdivé údaje nepodléhající ochraně podle zvláštních právních předpisů
http://linked.open...titaPredkladatele	Ústav informatiky AV ČR, v. v. i.
http://linked.open...dnocenehoVysledku	124328
http://linked.open...ai/riv/idVysledku	RIV/67985807:_____/12:00390576
http://linked.open...riv/jazykVysledku	eng - angličtina
http://linked.open.../riv/klicovaSlova	model search; electroencephalography; classification trees and forests; random forests (en)
http://linked.open.../riv/klicoveSlovo	electroencephalography classification trees and forests model search random forests
http://linked.open...ontrolniKodProRIV	[37427F3448C0]
http://linked.open...v/mistoKonaniAkce	Dublin
http://linked.open...i/riv/mistoVydani	The Hague
http://linked.open...i/riv/nazevZdroje	Proceedings of the 58th World Statistics Congress 2011
http://linked.open...in/vavai/riv/obor	BB
http://linked.open...ichTvurcuVysledku	1 (xsd:int)
http://linked.open...cetTvurcuVysledku	1 (xsd:int)
http://linked.open...vavai/riv/projekt	http://linked.opendata.cz/resource/domain/vavai/projekt/ME%20949
http://linked.open...UplatneniVysledku	2012
http://linked.open...iv/tvurceVysledku	Klaschka, Jan
http://linked.open...vavai/riv/typAkce	WRD - Světová
http://linked.open.../riv/zahajeniAkce	2011-08-21 (xsd:date)
http://linked.open...n/vavai/riv/zamer	Informatika pro informační společnost: modely, algoritmy, aplikace
number of pages	6 (xsd:int)
http://purl.org/ne...btex#hasPublisher	International Statistical Institute
https://schema.org/isbn	978-90-73592-33-9

Faceted Search & Find service v1.16.118 as of Jun 21 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software