About: Avoiding overfit by restricted model search in tree-based EEG classification     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
rdfs:seeAlso
Description
  • This work follows up previous studies where EEG frequency spectra of a group of experimental subjects were analyzed in order to find accurate enough classifiers discriminating somnolence (sleepiness) from other brain states. The classifiers considered were complex models whose building blocks were classification forests constructed by the Random Forests method. Since the EEG signals are highly individual, it is necessary to tailor a separate classifier for each subject. Earlier studies have shown, however, that a model (classification forest) based exclusively on the subject's own data may be improved, when combined (through a weighted average of votes for classes) with a model derived from the data of a well selected subset S of other subjects. Different strategies of the search for a proper set S were experimentally compared and a “winning strategy chosen. The starting point of the present study was a strange and undesirable behavior of the best models from the previous research stage, observed when the size of the forests (number of trees) was varied (originally, the default of 500 trees per forest was used): Some of the models deteriorated with the growing forest size. Such phenomena often result from an overfit due to a too extensive model search, but the tendency to overfit is not, as a rule, a property of the components of the models, i.e. of the forests. That has given rise to the hypothesis that combining bigger forest is more prone to overfit than that of the smaller ones. If so, the model search should be the more restricted (i.e. the number of candidate models kept smaller), the bigger forests are combined. The hypothesis is supported by a computational experiment whose results are reported: After applying restrictions to the size of set S (and to the values of some numerical parameters of the models, too), the trend of model deterioration with the growing forest size vanished or, at least, was considerably attenuated.
  • This work follows up previous studies where EEG frequency spectra of a group of experimental subjects were analyzed in order to find accurate enough classifiers discriminating somnolence (sleepiness) from other brain states. The classifiers considered were complex models whose building blocks were classification forests constructed by the Random Forests method. Since the EEG signals are highly individual, it is necessary to tailor a separate classifier for each subject. Earlier studies have shown, however, that a model (classification forest) based exclusively on the subject's own data may be improved, when combined (through a weighted average of votes for classes) with a model derived from the data of a well selected subset S of other subjects. Different strategies of the search for a proper set S were experimentally compared and a “winning strategy chosen. The starting point of the present study was a strange and undesirable behavior of the best models from the previous research stage, observed when the size of the forests (number of trees) was varied (originally, the default of 500 trees per forest was used): Some of the models deteriorated with the growing forest size. Such phenomena often result from an overfit due to a too extensive model search, but the tendency to overfit is not, as a rule, a property of the components of the models, i.e. of the forests. That has given rise to the hypothesis that combining bigger forest is more prone to overfit than that of the smaller ones. If so, the model search should be the more restricted (i.e. the number of candidate models kept smaller), the bigger forests are combined. The hypothesis is supported by a computational experiment whose results are reported: After applying restrictions to the size of set S (and to the values of some numerical parameters of the models, too), the trend of model deterioration with the growing forest size vanished or, at least, was considerably attenuated. (en)
Title
  • Avoiding overfit by restricted model search in tree-based EEG classification
  • Avoiding overfit by restricted model search in tree-based EEG classification (en)
skos:prefLabel
  • Avoiding overfit by restricted model search in tree-based EEG classification
  • Avoiding overfit by restricted model search in tree-based EEG classification (en)
skos:notation
  • RIV/67985807:_____/12:00390576!RIV13-AV0-67985807
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(ME 949), Z(AV0Z10300504)
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 124328
http://linked.open...ai/riv/idVysledku
  • RIV/67985807:_____/12:00390576
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • model search; electroencephalography; classification trees and forests; random forests (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [37427F3448C0]
http://linked.open...v/mistoKonaniAkce
  • Dublin
http://linked.open...i/riv/mistoVydani
  • The Hague
http://linked.open...i/riv/nazevZdroje
  • Proceedings of the 58th World Statistics Congress 2011
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Klaschka, Jan
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
http://linked.open...n/vavai/riv/zamer
number of pages
http://purl.org/ne...btex#hasPublisher
  • International Statistical Institute
https://schema.org/isbn
  • 978-90-73592-33-9
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software