Attributes | Values |
---|
rdf:type
| |
rdfs:seeAlso
| |
Description
| - This work follows up previous studies where EEG frequency spectra of a group of experimental subjects were analyzed in order to find accurate enough classifiers discriminating somnolence (sleepiness) from other brain states. The classifiers considered were complex models whose building blocks were classification forests constructed by the Random Forests method. Since the EEG signals are highly individual, it is necessary to tailor a separate classifier for each subject. Earlier studies have shown, however, that a model (classification forest) based exclusively on the subject's own data may be improved, when combined (through a weighted average of votes for classes) with a model derived from the data of a well selected subset S of other subjects. Different strategies of the search for a proper set S were experimentally compared and a “winning strategy chosen. The starting point of the present study was a strange and undesirable behavior of the best models from the previous research stage, observed when the size of the forests (number of trees) was varied (originally, the default of 500 trees per forest was used): Some of the models deteriorated with the growing forest size. Such phenomena often result from an overfit due to a too extensive model search, but the tendency to overfit is not, as a rule, a property of the components of the models, i.e. of the forests. That has given rise to the hypothesis that combining bigger forest is more prone to overfit than that of the smaller ones. If so, the model search should be the more restricted (i.e. the number of candidate models kept smaller), the bigger forests are combined. The hypothesis is supported by a computational experiment whose results are reported: After applying restrictions to the size of set S (and to the values of some numerical parameters of the models, too), the trend of model deterioration with the growing forest size vanished or, at least, was considerably attenuated.
- This work follows up previous studies where EEG frequency spectra of a group of experimental subjects were analyzed in order to find accurate enough classifiers discriminating somnolence (sleepiness) from other brain states. The classifiers considered were complex models whose building blocks were classification forests constructed by the Random Forests method. Since the EEG signals are highly individual, it is necessary to tailor a separate classifier for each subject. Earlier studies have shown, however, that a model (classification forest) based exclusively on the subject's own data may be improved, when combined (through a weighted average of votes for classes) with a model derived from the data of a well selected subset S of other subjects. Different strategies of the search for a proper set S were experimentally compared and a “winning strategy chosen. The starting point of the present study was a strange and undesirable behavior of the best models from the previous research stage, observed when the size of the forests (number of trees) was varied (originally, the default of 500 trees per forest was used): Some of the models deteriorated with the growing forest size. Such phenomena often result from an overfit due to a too extensive model search, but the tendency to overfit is not, as a rule, a property of the components of the models, i.e. of the forests. That has given rise to the hypothesis that combining bigger forest is more prone to overfit than that of the smaller ones. If so, the model search should be the more restricted (i.e. the number of candidate models kept smaller), the bigger forests are combined. The hypothesis is supported by a computational experiment whose results are reported: After applying restrictions to the size of set S (and to the values of some numerical parameters of the models, too), the trend of model deterioration with the growing forest size vanished or, at least, was considerably attenuated. (en)
|
Title
| - Avoiding overfit by restricted model search in tree-based EEG classification
- Avoiding overfit by restricted model search in tree-based EEG classification (en)
|
skos:prefLabel
| - Avoiding overfit by restricted model search in tree-based EEG classification
- Avoiding overfit by restricted model search in tree-based EEG classification (en)
|
skos:notation
| - RIV/67985807:_____/12:00390576!RIV13-AV0-67985807
|
http://linked.open...avai/riv/aktivita
| |
http://linked.open...avai/riv/aktivity
| - P(ME 949), Z(AV0Z10300504)
|
http://linked.open...vai/riv/dodaniDat
| |
http://linked.open...aciTvurceVysledku
| |
http://linked.open.../riv/druhVysledku
| |
http://linked.open...iv/duvernostUdaju
| |
http://linked.open...titaPredkladatele
| |
http://linked.open...dnocenehoVysledku
| |
http://linked.open...ai/riv/idVysledku
| - RIV/67985807:_____/12:00390576
|
http://linked.open...riv/jazykVysledku
| |
http://linked.open.../riv/klicovaSlova
| - model search; electroencephalography; classification trees and forests; random forests (en)
|
http://linked.open.../riv/klicoveSlovo
| |
http://linked.open...ontrolniKodProRIV
| |
http://linked.open...v/mistoKonaniAkce
| |
http://linked.open...i/riv/mistoVydani
| |
http://linked.open...i/riv/nazevZdroje
| - Proceedings of the 58th World Statistics Congress 2011
|
http://linked.open...in/vavai/riv/obor
| |
http://linked.open...ichTvurcuVysledku
| |
http://linked.open...cetTvurcuVysledku
| |
http://linked.open...vavai/riv/projekt
| |
http://linked.open...UplatneniVysledku
| |
http://linked.open...iv/tvurceVysledku
| |
http://linked.open...vavai/riv/typAkce
| |
http://linked.open.../riv/zahajeniAkce
| |
http://linked.open...n/vavai/riv/zamer
| |
number of pages
| |
http://purl.org/ne...btex#hasPublisher
| - International Statistical Institute
|
https://schema.org/isbn
| |