About: Evaluation of the Document Classification Approaches     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
Description
  • This paper deals with one class automatic document classification. Five feature selection methods and three classifiers are evaluated on a Czech corpus in order to build an efficient Czech document classification system. Lemmatization and POS tagging are used for a precise representation of the Czech documents. We demonstrated, that POS tag filtering is very important, while the lemmatization plays a marginal role for classification. We also showed that maximum entropy and support vector machines are very robust to the feature vector size and outperform significantly the naive Bayes classier from the view point of the classification accuracy. The best classification accuracy is about 90% which is enough for an application for the Czech News Agency, our commercial partner.
  • This paper deals with one class automatic document classification. Five feature selection methods and three classifiers are evaluated on a Czech corpus in order to build an efficient Czech document classification system. Lemmatization and POS tagging are used for a precise representation of the Czech documents. We demonstrated, that POS tag filtering is very important, while the lemmatization plays a marginal role for classification. We also showed that maximum entropy and support vector machines are very robust to the feature vector size and outperform significantly the naive Bayes classier from the view point of the classification accuracy. The best classification accuracy is about 90% which is enough for an application for the Czech News Agency, our commercial partner. (en)
Title
  • Evaluation of the Document Classification Approaches
  • Evaluation of the Document Classification Approaches (en)
skos:prefLabel
  • Evaluation of the Document Classification Approaches
  • Evaluation of the Document Classification Approaches (en)
skos:notation
  • RIV/49777513:23520/13:43919522!RIV14-MSM-23520___
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(ED1.1.00/02.0090), S
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 73677
http://linked.open...ai/riv/idVysledku
  • RIV/49777513:23520/13:43919522
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • support vector machines; part-of-speech tagging; maximum entropy; lemmatization; Czech news agency; automatic document classification (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [7439CE2D5803]
http://linked.open...v/mistoKonaniAkce
  • Milkow
http://linked.open...i/riv/mistoVydani
  • Cham
http://linked.open...i/riv/nazevZdroje
  • Proceedings of the 8th international conference on computer recognition systems CORES 2013
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Král, Pavel
  • Hrala, Michal
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
issn
  • 2194-5357
number of pages
http://bibframe.org/vocab/doi
  • 10.1007/978-3-319-00969-8_86
http://purl.org/ne...btex#hasPublisher
  • Springer-Verlag
https://schema.org/isbn
  • 978-3-319-00968-1
http://localhost/t...ganizacniJednotka
  • 23520
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software