About: Yet Another Language Identifier     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
rdfs:seeAlso
Description
  • Language identification of written text has been studied for several decades. Despite this fact, most of the research is focused on a few most spoken languages, whereas the minor ones are ignored. The identification of a larger number of languages brings new difficulties that do not occur for a few languages. These difficulties are causing decreased accuracy. The objective of this paper is to investigate the sources of such degradation. In order to isolate the impact of individual factors, 5 different algorithms and 3 different number of languages are used. The Support Vector Machine algorithm achieved an accuracy of 98% for 90 languages and the YALI algorithm based on a scoring function had an accuracy of 95.4%. The YALI algorithm has slightly lower accuracy but classifies around 17 times faster and its training is more than 4000 times faster. Three different data sets with various number of languages and sample sizes were prepared to overcome the lack of standardized data sets. These data sets are n
  • Language identification of written text has been studied for several decades. Despite this fact, most of the research is focused on a few most spoken languages, whereas the minor ones are ignored. The identification of a larger number of languages brings new difficulties that do not occur for a few languages. These difficulties are causing decreased accuracy. The objective of this paper is to investigate the sources of such degradation. In order to isolate the impact of individual factors, 5 different algorithms and 3 different number of languages are used. The Support Vector Machine algorithm achieved an accuracy of 98% for 90 languages and the YALI algorithm based on a scoring function had an accuracy of 95.4%. The YALI algorithm has slightly lower accuracy but classifies around 17 times faster and its training is more than 4000 times faster. Three different data sets with various number of languages and sample sizes were prepared to overcome the lack of standardized data sets. These data sets are n (en)
Title
  • Yet Another Language Identifier
  • Yet Another Language Identifier (en)
skos:prefLabel
  • Yet Another Language Identifier
  • Yet Another Language Identifier (en)
skos:notation
  • RIV/00216208:11320/12:10130078!RIV13-MSM-11320___
http://linked.open...avai/predkladatel
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(7E11042)
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 181240
http://linked.open...ai/riv/idVysledku
  • RIV/00216208:11320/12:10130078
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • identifier; language; another (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [89EF398C3C1F]
http://linked.open...v/mistoKonaniAkce
  • Avignon, France
http://linked.open...i/riv/mistoVydani
  • Avignon, France
http://linked.open...i/riv/nazevZdroje
  • Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Majliš, Martin
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
number of pages
http://purl.org/ne...btex#hasPublisher
  • Association for Computational Linguistics
https://schema.org/isbn
  • 978-1-937284-19-0
http://localhost/t...ganizacniJednotka
  • 11320
is http://linked.open...avai/riv/vysledek of
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software