About: Language Identification on the Web: Extending the Dictionary Method

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Language Identification on the Web: Extending the Dictionary Method Goto Sponge NotDistinct Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

Attributes	Values
rdf:type	skos:Concept http://linked.opendata.cz/ontology/domain/vavai/Vysledek
Description	Automated language identification of written text is a well-established research domain that has received considerable attention in the past. By now, efficient and effective algorithms based on character $n$-grams are in use, mainly with identification based on Markov Processes or on character $n$-gram profiles. In this paper we investigate the limitations of these approaches when applied to real-world web pages. The challenges to be overcome include language identification on very short texts, correctly handling texts of unknown language and texts comprised of multiple languages. We propose and evaluate a new method, which constructs language models based on word relevance and addresses these limitations. We also extend our method to allow us to efficiently and automatically segment the input text into blocks of individual languages, in case of multiple-language documents. Automated language identification of written text is a well-established research domain that has received considerable attention in the past. By now, efficient and effective algorithms based on character $n$-grams are in use, mainly with identification based on Markov Processes or on character $n$-gram profiles. In this paper we investigate the limitations of these approaches when applied to real-world web pages. The challenges to be overcome include language identification on very short texts, correctly handling texts of unknown language and texts comprised of multiple languages. We propose and evaluate a new method, which constructs language models based on word relevance and addresses these limitations. We also extend our method to allow us to efficiently and automatically segment the input text into blocks of individual languages, in case of multiple-language documents. (en)
Title	Language Identification on the Web: Extending the Dictionary Method Language Identification on the Web: Extending the Dictionary Method (en)
skos:prefLabel	Language Identification on the Web: Extending the Dictionary Method Language Identification on the Web: Extending the Dictionary Method (en)
skos:notation	RIV/00216224:14330/09:00067120!RIV14-MSM-14330___
http://linked.open...avai/riv/aktivita	P S
http://linked.open...avai/riv/aktivity	P(LC536), S
http://linked.open...vai/riv/dodaniDat	2014
http://linked.open...aciTvurceVysledku	Řehůřek, Radim
http://linked.open.../riv/druhVysledku	D - Článek ve sborníku
http://linked.open...iv/duvernostUdaju	S - Úplné a pravdivé údaje nepodléhající ochraně podle zvláštních právních předpisů
http://linked.open...titaPredkladatele	Masarykova univerzita / Fakulta informatiky
http://linked.open...dnocenehoVysledku	323178
http://linked.open...ai/riv/idVysledku	RIV/00216224:14330/09:00067120
http://linked.open...riv/jazykVysledku	eng - angličtina
http://linked.open.../riv/klicovaSlova	machine learning; language segmentation; language identification (en)
http://linked.open.../riv/klicoveSlovo	language segmentation language identification machine learning
http://linked.open...ontrolniKodProRIV	[A5273553D9CC]
http://linked.open...v/mistoKonaniAkce	Mexico City, Mexico
http://linked.open...i/riv/mistoVydani	Mexico City, Mexico
http://linked.open...i/riv/nazevZdroje	Computational Linguistics and Intelligent Text Processing, 10th International Conference, CICLing 2009, Proceedings.
http://linked.open...in/vavai/riv/obor	IN
http://linked.open...ichTvurcuVysledku	1 (xsd:int)
http://linked.open...cetTvurcuVysledku	2 (xsd:int)
http://linked.open...vavai/riv/projekt	Integrated center for natural language processing
http://linked.open...UplatneniVysledku	2009
http://linked.open...iv/tvurceVysledku	Řehůřek, Radim Kolkus, Milan
http://linked.open...vavai/riv/typAkce	WRD - Světová
http://linked.open...ain/vavai/riv/wos	000265681200029
http://linked.open.../riv/zahajeniAkce	2009-03-01 (xsd:date)
issn	0302-9743
number of pages	12 (xsd:int)
http://bibframe.org/vocab/doi	10.1007/978-3-642-00382-0_29
http://purl.org/ne...btex#hasPublisher	Springer-Verlag
https://schema.org/isbn	9783642003813
http://localhost/t...ganizacniJednotka	14330

Faceted Search & Find service v1.16.118 as of Jun 21 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 85 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software