About: Compression of Semistructured Documents

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Compression of Semistructured Documents Goto Sponge NotDistinct Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

Attributes	Values
rdf:type	skos:Concept http://linked.opendata.cz/ontology/domain/vavai/Vysledek
Description	EGOTHOR je vyhledávací stroj indexující web a umožňující hledat webovské dokumenty. Jím dodávaný seznam hitů obsahuje, URL a název hitu, a také snippet snažící se stručně ukázat shodu. Snippet může být téměř vždy vytvořen algoritmem, který úplnou zanlost původního dokumentu (většinou HTML stránky). Z toho plyne, že vyhledávací stroj si musí jako součást indexu uchovávat ke všem dokumentům jejich plné znění. Takovýto požadavek nás vede k odpovídajícím kompresním algoritmům, které umožní zredukovat nároky na místo. Jedním z řešení je použít stávající běžně dostupné metody jako je gzip či bzip2, ale může být výhodnější vyvinout novou metodu, která by mohla využít strukturu dokumentu či textový charakter těch dokumentů. Pro kompresi XML dokumentů již existují specializované kompresní metody. Cílem tohoto příspěvku je integrace těchto dvou přístupů k dosažení optimálního kompresního poměru. (cs) EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as part of the index. Such a requirement leads us to an appropriate compression algorithm which would reduce the space demand. One of the solutions could be some use of common compression methods, for instance gzip or bzip2, but it might be preferable to develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratio EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as part of the index. Such a requirement leads us to an appropriate compression algorithm which would reduce the space demand. One of the solutions could be some use of common compression methods, for instance gzip or bzip2, but it might be preferable to develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratio (en)
Title	Compression of Semistructured Documents Compression of Semistructured Documents (en) Komprese semistrukturovaných dokumentů (cs)
skos:prefLabel	Compression of Semistructured Documents Compression of Semistructured Documents (en) Komprese semistrukturovaných dokumentů (cs)
skos:notation	RIV/00216208:11320/07:00005175!RIV08-AV0-11320___
http://linked.open.../vavai/riv/strany	11;17
http://linked.open...avai/riv/aktivita	P Z
http://linked.open...avai/riv/aktivity	P(1ET100300419), P(1ET100300517), Z(MSM0021620838)
http://linked.open...iv/cisloPeriodika	1
http://linked.open...vai/riv/dodaniDat	2008
http://linked.open...aciTvurceVysledku	Žemlička, Michal Lánský, Jan Galamboš, Leo
http://linked.open.../riv/druhVysledku	J - Článek v odborném periodiku
http://linked.open...iv/duvernostUdaju	S - Úplné a pravdivé údaje nepodléhající ochraně podle zvláštních právních předpisů
http://linked.open...titaPredkladatele	Univerzita Karlova v Praze / Matematicko-fyzikální fakulta
http://linked.open...dnocenehoVysledku	414579
http://linked.open...ai/riv/idVysledku	RIV/00216208:11320/07:00005175
http://linked.open...riv/jazykVysledku	eng - angličtina
http://linked.open.../riv/klicovaSlova	Compression; Semistructured; Documents (en)
http://linked.open.../riv/klicoveSlovo	Compression Semistructured Documents
http://linked.open...odStatuVydavatele	GB - Spojené království Velké Británie a Severního Irska
http://linked.open...ontrolniKodProRIV	[8AB77E37AFD1]
http://linked.open...i/riv/nazevZdroje	International Journal of Information Technology
http://linked.open...in/vavai/riv/obor	JC
http://linked.open...ichTvurcuVysledku	3 (xsd:int)
http://linked.open...cetTvurcuVysledku	4 (xsd:int)
http://linked.open...vavai/riv/projekt	Methods for Intelligent Systems and Their Applications in Datamining and Natural Language Processing Intelligent Models, Algorithms, Methods and Tools for the Semantic Web (realization)
http://linked.open...UplatneniVysledku	2007
http://linked.open...v/svazekPeriodika	4
http://linked.open...iv/tvurceVysledku	Žemlička, Michal Lánský, Jan Galamboš, Leo
http://linked.open...n/vavai/riv/zamer	Moderní metody, struktury a systémy informatiky
issn	1305-2403
number of pages	7 (xsd:int)
http://localhost/t...ganizacniJednotka	11320
is http://linked.open...avai/riv/vysledek of	Compression of Semistructured Documents Compression of Semistructured Documents Compression of Semistructured Documents

Faceted Search & Find service v1.16.118 as of Jun 21 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 91 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software