"RIV/00216224:14330/11:00056803!RIV12-MSM-14330___" . . "P(LC536), S" . . "D\u00EDky vyvinut\u00ED tohoto n\u00E1stroje je dosahov\u00E1no \u00FA\u010Dinn\u00E9ho odstran\u011Bn\u00ED duplicitn\u00EDch \u010D\u00E1st\u00ED textov\u00FDch dokument\u016F ve velk\u00FDch textov\u00FDch korpusech sestavovan\u00FDch v Centru zpracov\u00E1n\u00ED p\u0159irozen\u00E9ho jazyka na Fakult\u011B informatiky Masarykovy univerzity. V p\u0159\u00EDpad\u011B nasazen\u00ED m\u00E9n\u011B \u00FA\u010Dinn\u00E9ho n\u00E1stroje by v\u00FDsledn\u00E9 korpusy nedosahovaly po\u017Eadovan\u00FDch kvalit a nebylo by mo\u017En\u00E9 prov\u00E1d\u011Bt \u00FAsp\u011B\u0161n\u011B jazykov\u00E9 anal\u00FDzy v r\u00E1mci Centra, ani poskytovat kvalitn\u00ED data v r\u00E1mci spolupr\u00E1ce s pr\u016Fmyslov\u00FDmi partnery Fakulty informatiky MU. Zdrojov\u00FD k\u00F3d, dokumentace a dal\u0161\u00ED materi\u00E1ly jsou udr\u017Eov\u00E1ny v anglick\u00E9m jazyce, \u010D\u00EDm\u017E je umo\u017En\u011Bna univerz\u00E1ln\u00ED p\u0159\u00EDstupnost n\u00E1stroje. Software byl (v podob\u011B instala\u010Dn\u00EDho bal\u00EDku pro Python) sta\u017Een celkem 23 kr\u00E1t (viz http://code.google.com/p/onion/downloads/list, nav\u0161t\u00EDveno 12. 4. 2012) a d\u00E1le zp\u0159\u00EDstupn\u011Bn v podob\u011B kompletn\u00EDho zdrojov\u00E9ho k\u00F3du a v\u0161ech natr\u00E9novan\u00FDch model\u016F. Lze tedy usuzovat, \u017Ee je testov\u00E1n nebo nasazen dal\u0161\u00EDmi u\u017Eivateli krom\u011B Masarykovy univerzity." . . "onion"@en . . "[B4282612B5C0]" . "http://nlp.fi.muni.cz/projects/onion/" . . "218358" . . "onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts. The tool has been implemented in Python, licensed under New BSD License and made an open source software (available for download including the source code at http://code.google.com/p/onion/). It is being successfuly used for cleaning large textual corpora at Natural language processing centre at Faculty of informatics, Masaryk university Brno and it's industry partners. The research leading to this piece of software was published in author's Ph.D. thesis %22Removing Boilerplate and Duplicate Content from Web Corpora%22. The deduplication algorithm is based on comparing n-grams of words of text." . . "onion" . . "14330" . . "onion" . "http://nlp.fi.muni.cz/projects/onion/" . . "deduplication; corpora; text deduplication; n-gram deduplication; n-gram model"@en . "onion"@en . . . . "1"^^ . "onion" . . . . . . . "Pomik\u00E1lek, Jan" . . "1"^^ . . "RIV/00216224:14330/11:00056803" . . "onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts. The tool has been implemented in Python, licensed under New BSD License and made an open source software (available for download including the source code at http://code.google.com/p/onion/). It is being successfuly used for cleaning large textual corpora at Natural language processing centre at Faculty of informatics, Masaryk university Brno and it's industry partners. The research leading to this piece of software was published in author's Ph.D. thesis %22Removing Boilerplate and Duplicate Content from Web Corpora%22. The deduplication algorithm is based on comparing n-grams of words of text."@en . "Software k odstra\u0148ov\u00E1n\u00ED duplicitn\u00EDch \u010D\u00E1st\u00ED v rozs\u00E1hl\u00FDch souborech textov\u00FDch dokument\u016F. Implementace v jazyce Python. Licence: New BSD License. Odpov\u011Bdn\u00E1 osoba pro jedn\u00E1n\u00ED: doc. PhDr. Karel Pala, CSc.; email: pala@fi.muni.cz; telefon: 549495616; adresa: Karel Pala, Fakulta informatiky Masarykovy univerzity, Botanick\u00E1 68a, 602 00 Brno." .