Attributes | Values |
---|
rdf:type
| |
rdfs:seeAlso
| |
Description
| - The source code represents the backend of web application. It is written in the Python 2.7, gui is not neccessary because the script is run in the specified time interval automatically by Phone/PC. Front-end can be made individually (JS, PHP, mySQL). For example: http://space-walk.info/phd/pages/cz/realmining.php. The main goal is to analyze BIG DATA volumes from the websites in the real-time and visualise them through the selected API. The Plot.ly and Google services are used in this case along with mySQL, PHP, Javascript to handle processing and visualisation. Code includes basic classes to handle HTML structure: MLStripper(HTMLParser): - clears the HTML structure (tagy, JS, atp.) ParseIt: - analyses the target websites - utilizes Counter collection and BeautifulSoup library for easier HTML transformation to classes, which allows elegant atribute handling - saves data into associative arrays, dictionaries, sometimes in the multidimensional structure - words are filtered by the bad word dictionaries Badwords: - manipulation with the bad word dictionaries, usage is optional, the stopwords.txt is usualy good enough PublishResults: - utilizes Plot.ly service as an API to visualize graphs - necessary to set up the app_cfg.py to access mySQL and API account SpecificAnalyzes: - searches top word contexts, based on the parametrical values - distance Crimes - searches through the set of words that are familiar to specific crime - in case of positive occurrance, the JSON dictionary of towns is scanned and the adequate town is returned (only for towns with more than 5 000 inhabitants). - JSON is utilized because of complicated structure of the Czech language
- The source code represents the backend of web application. It is written in the Python 2.7, gui is not neccessary because the script is run in the specified time interval automatically by Phone/PC. Front-end can be made individually (JS, PHP, mySQL). For example: http://space-walk.info/phd/pages/cz/realmining.php. The main goal is to analyze BIG DATA volumes from the websites in the real-time and visualise them through the selected API. The Plot.ly and Google services are used in this case along with mySQL, PHP, Javascript to handle processing and visualisation. Code includes basic classes to handle HTML structure: MLStripper(HTMLParser): - clears the HTML structure (tagy, JS, atp.) ParseIt: - analyses the target websites - utilizes Counter collection and BeautifulSoup library for easier HTML transformation to classes, which allows elegant atribute handling - saves data into associative arrays, dictionaries, sometimes in the multidimensional structure - words are filtered by the bad word dictionaries Badwords: - manipulation with the bad word dictionaries, usage is optional, the stopwords.txt is usualy good enough PublishResults: - utilizes Plot.ly service as an API to visualize graphs - necessary to set up the app_cfg.py to access mySQL and API account SpecificAnalyzes: - searches top word contexts, based on the parametrical values - distance Crimes - searches through the set of words that are familiar to specific crime - in case of positive occurrance, the JSON dictionary of towns is scanned and the adequate town is returned (only for towns with more than 5 000 inhabitants). - JSON is utilized because of complicated structure of the Czech language (en)
|
Title
| - Real-time big data webmining and data processing
- Real-time big data webmining and data processing (en)
|
skos:prefLabel
| - Real-time big data webmining and data processing
- Real-time big data webmining and data processing (en)
|
skos:notation
| - RIV/00216275:25410/13:39896075!RIV14-MSM-25410___
|
http://linked.open...avai/predkladatel
| |
http://linked.open...avai/riv/aktivita
| |
http://linked.open...avai/riv/aktivity
| |
http://linked.open...vai/riv/dodaniDat
| |
http://linked.open...aciTvurceVysledku
| |
http://linked.open.../riv/druhVysledku
| |
http://linked.open...iv/duvernostUdaju
| |
http://linked.open...onomickeParametry
| - Rychlé získání potřebných informací z velkého objemu dat, multioborové
|
http://linked.open...titaPredkladatele
| |
http://linked.open...dnocenehoVysledku
| |
http://linked.open...ai/riv/idVysledku
| - RIV/00216275:25410/13:39896075
|
http://linked.open...terniIdentifikace
| |
http://linked.open...riv/jazykVysledku
| |
http://linked.open.../riv/klicovaSlova
| - python, webmining, big data (en)
|
http://linked.open.../riv/klicoveSlovo
| |
http://linked.open...ontrolniKodProRIV
| |
http://linked.open.../licencniPoplatek
| |
http://linked.open...in/vavai/riv/obor
| |
http://linked.open...ichTvurcuVysledku
| |
http://linked.open...cetTvurcuVysledku
| |
http://linked.open...UplatneniVysledku
| |
http://linked.open...echnickeParametry
| - python 2.7, beautiful soup
|
http://linked.open...iv/tvurceVysledku
| |
http://linked.open...avai/riv/vlastnik
| |
http://linked.open...itiJinymSubjektem
| |
http://localhost/t...ganizacniJednotka
| |