Browse Definitions by Topic

Data mining

Data mining is a collective term used to denote a set of methods for the detection within data of previously unknown, non-trivial, practically useful and accessible information, which can then be interpreted as necessary for the purposes of making decisions in various spheres of human activity.

The basis of data mining methods comprises all sorts of classification, modelling and forecasting, based on the use of decision trees, artificial neural networks, algorithms, evolutionary programming, associative memory and fuzzy logic. Data mining methods often involve the use of probability and statistical analysis.

One of the most important purposes of data mining methods is to visualise the results of calculations, which makes possible the use of data mining tools by people who lack special mathematical skills.

Problem Statement

Initially, the task is set as follows:

A fairly large database exists

  • Some degree of “hidden knowledge” is assumed to exist somewhere within it
  • Methods must be developed for  detecting knowledge buried within significant volumes of raw data. In the current conditions of global competition, it is precisely the patterns that are found (knowledge) that can serve as a source of additional competitive advantage.

What does "hidden knowledge" mean?

The hidden knowledge is the information, that:

  • Previously unknown that is, knowledge that must be new (rather than confirming some previously received information);
  • Non-trivial – i.e. that which cannot be simply observed (for direct visual data analysis or for calculating simple statistical characteristics);
  • Practically useful knowledge that is of value to the researcher or consumer;
  • Accessible for interpretation knowledge that is easy to present in a user-friendly form and easily explained in terms of the subject area.

These requirements largely determine the essence of data mining methods and in what form and according to what ratio data mining technology is used within database management systems, statistical analysis methods and methods of artificial intelligence.

Data mining and artificial intelligence

The knowledge extracted by data mining methods is usually presented in the form of regularities (patterns) such as:

  • associative rules;
  • decision trees;
  • clusters;
  • mathematical functions.
The algorithms for finding such regularities are at the intersection of the following areas: Artificial Intelligence, Mathematical Statistics, Mathematical Programming, Visualisation, OLAP.
Additional Terms
Barcode
– graphic information applied to the surface, marking or packaging of products, allowing the requisite information to be read via technical means – a sequence of black and white stripes or other geometric shapes. Fields of application Document flow acceleration in banking and other payment systems; Minimisation of data-reading errors due to process automation; Identification of employees (corporate barcode); Organisation of time recording systems; Unification of forms for collecting different types of data (medicine, statistics, etc.); Simplification of warehouse inventory; Control over the availability and promotion of goods in stores, ensuring their safety, etc. Practical use Historically, the EAN / UPC code is most commonly used in trading. Originally, the US UPC system was developed, containing 12 digits for the encoding of the product, and it gained such popularity that European countries began paying attention to it. However, an entire range of codes was already being used to encode goods of the USA and Canada, and the firms were exclusively registered in the USA. The developers of the European encoding system EAN-13 faced a serious task – to extend the range of codes and organise an independent US registration system, ensuring maximum compatibility with UPC encoding. The solution was to add the thirteenth digit to the leftmost position (it is usually indicated by the Arabic digit to the left of the barcode) using 12 digital templates, just as in the UPC. At the same time, it was possible to maintain the backward compatibility of EAN-13 with the UPC coding – which became a subset of the EAN-13 coding with the first digit 0. Logical structure The EAN-13 code, from the point of view of encoding, can be conditionally divided into 5 zones: Prefix of the national organisation GS1 (3 digits); Manufacturer's product registration number (4-6 digits); Product code (3-5 digits); Check digit (1 digit); Additional field (optional barcode field, sometimes there is a ">" sign, "free zone indicator"). How do computer terminals identify different parts of code? They don’t. It’s not necessary. What matters is the unique code, and it’s this code that’s written entirely within the database of a trading enterprise. The exception to this is codes starting with a deuce, where an enterprise can encrypt its own logic for the product. Barcodes are widely used in the automation of the trade sector, especially with big retailers. All the identity criteria, such as ID, names of the goods and prices, can be programmed to be read by the equipment using the barcode.
>> Command Line Interface
Command line interface (CLI) - a kind of textual interface between a person and a computer, in which computer instructions are given mainly by typing text strings (commands) from the keyboard, on UNIX-systems it is possible to use a mouse. Also known as the console user interface. The command-line interface, often mentioned as command-line user interface, is contrasted with the menu-based control systems of the program, as well as to various implementations of the graphic user interface (GUI). The output format of the information in the command-line interface is not regulated; usually, this is a simple text output, but it can also be graphic, audio, etc. Advantages Small memory consumption compared to the menu system. In modern software, there is a large number of commands, many of which are extremely rare. Therefore, even in some programs with a graphical interface, the command line is used: the command set (provided that the user knows this command) is much faster than, for example, navigating through the menu. A natural extension of the command line interface is the batch interface. In essence its a sequence of commands written to a file of ordinary text format, after which the file can be executed in the program, which will lead to the same (in most cases) effect, as if these commands were entered one by one on the command line. Examples - .bat-files in DOS and Windows, shell-scripts in Unix-systems. If the program is fully or almost completely managed by commands from the command line interface, and supports a batch interface, a skilful combination of the command line interface with a graphical interface provides the user with very powerful capabilities. Disadvantages The command-line interface is not user-friendly for those, who have begun familiarizing themselves with the computer with a graphical mode, due to the almost unavailable discoverability. The need to study the syntax of commands and memorize abbreviations is complicated, because each command can have its own designations. Without auto-completion, entering long and special characters from the keyboard can be difficult. No analog input. For example, adjusting the volume with the sounded slider allows you to set the appropriate volume faster than a command like aumix -v 90. Usage Historically, the main areas of application of the command line interface were on computer terminals  in 1960-1980s, for MS-DOS, Unix operational systems and later on on Apple DOS. Now it’s used for development of chats, computer games and program testing.
>> Content Management Application (CMA)
Content Management Application (CMA)  - a computer program used to provide a joint process of creating, managing, and editing content. The main functions of the CMA: Provide tools for creating content, organizing collaboration; Manage content: storage, versioning, adherence, document flow management, etc.; Publish content; Show information in a convenient form for navigation and search. A content management application can contain a variety of data: documents, photos, scientific data, phone number, and so on. Such an application is often used to store, review, manage,  and publish documentation. Version control is one of the main advantages when content is accessed by a group of individuals. Kinds of Apps In general, CMAs are divided into: Enterprise Content Management Application(ECMA);   Web Content Management Application (WCMA); Due to the fact that CMA have a deep internal classification by subject areas, the term CMA replaced the WCMS, becoming a synonym for the website management system. Such CMAs allow you to manage textual and graphical content of the site, providing the user with a convenient interface for working with content, handy tools for publishing and storing information, automating the process of placing data in bases and its output in HTML. There are many ready-made solutions for content management, including free ones. All applications available can be divided into 3 types according to the way they work: Generate pages on request. Applications of this type operate on the basis of the principle “Editing module → Database → Presentation module”. When requested, the presentation module generates a page with content, based on the information from the base. Database information is modified by the editing module. Pages are re-created by the server with each request, which creates an additional load on system resources. The load can be repeatedly reduced by using caching tools that are available in modern web servers. Page generation when editing. Systems of this type are used to edit pages that, when making changes to the content of a site, create a set of static pages. With this method, you are sacrificing interactivity between the visitor and the contents of the site. Mixed type. It combines the advantages of the previous types. It can be implemented by the caching principle - the presentation module generates the page once, later it is loaded much faster from the cache. The cache may be updated either automatically, after a certain period of time or when making changes to certain sections of the site, or manually at the command of the administrator. Another approach is to save certain data blocks at the stage of editing a site and to assemble a page from these blocks when a user requests a corresponding page.
>>
Additional Terms of Data management
See more words
Net Promoter Score (NPS)
Net Promoter Score (NPS) is an index that identifies customer loyalty to a product or company and is used to assess readiness for re-purchases. How It Works Measuring the NPS loyalty index involves several steps: Consumers are asked to answer the question “What is the probability that you would recommend a company/product/brand to your friends/acquaintances/colleagues?” On a 10-point scale, where 0 corresponds to the answer “I will not recommend it in any way”, and 10 - “ I will surely recommend. " Based on the estimates obtained, all consumers are divided into 3 groups: 9-10 points - product/brand promoters, 7-8 points - passives, 0-6 points - detractors. Calculation of the NPS index itself. NPS =% supporters -% critics As a result, the the user’s loyalty score calculated on the scale from -100 to 100. If all the customers are willing to recommend the product, the score will be about 90-100, if they are not willing to recommend it - the NPS will drop to -90-100 points.   NPS trade mark was registered for the marketing tool, which automates the calculation of the above mentioned data. History Frederick Reichheld is considered the founder of the method, who first announced the method in the article “One Number You Need to Grow”, published in the Harvard Business Review in December 2003. In 2006, he released a book entitled “The Ultimate Question: Driving Good Profits and True Growth”. He continued his arguments on the loyalty, profitability and growth of the company. In 2001, Reichheld conducted research in more than 400 American companies, where the main task was to measure the influence of customer loyalty (measured by NPS) on its growth rate. The main result was the conclusion that the average NPS by market in the industry was 16%, but for companies such as eBay and Amazon NPS it was 75%. Reichheld does not say that communication is present everywhere: it is absent altogether in monopolistic markets. However, industries such as passenger air travel, insurance and car rental have become a prime example of interconnection. This is obvious, since these companies are service providers, where customer satisfaction and loyalty depend on the level of customer service. As a result, many companies have become adherents of this technology, including Apple, American Express,  eBay, Amazon, Allianz, P & G, Intuit,, Philips, etc. For certain industries, especially software, it has been proven that Detractors often stay with the company while Passives leave.  This seems to be a relatively high barrier to trade. Faced with criticism of the promoter's score, proponents of the network promoter's approach stated that the proposed statistical analysis only proved that the "recommendation" problem was similar to other indicators in predictive capacity, but failed to solve the real problem and this is the core of the argument presented by Reichheld. Proponents of the method also argue that third-party data analysis is not as good as analyzing the company in its own set of customers, and the actual benefits of the method (simple communication concepts, short survey, customer follow-up features ) exceed any statistical disadvantage of the approach. They also allow inquiries using any other issues to be used in the net promotion system, as long as it meets the criteria to securely classify customers as promoters, passives and detractors.
>> 5G
is the fifth generation mobile communication technology based on the IMT-2020 standard. The speed of Internet access in the 5G network is predicted at around 10 Gbit/s. 5G reduces the signal delay to one millisecond - against 10 milliseconds on 4G networks and 100 milliseconds in 3G. New generations of mobile communication appear every 10 years. Within this interim, time is spent on the development of technology, standards and infrastructure upgrades. It is expected that the 5G network capacity will be enough to serve more than 1 million devices per 1 km² at an average speed of 100 Mbps. Read more about 5G opportunities for business. Who Deals with 5G Networks in the World Today 5G technologies are used by: research laboratories (for example, the 5G Lab Germany laboratory at the Dresden Technical University); mobile operators (British Vodafone, American Verizon and AT&T, Japanese NTT DoCoMo, Swedish Teliaetc); telecom equipment suppliers Swedish Ericsson, (Chinese Huawei, Finnish Nokia, South Korean Samsung, etc.). 5G Applications These apps and services require significantly higher characteristics of a mobile Internet connection, which cannot be implemented in existing commercial LTE networks. It is expected that 5G networks will allow connecting many devices capable of establishing billions of connections, due to which it will be possible to create new services in: Tactile Internet (transmission of touch), IT and Telecom, automotive industry - self-driving cars, entertainment industry, education, agriculture and many others. Due to the 5G networks, it will also be possible to improve the quality of use of already existing services, where large volumes of traffic are involved. Launch of the World's First 5G Network October 1, 2018 Verizon announced the launch of the world's first commercial network of the fifth generation (5G). The operator has deployed it in four US cities: Sacramento, Houston, Los Angeles, and Indianapolis. The company officially declared Houston resident Clayton Harris "the first customer of the 5G network in the world,” which provides an average speed of 300 Mbit/s, and the maximum of 940 Mbit/s.
>> Node.js
is a server platform for working with JavaScript through the V8 engine. JavaScript performs the action on the client side, and Node let the commands, written on JS to be implemented on the server. With Node, front-end programmers can write full-fledged software applications. Node can call commands from JavaScript code, work with external libraries, and act as a web server. Node Advantages Node is easier to scale. When thousands of users connect to the server at the same time, Node works asynchronously, that is, it sets priorities and allocates resources more intelligently. Java, for example, allocates a separate stream for each connection. Features Asynchronous scripts based on events. All Node.js APIs are asynchronous: non-blocking downloads. In essence, this means that a Node based server never expects data to be returned from the API. After the call, the server proceeds to the next API, and the Node.js events notification mechanism helps the server to get a response from the previous call. Very fast. Being built on the Google Chrome V8 JavaScript browser, the Node.js library runs very quickly in code. Single-threaded but easily scalable - Node.js uses a single-threaded model with an event loop. The Event engine helps the server respond in a non blocking way and provides high scalability, unlike traditional servers that create limited threads for processing requests. Node uses a single-threaded program, and the same program can serve much more requests than traditional servers, such as the Apache HTTP Server. No buffering - Node.js apps do not buffer data. Apps simply output data in parts. Where is Node.js used? Node.js has established itself as an ideal technological solution in the following areas: Input / Output applications Streaming apps Intensive use of data in real time (DIRT) JSON API based applications Node is successfully used by such large companies as eBay,Microsoft, PayPal, General Electric, Uber,  GoDaddy, Wikipins, Yahoo!. Read how we build great apps with Node.js.
>>
View all IT-related terms
Results for "DEV"
Logo Magora LTD
close
Get in touch
Do you agree to the personal data processing?

Logo Magora LTD
close
Thank you very much.

Your registration to the webinar on the 27th of September at 2 p.m. BST was successfuly completed.
We will send you a reminder on the day before the event.
Magora team
Registration for a webinar

"Let Smart Bots Speed up your Business"
Date: 27.09.2018 Time: 2 p.m. BST
Do you agree to the personal data processing?