Browse Definitions by Topic

Search Engine

Search engine (SE) is a computer system designed to search for information.

The most well-known search engine applications are web services for searching text or graphic information on the World Wide Web. There are also systems that can search for files on FTP servers, products in online stores, and information in news groups.

To search for information using a search engine, the user formulates a search query. The job of the search engine is to find documents containing either the specified keywords or words related to the user's request. In this case, the search engine generates a search results page. Some engines also extract information from suitable databases and resource directories on the Internet.

Search and maintenance methods are divided into four types of search engines: systems using search robots, systems controlled by humans, hybrid and meta. The architecture usually includes:

  • A search robot (crawler) that collects information from websites or from other documents,
  • Index, providing a quick search for the accumulated information, and
  • Search engine - a system with a graphic user interface for the user.

How does the search engine work?

As a rule, systems operate in stages:

  1. The crawler receives the content;
  2. The indexer generates an index DB with structured data that is searchable;
  3. The SE provides functionality for searching indexed data.

To update the search engine’s collected information, this indexing cycle is repeated.

Search engines work by storing information about web pages, receiving their HTML code and URLs(Uniform Resource Locator).

A crawler is a program that automatically passes through all the links found on the page and highlights them. A crawler, based on references or based on a predefined list of addresses, searches for new documents not yet known to the search engine. The site owner can exclude certain pages using robots.txt, which can be used to prevent the indexing of special files, pages or directories of the site. The search engine analyzes the content of each page for further indexing. Words can be extracted from headers, page text or special fields - meta tags.

A separate crawler is looking for new URLs via scanning the links in the internet. Another robot is visiting each of the new pages to analyse the information and add it to the indexed DB.

An index is a module that analyzes a page by first breaking it into parts using its own lexical and morphological algorithms. All the elements of the web page are isolated and analyzed separately. Data about web pages is stored in the index database for use in subsequent queries.

Additional Terms
Keyword
is a word in the text that gives a concise description of the content of a text document, which allows user to better identify its subject matter. Keywords in the Web are used mainly for searching and are the main way to organize content. Keywords in Text Analysis Key words in text analysis, (including when building an index in search engines), are especially important and representative of a particular nature of words in a text, the set of which can give a high-level description of its content for the reader. Key words (KW) are characterized by the following traits: Frequency - the most common denote the feature of the object, state or effect; Represented by significant vocabulary, sufficiently generalized in their semantics, degree of abstraction, and style. Interrelation - connected with each other by a network of semantic links, intersections of meanings; If the KWs are repeated too often in the text, the search engines may regard this as spam and not promote the given page. The KW set defines the index of words, their frequency, and predictability. Keyword in the markup of web pages In HTML, to specify keywords, there are meta elements with the respective keywords. This way of specifying keywords opens up even more opportunities for abuse. As a result, only some search engines use this metatag as a factor to improve the ranking of the pages, while others don’t. Historically, this aspect was overused in SEO and is now ignored by the leading search engines, like Google. For example, Google often ignores the keywords in the tag, because of too much abuse in the past. However, they are used by other user agents (for example, web browsers for searching bookmarks). In XHTML microformats, the keywords describing the document are presented as a list of links, each of which should lead to a page containing a list of documents that also has this keyword. Thus, the possibility of abuse is somewhat reduced, since each link should lead to real content. For such keywords, the term "tags" are more often used, and at the code level they are implemented using the micro-format tag-rel.
>>
Metadata
discloses information about the characteristics and properties that describe any entities that allow to automatically search and manage them in large information flows. The Difference Between Data and Metadata It is usually impossible to make an unambiguous division into data and metadata in a document because: Something can be both data and metadata. Thus, the title of an article can be simultaneously referred to as metadata (as a metadata element - the title), and to the actual data (since the title is part of the text itself). According to the usual definition, metadata is a set of structured information. You can create metadata for metadata, for output to special devices, or read their descriptions using text-to-speech software. Classification of metadata Metadata can be classified by The content. Metadata can either describe the resource itself (for example, the name and size of the file), or the content of the resource (for example, "this video shows how to play football"). The resource as a whole. Metadata can refer to a resource as a whole or to parts of it. For example, "Title" (movie name) refers to the movie, and "Scene description" (the description of the movie episode) is separate for each episode of the film. Logical inference. Metadata can be divided into three layers: the bottom layer is raw data; middle layer - metadata describing the specified "raw" data; and the top layer is metadata, which allows you to make a logical conclusion using the second layer. The three most commonly used metadata classes are: Internal metadata, which describes the structure or constituent parts of a thing. For example, the format and size of the file. Administrative metadata required for information processing. Such as, information about the author, the editor, the date of publication, etc. Descriptive metadata that describe the nature of a thing, its attributes. For example, a set of information-related categories, links to other subjects related to the the item in question. In search engine optimisation SEO-experts concentrate on the concrete part of metadata - HTML-tags: <title>,< description>,< h1>,< keyword>.  It’s the particular examples of metadata.
>>
SERP
Search engine results page () - a web page generated by the search engine in response to a user's request. One part of the search results is organic. This is a list, found and indexed by the search engine,  which is closely relevant to the query, and the ranking and display is not affected by paid advertised listings. The structure of SE results page divided into several parts, on the top 3 positions the AdWords ads is placed with a special sign “ Advertisement”. Below you’ll see the natural, or organic search results. And on the right - some additional content, like Google maps pages, search related images, etc. Usually it is ordered by descending relevance to the search query according to the ranking algorithms used in the search system. Other types of sorting can also be provided, for example, by the date of the documents. Overview of the SERP Structure There are several areas in the search results page in modern search engines: Organic search results - the main part of the results; Contextual ads (paid links) - small fragments of text, placed in search results on a fee basis. This is one of the main ways to monetize the search engine; Shortcuts (one-boxes, wizards, etc.) - the area in front of the main search results, where you can put a ready answer to a query, useful information or links, or suggest correcting typos in the request; Related request - reformulation and refinement of the entered query, similar requests; Controls: Field for entering a search request, the possibility of automatic prompts (auto-completion); Links to go to the next, previous and several other pages of the issue. Search documents are usually represented by web pages, but many systems are also able to index and provide links to files in formats such as .pdf, .doc, .ppt, etc., pages with Flash-animation (.swf). Some systems have introduced the so-called universal search: these results may be mixed, for example, the query may display results ranging from pictures, videos, news, and maps. With the implementation on the site  structured data, it is increasingly possible to see in the SERP "extended" snippets in search results, which occupy about 20-30% of the Google’s first page.
>>
Additional Terms of SEO
See more words
Net Promoter Score (NPS)
Net Promoter Score (NPS) is an index that identifies customer loyalty to a product or company and is used to assess readiness for re-purchases. How It Works Measuring the NPS loyalty index involves several steps: Consumers are asked to answer the question “What is the probability that you would recommend a company/product/brand to your friends/acquaintances/colleagues?” On a 10-point scale, where 0 corresponds to the answer “I will not recommend it in any way”, and 10 - “ I will surely recommend. " Based on the estimates obtained, all consumers are divided into 3 groups: 9-10 points - product/brand promoters, 7-8 points - passives, 0-6 points - detractors. Calculation of the NPS index itself. NPS =% supporters -% critics As a result, the the user’s loyalty score calculated on the scale from -100 to 100. If all the customers are willing to recommend the product, the score will be about 90-100, if they are not willing to recommend it - the NPS will drop to -90-100 points.   NPS trade mark was registered for the marketing tool, which automates the calculation of the above mentioned data. History Frederick Reichheld is considered the founder of the method, who first announced the method in the article “One Number You Need to Grow”, published in the Harvard Business Review in December 2003. In 2006, he released a book entitled “The Ultimate Question: Driving Good Profits and True Growth”. He continued his arguments on the loyalty, profitability and growth of the company. In 2010, Reichheld conducted research in more than 400 American companies, where the main task was to measure the influence of customer loyalty (measured by NPS) on its growth rate. The main result was the conclusion that the average NPS by market in the industry was 16%, but for companies such as eBay and Amazon NPS it was 75%. Reichheld does not say that communication is present everywhere: it is absent altogether in monopolistic markets. However, industries such as passenger air travel, insurance, and car rental have become a prime example of interconnection. This is obvious, since these companies are service providers, where customer satisfaction and loyalty depend on the level of customer service. As a result, many companies have become adherents of this technology, including Apple, American Express,  eBay, Amazon, Allianz, P & G, Intuit,, Philips, etc. For certain industries, especially software, it has been proven that detractors often stay with the company while passives leave.  This seems to be a relatively high barrier to trade. Faced with criticism of the promoter's score, proponents of the network promoter's approach stated that the proposed statistical analysis only proved that the "recommendation" problem was similar to other indicators in predictive capacity, but failed to solve the real problem and this is the core of the argument presented by Reichheld. Proponents of the method also argue that third-party data analysis is not as good as analyzing the company in its own set of customers, and the actual benefits of the method (simple communication concepts, short survey, customer follow-up features ) exceed any statistical disadvantage of the approach. They also allow inquiries using any other issues to be used in the net promotion system, as long as it meets the criteria to securely classify customers as promoters, passives and detractors.
>>
Headless browser
is a web browser, which communicates with the user in the command-line mode, without a traditional graphical interface. Headless browsers can automate the controls of a web page in an environment similar to some popular browsers. They are particularly useful for testing web pages because they correctly interpret HTML, style sheets and JavaScript execution with AJAX - such functions that are not always available during testing. In 2009, Google began using headless browsers to help its search engine index AJAX3 sites. Headless Browsers Use Cases Headless browsers can be used for: Web app tests automation. Web page screenshots. Automated tests for JavaScript libraries. Web scraping to data recovery. Website interaction automation. Malicious Use Cases Headless browsers can also be used to: Perform DDOS attacks against websites. Increase the number of views. Automatically search for sites for fraudulent use, for example, confidential identifiers. List of Headless Browsers Here is a list of browsers offering headless functions: PhantomJS - a headless browser using the WebKit engine for rendering its pages and JavaScriptCore for javascript execution. PhantomJS was originally developed in 2010. HTMLUnit - also headless, written in Java. HTMLUnit uses Rhino for the JavaScript. TrifleJS - a version of the scriptable Internet Explorer browser that uses the Trident rendering engine and the V8 JavaScript engine. TrifleJS uses the same API as PhantomJS and, works by using the WebBrowser object of the .NET WebBrowser framework to control the version of IE installed on the machine. Splash - it has HTTP API, Lua scripting and an IPython IDE. Splash is written in Python and uses the WebKit rendering engine. Weboob - a Python library. Emulated Headless Browser These browsers emulate the environment of a browser Zombie.js. a navigation environment for Node.js20,21. ENVJS. a navigation environment is written in JavaScript for the Rhino engine. While they are able to support common browsing functions (HTML parsing, XHR, cookie support, etc.), they can not render and have limited support for DOM events. They usually run faster than a typical browser, but are unable to correctly interpret many sites.
>>
Frame
is a unit of digital data transmission in computer and telecommunication networks. In a packet-switched system, a frame is a simple container for a network. In other telecommunication systems, frames are repeating structures that support time division multiplexing. Frame (HTML) - in web-design: the presentation of multiple HTML documents on a separate web page. Frame rate is the number of images displayed on the screen per time unit, usually expressed in FPS (frames per second) Frame (GUI), a box to save other widgets in the graphical user interface A frame typically includes a synchronization feature that has a sequence of bits, ‘or symbols’, indicating the receiver, the received symbol, or the beginning and end of the upload data in the bit stream. If the receiver is connected to the system during transmission, it will ignore the information until it detects a new frame synchronization sequence. Packet switching In the OSI model of a computer network, a frame is a data unit of the link layer. The frame is the result of the last encapsulation layer before the data is transferred by the physical layer. Each frame is separated from the next frame by an interval. It is a series of bits, usually consisting of a frame synchronization, a packet payload, and a frame check sequence. Examples include Point-to-Point Protocol  frames,Fiber Channel frames, Ethernet frames, and V.42 modem frames. Typically, several frames of different sizes are nested within each other. For example, when using the PPP protocol in asynchronous serial communication, the 8 bits of each byte consist of a start bit and a stop bit, and the payload data bytes in the network packet are framed by the header and footer, several packets can be framed with bound bytes. Time division multiplexing In telecommunications, particularly time division multiplexing (TDM) and time division multiple access variants (TDMA), a frame is a cyclically repeated block of data consisting of a fixed number of time slots; each interval is time used for Logical TDM channels or TDMA transmitters. In this context, a framework is usually an entity at the physical layer. Examples of TDM applications are SONET / SDH circuit-switched B channels and ISDN, while TDMA examples are circuit-switched data used in early cellular voice services. This frame is also an entity for time division duplexing, wherein the handset can transmit during certain time slots while receiving other slots.
>>
View all IT-related terms
Results for "DEV"
Logo Magora LTD
close
Get in touch
Do you agree to the personal data processing?

Logo Magora LTD
close
Thank you very much.

Your registration to the webinar on the 27th of September at 2 p.m. BST was successfuly completed.
We will send you a reminder on the day before the event.
Magora team
Registration for a webinar

"Let Smart Bots Speed up your Business"
Date: 27.09.2018 Time: 2 p.m. BST
Do you agree to the personal data processing?