About MagiBot


·
Also available in: 中文

MagiBot

MagiBot (project name Matarael, hereinafter referred to as MagiBot) is magi.com’s web crawling program (also known as “spider”). Crawling may be used to refer to MagiBot’s extracting and/or updating process of webpages.

We (“we” may be used to refer to MagiBot, other programs of the Magi System, magi.com, or simply Peak Labs Limited) use multiple computers to perform the crawling process, during which MagiBot decides, according to certain algorithms, the websites, the according frequency, and the order to crawl.

Restraining or blocking MagiBot from contents of your site

If you hope to restrain or block MagiBot’s crawling, besides the option of modifying system settings or webpage of your site, you could also guide MagiBot’s crawling through robots.txt.

MagiBot will follow given robots rules (including but not limited to robots.txt, x-robots-tags, rel, etc.), and will give precedence to the rules assigned to User-Agents magibot and/or matarael (case insensitive).

MagiBot also supports tags like noindex, nofollow, nosnippet, noarchive, etc. which limit the indexing of information and the presenting of search results.

There are few rare situations when MagiBot would send requests to path(s) in the disallow list, but no information retrieved in these requests will be indexed or used in magi.com or other parts of Magi System. Some search engines utilize anchor information to generate equivalent presentation of nosnippet contents for path(s) that are disallowed. We will handle such demeanor prudently.

MagiBot maneuvers Keep-Alive according to the crawling plans. If records of abort/reset of socket are found in the log of your server, they are usually dropped voluntarily by MagiBot instead of caused by your server.

The protocol and standards supported by MagiBot

MagiBot supports most current protocol and standards, including but not limited to IDN, IPv6, SLD, structured information of schema.org and Facebook’s OGP tagged in forms of JSON-LD, Microdata, RDF/RDFa, etc., and their derivations. Although Magi has the ability of extracting knowledge and concept from plain text, we still advise the owners of sites to tag entities and information in a structured approach in order to optimize the presentation in search results as well as social networks.

Crawling of information in applications

MagiBot has the ability to crawl contents of smartphone applications. Since there are few precedents, we will act prudently under the corresponding user agreements. The crawling and imitation of user behavior can be prohibited by either:

  1. Declaring robots: 'noindex' in the manifest
  2. Declaring x-robots-tag = 'noindex' in the headers of responses from your APIs