MagiBot (project name
Matarael, hereinafter referred to as MagiBot) is Magi’s web crawling program (also known as “spider”). Crawling may be used to refer MagiBot’s extracting and/or updating process of webpages.
We (“we” may be used to refer MagiBot, other programs of the Magi project, or simply Peak Labs Limited) use multiple computers to perform the crawling process, during which MagiBot decides, according to certain algorithms, the websites, the according frequency, and the order to crawl.
Restraining or blocking MagiBot from contents of your site
If you hope to restrain or block MagiBot’s crawling, besides the option of modifying system settings or webpage of your site, you could also guide MagiBot’s crawling through robots.txt.
MagiBot will follow given robots rules (including but not limited to
rel, etc.), and will give precedence to the rules assigned to User-Agents
matarael (case insensitive).
MagiBot also supports tags like
noarchive, etc. which limit the indexing of information and the presenting of search results.
There are few rare situations when MagiBot would send requests to path(s) in the
disallow list, but no information retrieved in these requests will be indexed or used in other parts of Magi. Some search engines utilize anchor information to generate equivalent presentation of
nosnippet contents for path(s) that are disallowed. We will handle such demeanor prudently.
Keep-Alive according to the crawling plans. If records of abort/reset of socket are found in the log of your server, they are usually dropped voluntarily by MagiBot instead of caused by your server.
The protocol and standards supported by MagiBot
MagiBot supports most current protocol and standards, including but not limited to
SLD, structured information of schema.org and Facebook’s OGP tagged in forms of
RDF/RDFa, etc., and their derivations. Although Magi has the ability of extracting knowledge and concept from plain text, we still advise the owners of sites to tag entities and information in a structured approach in order to optimize the presentation in search results as well as social networks.
Crawling of information in applications
MagiBot has the ability to crawl contents of smartphone applications. Since there are few precedents, we will act prudently under the corresponding user agreements. The crawling and imitation of user behavior can be prohibited by either:
robots: 'noindex'in the manifest
x-robots-tag = 'noindex'in the headers of responses from your APIs