A distributed crawling system that complies with standards.
Ready-to-use and battle-tested
Instead of using bloated headless browsers, we designed Matarael from
square one following numerous standards of IETF RFC and W3C, resulting
in highly efficient stream-based crawling and thorough support for
compliance, such that Matarael will follow robots.txt, X-Robots-Tag, and
any Robots Exclusion Protocol variations.
Matarael also supports distributed Web Graph construction and Map-Reduce analysis on various storage backends.