Matarael
A distributed crawling system that complies with standards.
Ready-to-use and battle-tested
Instead of using bloated headless browsers, we designed Matarael from
square one following numerous standards of IETF RFC and W3C, resulting
in highly efficient stream-based crawling and thorough support for
advanced features such as JavaScript. We place significant importance on
compliance, such that Matarael will follow robots.txt, X-Robots-Tag, and
any Robots Exclusion Protocol variations.
Matarael also supports distributed Web Graph construction and Map-Reduce
analysis on various storage backends.
Products built upon Matarael
Also available in:
中文