A distributed crawling system that complies with standards.

Ready-to-use and battle-tested

Instead of using bloated headless browsers, we designed Matarael from square one following numerous standards of IETF RFC and W3C, resulting in highly efficient stream-based crawling and thorough support for advanced features such as JavaScript. We place significant importance on compliance, such that Matarael will follow robots.txt, X-Robots-Tag, and any Robots Exclusion Protocol variations.

Matarael also supports distributed Web Graph construction and Map-Reduce analysis on various storage backends.

Products built upon Matarael

Also available in: 中文