What Is Magi?
Magi is a machine-learning-based information extraction and retrieval system developed by Peak Labs. Magi can summarize knowledge from natural language texts in any field into structured data, and provide human users as well as other AI an interpretable, retrievable, and traceable knowledge system that can automatically gather and amend the information through lifelong learning.
What Can Magi Do?
If you reach here from magi.com, then congratulations! You have found a fraction of Magi. This search-engine-like website is the public version of Magi. Unlike other search engines, magi.com not only collects the gigantic amount of text on the Internet, it also engages itself in understanding the underlying knowledge in the text.
Try to search something you are interested in, or directly ask questions. magi.com will strive to provide you with highly aggregated and structured knowledge results.*
Every piece of information is colored in accordance with its credibility. You can click on it to reveal the sources where Magi learns that knowledge.
We have also developed a full-scale search engine for Magi from scratch, which presents normal search results from the entire Internet. So even if the unfortunate situation of no structured results available occurs, you would still get something.*
Moreover, the learning process mentioned above is running 24/7 and unsupervised. Knowledge in breaking news can be learned in around 5 minutes. The credibility of the learned knowledge will be continuously observed and re-evaluated as the sources that can be cross-referenced increases, and the accidentally learned mistakes would be automatically corrected accordingly.
* As of now, magi.com provides mostly Chinese results. Local laws and regulations may apply.
Presently, little knowledge on the Internet is manually sorted into formats interpretable by machines, such as encyclopedia-like websites and databases of certain vertical fields. Yet, such information only represents a drop of the ocean, and has encountered challenges in fulfilling the growing needs of automation and artificial intelligence with regard to the scale of coverage, the frequency of update, and the evaluation of credibility.
The key problem is that, naturally, human beings understand natural language, but have limited stamina and attention, thus cannot keep up with the generating speed of valuable information, nor keep the quality steady or the stance objective. Meanwhile, machines are never tired and have incredible speed, but are loss at what to do with complicated free texts, and let great value between the lines slip away.
Imagine an evergrowing and self-updating database, containing structured information extracted from texts on the whole Internet that is sufficiently algorithm-friendly for programs. Then,
- Voice assistants might stop repeating “Sorry, I don’t understand”,
- Business intelligence might acquire broader background knowledge to make better decisions,
- Fintech services might significantly increase the efficiency of data collection and verification,
- … …
As the public version, magi.com provides human users with a novel approach of interacting with data on the Internet. The technology platform behind Magi, however, is the bigger picture - To empower machines with the ability to utilize the limitless knowledge on the Internet like human beings.
Until recently, the question-answering systems are designed to serve human beings, and the text answers they give can hardly be directly put to use by downstream tasks. Also, the question-answering models have limited capacity in volume and efficiency of updates, resulting in poor scalability. Furthermore, the knowledge of the models lies in “blackboxes” of countless numbers, hardly explainable and almost untraceable, which, in our opinion, cannot be considered responsible or reliable to present to users. Besides, the solutions based on document retrieval cannot meet the needs of structurization, nor guarantee global optima due to the limitation of resource in real time situations of online services, even though the criteria of queries entered by users are relatively demanding.
In short, we believe that the importance of knowledge acquisition is higher than question answering, and that proactively discovering knowledge together with continuously revising is a better approach than passively matching answers according to a given question. It is already a tough task to make machines understand language, but Magi aims at the toughest - text in open fields on the Internet - and directly faces the key dilemma of scalability and accuracy in knowledge engineering.
A simple sentence could contain multiple sets of interwound information, while Magi deals with articles full of grammar mistakes and factual errors. Imagine the trouble.
To maximize the utilization rate of information, Magi endeavors to extract all knowledge from every piece of text of different quality and topic. Existing solutions are not capable of such task, which is not a well-defined sequence labeling problem, bears an overwhelming searching space due to interwound relationships, and has no available training data since there is no limit on the category or the form of the text.
We developed the entire technology stack from square one. A Distributed Search Engine with an original succinct indexing structure, a Neural Information Extraction System with tailor-made Attention network, a Stream-Based Crawling System independent of headless browsers, a Multilingual Analyzing Pipeline capable of handling the mixture of more than 170 languages, etc. Meanwhile, our one and unique set of training/pre-trained data is curated throughout the years.
More importance would be attached to more reliable sources through introducing the query-independent features in traditional ranking algorithms. The model, which is based on multi-level transfer learning, abandons pre-set grammatical rules, semantic role labeling, dependency parsing, and other methods that restrain the ability of generalization, and achieves satisfying outcomes on multiple languages with zero resource. This system continues to learn and adjust itself with the aggregation of data and expansion of sources, and erases the accidentally learned noises or false knowledge.
Magi consists of each and every unique feature mentioned above. As an exceptional and progressive project, Magi regularly releases some data and related works on Zenodo and arXiv.
Currently Magi is far from mature, but its characteristics give it infinite possibility.
Taking the most troublesome task of open field information processing as a start, Magi has proven itself the potential to become
the One system to rule them all. Instead of knocking on the doors of each field of text separately, Magi intends to solve them all at once, which could be the difference between finity and infinity.
Following the increase of the amount and credibility of data, Magi could become the ImageNet of knowledge, and benefit other industries. A better solution of information processing in many vertical industries can be realized through fine-tuning of existing Magi models with little data.
In the near future, with the advance of the industry, the structured semantic network of everything that Magi builds could peradventure become the cornerstone of Explainable AI.