Natural language understanding has been a tough nut to crack, but thanks to Google’s continued investment in AI, it has reached a whole new level. At I/O 2021, Google introduced MUM — Multitask United Model. According to Google, this new language model is 1000x powerful than BERT, released in 2019. MUM is coming to Google products sometime in the future.
What is MUM: Multitask United Model?
MUM is a language model built on the same transformer system as BERT, which made waves back in 2019. BERT is a powerful language model that proved a breakthrough on release. MUM, however, is upping the ante: according to Google, it’s supposed to be 1000x more powerful than BERT.
A great deal of that power comes from the fact that it can multitask. It doesn’t have to do one task after another, but it can handle multiple tasks simultaneously. This means that it can read text, understand the meaning, form deep knowledge about the topic, use video and audio to reinforce and enrich that, get insights from over 75 languages and translate those findings into multi-layered content that answers complex questions. All at once!
An idea of the power of Google MUM
At I/O 2021, Google’s Prabhakar Raghavan gave an insight into how that would work. He used the complex query “I’ve hiked Mt. Adams and now want to hike Mt. Fuji next fall, what should I do differently to prepare?” to demonstrate what MUM could do. In a regular search session, you’d have to search for all the different aspects yourself. Once you have everything, you need to combine it to have all the answers to the questions.
Now, MUM would combine insights from many different sources on many different aspects of the search, from measuring the mountains to suggesting a raincoat because it’s the rainy season on Mt. Fuji to extracting information from Japanese sources. After all, there’s a lot more written about this specific topic in that language.
In complex queries like this, it all comes down to combining entities, sentiments, and intent to figure out what something means. Machines have difficulty understanding human language, and language models like BERT and MUM get real close to doing just that.
MUM takes it a step farther by processing language and adding video and images because it is multi-modal. That makes it possible to generate a rich result that answers the query by presenting a whole new piece of content. MUM’ll even be incorporated into Google Lens, so you can point your camera to your hiking boots and ask if these are suited to take that hike up to Mt. Fuji!