Raised $75 million in fresh funding, bringing its total funding to $115 million so far
SoundHound Inc., a leading innovator in voice-enabled artificial intelligence (AI) and conversational intelligence technologies known for its music recognition app, raised $75 million in fresh funding, as reported by Bloomberg on January 31st, 2017. So far, the 12-year-old Santa Clara, California-based startup has raised $115 million in funding. The latest investment round came from investors such as Nvidia Inc. (NASDAQ: NVDA), Samsung’s Catalyst Fund, Nomura Holdings, Sompo Japan Nipponkoa Insurance, Recruit Holdings’ RSI Fund, Kleiner Perkins Caufield & Byers, SharesPost 100 Fund, and MKaNN. Global Catalyst Partners, Walden Venture Capital, and Translink Capital Partners have returned to participate in the series D round. Research firm Pitchbook Inc. estimates SoundHound’s valuation to be around $800 million.
SoundHound is best known for its self-titled app, which competes alongside Shazam to detect and identify the music one listens to. The SoundHound app has been hugely popular and has been downloaded more than 300 million times. In March 2016, the company launched Houndify, its virtual assistant platform for the iPhone and Android, using which third-party developers can build numerous applications, much like a mobile version of Amazon.com Inc.’s (NASDAQ: AMZN) Alexa, Alphabet Inc.’s (NASDAQ: GOOG) Google Assistant and Apple Inc.’s (NASDAQ: APPL) Siri.
CEO Keyvan Mohajer stated that the company is aiming to build a conversational AI platform much like what Amazon is hoping to do with Alexa. Since the launch of Houndify, the Company’s service has been integrated into more than 500 distinct products ranging from cars, robots, and appliances, with more than 20,000 partners. The Company now hopes to use the same kinds of AI patterns to expand into other media like text and images.
Houndify to add more domains through collective AI
Much of the new funding will be used to add more domains to its Houndify app, and make the platform available in more markets, specifically Asia and Europe. SoundHound hopes to license the proprietary technology behind its independent Houndify platform to even more users globally and amplify the rollout of its Collective AI architecture. For businesses looking to incorporate voice into their existing devices, building it themselves can be time intensive, expensive, and resource heavy. However, by licensing the Houndify platform, they could accelerate development of such devices.
The fresh round of funding round assumes significance for SoundHound at a time when the estimated $14.4 trillion Internet of Things (IoT) market has an urgent need for practical and innovative AI technologies, including natural language processing and voice interfaces. With hundreds of millions of mobile app downloads globally and more than 100 patents for the Houndify speech AI platform, SoundHound is the only privately held company to own an entire suite of proprietary speech and language understanding technologies. Moreover, Houndify is the only independent platform that allows partners to develop, own and control their own AI strategy, data, and brand.
SoundHound’s strategic investors will also use the Company’s Houndify AI platform and its patented Speech-to-Meaning™ and Deep Meaning Understanding™ technologies in their devices. Houndify’s registered platform that are using the platform for their products include Samsung ARTIK Smart IoT platform, and NVIDIA, which brings Houndify’s large vocabulary speech recognition and natural language understanding to cars even without cloud connection by utilizing NVIDIA graphics processing units (GPUs). SoundHound also uses NVIDIA GPUs for fast training of the models powering its Houndify platform.
Houndify to compete with Google, Amazon
Soundify is looking to compete with Amazon.com and Google to build AI that helps machines understand human voices. As more everyday devices get connected to the internet, using speech to control and direct them will become the dominant form of interaction. The Company aims to encourage device makers to use voice AI tools offered by SoundHound rather than try to build their own. This is because most of the other speech-recognition engines from Apple, Baidu Inc., and Microsoft Corp. (NASDAQ: MSFT) control how the software can be used and the data that is generated. However, when customers build voice-enabled devices or apps using SoundHound’s technology, the startup does not own the users or the data.
SoundHound pushes for Collective AI
One of the key advantages of the Houndify speech AI platform is its architecture for collaborative intelligence called Collective AI, which enables developers to extend the functionality of existing knowledge domains without needing access to or a full understanding of the underlying libraries. This results in a global AI with comprehensive knowledge that is always learning, is crowd-sourced to domain experts, and is larger than the sum of its parts.
Houndify’s Collective AI architecture already provides access to knowledge and data from Yelp Inc. (NYSE: YELP), Uber Technologies Inc., and Expedia Inc. (NASDAQ: EXPE), as well as over 100 other domains such as weather, stocks, sports, local businesses, flights, hotels, mortgage, and even interactive games. This in turn enables developers using SoundHound’s software to build voice-enabled products that better understanding what the user is saying. Houndify also provides a large number of domains targeted at the automotive industry.
Houndify is featured in SoundHound’s mobile apps: Hound, the voice search & assistant app, and SoundHound, the music search, discovery and play app, making them hands-free and voice interface-enabled. Houndify entered into a recent partnership with SELVAS AI, which allows Houndify users to incorporate more than 20 distinct text-to-speech solutions in multiple languages into their products or services. Houndify also has a partnership with Rand McNally, which uses the company’s voice technology in its OverDryve™ connected car device; a collaboration with Onkyo, to develop and market a next-generation series of smart speakers powered by Houndify; an integration of Houndify in Sharp’s RoBoHoN® platform; and a collaboration with Shenzhen Tanscorp Technology Co. for the Robot LQ-101, an intelligent family service robot.
Different approach to speech AI
SoundHound has also taken a different approach to building its speech AI, where the technology will in real-time identify the words and work on deciphering the context, which provides faster results. Most other speech and language interpretation technologies take a piecemeal approach where the software figures out the words from the audio and then deciphers the meaning. Based on the description of the technology, it is likely that SoundHound’s approach uses incremental recognition, where the software does not wait until the user stops talking to try and interpret the words. However, optimizing for speed in this way may result in a speech and language interpretation system that struggles with certain use cases.
Other breakthroughs in speech AI
Among the other notable breakthroughs in speech AI, Microsoft announced in October 2016 that its Artificial Intelligence and Research unit achieved a major breakthrough in speech recognition by developing the first technology that recognizes words in a conversation as well as humans do. Microsoft’s speech recognition system makes the same or fewer errors than professional transcriptionists, with a word error rate (WER) of 5.9%. The 5.9% error rate is about equal to that of people who were asked to transcribe the same conversation, and it is the lowest ever recorded against the industry standard Switchboard speech recognition task.
The milestone will have broader implications for consumer and business products that can be significantly augmented by speech recognition. That includes consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription and personal digital assistants such as Cortana.
Similarly, Google’s UK-based DeepMind unit, which is working to develop super-intelligent computers, achieved a breakthrough in creating some of the most realistic, human-level speech ever achieved from a computer, in September 2016. Named as WaveNet, the new AI system acts as a deep neural network that is capable of generating speech from a human voice sample and then generating raw audio waveforms of its own. DeepMind’s testing among English and Mandarin Chinese listeners has revealed that WaveNet is much better than existing text-to-speech systems (TTS), but still short of being as convincing as a real human’s speech. WaveNet, which is said to outperform existing technology by 50%, is designed to mimic how parts of the human brain function and can imitate human speech by learning how to form the individual sound waves that a human voice creates.