There is a land grab underway. Every bit as dramatic as the Klondike but largely unnoticed by the people that it involves most closely.
I am referring to the battle for hearts and minds and for control over the new user interface paradigm that is set to become the most common way to interface with mobile devices.
The giants of mobile technology Microsoft, Apple and Google are quietly fighting it out for their share of the mobile user customer base (not in the way that immediately springs to mind as handset wars) and once they have the consumer to lock them in forever as a customer.
What am I referring to? I’m referring to a combination of speech recognition and very intelligent highly embedded, all pervasive, context-aware voice recognition software. That is a big long string of words so let me pick it apart and explain myself…
Apple has Siri, Microsoft has Cortana and Google has Google Voice. What you may not realise is how sophisticated and integrated these are becoming and how ubiquitous that they are trying to become in your life and your mobile computing experience.
If you go back five years the state of the art in speech recognition was to be able to answer the question “What is the capital of France?”. The mobile device would answer that question by returning a list of web pages that may give you the answer. Recently if you ask the same question the search results will tell you “Paris” and give you some pre-determined summary information about Paris as well as the list of web pages. If on the other hand you ask it “What is the capital of Paris and what is its population?” up until very recently it would have no idea you were referring to Paris when you ask about “its population”.
A software company SoundHound Inc has created a new software called Hound that now understands the context in which you are asking follow-up or multi-part questions. It will happily answer the question “What is the capital of France and what is its population?”. It will even answer questions that are more complex “What is the capital of France, what is its population and how far is it from the capital of Germany?”. To do this it needs to retain context and answers, indeed multiple contexts and answers from the question you are asking and the interactions you are having. This is a really important leap because it’s starting to onto the path to be able to ask your Mobile Assistant “Book me a flight to London on Tuesday. Book me my favourite restaurant for that night and invite Sarah to the meal”.
All this context aware intelligence however is of no real application outside of search unless it is actually integrated into the applications you use on a day to day basis. Speech to date has really only been used to drive search engines and dictate documents. Well surprise, all of the major mobile operating system manufacturers have in their recent releases made the ability to integrate speech into third party mobile applications very very simple indeed and they are pushing hard that you do so.
An example that I have seen is the ability to use the Cortana speech engine from Microsoft to instruct a flight simulator game to fly to a particular location on autopilot. The game App is started, the course for the aircraft set to the desired destination and the aircraft instructed to fly there. In order to do this the speech engine needs to understand what you’re asking, understand it in the context of the concepts of the game (“Fly”, “Location”…) and be able to instruct the game software to carry out your instructions. This is very sophisticated indeed and can be achieved with just a few lines of software added to an existing App.
Imagine if you will that your mobile device has speech integrated into the OS and into every application that you use on a day-to-day basis. Imagine also that the speech is context aware across all the applications carrying context from one to another and that it is able to instruct the applications that you use to carry out actions as well as interrogate them for information.
Finally, I ask you to imagine that these speech enabled applications learn preferences and regular behaviours. They know your favourite restaurant in each location, they understand who you mean when you talk about Sarah and they know when you ask to go and book a meeting in London to book a flight to the nearest airport and why. How could you possibly move to a different mobile platform leaving all that understanding behind you? That is the killer feature that will completely lock you in to whichever platform provides highly integrated, context aware, speech recognition to you first. The major manufacturers Microsoft Apple and Google understand this better perhaps than most of their users at this point in time.
The land grab that is taking place right under our noses today is a platform lock in play that makes our mobile devices so useful by knowing information about our preferences and behaviours and intelligently interpreting it in the context of speech that you will never be able to leave a platform without a feeling that the new device is very poor indeed as any other device will seem dumb in comparison to the device you are using. Not because of any lack of software capability but simply because of a lack of stored context and understanding of preferences and behaviours. I can pretty much guarantee that there will be no way to export your context and understanding from one platform to another so once an Apple user always an Apple user, once a Microsoft user and so on.
For those of you who think speech interfaces are not up to the job consider this, with speech you don’t need to switch between multiple windows or Apps on-screen so the size of the screen no longer matters so much, you don’t need a finger to touch tiny keys on a tiny keyboard and even the concept of distinct Apps starts to become much less important.
I firmly believe that the evolution of the humble mobile App is that users will choose from a menu of available enable “services” that integrate with other services on the platform and the user will no longer see pages of App icons or need an App store as we recognise it today. You don’t need a Calendar App if you can just ask the device “What am I doing today” or “Add a meeting with Mary at 2” and if a speech command can act over a number of services to book a meeting make a flight reservation and invite Sarah to a meal the artificial boundaries enforced by the present App concept become meaningless. I don’t think it is an understatement to say that this will totally change the way we think about mobile device interfaces and use the devices.
For those who doubt that speech recognition works in real life I have dictated this post using the Android voice recognition application in my car in the rain and spent just a few short minutes tidying up the results. The language may not be polished as I do not speak in the same style as I would write a business document or type a LinkedIn Post but it is miles better than I have experienced in the past and without any voice training of the software. It will not work in all circumstances of course but long gone are the days when phone calls were held privately, just take a bus or train trip to learn all sorts about the private lives of your fellow passengers so it is not such a leap to imaging them using speech commands.
Welcome to the new world where the window based GUI paradigm and even keyboards to some extent really don’t need to exist anymore for the purposes of mobile computing and where you are locked into the first platform that you start using. The mobile device that first learns about you as an individual will be your platform for life.