Initially, appliances such as televisions, set-top boxes, and air conditioners required only a small amount of control. Usually, the on/off switch, several selection buttons, and two sets of increase/decrease controls are sufficient.
As technology advances, device support features increase, and command and configuration options increase. But users still want to use a single remote to manage all the features. Engineers start to integrate more complex user interfaces (UI). In order to facilitate the user's convenience, a layered menu appears on the TV screen, and more and more buttons are set in the remote control.
Nowadays, there two important development directions of smart household appliances. First, making devices more “smart” to understand the human language and follow the human's verbal commands to complete the corresponding operations, thus achieving direct language communication and control of the human-machine. Second, the user interface is more user-friendly, so that the elderly and disabled can be used without barriers. Smart devices can connect to other devices and the Internet for added convenience. However, making complicated buttons on the remote control is impractical for manufacturers and the user experience will be poor.
In this article, we will discuss how voice commands can be used to provide a better user experience, especially focus on the Bluetooth Low Energy(BLE) remote control for smart TV.
Voice Command and Remote Control
As the news says “Voice control has advanced from being the technology of the future to being one of the hottest new technologies of the day.” Voice is a very powerful and intuitive interface. A short language which contains enough information can describe very complex commands. Devices now have access to cloud computing and can be used in state-of-the-art recognition engines such as those from Microsoft, Google and Amazon. Today, the services of cloud-based speech recognition provide a very good user experience.
Voice can also be provided to virtual assistants for smart devices which are rapidly expanding over the past few years. Among the software agents that perform tasks for individuals, Apple's Siri, Google Assistant, and Amazon Alexa are the most widely used.
These virtual assistants, which use natural language to match text or voice to execute commands, have been integrated into phones and are rapidly evolving in smart TVs, watches and wearable devices. Virtual assistants offer a wide range of services, including providing information, playing music and videos and configuring devices, and even purchasing items for customers.
Although there has been a constant listening to voice commands, the background noise and the distance between the user and the microphone make it difficult to correctly identify the message. In addition, the amount of data exchanged between the device and the cloud service is so large that most requests in the speech recognition engine are irrelevant. Furthermore, constantly recording environmental sounds can pose serious security and privacy issues.
As a result, a trigger is needed that can be implemented by buttons, gestures or recognizable words or phrases. This solution is suitable for users who are close to the device, such as a smartphone. But it is much more difficult to correctly identify triggers and provide a good user experience in smart TVs, set-top boxes, and other applications that are far from the user. The microphone needs to be close to the user and the remote control already exists. So, the most natural way is embedding a microphone in the remote control which is also known as voice remote control.
Voice Command and Speech Recognition in Smart TV
With the continuous development of science and technology, the emergence of speech recognition technology has enabled the above ideals of human beings to be realized. Speech recognition is a high-tech that allows a machine to recognize and understand a voice signal into a corresponding text or command.
In short, for the voice remote control manufacturer, the voice command function can be expressed as: "Capture 'sufficient' high quality voice recordings, send them to the speech recognition engine, and then process the text results to derive the user's commands."