Meet Alexa: How we taught our Meeting Room to speak
Besides being fun, the project proved to be quite useful and gave us the opportunity for a deep dive into a new technology stack. This post will explain why we chose to use Alexa, how Alexa is integrated with our meeting room and which obstacles we needed to overcome in the process.
Let’s explore the available options for simplifying the use of the meeting room tech stack. We could have followed the most straightforward analog route and put helpful stickers on all the devices. That would have been a simple solution, but it would still be cumbersome to use. An universal remote would have improved the user experience but we wanted to be able to control non-IR devices and we didn’t want to worry about another physical device that needed to be in its place in order to start a meeting. The same goes for a chat bot or web app: Users would need to bring their laptop and know how to access the service. Also, guests wouldn’t be able to use this at all. An Alexa skill can be used intuitively by everyone with access to the meeting room without additional hardware. So we chose to pursue this solution.
We took a Raspberry Pi 3 with an IR transceiver shield to emit the infrared signals for turning the projector on and off. The LIRC project provided the necessary infrared codes and tooling to successfully control the projector. Our HDMI switch is used to select one of 8 inputs for the projector. The switch is Ethernet-enabled and can be accessed over HTTP. We needed to build a screen scraper for this to work since it doesn’t offer any kind of API, though it works pretty well. Next, we made sure that the Alexa skill was able to communicate with the Raspberry Pi.
Alexa skills can be implemented in two ways. They can use a RESTful web service or run a function inside Amazon’s function-as-a-service offering Lambda. Instead of exposing a RESTful API to the internet we chose to use a Lambda function and a publish-subscribe (PubSub) protocol for sending messages to the Pi. One of the most popular PubSub protocols for IoT applications is MQTT. Thankfully, Amazon offers a MQTT broker service – AWS IoT – which is nicely integrated with Lambda. We wrote a small Ruby script (around 100 LOC) which subscribes to a topic at our AWS IoT broker and listens for messages from the Lambda function behind our Alexa skill.
First, we thought that building the skill would be straightforward. We soon discovered that we needed to carefully streamline the user experience. For example, Alexa skills need an activation word. This is the only thing that a user should know to successfully use the skill. Users can start a skill by saying phrases like “Alexa, open [activation word]” or “Alexa, start [activation word]”. We tried words like meeting room but finally settled with "projector" because that’s what describes the user’s intent most accurately. This means that starting a meeting is as easy as saying “Alexa, start projector”. We printed this sentence on labels on the Echo Dot.
Second, we needed to make sure that choosing an input is both intuitive and fast. Although all the inputs have self-describing names such as “HDMI Table”, we didn’t want to assume that all the users needed to know exactly what to say. After much trial and error, a combination of spelling out the most popular options and providing a command for listing all inputs seemed most appropriate. We also made sure to include aliases for commonly used devices. This is what a typical conversation looks like:
"Alexa, start projector."
— User
"Which device do you want? Apple TV, Laptop or the full list?"
— Alexa
"Apple TV"
— User
"I’ll activate the input ‘Apple TV’."
— Alexa
You may want to know how we address the privacy concerns some may have when there’s a microphone listening to the conversations in the meeting room. On the one hand, it is true that Amazon sends voice data to the cloud for processing. You can even listen to them using the Alexa app. However, it seems that the transmission only begins after saying the activation word. Basically, this means that Alexa won’t send any voice data to the cloud when the device is muted using the mute button. And there will always be a fallback solution: if the conversation is sensitive enough, just unplug the device. This also applies to your smart phone, laptop, smart TV and your soy milk maker, by the way. Welcome to the future.
Let’s sum it up. We had great fun building the project, and as far as we heard, the users seem to enjoy it just as much. It also makes for a fun demonstration for all people visiting our office in Münster. As an added bonus, other Alexa features such as her famously non-funny jokes also seem to be used frequently. We explored many more related areas and opportunities besides just the Alexa skill and the hardware. For example, we looked at how to collect metrics from skills in order to determine user frequency and experience for improving the solution in the future. We also made maintenance hassle-free by checking all configuration, including the Lambda function and associated resources, into version control to treat them as Infrastructure as Code. We’ll present those learnings in more blog posts on this topic in the future.