Skip to main content

SensorFlow IT Requirements and Architecture (Cybersecurity)

This document describes the network-related requirements of the solution, used technologies, protocols and communication channels.

L
Written by Leigh Riley

Gateways

Our gateways are the backbone of the SensorFlow infrastructure. Their main purpose is twofold. They serve as a proxy for our sensors to forward the recorded sensor data to the cloud as well as pass on commands to the sensors that are received from the dashboard. Secondly, they are housing our decisionmaking engine which is in charge of controlling instrumented appliances based on the recorded sensor information (e.g. no one is in the room -> switch off AC) We chose this approach for two main reasons:

  1. Security: Time and time again we find stories in the news of Internet of Things (IOT) devices being attacked by hackers. These stories have a common denominator: The vendors connected them directly to the internet but failed to realize that the limited capabilities of the devices leave them vulnerable to sophisticated attacks and/or they took shortcuts in the implementation of the security layer of the IOT devices. It’s for this reason, we chose to shield our devices behind our gateways to limit the attack surface and enable all communication through a much more sophisticated device that can effortlessly implement state of the art security protocols.

  2. System performance: The Internet of Things is supposed to make buildings smarter, however, we often find that suppliers are relying too much on the cloud and prominent internet connectivity. To avoid that a smart building turns into a dumb building as soon as the internet drops, we designed our system in a way that all automation decisions are made on the gateway and not in the cloud. Therefore, even without internet connectivity, all appliances will still be monitored and automated, recorded data will be persisted on the gateway and streamed back to the SensorFlow cloud once the internet is back online.

SensorFlow’s IOT sensors communicate exclusively with the SensorFlow gateway using our proprietary Airlink protocol. This communication happens on sub GHz bands, usually between 920-925 MHz depending on the country and thus does not interfere with existing Bluetooth or WiFi infrastructure. Each gateway can support up to 960 sensor Nodes, but in practice, we find that we usually connect not more than 300 nodes per gateway, depending on achievable wireless range, due to building layout and material. Usually, we install not more than 15 Gateways per site, with an average number of gateways installed numbering around 7-8.

Gateway Requirements

Our gateways are dependent on the site’s internet infrastructure and can be connected via WiFi or ethernet, with ethernet being the preferred option. We do not open any ports to the outside and all communication is outbound initiated using normal requests or VPN connections. We do not use any proxy infrastructure. Additional minimization of attack surface can be achieved if the client sets up a sub network for our gateways to connect to. With this, we can isolate our communications from the hotel’s network. Any firewall infrastructure should be implemented by the client network provided and will not affect our Gateways as all communication is outbound. The gateway requires the following ports to be open on the firewall for outbound communication:

Bandwidth Requirements

The bandwidth requirements for the gateways are very low, as they are mainly forwarding IOT data in the form of short json messages as well as log messages for diagnostic purposes. The total upload volume of the Gateways is around 2MB per hour, depending the actual installation. Downstream data flow will be negligible in usual operation and will not exceed a couple of kilo bytes per day.

Protocol Requirements

The gateway interacts with 3 external services:

  1. Amazon Web Services (AWS) - Runs all the cloud infrastructure that makes up the SensorFlow analytics and dashboard backend. All interactions are secured using AWS Cognito, a service that provides authentication, authorization, and user management. (https://aws.amazon.com/cognito)

  2. Balena.io - https://Balena.io/
    Balena is used as fleet management for the gateways. It reports on the status of the gateway and manages any updates/patches to the gateway. BalenaOS is the base operating system of the gateway. The main application runs in a Docker container on the OS. A detailed description of the Balena-related security implementations can be found here: https://www.balena.io/docs/learn/welcome/security/
    Balena communications: 1. A VPN for reporting system uptime and sending of logs 2. A polling service to check if there are any updates

  1. Papertrail - https://papertrailapp.com
    Log data from the gateways is sent to papertrail for log management. This is used for easy reading of logs for any troubleshooting issues. Logs are searchable for 2 weeks and archived for one year. (Underlying storage is AWS S3).

Figure 1 details the components that are communicating on-site and how they interact.

Figure 1: Network overview detailing the communication channels between our nodes, gateways and external services.

Dashboard

Our dashboard allows clients to see the data recorded as well as control instrumented AC systems. It can be accessed from any web browser and is authorized via username and password login. It is hosted on Google Firebase and consists of a combination of html, javascript and various images. All communication is with AWS cloud over TLS.

Authentication and authorization for the dashboard also utilizes the AWS cognito service (https://aws.amazon.com/cognito). We will create user accounts upon client request to give only relevant staff access to the dashboard.

AWS Guard Duty (https://aws.amazon.com/guardduty/) is used to monitor the cloud account for any unusual or suspicious activity.

Data Storage

Data sent to the AWS cloud is stored in DynamoDB (a NoSql database) for application service purposes. The data is also stored in a high-performance time-series Database (Postgres + TimeScaleDB extension) for data science purposes. Both databases are hosted by AWS. DynamoDB is running as a managed service from AWS and our time-series database is hosted on an AWS EC2 instance.

Support and Maintenance

We aspire to provide end-to-end service to our clients and have dedicated support on standby to address system issues through our remote maintenance capabilities. These include the capability to update the firmware of sensor nodes using an over-the-air update process between the gateways and the sensors. We use this to deliver security updates, bug fixes or new features to our sensors. Furthermore, we have the capability to remotely update our gateway software as well as start, restart and shutdown gateways using the balena.io service. In rare cases, if all remote measures fail, we will ask for help from on-site hotel engineers. This would mostly be for battery replacements or exchange of broken or damaged hardware. In case the task at hand is too complex for on-site engineers to execute, we will send one of our dedicated installation project managers to do the works required. This will be evaluated on a case-by-case basis.

Gateway Security

Only the SensorFlow gateways are connected directly to the client’s network, while the sensor nodes will only ever communicate with our gateway and never with the internet directly. As such, only the gateways are potentially vulnerable to remote cyber-attacks through the internet. We usually connect our gateways to the same network the hotel guests connect to. This means that from a network design point of view, the security risk associated with our gateways can be considered equal to that of any device a guest might connect to the hotel network. Even if an attacker gets access or full control over the SensorFlow gateway, the consequences are minimal and isolated to the gateway in question.

Consequences of Gateway Attacks

  1. (Minor) Interruption or deactivation of automation: This would lead to lost savings and some data loss in rooms serviced by this gateway. This would be the case if the attacker stops the SensorFlow application running on the gateway.

  2. (Minor) Impact on guest comfort: This could happen if an attacker manages to switch off FCUs after gaining access to the gateway. This would however be extremely unlikely as the attacker would have to completely reverse engineer our secret proprietary protocol to be able to send valid commands to our devices which would be a serious engineering effort. The damage could also be easily contained by switching off the affected gateway, at which point our thermostats would resume normal offline operation. It would not be possible to permanently hold infrastructure hostage through this attack vector.

  3. (Minor) Access to the hotel network: An attacker who gains control over the gateway could use the gateway to listen to network traffic or jam the network with noise. The consequences of this scenario depend mainly on the hotel’s network setup. We usually suggest that the gateways be connected to their own isolated subnet or to the same network that hotel guests connect to, with the same privileges and restrictions. A well-designed network setup therefore would not allow a device on the gateway’s network or guest network to access vital hotel network infrastructure.

Direct Remote Attacks on Open Gateway Ports

This scenario describes an attacker trying to gain remote access to the gateway via open ports.

Likelihood: Impossible

This attack vector is considered to be impossible as the gateway does not listen to any ports on the outside and any communication is initiated outbound from the gateways. This behavior is inherited from BalenaOS, the operating system that forms the base of the gateway software. A detailed description of the Balena related security implementations and how they manage communication and authorization can be found here:

We run the SensorFlow application inside a docker container on top of BalenaOS, which interacts with the SensorFlow cloud infrastructure. All communication with our Cloud is again outbound and no ports are opened so our own application is not adding extra attack surface to the gateway.

Due to the above it would be impossible for an attacker to find and attack our gateways remotely without breaching the client’s network infrastructure first. This is especially true if the gateways are additionally protected by a firewall which restricts access to the client’s network. Even if they got through the firewall, they would still not find any open ports to attack on our gateways.

Man in the Middle Attacks on the Gateways

The only way to attack the gateway would be for an attacker to impersonate the Balena cloud or the SensorFlow cloud. A successful attacker could attempt to send commands to connected thermostats or exchange the gateway firmware with their own firmware.

Likelihood: Rare

We consider this attack scenario to be very unlikely to succeed as all our APIs are secured through standard HTTPS TLS encryption and certificate based authorization and authentication. This is true for the service endpoints presented by Balena for control and remote maintenance as well as all SensorFlow cloud APIs who are secured by the AWS Cognito service. An attacker would therefore have to successfully steal the SensorFlow or Balena root certificates to succeed with this attack vector.

Physical Attack On The Gateway

Should an attacker gain physical access to the gateway, they could attempt to replace the firmware of our gateways by switching out the SD cards that store the operating system and therefore, create a device on the hotel network under their control and most likely whitelisted to connect without having to go through a captive portal.

Likelihood: Rare

As this is an on-premise attack that requires technical knowledge, we consider the likelihood of this attack to be very rare. Especially in light of the abovementioned network infrastructure limitations that should apply to all devices on the gateways or guest network. To perform this kind of attack, it would be much easier for the attacker to just directly connect a device to the hotel network, rather than going through the SensorFlow gateway.

Node Security

Our sensor nodes are only connecting to our gateways via our proprietary protocol called AirLink. As such they are completely unreachable and undiscoverable from the outside and therefore cannot be attacked directly. Our nodes do not speak any IP based protocol and as such, are unable to interact directly with the hotel’s network infrastructure. Their bandwidth capabilities are also extremely limited by design, making them unsuitable for any attack on the hotel’s network infrastructure. This is a consequence of using LoRa vs WiFi or Bluetooth. All communication is happening in binary form, making reverse engineering attempts of our protocol very difficult.

Consequences Of Attacks On Nodes

  1. (Minor) Impact on guest comfort: Should an attacker manage to connect a rogue node or fully take control over a node, he could send false occupancy data which can then influence the FCU behavior in those rooms. This could cause thermal discomfort to guests. The impact can easily be mitigated by switching off the SensorFlow Gateways or blacklisting the node from the Gateway network.

  2. (Minor) Interruption or deactivation of automation: Should an attacker manage to deactivate or disconnect sensor nodes, automation would be disabled leading to lost savings and a loss of data.

Installing A Rogue Node

In this case, an attacker attempts to install a malicious node at the property.

Likelihood: Rare

As our protocol is proprietary and not published, someone would need to complete a full reverse engineering exercise of our wireless protocol to attempt to connect a rogue node to the SensorFlow system. This would require an attacker to be on-site for several days or weeks to collect enough information from regular communications to have a remote chance of deciphering the protocol.

Additionally, all of our nodes contain a unique id number, called a MAC address, which we record before we bring the equipment on site. Our gateways will reject any node that does not have a recognized MAC address. The MAC address of a node is only exchanged during the initial connection process making it extremely hard for an attacker to launch a targeted attack. Should an attacker go through all of this effort, the only damage he could achieve would be to control FCUs in hotel rooms without visibility of which rooms they are controlling. Going through all this effort, they would be able to cause only minor annoyances to guests. Even if an attacker tried to cause damage by employing a replay attack on our communication, they would need to gain a detailed understanding of the communication timings of our network which would strongly limit the success of this attack vector.

Attack Through Firmware Upgrade Function

Our sensor nodes are upgraded remotely. Our gateways trigger this upgrade.

Likelihood: Rare

As we are speaking a proprietary protocol between the sensors and our gateways, an attacker would have to take control of our gateways or bring equivalent hardware on-site. After that, they would also have to reverse engineer our complete firmware upgrade protocol which would be a very sophisticated engineering effort with minimal result.

Network Jamming

Every wireless network is vulnerable to network jamming attacks. So an on-site attacker can succeed to disconnect some nodes by jamming the frequencies that the SensorFlow system operates on. Nothing can be done to prevent this, however, the only result is deactivating automation and causing some data loss.

Likelihood: Rare

Physical Attacks On Nodes

All of the above require reverse engineering efforts to be successful. The fastest way of doing this would be to attempt to download the firmware from our nodes and then attempt to decompile it.

Likelihood: Rare

This is impossible as we lock our nodes after programming, meaning that connecting a tool to download the firmware will require unlocking the node which immediately erases the firmware.

Did this answer your question?