Introduction
This is part of our work done for our client ThingSphere.
We are obsessed about the concept of infinity – when it comes to scaling.We are obsessed about the concept of infinity – when it comes to scaling.
With the advent and growth of IoT enabled devices, a new frontier in terms of scalability of software and hardware systems emerged. As we all might have seen cases of IoT platforms and their poor performances when there is a huge number of concurrent connected devices. At ThingSphere, our obsession with infinity makes us push our system to its limits. It’s not just the stack alone – we had to push our hardware, the operating system and the limits it places on network throughput. And at the end – we pride ourselves on our ability to scale and have addressed this issue in an efficient manner.
Let’s take a step back to focus on the number of devices for a moment here – over 1 million simultaneously connected devices. That’s around half the total number of vehicles in the city of London [1]. Singapore, has just under a million automobiles – if you put together all private cars, public transport and goods vehicles [2].
By connecting one million devices – we can connect a major chunk of the registered number of cars in one of the most populated cities in the world.
Uber – for example has 3 million drivers all over the world [3]. If we assume half of them actively on duty at any point of time – due to varying time zones – with our IoT Platform, you can readily deploy an application that can connect all active Ubers out there with ThingSphere – right now.
A lot of how-to’s and articles out there mentions breaching the million mark – but surprisingly, most of them let us down since they only mentioned achieving a number of concurrent maintained connections, without performing any significant data transfer.
We will now mention the steps we followed to achieve the desired scalability. In the process we put limits of each of the components to test – load balance, the TCP/IP protocol itself, the operating system networking stack, our message brokers – only to discover that ThingSphere, in collaboration with Solace can scale to numbers much larger than a million devices. The more detailed process of each step will be clearly explained as we go further in this blog post.
Architecture Overview
ThingSphere has a highly modular and flexible software stack built from scratch with scalability and ease of use as the prime focus.
The above diagram shows the different layers in the stack and their core functionalities. Some of the functionalities can be modified/extended/removed as per business requirements. The modularity and abstraction assures that the rest of the stack is not impacted by such maneuvers. This ensures that the platform is plug and play for most industries and their myriad variety of business requirements.
We deployed Solace PubSub+ software message broker as the messaging backbone for our stack. From vertical scaling we achieved 200k concurrent connections with one message broker and this is the maximum limit we can get from single software message broker. To achieve greater than one million concurrent connections we deployed 6 message brokers each can handle 200k simultaneous connections. These brokers are again interconnected using DMR. This enables us to assure that our platform has a capability to handle greater than 1 million concurrent connections.
The combination of best of breed products with our custom high performance algorithms at each layer of the stack enables us to push the limits of our hardware.
Benchmark Methodology
Tsung – Distributed load test
We used Tsung 1.7 to generate MQTT messages to our brokers. Tsung works with distributed clients like master slave concept. To generate more than million clients, we had created 24 clients – each of which maintains 64k connections so (24*64000 = 1536000) i.e., 24 clients can generate more than 1 million connections.
Parameters in test setup
Message size: We used the message sizes varying as 32 bytes, 44 bytes and 52 bytes.
Message frequency: We specified that each client would start publishing messages after they get connected.
Number of Queues and Topic endpoints: A number of topic endpoints effect the number of clients can publish and subscribe to the platform especially when the QOS is 1. The queue and topic endpoint in the message broker should set to its maximum to not to lose the connections.
QOS: We use QOS 1 for our MQTT Messages
Security features: We use some innovative techniques to negate the impact of SSL decryption
Geography: Our clients were set up from different geographic locations so that we can generate near real time use case to give proper test results when our platform works in real world scenario.
Final Results
And finally the graph that matters most
The messages count in each message broker that can sum up to greater than one million:
How this impacts your business?
The high scalable and robust platform would help you majorly concentrate on the business enhancements rather than the technical stuff. Also when you have huge customer acquisitions and very large scale industries you should never bother about the simultaneous device connections, connectivity or data loss. ThingSphere is very much reliable and ensures of no data loss even when connectivity is lost or broker failover. So that you will get accurate analysis which would collectively helps your business grow.
Summary
Breaching 1mn concurrent active connections was merely a first step in an ocean of possibilities. We love solving difficult-to-scale IoT challenges at ThingSphere – and have taken multiple strides towards a seamless connected future for an infinite cloud of connected devices. The next up on our scaling series shall brief about our challenges and solutions to write >1mn data samples to our database. Stay tuned for updates!
References
[1] http://content.tfl.gov.uk/technical-note-12-how-many-cars-are-there-in-london.pdf