RMM scaling is an increasingly common conversation as M&A transactions abound in the MSP space. As companies merge (or experience rapid organic growth), questions arise regarding how scalable Connectwise Automate is. The initial answers might seem discouraging, but we’d like to set the record straight: it’s possible to scale Automate much larger than you’d initially think.
Scaling Applications 101: Vertical vs. Horizontal
In software scaling, there are two main methodologies: vertical and horizontal. Vertical scaling is trying to make a single point more robust, and horizontal scaling is trying to distribute the load over additional points.
With Connectwise Automate, scaling starts as a vertical process. A partner will often open a support ticket with performance issues, and the recommendation will be to increase resources (probably RAM), along with other suggestions to reduce the application load (turning off various features, etc.). At some point, the suggestion to move the database to a different server will likely be made — and this isn’t precisely horizontal scaling, but rather a method to ensure horizontally scaled resources are allocated to the database.
A 3-way split is a lesser-known option that introduces proper horizontal scaling by introducing additional web server(s) to spread the inbound requests across multiple servers. This will enable large MSPs to scale on the Connectwise Automate to sizes previously thought impractical.
How Horizontal scaling works
The basic premise comes down to Automate’s infrastructure as an RMM. The remote agents send inbound requests to the web front end; from there, the various web components process the request and write data to the database. For estimation purposes, we calculate that 30% of agents communicate with the server each minute, so for a server of 20,000 agents, the server must process ~6,600 requests per minute (for just the agents, with additional requests for staff and integrations). While resources and configuration are important, eventually, a limit will be reached where the laws of computer science will kick in — and a single server will only be able to process so much.
Once a second server is introduced, the load is split approximately in half, and capacity in that part of the application is approximately doubled (assuming no other bottlenecks in the system). If the constraint is purely web requests/per server/per second, adding the second node would allow for the growth of up to 40,000 agents before that limit is reached again. This process is repeatable, allowing scaling on an ongoing basis (eventually, diminishing returns would be encountered, but it’s unlikely that any real-world deployment would approach that limit). This makes horizontal scaling an excellent solution for Connectwise Automate.
How to scale ConnectWise Automate
As we write this, we have multiple MSPs coming to us looking for solutions to scale in the 20,000 to 40,000 agent range. We’ve architected these environments previously, and below is our usual architecture:
First off, we have the reverse proxies. While they can be used to implement numerous essential security measures, they are also application-aware layer-7 routers. A reverse proxy can detect what server has the most capacity and direct inbound connections to it, ensuring efficiency. In the event of an issue with a single server, the proxy transparently moves the traffic with no interruptions — and proxy metrics can be used to detect and diagnose application issues. While other load-balancing methods exist, the intelligence of the reverse proxy and the visibility it provides make it a clear winner.
Next, we have the web servers. For large stacks, we typically suggest three — two for agent traffic and then one dedicated to non-agent traffic (users, integrations, etc.). The dedicated web server for user traffic improves performance significantly, and via the proxy, requests can be routed intelligently across the whole cluster (so requests will be routed to the agent web servers if the user server has issues). This also creates a degree of redundancy, which is desirable with mission-critical tools like an RMM.
The application server is next. In this instance, we define the application server as the server where the Database Agent service is located. The Database Agent is a monolithic part of the infrastructure (which isn’t always bad), and only one instance of the DBAgent can exist. Its primary function is to perform various tasks (building groups, approving patches, sending emails, applying user permissions, etc.), typically in a series of loops that run at different times. As the name implies, most of these operations are database-related — so while proper resources are essential, this single-instance service isn’t a bottleneck as long as the database performs as expected.
Finally, we have the database — which is the most critical component to scaling Automate. We’ve written at length about database performance and optimization in Automate, and it all applies here. Small inefficiencies are magnified at scale, so many small adjustments are needed to fix various emergent properties that emerge when scaling an Automate stack (in addition to the fundamentals of an optimized configuration and proper indexing). We’ve performed R&D in the past and have gotten Automate to work with group replication in MySQL (which would scale exceedingly well), but the modifications required to the schema make it impractical for production servers currently (ConnectWise has contacted us to get this functionality baked into the product). For now, we believe a properly optimized single-instance database should scale to ~50,000 agents without issue.
The future is bright (and big)
With the M&A activity happening with MSPs, the market will demand scaling ConnectWise Automate larger and larger — and we’re confident it will scale. All the needed elements are present, and with a few minor updates and changes to the product itself, Automate will become an ultra-scalable platform. Until then, we’re happy to work with any MSP to chart the frontier and scale Automate to 50,000 agents and beyond.