Cloud Native

Multi-tenant Cross-cluster Search on AWS

Raghavan Madabusi

May 3, 2022 • 2 min read

Here’s how we have architected a cross-cluster multi-tenant Elasticsearch on AWS in one of our engagements

AWS Elasticsearch

is a fully managed service that makes it simple to deploy, scale, monitor and secure Elasticsearch in AWS
Elasticsearch is primarily used for search, text analytics and log aggregation

What is OpenSearch?

Early in Jan 2021, Elastic announced their software licensing strategy, and will not release new versions of Elasticsearch and Kibana under the Apache License
In order to ensure open source versions of both packages remain available, AWS stepped up to create and maintain Apache licensed fork of Elasticsearch and Kibana
Early April 2021, AWS introduced the OpenSearch project, a community-driven, open source fork of Elasticsearch and Kibana. So, watch out more as AWS will rename their existing Amazon Elasticsearch service to Amazon OpenSearch

With that backdrop aside, lets focus on the purpose of this blog.

Client Problem Statement

Proactively monitor applications and infrastructure using data distributed across logs, infrastructure and multiple AWS accounts to find performance issues faster and improve operational health using AWS Elasticsearch

Sound’s so familiar? Yes, this isn’t a rocket science till we consider the following challenges:

Multiple AWS accounts will have different resource usage and patterns and they generate logs at different pace
Pooled/hybrid Elasticsearch domain multi-tenancy for all AWS accounts will have unbalanced resource usage
Adding, removing or merging new AWS accounts might impact the capacity of the existing Elasticsearch domain
Providing granular ACL for different AWS account users in the centralized Elasticsearch at index, document and field level is hard
Operations team can’t have centralized place to look at multiple AWS accounts log data and have to hop on to individual siloed tenants

Solution

Cross-cluster Elasticsearch enables to perform searches, aggregations, and visualizations across multiple AWS Elasticsearch domains with a single query.
This will enable to separate heterogeneous workloads and store different indices on different domains while still being able to query across all domains within a single request

Cross-cluster Elasticsearch Architecture

Each AWS accounts log data is stored in their respective tenant Elasticsearch domain
Different Elasticsearch domains are of different instance types and counts depending their usage patterns
Operations team Elasticsearch domain connects to 3 different domains and can perform federated query in single place
New Elasticsearch domains can be added when a new AWS account is created

Cross-cluster search limits

Can’t connect to self-managed Elasticsearch clusters
Can’t connect to domains in different regions
A domain can have a maximum of 20 outgoing & incoming connections
Version should be v6.7+
For more detailed limitations, refer https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/cross-cluster-search.html#cross-cluster-search-limitations

Author: Raghavan Madabusi