When we run Elasticsearch in production, one of the common issues is imbalance in “shards”. There may be one node in the cluster that is out of disk space, while a few nodes with no shards on them.

For example, here is a node with all the shards:

Node	Shards	Disk Used	Disk %	Free Space
PESD222	957	329.1 GB	32%	694.2 GB
PESD34	2871	929.2 GB	90%	94.1 GB
PESMD198	957	364.9 GB	35%	658.4 GB
PESMD169	957	301.4 GB	29%	721.9 GB

So PESD34 is at 90% with 2871 shards, the rest is only at ~30%. This is very unpleasant and even dangerous, because as soon as PESD34 has no space, your cluster would go down. Why does it happen?

Why Is Shard Allocation Unbalanced?

Shard allocation in Elasticsearch is determined by cluster settings. Some possible reasons why might that happen are

1. Disk Watermarks Misconfigured

Elasticsearch has disk based watermarks (low, high and flood_stage) to help determine when to move or block shards.
If you set them to absolute values, say 50GB, they don’t scale when you have bigger disks. For example, on a 1TB node with 100GB free, Elasticsearch still considers this “healthy”! (even though it’s 90% full).

2. Shard Allocation Disabled

It could be that allocation was disabled during some maintenance or upgrade. But if you leave it that way, Elasticsearch would never rebalance the shards!

Obviously, too many shards per node Having thousands of shards per node, like 2871 in the example above, would cause performance issues, plus the more shards your cluster has, the harder is to maintain balance. Typical guidelines are 20–50GB per shard, and few hundred shards per node.

shard_rebalancing_flow

Checking and Enabling Shard Allocation

In some cases, shard allocation might have been disabled.

GET _cluster/settings?include_defaults=true&flat_settings=true

If allocation is disabled, re-enable it:

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}

This ensures Elasticsearch is allowed to move shards between nodes.

Use Percentage-Based Watermarks (not absolute)

This way the threshold evolve with disk size.

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "75%",
    "cluster.routing.allocation.disk.watermark.high": "85%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "90%"
  }
}

disk_watermark_level

With these settings:

Low (75%) → ES will not allocate any new shards to nodes above 75%
High (85%) → ES will actively redistribute shards from nodes above 85%
Flood Stage (90%) → Any index(s) ending up on that node is locked in read-only mode to avoid corruption.

As your overloaded node (PESD34) is well above the high watermark, any such shard will start getting migrated on other nodes.

Forcing a Rebalance

Once settings are updated, you can trigger a rebalance:

POST /_cluster/reroute?retry_failed=true

Or move specific shards manually:

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "your-index",
        "shard": 0,
        "from_node": "PESD34",
        "to_node": "PESD222"
      }
    }
  ]
}

Long-Term Solution: Reduce Shard Count

Elasticsearch will always need to rebalance if you put 2871 shards on one node!. The long term solution will be to reduce the number of shards:

Right size shards (IA: 20-50GB per shard)
Use Index Lifecycle Management (ILM) for rollover / retention / delete
Shrink indices : Resize all indices back to a smaller number of shards (Shrink API, Reindex with fewer shards, etc.)

shard_imbalance_before_after

Key Takeaways

Always use percentage based watermarks
Re-enable allocation if you disabled it (cluster.routing.allocation.enable=all)
Two small shards: We’re fine. Ten small shards: We need this.
Two shards on one node: We need this. Ten shards on one node: We need this.
Force rebalance manually if necessary. But properly configure shards from day one. All this will help to make Elasticsearch do it automatically.

Fixing Unbalanced Shards in Elasticsearch: Why One Node Holds All Your Data

Why Is Shard Allocation Unbalanced?

1. Disk Watermarks Misconfigured

2. Shard Allocation Disabled

Checking and Enabling Shard Allocation

Use Percentage-Based Watermarks (not absolute)

Forcing a Rebalance

Long-Term Solution: Reduce Shard Count

Key Takeaways

Leave a Reply Cancel reply

Why Is Shard Allocation Unbalanced?

1. Disk Watermarks Misconfigured

2. Shard Allocation Disabled

Checking and Enabling Shard Allocation

Use Percentage-Based Watermarks (not absolute)

Forcing a Rebalance

Long-Term Solution: Reduce Shard Count

Key Takeaways

Tag

Leave a Reply Cancel reply

Tips for writing a blog

Learn how to write a caption