How to Migrate Mimir KV Store from Consul to Memberlist With Zero-Downtime

1. Background & Why

In Grafana Mimir, hash rings are the critical infrastructure responsible for sharding, replication, and service discovery. The hash ring maps data tokens to specific instances — without it, distributors wouldn’t know which ingester to send writes to, and query engines wouldn’t know where to fetch data.

Historically, Mimir stored ring state in external databases like Consul or etcd. This creates operational overhead:

Additional infrastructure to operate, monitor, and scale
Separate clusters required for high availability
Single-point-of-failure risk if the backend becomes unavailable
Every ring change triggers expensive database transactions

Memberlist is a peer-to-peer gossip protocol built into Mimir that:

Requires no external service — embedded in every Mimir pod
Uses SWIM gossip over port 7946 (TCP and UDP)
Survives partial cluster failures with eventual consistency
Is the Grafana-recommended backend for modern Mimir deployments

This tutorial documents the zero-downtime migration from Consul (or etcd) to Memberlist for all Mimir rings using Mimir’s multi KV store feature.

Trade-Off: Consistency vs. Simplicity

Feature	Memberlist (Recommended)	Consul
Operational Overhead	Minimal: Embedded in every Mimir pod; no separate service to operate.	High: Requires a separate Consul cluster, monitoring, and lifecycle management.
Consistency Model	Eventual: Changes propagate within ~5-10 seconds via gossip.	Strong: Immediate consistency via CAS (compare-and-swap) operations.
Failure Tolerance	Good: Survives network partitions gracefully; gossip self-heals.	Critical: Loss of quorum = cluster halt; requires careful bootstrap.
Network Calls	~20-50 per pod per second (gossip heartbeats).	Dozens per second (CAS operations).
Best For	All modern Mimir deployments; especially Kubernetes.	Existing Consul environments; strong-consistency requirements.

Eventual consistency in Mimir: A crashed ingester stays visible to distributors for up to ~10 seconds while gossip propagates. Writes to it fail and retry — Mimir’s write path tolerates this. It does not tolerate waiting on a centralized CAS operation for every write.

2. Architecture Overview

Rings and their KV keys

Each Mimir component maintains a ring in a separate KV namespace:

Component	Key Prefix	KV Key Pattern
ingester	`ingester/`	`ingester/ingester-<zone>-<i>/`
distributor	`distributor/`	`distributor/distributor-<i>/`
compactor	`compactor/`	`compactor/compactor-<i>/`
store_gateway	`store_gateway/`	`store_gateway/store-gateway-<i>/`
alertmanager	`alertmanager/`	`alertmanager/alertmanager-<i>/`
ruler	`rulers/`	`rulers/ruler-<i>/`

How structuredConfig overlays base config

structuredConfig is deep-merged on top of the base config — it wins on conflict. This is the mechanism used in Phase 1 and Phase 3 to override ring KV settings without rewriting the entire base config:

base config  +  structuredConfig  =  final config passed to Mimir binary

How runtimeConfig hot-reloads

Mimir polls runtimeConfig every ~10 seconds. The multi_kv_config key overrides the multi: section of all rings simultaneously — no pod restart required:

# runtimeConfig (hot-loaded)
multi_kv_config:
  primary: memberlist    # overrides primary for ALL multi-configured rings
  mirror_enabled: false  # stop writing to secondary

This is the zero-restart mechanism used in Phase 2.

Cluster Label Security

In Kubernetes environments (especially AWS EKS with Karpenter or similar auto-scaling), pod IPs are frequently recycled. If a Mimir pod dies and another pod is spawned in its place with the same IP, the memberlist gossip will view them as the same logical node.

Critical risk: If two different clusters (e.g., Mimir + Loki + Prometheus) are using memberlist on the same IP range, their gossip traffic will merge the rings — causing traffic to be misrouted between systems. This is catastrophic.

Solution: Cluster labels. Each cluster gets a unique identifier, and memberlist only accepts gossip traffic from nodes with matching labels. During this migration, you’ll set a cluster label in Phase 1, and enforce verification in Phase 3 once all pods share the label.

3. Prerequisites

Before starting the migration, verify:

Mimir version

Mimir 2.2 or later (memberlist is default since 2.2.0)
Mimir 2.17+ required if using HA tracker with memberlist for distributors

Memberlist is configured

In your Mimir config, confirm memberlist.join_members is set to the gossip ring service:

memberlist:
  abort_if_cluster_join_fails: false
  compression_enabled: false
  join_members:
    - mimir-gossip-ring.<namespace>.svc.cluster.local:7946

If join_members is missing, pods won’t form a cluster and migration will fail.

Network connectivity

Port 7946 (TCP and UDP) is open between all Mimir instances

Current KV backend is healthy

Check all Consul or etcd instances are healthy before starting:

# For Consul
kubectl get pods -l app=consul,component=server -n <namespace>

# For etcd
kubectl get pods -l app=etcd -n <namespace>

Configuration access

You can edit and reload Mimir runtime configuration without restarting pods (via ConfigMap or API).

Monitoring access

Access to Prometheus to run queries and verify metrics during migration.

Pod metrics annotations

Check your Mimir deployment for pod annotation port misconfiguration. Prometheus scrape annotations must use numeric port values, not named ports:

# CORRECT
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"   # numeric string

# WRONG — will cause scraping on wrong port
podAnnotations:
  prometheus.io/port: "http-metrics"

If any component uses named port annotations, fix them before Phase 1. See Issue 1 for why.

Helm chart version

Confirm your Mimir Helm chart supports structuredConfig. It was introduced in mimir-distributed ~4.x.

4. Component Ring Overview

Each Mimir component maintains its own independent ring. Not all rings need to be migrated simultaneously, but in practice, the entire cluster must use the same KV backend — you cannot run memberlist on some components and Consul on others.

Here are the main rings and their migration priority:

Component	Ring Configuration Key	Migration Priority	Notes
Ingesters	`ingester.ring.*`	1 (Migrate first)	Most critical for write path.
Ingest Storage Partitions	`ingester.partition_ring.*`	1 (Migrate first)	If using ingest storage.
Distributors	`distributor.ring.*`	2 (Migrate second)	Critical for request routing.
Compactors	`compactor.sharding_ring.*`	3 (Migrate third)	Less critical; benefits from stability.
Store-gateways	`store_gateway.sharding_ring.*`	3 (Migrate third)	Read-path sharding; less disruptive.
Rulers (optional)	`ruler.ring.*`	Optional	Only if using Mimir Ruler.
Alertmanagers (optional)	`alertmanager.sharding_ring.*`	Optional	Only if using Mimir Alertmanager.
Query-schedulers (optional)	`query_scheduler.ring.*`	Optional	Only if using query scheduling.
Overrides-exporters (optional)	`overrides_exporter.ring.*`	Optional	Rarely used.

5. Migration Strategy — Multi KV Approach

Why You Need Multi KV

A direct cutover loses ring history — memberlist starts empty, so all components suddenly appear unregistered, causing query failures and ingestion drops. Instead, use the multi KV store: it writes to both Consul and memberlist simultaneously, letting memberlist shadow the primary until it has a full copy of ring state. Only then do you flip reads over.

The Three-Phase Migration

Phase 1: [Consul PRIMARY] ←→ [Memberlist SECONDARY mirror]
             ↓ (runtime hot-reload, zero restart)
Phase 2: [Memberlist PRIMARY] ←→ [Consul SECONDARY, no mirror]
             ↓ (Helm deploy, rolling restart)
Phase 3: [Memberlist ONLY]  (Consul deleted)

How Multi KV Works

The multi KV store uses these configuration parameters:

primary: The store that handles both reads and writes (e.g., consul)
secondary: The store that receives copies of writes only (e.g., memberlist)
mirror_enabled: Whether to send writes to secondary (should be true during Phase 1)
mirror_timeout: How long to wait for secondary write before timing out (e.g., 2s)

If a secondary write fails, the primary write still succeeds — you’ll see the error in metrics, but the system doesn’t block.

6. Phase 1 — Dual-Write Setup

What changes

Add a configuration overlay that modifies all 6 ring kvstore blocks to use multi KV store with your current backend (Consul/etcd) as primary (reads + writes) and memberlist as secondary (writes only, mirrored).

Also set:

memberlist.cluster_label_verification_disabled: true — prevents accidental ring partitions while cluster labels are rolling out
memberlist.cluster_label — a unique identifier for your cluster (e.g., mimir-prod-us-east-1), set now and enforced in Phase 3

See Cluster Label Security in the Architecture Overview for background.

Configuration changes

Using Helm, add a structuredConfig: block to your values:

mimir:
  structuredConfig:
    memberlist:
      cluster_label_verification_disabled: true        # Disable enforcement during rollout
      cluster_label: "mimir-prod-us-east-1"            # Set unique label (customize for your cluster)
    ingester:
      ring:
        kvstore: &kvstore
          store: multi
          multi:
            primary: consul        # Your current backend (consul or etcd)
            secondary: memberlist
            mirror_enabled: true
    distributor:
      ring:
        kvstore: *kvstore
    compactor:
      sharding_ring:
        kvstore: *kvstore
    store_gateway:
      sharding_ring:
        kvstore: *kvstore
    alertmanager:
      sharding_ring:
        kvstore: *kvstore
    ruler:
      ring:
        kvstore: *kvstore

Note: If you’re using etcd instead of Consul, replace primary: consul with primary: etcd.

Note: Your base config may still reference the old backend directly. The structuredConfig deep-merge will override those settings at Helm render time. Leave your base config as-is for now.

Deploying Phase 1

Apply your Helm values:

$ helm upgrade mimir mimir-distributed -f mimir/values.yaml -n <namespace>

Watch for the rolling restart to complete:

$ kubectl rollout status deployment/mimir-ingester -n <namespace>
# Wait for all components

Verifying Phase 1

Check that multi KV is initialized:

$ kubectl logs -l app=mimir-ingester -n <namespace> | grep -i "Starting KV client.*multi"

Expected: One entry per pod showing store=multi.

Check that memberlist cluster formed:

$ kubectl logs -l app=mimir-ingester -n <namespace> | grep -i "joined memberlist cluster"

Expected: Entries from all pods joining the gossip ring.

Check that cluster label is set:

$ kubectl exec -it <ingester-pod> -n <namespace> -- \
  curl -s localhost:9009/config | jq '.memberlist.cluster_label'

Expected: "mimir-prod-us-east-1" (or whatever label you set).

Monitor mirror health using PromQL. Watch these metrics for the first 10-15 minutes:

# Should increase steadily (mirror writes happening)
rate(cortex_multikv_mirror_writes_total[5m])

# Should be zero or very low (errors are normal during startup, will resolve)
cortex_multikv_mirror_write_errors_total

Secondary write errors are expected in the first 10-15 minutes — they resolve automatically as the gossip cluster converges. The primary (Consul/etcd) is still serving all reads and writes correctly.

Verify no scrape errors: → Run V2: Gossip Scrape Errors

Wait 15 minutes after all pods are running before proceeding to Phase 2. This ensures memberlist rings are fully synchronized with the primary backend.

7. Phase 2 — Flip Primary via Runtime Config

What changes

Update runtimeConfig in your Helm values to flip the multi KV primary from Consul to Memberlist. This is a hot-reload — no pod restart required.

Critical: Only change runtimeConfig during Phase 2. Do NOT touch structuredConfig.

Why: runtimeConfig is re-read every ~10 seconds and overrides multi: settings for all rings. structuredConfig requires a Helm deploy and pod rollout, which risks disruption. The runtime config is the correct, zero-restart lever.

Configuration changes

runtimeConfig:
  multi_kv_config:          # ← ADD this block
    primary: memberlist     # flip reads+writes to memberlist
    mirror_enabled: false   # stop writing to Consul secondary
  # ... existing runtimeConfig content ...

Note: The cluster label set in Phase 1 stays in place. Verification remains disabled (in structuredConfig) until Phase 3, when all pods have rolled out with the label.

Deploying Phase 2

Apply your Helm values (this updates the ConfigMap only):

$ helm upgrade mimir mimir-distributed -f mimir/values.yaml -n <namespace>

Mimir pods will pick up the change in ~10 seconds without restarting.

Verifying Phase 2

Check that all components switched primary:

# Query a pod to verify active config
$ kubectl exec -it <ingester-pod> -n <namespace> -- \
  curl -s localhost:9009/config | jq '.ingester_ring.multi_kv_config.primary'
# Should return: "memberlist"

Check ring health: → Run V3: Ring Member Count

Verify ring convergence: → Run V1: Ring Convergence

Verify no new Consul errors (Consul is now secondary and unused; stale heartbeat timeouts are normal):

$ kubectl logs -l app=mimir-ingester -n <namespace> | \
  grep -i "error.*consul" | wc -l

Expected: Zero to very low.

Wait 15 minutes for stability before proceeding to Phase 3.

8. Phase 3 — Full Cutover & KV Backend Decommission

What changes

Remove multi KV configuration — switch all rings to use memberlist directly
Remove memberlist.cluster_label_verification_disabled (enforcement now active)
Keep the memberlist.cluster_label set in Phase 1
Remove multi_kv_config from runtimeConfig
Decommission your Consul/etcd backend (delete pods, remove from infrastructure config)

The cluster label verification is now enforced: memberlist will only accept gossip traffic from nodes with matching labels. This prevents ring merging with other gossip clusters (e.g., Loki, Prometheus) that may be running on the same Kubernetes cluster.

Phase 3 is irreversible. Once your KV backend pods are deleted, recovering requires restoring infrastructure and re-deploying from backups. Plan accordingly.

Configuration changes

In your structuredConfig, replace all 6 multi KV blocks with simple memberlist, and remove the cluster label disable flag:

# BEFORE (Phase 1/2)
mimir:
  structuredConfig:
    memberlist:
      cluster_label_verification_disabled: true
      cluster_label: "mimir-prod-us-east-1"  # Keep label; enforcement is now re-enabled
    ingester:
      ring:
        kvstore: &kvstore
          store: multi
          multi:
            primary: memberlist
            secondary: consul
            mirror_enabled: false

# AFTER (Phase 3)
mimir:
  structuredConfig:
    memberlist:
      cluster_label: "mimir-prod-us-east-1"  # Keep label; enforcement is now re-enabled

    ingester:
      ring:
        kvstore: &kvstore
          store: memberlist      # Simple, clean

In your runtimeConfig, remove the multi KV config block entirely:

# BEFORE
runtimeConfig:
  multi_kv_config:
    primary: memberlist
    mirror_enabled: false
  # ... other runtimeConfig ...

# AFTER
runtimeConfig:
  # multi_kv_config section removed entirely
  # ... other runtimeConfig ...

In your infrastructure config (Helm values, Terraform, or whatever deploys your KV backend), remove or disable the Consul/etcd deployment:

Deploying Phase 3

Apply your Helm values:

$ helm upgrade mimir mimir-distributed -f mimir/values.yaml -n <namespace>

Mimir pods will do a rolling restart (structuredConfig change requires pod restart). The ring will remain stable throughout because memberlist is peer-to-peer — it doesn’t depend on a central service. Once pods have restarted, delete your consul KV backend pods

Verifying Phase 3

Run these checks at 5-minute intervals, 3 times minimum (15 min total):

V4: KV Backend Errors — should be zero
V5: Ring/Memberlist Errors — should be zero
V2: Gossip Scrape Errors — should be zero

Confirm KV store is memberlist in logs:

$ kubectl logs -l app=mimir-ingester -n <namespace> -c mimir | grep "Starting KV client" | head -3

Expected: All entries show store=memberlist.

Check ring convergence: → Run V1: Ring Convergence

Verify cluster label enforcement is enabled:

kubectl exec -it <ingester-pod> -n <namespace> -- \
  curl -s localhost:9009/config | jq '.memberlist.cluster_label_verification_disabled'

Expected: false (or the key is absent — meaning enforcement is active). Verification is now enforced; memberlist will reject gossip packets from nodes with mismatched labels.

9. Verification Reference

The following verification checks are referenced throughout the migration phases. Run them as needed to confirm health at each stage.

V1: Ring Convergence

Purpose: Verify that ring state is propagating correctly across all pods via gossip.

time() - cortex_ring_oldest_member_timestamp

Expected: < 30 seconds (ideally < 15 seconds). If this value is consistently > 30s, gossip propagation is laggy; see Issue 6 for tuning.

V2: Gossip Scrape Errors

Purpose: Verify Prometheus isn’t accidentally scraping the memberlist gossip port (symptom of misconfigured pod annotations).

kubectl logs -l app=mimir-ingester -n <namespace> | \
  grep -E "unknown message type|TCPTransport"

Expected: Zero results. If non-zero, see Issue 1 — your pod annotations likely use named ports instead of numeric port strings.

V3: Ring Member Count

Purpose: Verify all expected ring members are present and healthy.

kubectl exec -it <mimir-distributor-pod> -n <namespace> -- \
  curl -s localhost:9009/ingester/ring | jq '.members | length'

Expected: Matches your expected replica count (typically 3 for ingesters, varies by setup).

V4: KV Backend Errors

Purpose: Verify no errors are occurring with your KV backend (Consul or etcd).

kubectl logs -l app=mimir-ingester -n <namespace> | \
  grep -iE "error.*consul|error.*etcd"

Expected: Zero results (in Phase 2 and later, some stale heartbeat warnings are normal and not blocking).

V5: Ring/Memberlist Errors

Purpose: Verify no internal ring or memberlist errors.

kubectl logs -l app=mimir-ingester -n <namespace> | \
  grep -iE "error.*(ring|kvstore|memberlist)"

Expected: Zero results.

10. Known Issues & Mitigations

Issue 1: Prometheus Scrape Port Mismatch

Symptom: Memberlist TCPTransport "unknown message type G" errors in logs, originating from Prometheus IP addresses.

Root Cause: Pod annotation prometheus.io/port: "http-metrics" is a string name, not a port number. Prometheus’s scrape config regex captures the numeric port (\d+); when it can’t resolve a named port to a number, it falls back to the first open port it finds — which is the memberlist gossip port 7946. Mimir sees the Prometheus HTTP scrape as a gossip packet and logs the error.

Fix: Change all component podAnnotations to use prometheus.io/port: "8080":

When to fix: Before Phase 1 deploy. This is a prerequisite, not a post-Phase 1 fix.

Verification: No "unknown message type" in logs within 5 min of deploy.

Issue 2: Secondary KV Write Timeouts During Phase 1 Startup

Symptom: "error writing to secondary KV store" or write timeout errors in logs during the first 10-15 minutes of Phase 1.

Root Cause: Phase 1 enables mirror_enabled: true, so every ring write goes to both your primary backend and memberlist. However, the memberlist cluster takes 1-2 minutes to fully form after pod restart — during this window, secondary writes fail because the gossip ring hasn’t converged yet.

Mitigation: This is expected and not a blocking issue. Secondary writes resume automatically once the gossip cluster is healthy. The primary is still serving all reads and writes correctly.

Action: Do not roll back. Monitor for 15 minutes. Errors should drop to zero.

Issue 3: structuredConfig vs runtimeConfig Precedence

Symptom: Confusion about which config layer to change when flipping from Consul primary to Memberlist primary.

Rule:

Phase 1: Change structuredConfig (Helm values — requires pod restart). Configures the multi KV backend.
Phase 2: Change only runtimeConfig (ConfigMap hot-reload — no pod restart). Flips the primary store.
Phase 3: Change structuredConfig again (Helm deploy — rolling restart). Removes multi KV entirely.

Precedence chain (highest wins):

runtimeConfig (hot-reload, every ~10s)
    > structuredConfig (Helm deep-merge)
        > base config

Changing structuredConfig for Phase 2 triggers an unnecessary rolling restart. runtimeConfig.multi_kv_config is the correct zero-restart lever.

Issue 4: No Private IP Found

Symptom: Memberlist logs show “No Private IP Found” errors.

Cause: Kubernetes VPC CNI has ENABLE_PREFIX_DELEGATION enabled; memberlist can’t determine which interface to bind to.

Fix: Set memberlist.bind_addr to the pod IP using the Downward API

Issue 5: Too Many Unhealthy Instances

Symptom: Ring shows many instances marked UNHEALTHY or LEAVING; queries fail intermittently.

Cause: Cluster merged with another system via IP reuse, or ingester pods were force-deleted without deregistration.

Fix: Use the ring admin API to manually “forget” bad instances or restart all ingester pods simultaneously to reset the in-memory ring:

Issue 6: Slow Ring Updates

Symptom: Ring changes take 30+ seconds to propagate across the cluster.

Cause: Gossip interval is too large, or gossip-nodes count is too low.

Fix: Tune gossip parameters (rarely needed):

mimir:
  structuredConfig:
    memberlist:
      gossip_interval: 500ms        # Increase heartbeat frequency (default: 200ms)
      gossip_nodes: 4               # Gossip to more peers per interval (default: 3)
      retransmit_factor: 5          # Retry messages more (default: 4)
      pullpush_interval: 20s        # Full state sync interval (default: 10s)

Troubleshooting Quick Reference

Symptom	Likely Cause	Resolution
”unknown message type G” in logs	Prometheus scraping gossip port 7946	Fix pod annotation to use numeric port `"8080"` instead of `"http-metrics"`
Secondary KV write timeout errors	Memberlist gossip cluster not converged yet	Expected during Phase 1 startup; wait 15 min, errors will resolve
”Too Many Unhealthy Instances”	Ring merged with another system via IP reuse	Restart all pods or use `/ring/forget/<instance-id>` API to deregister bad entries
”No Private IP Found”	Kubernetes CNI can’t resolve pod IP	Set `memberlist.bind_addr` to pod IP via Downward API
Ring updates take 30+ seconds	Gossip interval too large	Increase `gossip_nodes` from 3 to 4-5; decrease `gossip_interval` if needed (rarely required)
High CPU on memberlist reconciliation	Ring state comparison overhead with many instances	Increase `pullpush_interval` from 10s to 20-30s

11. Rollback Procedures

Rollback Phase 1 (Low Risk)

Remove the structuredConfig block added in Phase and re-deploy to return to your original KV backend only.

Rollback Phase 2 (Medium Risk — Hot Reload)

Phase 2 is a runtimeConfig-only change. Remove multi_kv_config from runtimeConfig:

runtimeConfig:
  # multi_kv_config: ← delete this block
  # ... rest of runtimeConfig ...

Deploy. Within 10 seconds, all rings will re-read the config and switch back to primary: <your-original-backend>. No pod restart needed.

Rollback Phase 3 (High Risk — KV Backend Deleted)

Phase 3 deletes your KV backend. If issues arise after Phase 3 deploy:

Restore your KV backend infrastructure:

Restore Mimir to Phase 1 multi KV state (original backend primary, memberlist secondary):

mimir:
  structuredConfig:
    memberlist:
      cluster_label_verification_disabled: true
    ingester:
      ring:
        kvstore:
          store: multi
          multi:
            primary: <original-backend>  # consul or etcd
            secondary: memberlist
            mirror_enabled: true
    # ... repeat for all 6 components ...

Deploy Mimir — rolling restart will reconnect rings to the restored KV backend
Remove multi_kv_config from runtimeConfig (if Phase 3 had set it)
Once stable, you can re-plan the migration with the root cause fixed

12. Post-Migration Verification

Once all components have been migrated to memberlist, verify the entire deployment.

Run the following reference checks:

V1: Ring Convergence — should be < 30 seconds
V3: Ring Member Count — should match expected replicas
V5: Ring/Memberlist Errors — should be zero

Additionally verify:

All Mimir components show store: memberlist in config checks

kubectl exec -it <any-mimir-pod> -n <namespace> -- \
  curl -s localhost:9009/config | jq '.ingester_ring.kvstore.store'
# Should return: "memberlist"

No cortex_multikv_* metrics being recorded (mirroring is fully disabled)
```
cortex_multikv_mirror_writes_total
# Should have no values
```
Ring convergence is healthy — V1: Ring Convergence should be < 30 seconds

All pods in all rings are marked as ALIVE (not LEAVING, UNHEALTHY, or JOINING)

kubectl exec -it <distributor-pod> -n <namespace> -- \
  curl -s localhost:9009/ingester/ring | jq '.members[] | select(.state != "ALIVE") | .addr'
# Should return empty (all pods ALIVE)

No errors in logs — V5: Ring/Memberlist Errors should be zero
Data ingestion and query latencies are stable (no regression)
- Check your monitoring dashboard for any latency increases
- Verify query success rates haven’t dropped
Your original KV backend cluster can be decommissioned
- Verify no remaining Consul/etcd instances in your infrastructure config
- Update your monitoring/alerting to remove KV backend checks (no longer needed)

Quick Reference Checklist

Pre-Migration

Mimir 2.2+
Port 7946 (TCP+UDP) open between all pods
memberlist.join_members configured
Current KV backend healthy
Pod metrics annotations use numeric port (not named port)
Prometheus access for metric verification

Phase 1

structuredConfig added with all 6 rings as store: multi, primary: <backend>
memberlist.cluster_label_verification_disabled: true set
memberlist.cluster_label set to a unique identifier (e.g., mimir-prod-us-east-1)
Deploy and wait 15 min
No "unknown message type" errors in logs
cortex_multikv_mirror_write_errors_total is zero (after 15 min)
All pods joined memberlist gossip cluster

Phase 2

multi_kv_config: primary: memberlist, mirror_enabled: false added to runtimeConfig only
structuredConfig left unchanged
Deploy — no pod restart triggered
All 6 components switched to memberlist within 30s
Wait 15 min — no errors, ring stable
Ring convergence healthy

Phase 3

Remove cluster_label_verification_disabled: true from structuredConfig (enforcement re-enabled)
Cluster label kept in place (verification now enforced)
All 6 rings changed to store: memberlist (remove multi KV)
multi_kv_config removed from runtimeConfig
KV backend deployment disabled/removed from infra config
Deploy — rolling restart
Verify cluster_label_verification_disabled: false (enforcement active)
Zero KV backend errors in logs
KV backend pods deleted
Logs confirm store=memberlist

Post-Migration

All rings showing ALIVE instances
No memberlist/ring/kvstore errors
Ring convergence < 30 seconds
Ingestion and query latencies normal
KV backend cluster decommissioned