Cleanup Elasticsearch indices

1 minute read

When you manage a bunch of elasticsearch clusters, one of the questions that is likely to arise: How do I cleanup old/unused indexes? The best way to achieve this is to use Curator.

Installation

python3 -m venv .venv3
source .venv3/bin/activate
pip install elasticsearch-curator==5.8.4

Configuration

Curator needs 2 files to be able to perform operations.

1/ config.yml

client:
  hosts:
    - 127.0.0.1
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  username:
  password:
  timeout: 30
  master_only: False

logging:
  loglevel: INFO
  logfile:
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

Here we define mainly the elasticsearch instances we want curator to target.

2/ action.yml

actions:
  1:
    action: delete_indices
    description: >-
      Delete indices older than 30 days 
      and indices starting with test-,
      and exclude alias called logs. 
    options:
      ignore_empty_list: True
      disable_action: False
    filters:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 30
    - filtertype: pattern
      kind: prefix
      value: test-
    - filtertype: alias
      aliases: [ logs ]
      exclude: True

In this example, we define 1 action (delete_indices) with multiple filters (see desc.).
We exclude a specific alias to avoid removing the corresponding index
.

Execution

$ curator --dry-run --config config.yml action.yml

Preparing Action ID: 1, "delete_indices"
...
...
DRY-RUN MODE.  No changes will be made.
...
DRY-RUN: delete_indices: test-001 with arguments: {}
DRY-RUN: delete_indices: test-002 with arguments: {}
DRY-RUN: delete_indices: test-003 with arguments: {}
DRY-RUN: delete_indices: test-004 with arguments: {}
Action ID: 1, "delete_indices" completed.
Job completed.

Once we are statisfy with the output we can remove the –dry-run flag and this job can be scheduled.

Note: From the same pip package it is possible to use curator_cli where all parameters can be passed from the command line.

Final thoughts

Curator is the perfect tool when you want to manage indices in elasticsearch.
It can also perform snapshots, combine with repository plugins - it provides a perfect backup/restore solution.

References

Updated: