Ways to improve first time indexing in ElasticSearch -
in application, have need re-index of data time time. have noticed time takes index data first time (via bulk index) slower subsequent re-indexing. in 1 scenario, takes 2 hours perform indexing first time, , 15 minutes (indexing same data) subsequent indexing.
while 2 hours index first time reasonable, curious why subsequent iterations re-index faster. , more so, wondering if there's can improve performance when indexing first time, e.g. perhaps indicating how large index be, etc.
thanks, eric
edited strike out references merge_factor
has been removed in es 2.0: https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_setting_changes.html#_merge_and_merge_throttling_settings
as damien indicates, can indeed influence (bulk) indexing settings - refresh_interval
can set -1
temporarily , set default value of 1s
after complete bulk indexing. another setting modify merge.policy.merge_factor
; set higher value such 30
, default of 10
once done.
there number of tutorials , mailing list discussions optimizing bulk indexing, here's official doc links start with:
http://www.elasticsearch.org/guide/reference/index-modules/merge/ http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings/
if haven't tuned memory settings jvm, should. although specific 512mb vps running ubuntu 10.04 server, these settings (http://pastebin.com/mnugqcly) should point in right direction. basically, allocating desired amount of ram elasticsearch upon startup can improve jvm memory allocation/gc timing.
Comments
Post a Comment