Ways to improve first time indexing in ElasticSearch -


in application, have need re-index of data time time. have noticed time takes index data first time (via bulk index) slower subsequent re-indexing. in 1 scenario, takes 2 hours perform indexing first time, , 15 minutes (indexing same data) subsequent indexing.

while 2 hours index first time reasonable, curious why subsequent iterations re-index faster. , more so, wondering if there's can improve performance when indexing first time, e.g. perhaps indicating how large index be, etc.

thanks, eric

edited strike out references merge_factor has been removed in es 2.0: https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_setting_changes.html#_merge_and_merge_throttling_settings


as damien indicates, can indeed influence (bulk) indexing settings - refresh_interval can set -1 temporarily , set default value of 1s after complete bulk indexing. another setting modify merge.policy.merge_factor; set higher value such 30 , default of 10 once done.

there number of tutorials , mailing list discussions optimizing bulk indexing, here's official doc links start with:

http://www.elasticsearch.org/guide/reference/index-modules/merge/ http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings/

if haven't tuned memory settings jvm, should. although specific 512mb vps running ubuntu 10.04 server, these settings (http://pastebin.com/mnugqcly) should point in right direction. basically, allocating desired amount of ram elasticsearch upon startup can improve jvm memory allocation/gc timing.


Comments

Popular posts from this blog

java - JavaFX 2 slider labelFormatter not being used -

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -