Optimizing mysql group by with large linking table -
i've been reading lot on , still 30+ seconds per query, while i'm should work lot faster.
the problem this:
having large linking table (40 millions rows, data consisting of 650mb , index representing 1.8 gb) defined follows:
create table if not exists `glossary_entry_wordlist_1` ( `idterm` mediumint(8) unsigned not null, `idkeyword` mediumint(8) unsigned not null, `termlength` smallint(6) not null, `termnumberwords` tinyint(4) not null, `termtransliteralrfc` mediumint(9) not null, `keywordlength` tinyint(3) unsigned not null, `termlanguage` tinyint(4) not null, primary key (`idkeyword`,`idterm`), key `termtransliteralrfc` (`termtransliteralrfc`), key `termlength` (`termlength`), key `secondprimary` (`idterm`,`idkeyword`) ) engine=myisam default charset=latin1 collate=latin1_general_ci
and small temporary table defined follows:
create temporary table if not exists `foundids` ( `searchid` int(11) not null, `searchedkeywordid` int(11) not null, `similarkeywordid` mediumint(8) unsigned not null, `partsmatched` tinyint(4) not null, `sumsimliarparts` int(11) not null, `keywordlength` int(11) not null, `fuzzymark` float not null, `keyworddjb2` bigint(20) not null, `smallkeyword` tinyint(4) not null, primary key (`similarkeywordid`), key `searchid` (`searchid`), key `searchedkeywordid` (`searchedkeywordid`), key `partsmatched` (`partsmatched`), key `keywordlength` (`keywordlength`), key `smallkeyword` (`smallkeyword`), key `keyworddjb2` (`keyworddjb2`) ) engine=memory default charset=utf8 collate=utf8_unicode_ci;
i need retrieve glossary_entry_wordlist_1
idterm
associated @ least 50% (or other percent) of idkeyword
in table foundids
.
practically, need find sentences contain x words.
for use query (please mind conditional data here example):
select glossary_entry_wordlist_1.idterm, count( glossary_entry_wordlist_1.idkeyword) termsmatched , sum(foundids.keywordlength) sumtermlength, sum(foundids.fuzzymark)/100 sumfuzzy, sum(foundids.partsmatched*foundids.keywordlength)/100 sumlengthsimilar, foundids.searchid foundids inner join glossary_entry_wordlist_1 on glossary_entry_wordlist_1.idkeyword = foundids.similarkeywordid foundids.searchid = '17559' , glossary_entry_wordlist_1.termtransliteralrfc >= '824.4' , glossary_entry_wordlist_1.termlength>= '8.55' , glossary_entry_wordlist_1.termlength<= '18' , foundids.smallkeyword = 0 group glossary_entry_wordlist_1.idterm having count( glossary_entry_wordlist_1.idkeyword)>'2' order null
and explain:
id select_type table type possible_keys key key_len ref rows 1 simple foundids ref primary,searchid,smallkeyword searchid 4 const 8 using where; using temporary 1 simple glossary_entry_wordlist_1 ref primary,termtransliteralrfc,termlength primary 3 foundids.similarkeywordid 146 using
the engine behavior this: - smaller word length (1-2 letters) longer query response (obviously have lot more associations) - more words in searched table (foundids), longer query
any ideas on how improve query response?
thank you,
Comments
Post a Comment