Optimizing mysql group by with large linking table -


i've been reading lot on , still 30+ seconds per query, while i'm should work lot faster.

the problem this:

having large linking table (40 millions rows, data consisting of 650mb , index representing 1.8 gb) defined follows:

create table if not exists `glossary_entry_wordlist_1` (   `idterm` mediumint(8) unsigned not null,   `idkeyword` mediumint(8) unsigned not null,   `termlength` smallint(6) not null,   `termnumberwords` tinyint(4) not null,   `termtransliteralrfc` mediumint(9) not null,   `keywordlength` tinyint(3) unsigned not null,   `termlanguage` tinyint(4) not null,   primary key (`idkeyword`,`idterm`),   key `termtransliteralrfc` (`termtransliteralrfc`),   key `termlength` (`termlength`),   key `secondprimary` (`idterm`,`idkeyword`) ) engine=myisam default charset=latin1 collate=latin1_general_ci 

and small temporary table defined follows:

create temporary table if not exists `foundids` (   `searchid` int(11) not null,   `searchedkeywordid` int(11) not null,   `similarkeywordid` mediumint(8) unsigned not null,   `partsmatched` tinyint(4) not null,   `sumsimliarparts` int(11) not null,   `keywordlength` int(11) not null,   `fuzzymark` float not null,   `keyworddjb2` bigint(20) not null,   `smallkeyword` tinyint(4) not null,   primary key (`similarkeywordid`),   key `searchid` (`searchid`),   key `searchedkeywordid` (`searchedkeywordid`),   key `partsmatched` (`partsmatched`),   key `keywordlength` (`keywordlength`),   key `smallkeyword` (`smallkeyword`),   key `keyworddjb2` (`keyworddjb2`) ) engine=memory default charset=utf8 collate=utf8_unicode_ci; 

i need retrieve glossary_entry_wordlist_1 idterm associated @ least 50% (or other percent) of idkeyword in table foundids.

practically, need find sentences contain x words.

for use query (please mind conditional data here example):

select glossary_entry_wordlist_1.idterm,  count( glossary_entry_wordlist_1.idkeyword) termsmatched ,  sum(foundids.keywordlength) sumtermlength,  sum(foundids.fuzzymark)/100 sumfuzzy,  sum(foundids.partsmatched*foundids.keywordlength)/100 sumlengthsimilar,  foundids.searchid foundids inner join glossary_entry_wordlist_1 on glossary_entry_wordlist_1.idkeyword = foundids.similarkeywordid foundids.searchid = '17559' , glossary_entry_wordlist_1.termtransliteralrfc >= '824.4' , glossary_entry_wordlist_1.termlength>= '8.55' , glossary_entry_wordlist_1.termlength<= '18' , foundids.smallkeyword = 0 group glossary_entry_wordlist_1.idterm having count( glossary_entry_wordlist_1.idkeyword)>'2' order null 

and explain:

id  select_type  table  type  possible_keys  key  key_len  ref  rows    1 simple foundids ref primary,searchid,smallkeyword searchid 4 const 8 using where; using temporary  1 simple glossary_entry_wordlist_1 ref primary,termtransliteralrfc,termlength primary 3 foundids.similarkeywordid 146 using  

the engine behavior this: - smaller word length (1-2 letters) longer query response (obviously have lot more associations) - more words in searched table (foundids), longer query

any ideas on how improve query response?

thank you,


Comments

Popular posts from this blog

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -

java - JavaFX 2 slider labelFormatter not being used -