Optimizing mysql group by with large linking table -

- September 15, 2011

i've been reading lot on , still 30+ seconds per query, while i'm should work lot faster.

the problem this:

having large linking table (40 millions rows, data consisting of 650mb , index representing 1.8 gb) defined follows:

create table if not exists `glossary_entry_wordlist_1` (   `idterm` mediumint(8) unsigned not null,   `idkeyword` mediumint(8) unsigned not null,   `termlength` smallint(6) not null,   `termnumberwords` tinyint(4) not null,   `termtransliteralrfc` mediumint(9) not null,   `keywordlength` tinyint(3) unsigned not null,   `termlanguage` tinyint(4) not null,   primary key (`idkeyword`,`idterm`),   key `termtransliteralrfc` (`termtransliteralrfc`),   key `termlength` (`termlength`),   key `secondprimary` (`idterm`,`idkeyword`) ) engine=myisam default charset=latin1 collate=latin1_general_ci

and small temporary table defined follows:

create temporary table if not exists `foundids` (   `searchid` int(11) not null,   `searchedkeywordid` int(11) not null,   `similarkeywordid` mediumint(8) unsigned not null,   `partsmatched` tinyint(4) not null,   `sumsimliarparts` int(11) not null,   `keywordlength` int(11) not null,   `fuzzymark` float not null,   `keyworddjb2` bigint(20) not null,   `smallkeyword` tinyint(4) not null,   primary key (`similarkeywordid`),   key `searchid` (`searchid`),   key `searchedkeywordid` (`searchedkeywordid`),   key `partsmatched` (`partsmatched`),   key `keywordlength` (`keywordlength`),   key `smallkeyword` (`smallkeyword`),   key `keyworddjb2` (`keyworddjb2`) ) engine=memory default charset=utf8 collate=utf8_unicode_ci;

i need retrieve glossary_entry_wordlist_1 idterm associated @ least 50% (or other percent) of idkeyword in table foundids.

practically, need find sentences contain x words.

for use query (please mind conditional data here example):

select glossary_entry_wordlist_1.idterm,  count( glossary_entry_wordlist_1.idkeyword) termsmatched ,  sum(foundids.keywordlength) sumtermlength,  sum(foundids.fuzzymark)/100 sumfuzzy,  sum(foundids.partsmatched*foundids.keywordlength)/100 sumlengthsimilar,  foundids.searchid foundids inner join glossary_entry_wordlist_1 on glossary_entry_wordlist_1.idkeyword = foundids.similarkeywordid foundids.searchid = '17559' , glossary_entry_wordlist_1.termtransliteralrfc >= '824.4' , glossary_entry_wordlist_1.termlength>= '8.55' , glossary_entry_wordlist_1.termlength<= '18' , foundids.smallkeyword = 0 group glossary_entry_wordlist_1.idterm having count( glossary_entry_wordlist_1.idkeyword)>'2' order null

and explain:

id  select_type  table  type  possible_keys  key  key_len  ref  rows    1 simple foundids ref primary,searchid,smallkeyword searchid 4 const 8 using where; using temporary  1 simple glossary_entry_wordlist_1 ref primary,termtransliteralrfc,termlength primary 3 foundids.similarkeywordid 146 using

the engine behavior this: - smaller word length (1-2 letters) longer query response (obviously have lot more associations) - more words in searched table (foundids), longer query

any ideas on how improve query response?

thank you,

Search This Blog

Sher

Optimizing mysql group by with large linking table -

Comments

Post a Comment

Popular posts from this blog

java - How to Configure JAXRS and Spring With Annotations -

visual studio - TFS will not accept changes I've made to a Java project -

php - Create image in codeigniter on the fly -