garbage collection - java.lang.OutOfMemoryError: GC overhead limit -
i have program reads large list of sequences file , calculation among of pairs in list. stores of these calculations hashset. when running program halfway through, gc overhead limit error.
i realize because garbage collector using 98% of computation time , unable recover 2% of heap. here code have:
arraylist<string> c = loadsequences("file.txt"); // loads 60 char dna sequences hashset<dnapair,double> lsa = new hashset<dnapair,double>(); for(int = 0; < c.size(); i++) { for(int j = i+1; j < c.size(); j++) { lsa.put(new dnapair(c.get(i),c.get(j)),localseqalignmentsimilarity(c.get(i),c.get(j))); } }
and here's code actual method:
public static double localseqalignmentsimilarity(string s1, string s2) { s1 = " " + s1; s2 = " " + s2; int max = 0,h = 0,maxi = 0,maxj = 0; int[][] score = new int[61][61]; int[][] pointers = new int[61][61]; for(int = 1; < s1.length(); i++) { pointers[i][0] = 2; } for(int = 1; < s2.length(); i++) { pointers[0][i] = 1; } boolean ingap = false; for(int = 1; < s1.length(); i++) { for(int j = 1; j < s2.length(); j++) { h = -99; if(score[i-1][j-1] + match(s1.charat(i),s2.charat(j)) > h) { h = score[i-1][j-1] + match(s1.charat(i),s2.charat(j)); pointers[i][j] = 3; ingap = false; } if(!ingap) { if(score[i-1][j] + gappenalty > h) { h = score[i-1][j] + gappenalty; pointers[i][j] = 2; ingap = true; } if(score[i][j-1] + gappenalty > h) { h = score[i][j-1] + gappenalty; pointers[i][j] = 1; ingap = true; } } else { if(score[i-1][j] + gapextension > h) { h = score[i-1][j] + gapextension; pointers[i][j] = 2; ingap = true; } if(score[i][j-1] + gapextension > h) { h = score[i][j-1] + gapextension; pointers[i][j] = 1; ingap = true; } } if(0 > h) h = 0; score[i][j] = h; if(h >= max) { max = h; maxi = i; maxj = j; } } } double matches = 0; string o1 = "", o2 = ""; while(!(maxi == 0 && maxj == 0)) { if(pointers[maxi][maxj] == 3) { o1 += s1.charat(maxi); o2 += s2.charat(maxj); maxi--; maxj--; } else if(pointers[maxi][maxj] == 2) { o1 += s1.charat(maxi); o2 += "_"; maxi--; } else if(pointers[maxi][maxj] == 1) { o1 += "_"; o2 += s2.charat(maxj); maxj--; } } stringbuilder = new stringbuilder(o1); b = new stringbuilder(o2); o1 = a.reverse().tostring(); o2 = b.reverse().tostring(); a.setlength(0); b.setlength(0); for(int = 0; < math.min(o1.length(), o2.length()); i++) { if(o1.charat(i) == o2.charat(i)) matches++; } return matches/math.min(o1.length(), o2.length()); }
i thought because of variables declare inside method (the 2 int arrays , stringbuilders etc.) creating more , more objects every time method run changed them static fields , cleared them everytime (ex. arrays.fill(score,0);) instead of creating new object.
however didn't @ , still got same error.
could hashset stores of calculations getting big , unable stored java? i'm not getting out of heap space error seems kind of strange.
i changed command line argument give more space jvm didn't seem help.
any insight on problem helpful. thanks!
this problem, if c.size() 73657 , sequences unique:
hashset<dnapair,double> lsa = new hashset<dnapair,double>(); for(int = 0; < c.size(); i++) { for(int j = i+1; j < c.size(); j++) { lsa.put(...); } }
assuming these unique sequences, you're adding element lsa each pair. mention have 70k sequences, going have 70k * 70k = ~5 billion pairs, each of take @ minimum 4 bytes store, meaning you'd need 20+ gb allocated @ minimum feasible.
Comments
Post a Comment