bigdata - 1 GB Data with key and value, what kind of data structure to store them? 1TB? 1 PB? -
there 1 gb data key , value, kind of data structure store them? if data increase 1tb? 1 pb? need access them every day. , how long need access them? true time? 1 minute? 1 hour?
i answered using hashtable
in 1gb , 1tb. few seconds take? i'm not sure how calculate true time. when coming 1 pb, can sort data , divide them several part , store them in hashtables
.
seems interviewer not satisfied answer. seems i'm totally wrong :(
imho, choice of structure depends heavily on how memory have. ram totally out of question when 1tb or 1pb. when interviewers ask questions these, trying see how @ logical thinking rather expecting spot-on, exact solution(at least feel this).
coming actual question, use distributed platform, hadoop, sreejith has said. in systems hadoop use multiple systems single system in order leverage combined power gain better performance. approach can reduce read/write time compared single machine, if powerful ram , processor. along hadoop provides data structures sequencefile makes easy store , process huge datasets.
but whatever method choose, disk based access(which cannot avoided when dealing data of ~tb or ~pb) slower memory based access. so, need choose data structure allows minimize disk accesses as possible. see paper detailed info on i'm trying say.
hth
Comments
Post a Comment