is there mathematical model to describe the relationship between running time and input data size for hadoop? -

March 15, 2015

in hadoop cluster, there mathematical model describe curve transmission time , datainputsize of mapper?

for example, if original data size n m mappers, , total transmission time mappers reducers t. wanna double data size 2n in mappers, there approximation estimation transmission time t'(i think t' must less 2t), idea use log curve describe curve, not sure correct.

i assume input coming hdfs(?) assume input data has been placed on hdfs, we're not talking time transmit input data local file store hdfs. assume input size n total size of of input files combined. assume m number of map tasks (based on number of input splits input files broken into). if we're talking transmission between map tasks , reduce tasks need know size of output map operations. in general, size of output unrelated size of input n.

even if knew how total data needs transmitted between map tasks , reduce tasks, asking transmission time not meaningful, because transmission can happen @ same time map , reduce tasks executing, , series of individual transmissions between individual map tasks , reduce tasks each happen @ different points in time. goal of written hadoop application hide transmission time overlapping computation , communication.

Search This Blog

Three

is there mathematical model to describe the relationship between running time and input data size for hadoop? -

Comments

Post a Comment

Popular posts from this blog

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -

iphone - How do I keep MDScrollView from truncating my row headers and making my cells look bad? -