java - Understanding the Hadoop File System Counters -
i want understand filesystem counters in hadoop.
below counters job ran.
in every job run, observe map file bytes read equal hdfs bytes read. , observe file bytes written map sum of file bytes , hdfs bytes read mapper. pls help! same data being read both local file , hdfs, , both being written local file system map phase?
map
file_bytes_read 5,062,341,139
hdfs_bytes_read 4,405,881,342
file_bytes_written 9,309,466,964
hdfs_bytes_written 0
thanks!
so answer noticing job specific. depending on job mappers/reducers write more or less bytes local file compared hdfs.
in mapper case, have similar amount of data read in both local , hdfs locations, there no problem there. mapper code happens need read same amount of data locally reads hdfs. of time mappers being used analyze amount of data greater it's ram, it's not surprising see possibly writing data gets hdfs local drive. number of bytes read hdfs , local not going sum local write size (which don't in case).
here example using terasort, 100g of data, 1 billion key/value pairs.
file system counters file: number of bytes read=219712810984 file: number of bytes written=312072614456 file: number of read operations=0 file: number of large read operations=0 file: number of write operations=0 hdfs: number of bytes read=100000061008 hdfs: number of bytes written=100000000000 hdfs: number of read operations=2976 hdfs: number of large read operations=0
things notice. number of bytes read , written hdfs 100g. because 100g needed sorted, , final sorted files need written. notice needs lot of local read/writes hold , sort data, 2x , 3x amount of data read!
as final note, unless want run job without caring result. amount of hdfs bytes written should never 0, , yours hdfs_bytes_written 0
Comments
Post a Comment