Distrubuted TensorFlow : CreateSession still waiting for response from worker: /job:ps/replica:0/task:0 -
i trying example provided here: https://github.com/ischlag/distributed-tensorflow-example have 2 machines: 1 working server , other worker. (versions on both machine 1.0.1)
i getting following error:
variables initialized ... tensorflow/core/distributed_runtime/master.cc:193] createsession still waiting response worker: /job:ps/replica:0/task:0 tensorflow/core/distributed_runtime/master.cc:193] createsession still waiting response worker: /job:worker/replica:0/task:1 tensorflow/core/distributed_runtime/master.cc:193] createsession still waiting response worker: /job:worker/replica:0/task:2
i had similar issue able fix adding third node master clusterspec. tf_config environment variable looks like:
tf_config = { 'cluster' : { 'master' : [ master_node01:2222 ], 'ps' : [ps_node01:2222, ...] 'worker' : [worker_node01:2222, ...]} 'environment' : 'cloud', 'task': {'type': current_task, 'index': current_index}}
Great Post, I really appreciate your effort here. We could surely use it for some help. In addition I would like to share an article.
ReplyDeleteandroid app development adelaide