Parallelize function on dictionary in IPython -
up till now, have parallelized functions mapping them on lists distributed out various clusters using function map_sync(function, list)
.
now, need run function on each entry of dictionary.
map_sync not seem work on dictionaries. have tried scatter dictionary , use decorators run function in parallel. however, dictionaries dont seem lend scattering either. is there other way parallelize functions on dictionaries without having convert lists?
these attempts far:
from ipython.parallel import client rc = client() dview = rc[:] test_dict = {'43':"lion", '34':"tiger", '343':"duck"} dview.scatter("test",test) dview["test"] # yields [['343'], ['43'], ['34'], []] on 4 clusters # suggests dictionary can't scattered?
needless say, when run function itself, error:
@dview.parallel(block=true) def run(): d,v in test.iteritems(): print d,v run()
attributeerror
traceback (most recent call last) in () in run(dict) attributeerror: 'str' object has no attribute 'iteritems'
i don't know if it's relevant, i'm using ipython notebook connected amazon aws clusters.
you can scatter dict with:
def scatter_dict(view, name, d): """partition dictionary across engines of view""" ntargets = len(view) keys = d.keys() # list(d.keys()) in python 3 i, target in enumerate(view.targets): subd = {} key in keys[i::ntargets]: subd[key] = d[key] view.client[target][name] = subd scatter_dict(dview, 'test', test_dict)
and operate on remotely, would.
you can gather remote dicts 1 local 1 again with:
def gather_dict(view, name): """gather dictionaries directview""" merged = {} d in view.pull(name): merged.update(d) return merged gather_dict(dv, 'test')
Comments
Post a Comment