mysql - Dealing with 5,000 attributes -
i have data set contains 5,000 + attributes tables looks below
id attr1 attr2, attr3 0 1 0 1 0 0 0 0 0 0 0 1
i wish represent each record on single row example table below make more amenable data mining via clustering.
id, attr1, attr2, attr3 1 1 1
i have tried multitude of ways of doing have tried importing mysql db , getting max value each attribute (they can 1 or 0 each id) table cant hold 5,000 + attributes.
i have tried using pivot function in excel , getting max value per attribute number of columns pivot can handle far less 5,000 i'm looking at.
i have tried importing tableua suffers fact cant handle many records
i want table 2 in either text/csv file or database table
can suggest @ all, piece of software or have not yet considered
here python script ask for
def merge_rows_by_id(path): rows = dict() open(path) in_file: header = in_file.readline().rstrip() line in in_file: fields = line.split() id, attributes = fields[0], fields[1:] if id not in rows: rows[id] = attributes else: rows[id] = [max(x) x in zip(rows[id], attributes)] print (header) id in rows: print ('{},{}'.format(id, ','.join(rows[id]))) merge_rows_by_id('my-data.txt')
which written clarity more maximum efficiency, although it's pretty efficient. however, still leave lines 5000 attributes, fewer of them.
i've seen data "structure" used in bioinformatics researchers "put know "a" on 1 row, , set of "everything" doubles, , re-doubles, etc. i've had teach them data normalization make rdbm handle they've got. usually, attr_1…n
1 trial , attr_n+1…m
second trial, , on allows sensible normalization of data.
Comments
Post a Comment