python - Rearranging Data in CSV by Certain Data in a Column -
i have csv file 30,000 lines of data 24 columns. last column geographic column , looks this:
ethiopia il il tx tx md ny ny ethiopia ethiopia sweden ca ca hi latvia oh
right want entire csv rows correspond geographic locations of united states 2 character state abbreviations (ca, hi, oh, etc.)
basically want data in csv remove non usa related, or better if possible, arrange first x amount of lines usa based locations , rest else @ end of csv.
here code far:
import csv ask = "y" while ask != "n": inputfile = input("please enter filename: ") filename = open(inputfile, "r") data = [] filename f: reader = csv.reader(f, delimiter=',') row in reader: if len(row[24]) == 3: data = row[24] datalist = row[0:23].join(data) output = open("newly created data.csv","w") output.write(datalist) print ("done.") output.close() ask = input("another file, y or n? ")
it arranges data in column 24 correctly reading usa locations, don't know how sort rest of file , other 23 columns match usa locations.
i'm using python 3, thanks.
for purely standard library solution, maybe like
import csv open('location.csv', newline='') fp_in: reader = csv.reader(fp_in, delimiter=',') data = list(reader) data.sort(key=lambda x: (len(x[-1].strip()) != 2, x[-1].strip())) open("locout.csv", "w", newline='') fp_out: writer = csv.writer(fp_out, delimiter=',') writer.writerows(data)
the sort key function, lambda x: (len(x[-1].strip()) != 2, x[-1].strip()))
, means it'll sort data first whether or not last column has 2 characters or not, putting 2-character locations first, , secondly name (effectively alphabetizing them, @ least if start capital letter.)
i'm assuming file isn't large: 30000 lines isn't many, 24 columns, might work entirely in memory.
(aside: if you're doing lot of csv manipulations, might interested in pandas library -- makes lot of operations simpler they'd otherwise.)
Comments
Post a Comment