python - Comparing two files and printing similar lines -
i've got 2 text files both have index lines. want compare file1 , file2 , send similar lines new text file. i've been googling awhile , have been trying grep in various forms feel i'm getting in on head. i'd see 'mon-######' file2 appear in file1 , print lines file1 correspond.
(the files larger, cut them down brevity's sake)
for greater clarity:
file1 has entries of form:
mon-000101 100.27242 9.608597 11.082 10.034 mon-000102 100.18012 9.520860 12.296 12.223 file2 has entries of form:
mon-000101 mon-000171 so, if identifier (mon-000101 instance) file2 listed in file1 want entire line begins mon-000101 printed separate file. if isn't listed in file2 can discarded.
so if files large above files newly produced file have single entry of
mon-000101 100.27242 9.608597 11.082 10.034 because that's 1 common both.
since earlier questions you're @ least little familiar pandas, how about:
import pandas pd df1 = pd.read_csv("file1.csv", sep=r"\s+") df2 = pd.read_csv("file2.csv", sep=r"\s+") merged = df1.merge(df2.rename_axis({"mon-id": "name"})) merged.to_csv("merged.csv", index=false) some explanation (note i've modified file2.csv there more elements in common) follows.
first, read data:
>>> import pandas pd >>> df1 = pd.read_csv("file1.csv", sep=r"\s+") >>> df2 = pd.read_csv("file2.csv", sep=r"\s+") >>> df1.head() name ra dec mean_i1 mean_i2 0 mon-000101 100.27242 9.608597 11.082 10.034 1 mon-000102 100.18012 9.520860 12.296 12.223 2 mon-000103 100.24811 9.586362 9.429 9.010 3 mon-000104 100.26741 9.867225 11.811 11.797 4 mon-000105 100.21005 9.814060 12.087 12.090 >>> df2.head() mon-id 0 mon-000101 1 mon-000121 2 mon-000131 3 mon-000141 4 mon-000151 then, can rename axis in df2:
>>> df2.rename_axis({"mon-id": "name"}).head() name 0 mon-000101 1 mon-000121 2 mon-000131 3 mon-000141 4 mon-000151 and after that, merge right thing:
>>> merged = df1.merge(df2.rename_axis({"mon-id": "name"})) >>> merged name ra dec mean_i1 mean_i2 0 mon-000101 100.27242 9.608597 11.082 10.034 1 mon-000121 100.45421 9.685027 11.805 11.777 2 mon-000131 100.20533 9.397307 -100.000 11.764 3 mon-000141 100.26134 9.388555 -100.000 12.571 finally, can write out, telling not add index column:
>>> merged.to_csv("output.csv", index=false) producing file looks like
name,ra,dec,mean_i1,mean_i2 mon-000101,100.27242,9.608597,11.082,10.034 mon-000121,100.45421,9.685027,11.805,11.777 mon-000131,100.20533,9.397307,-100.0,11.764 mon-000141,100.26134,9.388555,-100.0,12.571
Comments
Post a Comment