python - Comparing two files and printing similar lines -

January 15, 2015

i've got 2 text files both have index lines. want compare file1 , file2 , send similar lines new text file. i've been googling awhile , have been trying grep in various forms feel i'm getting in on head. i'd see 'mon-######' file2 appear in file1 , print lines file1 correspond.

(the files larger, cut them down brevity's sake)

for greater clarity:

file1 has entries of form:

mon-000101  100.27242   9.608597   11.082   10.034 mon-000102  100.18012   9.520860   12.296   12.223

file2 has entries of form:

mon-000101 mon-000171

so, if identifier (mon-000101 instance) file2 listed in file1 want entire line begins mon-000101 printed separate file. if isn't listed in file2 can discarded.

so if files large above files newly produced file have single entry of

mon-000101  100.27242   9.608597   11.082   10.034

because that's 1 common both.

since earlier questions you're @ least little familiar pandas, how about:

import pandas pd df1 = pd.read_csv("file1.csv", sep=r"\s+") df2 = pd.read_csv("file2.csv", sep=r"\s+") merged = df1.merge(df2.rename_axis({"mon-id": "name"})) merged.to_csv("merged.csv", index=false)

some explanation (note i've modified file2.csv there more elements in common) follows.

first, read data:

>>> import pandas pd >>> df1 = pd.read_csv("file1.csv", sep=r"\s+") >>> df2 = pd.read_csv("file2.csv", sep=r"\s+") >>> df1.head()          name         ra       dec  mean_i1  mean_i2 0  mon-000101  100.27242  9.608597   11.082   10.034 1  mon-000102  100.18012  9.520860   12.296   12.223 2  mon-000103  100.24811  9.586362    9.429    9.010 3  mon-000104  100.26741  9.867225   11.811   11.797 4  mon-000105  100.21005  9.814060   12.087   12.090 >>> df2.head()        mon-id 0  mon-000101 1  mon-000121 2  mon-000131 3  mon-000141 4  mon-000151

then, can rename axis in df2:

>>> df2.rename_axis({"mon-id": "name"}).head()          name 0  mon-000101 1  mon-000121 2  mon-000131 3  mon-000141 4  mon-000151

and after that, merge right thing:

>>> merged = df1.merge(df2.rename_axis({"mon-id": "name"})) >>> merged          name         ra       dec  mean_i1  mean_i2 0  mon-000101  100.27242  9.608597   11.082   10.034 1  mon-000121  100.45421  9.685027   11.805   11.777 2  mon-000131  100.20533  9.397307 -100.000   11.764 3  mon-000141  100.26134  9.388555 -100.000   12.571

finally, can write out, telling not add index column:

>>> merged.to_csv("output.csv", index=false)

producing file looks like

name,ra,dec,mean_i1,mean_i2 mon-000101,100.27242,9.608597,11.082,10.034 mon-000121,100.45421,9.685027,11.805,11.777 mon-000131,100.20533,9.397307,-100.0,11.764 mon-000141,100.26134,9.388555,-100.0,12.571

Search This Blog

Three

python - Comparing two files and printing similar lines -

Comments

Post a Comment

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

c# - Farseer ContactListener is not working -

Automatically create pages in phpfox -