Split large file according to value in single column (AWK) -


i split large file (10^6 rows) according value in 6th column (about 10*10^3 unique values). however, can't working because of number of records. should easy it's taking hours , i'm not getting further.

i've tried 2 options:
option 1

awk '{print > $6".txt"}' input.file awk: cannot open "parent=mrna:solyc06g051570.2.1.txt" output (too many open files) 

option 2

awk '{print > $6; close($6)}' input.file  

this doesn't cause error files creates contain last line corresponding 'grouping' value $6

this beginning of file, however, file doesn't cause error because it's small:

exon    3688    4407    +   id=exon:solyc06g005000.2.1.1    parent=mrna:solyc06g005000.2.1 exon    4853    5604    +   id=exon:solyc06g005000.2.1.2    parent=mrna:solyc06g005000.2.1 exon    7663    7998    +   id=exon:solyc06g005000.2.1.3    parent=mrna:solyc06g005000.2.1 exon    9148    9408    +   id=exon:solyc06g005010.1.1.1    parent=mrna:solyc06g005010.1.1 exon    13310   13330   +   id=exon:solyc06g005020.1.1.1    parent=mrna:solyc06g005020.1.1 exon    13449   13532   +   id=exon:solyc06g005020.1.1.2    parent=mrna:solyc06g005020.1.1 exon    13711   13783   +   id=exon:solyc06g005020.1.1.3    parent=mrna:solyc06g005020.1.1 exon    14172   14236   +   id=exon:solyc06g005020.1.1.4    parent=mrna:solyc06g005020.1.1 exon    14717   14803   +   id=exon:solyc06g005020.1.1.5    parent=mrna:solyc06g005020.1.1 exon    14915   15016   +   id=exon:solyc06g005020.1.1.6    parent=mrna:solyc06g005020.1.1 exon    22106   22261   +   id=exon:solyc06g005030.1.1.1    parent=mrna:solyc06g005030.1.1 exon    23462   23749   -   id=exon:solyc06g005040.1.1.1    parent=mrna:solyc06g005040.1.1 exon    24702   24713   -   id=exon:solyc06g005050.2.1.3    parent=mrna:solyc06g005050.2.1 exon    24898   25402   -   id=exon:solyc06g005050.2.1.2    parent=mrna:solyc06g005050.2.1 exon    25728   25845   -   id=exon:solyc06g005050.2.1.1    parent=mrna:solyc06g005050.2.1 exon    36352   36835   +   id=exon:solyc06g005060.2.1.1    parent=mrna:solyc06g005060.2.1 exon    36916   38132   +   id=exon:solyc06g005060.2.1.2    parent=mrna:solyc06g005060.2.1 exon    57089   57096   +   id=exon:solyc06g005070.1.1.1    parent=mrna:solyc06g005070.1.1 exon    57329   58268   +   id=exon:solyc06g005070.1.1.2    parent=mrna:solyc06g005070.1.1 exon    59970   60505   -   id=exon:solyc06g005080.2.1.24   parent=mrna:solyc06g005080.2.1 exon    60667   60783   -   id=exon:solyc06g005080.2.1.23   parent=mrna:solyc06g005080.2.1 exon    63719   63880   -   id=exon:solyc06g005080.2.1.22   parent=mrna:solyc06g005080.2.1 exon    64143   64298   -   id=exon:solyc06g005080.2.1.21   parent=mrna:solyc06g005080.2.1 exon    66964   67191   -   id=exon:solyc06g005080.2.1.20   parent=mrna:solyc06g005080.2.1 exon    71371   71559   -   id=exon:solyc06g005080.2.1.19   parent=mrna:solyc06g005080.2.1 exon    73612   73717   -   id=exon:solyc06g005080.2.1.18   parent=mrna:solyc06g005080.2.1 exon    76764   76894   -   id=exon:solyc06g005080.2.1.17   parent=mrna:solyc06g005080.2.1 exon    77189   77251   -   id=exon:solyc06g005080.2.1.16   parent=mrna:solyc06g005080.2.1 exon    80044   80122   -   id=exon:solyc06g005080.2.1.15   parent=mrna:solyc06g005080.2.1 exon    80496   80638   -   id=exon:solyc06g005080.2.1.14   parent=mrna:solyc06g005080.2.1 

option 2, use “>>” instead of “>”, append.

awk '{print >> $6; close($6)}' input.file  

Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -