linux - Finding and Listing Duplicate Words in a Plain Text file -
i have rather large file trying make sense of. generated list of entire directory structure contains lot of files using du -ah command. result lists folders under specific folder , consequent files inside folder in plain text format.
eg:
4.0g ./reel_02/scans/200113/001/promise pegasus/bmb 10/red epic data/r3d/18-09-12/cam b/b119_0918no/b119_0918no.rdm/b119_c004_0918xj.rdc/b119_c004_0918xj_003.r3d 3.1g ./reel_02/scans/200113/001/promise pegasus/bmb 10/red epic data/r3d/18-09-12/cam b/b119_0918no/b119_0918no.rdm/b119_c004_0918xj.rdc/b119_c004_0918xj_004.r3d 15g ./reel_02/scans/200113/001/promise pegasus/bmb 10/red epic data/r3d/18-09-12/cam b/b119_0918no/b119_0918no.rdm/b119_c004_0918xj.rdc is there command can run or utility can use me identify if there more 1 record of same filename (usually last 16 characters in each line + extension) , if such duplicate entries exist, write out entire path (full line) different text file can find , move out duplicate files nas, using script or something.
please let me know incredibly stressful when plaintext file 5.2mb :)
split each line on /, last item (cut cannot it, revert each line , take first one), sort , run uniq -d shows duplicates.
rev file | cut -f1 -d/ | rev | sort | uniq -d
Comments
Post a Comment