linux - Finding and Listing Duplicate Words in a Plain Text file -

January 15, 2011

i have rather large file trying make sense of. generated list of entire directory structure contains lot of files using du -ah command. result lists folders under specific folder , consequent files inside folder in plain text format.

eg:

4.0g    ./reel_02/scans/200113/001/promise pegasus/bmb 10/red epic data/r3d/18-09-12/cam b/b119_0918no/b119_0918no.rdm/b119_c004_0918xj.rdc/b119_c004_0918xj_003.r3d 3.1g    ./reel_02/scans/200113/001/promise pegasus/bmb 10/red epic data/r3d/18-09-12/cam b/b119_0918no/b119_0918no.rdm/b119_c004_0918xj.rdc/b119_c004_0918xj_004.r3d 15g ./reel_02/scans/200113/001/promise pegasus/bmb 10/red epic data/r3d/18-09-12/cam b/b119_0918no/b119_0918no.rdm/b119_c004_0918xj.rdc

is there command can run or utility can use me identify if there more 1 record of same filename (usually last 16 characters in each line + extension) , if such duplicate entries exist, write out entire path (full line) different text file can find , move out duplicate files nas, using script or something.

please let me know incredibly stressful when plaintext file 5.2mb :)

split each line on /, last item (cut cannot it, revert each line , take first one), sort , run uniq -d shows duplicates.

rev file | cut -f1 -d/ | rev | sort | uniq -d

Search This Blog

Three

linux - Finding and Listing Duplicate Words in a Plain Text file -

Comments

Post a Comment

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

c# - Farseer ContactListener is not working -

Automatically create pages in phpfox -