I have a text file that contains a long list of entries (one on each line). Some of these are duplicates, and I would like to know if it is possible (and if so, how) to remove any duplicates. I am interested in doing this from within vi/vim, if possible.
If you're OK with sorting your file, you can use:
:sort u
:%!uniq
to simply remove duplicate entries without sorting the file. –
Progressionist u
–
Worldwide ZZ
.) –
Hermosillo a$b$a$
does nothing –
Zobe V
or something similar, then issue the command. –
Likeness Try this:
:%s/^\(.*\)\(\n\1\)\+$/\1/
It searches for any line immediately followed by one or more copies of itself, and replaces it with a single copy.
Make a copy of your file though before you try it. It's untested.
g/\v([^ ].*)$\n\1/d
avoiding blank lines would be great –
Ludeman From command line just do:
sort file | uniq > file.new
:sort u
was hanging on my large file. This worked very quickly and perfectly. Thank you! –
Cercus 'uniq' is not recognized as an internal or external command, operable program or batch file.
–
Uncritical awk '!x[$0]++' yourfile.txt
if you want to preserve the order (i.e., sorting is not acceptable). In order to invoke it from vim, :!
can be used.
perl -nle 'print unless $seen{$_}++' yourfile.txt
–
Pollinize I would combine two of the answers above:
go to head of file
sort the whole file
remove duplicate entries with uniq
1G
!Gsort
1G
!Guniq
If you were interested in seeing how many duplicate lines were removed, use control-G before and after to check on the number of lines present in your buffer.
'uniq' is not recognized as an internal or external command, operable program or batch file.
–
Uncritical g/^\(.*\)$\n\1/d
Works for me on Windows. Lines must be sorted first though.
aaaa
followed by aaaabb
will delete aaaa
erroneously. –
Uncritical Select the lines in visual-line mode (Shift+v), then :!uniq
. That'll only catch duplicates which come one after another.
If you don't want to sort/uniq the entire file, you can select the lines you want to make uniq in visual mode and then simply: :sort u
.
:5,10 sort u
–
Pollinize Regarding how Uniq can be implemented in VimL, search for Uniq in a plugin I'm maintaining. You'll see various ways to implement it that were given on Vim mailing-list.
Otherwise, :sort u
is indeed the way to go.
I would use !}uniq
, but that only works if there are no blank lines.
For every line in a file use: :1,$!uniq
.
:%s/^\(.*\)\(\n\1\)\+$/\1/gec
or
:%s/^\(.*\)\(\n\1\)\+$/\1/ge
this is my answer for you ,it can remove multiple duplicate lines and only keep one not remove !
This version only removes repeated lines that are contigous. I mean, only deletes consecutive repeated lines. Using the given map the function does note mess up with blank lines. But if change the REGEX to match start of line ^
it will also remove duplicated blank lines.
" function to delete duplicate lines
function! DelDuplicatedLines()
while getline(".") == getline(line(".") - 1)
exec 'norm! ddk'
endwhile
while getline(".") == getline(line(".") + 1)
exec 'norm! dd'
endwhile
endfunction
nnoremap <Leader>d :g/./call DelDuplicatedLines()<CR>
An alternative method that does not use vi/vim (for very large files), is from the Linux command line use sort and uniq:
sort {file-name} | uniq -u
This command got me a buffer without any duplicate lines without sorting, and it shouldn't be very hard to research why it works or how it could work better:
:%!python3.11 -c 'exec("import fileinput\nLINES = []\nfor line in fileinput.input():\n line = line.splitlines()[0]\n if line not in LINES:\n print(line)\n LINES.append(line)\n")'
This worked for me for both .csv
and .txt
awk '!seen[$0]++' <filename> > <newFileName>
Explanation: The first part of the command prints unique rows and the second part i.e. after the middle arrow is to save the output of the first part.
awk '!seen[$0]++' <filename>
>
<newFileName>
© 2022 - 2024 — McMap. All rights reserved.