In Vim, how to remove all lines that are duplicate somewhere

Asked 5/3, 2014 at 9:29 Answered 13/6, 2021 at 15:57

I have a file that contains lines as follows:

one one
one one
two two two
one one
three three
one one
three three
four

I want to remove all occurrences of the duplicate lines from the file and leave only the non-duplicate lines. So, in the example above, the result should be:

two two two
four

I saw this answer to a similar looking question. I tried to modify the ex one-liner as given below:

:syn clear Repeat | g/^\(.*\)\n\ze\%(.*\n\)*\1$/exe 'syn match Repeat "^' . escape(getline ('.'), '".\^$*[]') . '$"' | d

But it does not remove all occurrences of the duplicate lines, it removes only some occurrences.

How can I do this in vim? or specifically How can I do this with ex in vim?

To clarify, I am not looking for sort u.

Jegger answered 5/3, 2014 at 9:29 Comment(0)

My PatternsOnText plugin version 1.30 now has a

:DeleteAllDuplicateLinesIgnoring

command. Without any arguments, it'll work as outlined in your question.

Conversationalist answered 13/3, 2014 at 11:23 Comment(0)

If you have access to UNIX-style commands, you could do:

:%!sort | uniq -u

The -u option to the uniq command performs the task you require. From the uniq command's help text:

   -u, --unique
          only print unique lines

I should note however that this answer assumes that you don't mind that the output doesn't match any sort order that your input file might have already.

Deposition answered 5/3, 2014 at 9:43 Comment(3)

Ah, sorry, missed that (would have been nice if the answer highlighted that)! – Conversationalist 5/3, 2014 at 10:33

but it sorts the input. – Jegger 5/3, 2014 at 11:12

OK. If the original order is important then any solution that uses uniq won't work because it relies on the input being pre-sorted to get the behaviour you need. – Deposition 5/3, 2014 at 11:16

if you are on linux box with awk available, this line works for your needs:

:%!awk '{a[$0]++}END{for(x in a)if(a[x]==1)print x}'

Betrothal answered 5/3, 2014 at 9:38 Comment(0)

Assuming you are on an UNIX derivative, the command below should do what you want:

:sort | %!uniq -u

uniq only works on sorted lines so we must sort them first with Vim's buit-in :sort command to save some typing (it works on the whole buffer by default so we don't need to pass it a range and it's a built-in command so we don't need the !).

Then we filter the whole buffer through uniq -u.

Gallenz answered 5/3, 2014 at 9:45 Comment(1)

Ah, sorry, missed that (would have been nice if the answer highlighted that)! – Conversationalist 5/3, 2014 at 10:33

It does not preserve the order of the remaining lines, but this seems to work:

:sort|%s/^\(.*\)\n\%(\1\n\)\+//

(This version is @Peter Rincker's idea, with a little correction from me.) On vim 7.3, the following even shorter version works:

:sort | %s/^\(.*\n\)\1\+//

Unfortunately, due to differences between the regular-expression engines, this no longer works in vim 7.4 (including patches 1-52).

Largo answered 5/3, 2014 at 13:41 Comment(3)

A bit short: :sort|%s/\(.*\)\n\%(\1\n\)\+// – Grandson 5/3, 2014 at 15:19

Very nice, and quite different. I might even delete my answer if you post yours separately. But anchor your pattern with ^ before \(.*\) or else "a four" and "four" may sort to adjacent lines and be deleted. And you can make it a little simpler by capturing the \n: :sort|s/^\(.*\n\)\1\+// (since \1 is an atom). – Largo 5/3, 2014 at 15:36

I missed that anchor. Good catch! Your second substitution is made much tidier by enlarging the capture group. However there is one problem: It doesnt work correctly with Vim's new NFA regex engine (as of 7.4). %s/\%#=2^\(.*\n\)\1\+//. Feel free to post this answer in my stead – Grandson 5/3, 2014 at 15:47

My PatternsOnText plugin version 1.30 now has a

:DeleteAllDuplicateLinesIgnoring

command. Without any arguments, it'll work as outlined in your question.

Conversationalist answered 13/3, 2014 at 11:23 Comment(0)

Taking the code from here and modifying it to delete the lines instead of highlighting them, you'll get this:

function! DeleteDuplicateLines() range
  let lineCounts = {}
  let lineNum = a:firstline
  while lineNum <= a:lastline
    let lineText = getline(lineNum)
    if lineText != ""
        if has_key(lineCounts, lineText)
            execute lineNum . 'delete _'
            if lineCounts[lineText] > 0
              execute lineCounts[lineText] . 'delete _'
              let lineCounts[lineText] = 0
              let lineNum -= 1
            endif
        else
            let lineCounts[lineText] =  lineNum
            let lineNum += 1
        endif
    else
      let lineNum += 1
    endif
  endwhile
endfunction

command! -range=% DeleteDuplicateLines <line1>,<line2>call DeleteDuplicateLines()

Conversationalist answered 5/3, 2014 at 10:20 Comment(0)

This is not any simpler than @Ingo Karkat's answer, but it is a little more flexible. Like that answer, this leaves the remaining lines in the original order.

function! RepeatedLines(...)
  let first = a:0 ? a:1 : 1
  let last = (a:0 > 1) ? a:2 : line('$')
  let lines = []
  for line in range(first, last - 1)
    if index(lines, line) != -1
      continue
    endif
    let newlines = []
    let text = escape(getline(line), '\')
    execute 'silent' (line + 1) ',' last
      \ 'g/\V' . text . '/call add(newlines, line("."))'
    if !empty(newlines)
      call add(lines, line)
      call extend(lines, newlines)
    endif
  endfor
  return sort(lines)
endfun
:for x in reverse(RepeatedLines()) | execute x 'd' | endfor

A few notes:

My function accepts arguments instead of handling a range. It defaults to the entire buffer.
This illustrates some of the functions for manipulating lists. :help list-functions
I use /\V (very no magic) so the only character I need to escape in a search pattern is the backslash itself. :help /\V

Largo answered 5/3, 2014 at 15:28 Comment(1)

If you want to make a command out of it, then

:command! -range=% DeleteDuplicateLines for x in reverse(RepeatedLines(<line1>,<line2>)) <Bar> execute x 'd' <Bar> endfor

should work. – Largo 5/3, 2014 at 18:32

Add line number so that you can restore the order before sort :%s/^/=printf("%d ", line("."))/g
sort :sort /^\d+/
Remove duplicate lines :%s/^(\d+ )(.*)\n(\d+ \2\n)+//g
Restore order :sort
Remove line number added in #1 :%s/^\d+ //g

Oxa answered 13/6, 2021 at 15:57 Comment(0)

-1

please use perl ,perl can do it easily !

use strict;use warnings;use diagnostics;
#read input file
open(File1,'<input.txt') or die "can not open file:$!\n";my @data1=<File1>;close(File1);
#save row and count number of row in hash 
my %rownum;
foreach my $line1 (@data1)
{ 
    if (exists($rownum{$line1}))
    { 
        $rownum{$line1}++;
    }
    else
    {
        $rownum{$line1}=1;
    }
}
#if number of row in hash =1 print it
open(File2,'>output.txt') or die "can not open file:$!\n";
foreach my $line1 (@data1)
{ 
    if($rownum{$line1}==1)
    { 
        print File2 $line1;
    }
}
close(File2);

Grane answered 30/4, 2014 at 6:52 Comment(2)

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post - you can always comment on your own posts, and once you have sufficient reputation you will be able to comment on any post. – Doyen 30/4, 2014 at 7:14

I have provide my perl code here now,it is easy to understand – Grane 30/4, 2014 at 7:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags