In Vim, how to remove all lines that are duplicate somewhere
Asked Answered
J

9

6

I have a file that contains lines as follows:

one one
one one
two two two
one one
three three
one one
three three
four

I want to remove all occurrences of the duplicate lines from the file and leave only the non-duplicate lines. So, in the example above, the result should be:

two two two
four

I saw this answer to a similar looking question. I tried to modify the ex one-liner as given below:

:syn clear Repeat | g/^\(.*\)\n\ze\%(.*\n\)*\1$/exe 'syn match Repeat "^' . escape(getline ('.'), '".\^$*[]') . '$"' | d

But it does not remove all occurrences of the duplicate lines, it removes only some occurrences.

How can I do this in vim? or specifically How can I do this with ex in vim?

To clarify, I am not looking for sort u.

Jegger answered 5/3, 2014 at 9:29 Comment(0)
C
1

My PatternsOnText plugin version 1.30 now has a

:DeleteAllDuplicateLinesIgnoring

command. Without any arguments, it'll work as outlined in your question.

Conversationalist answered 13/3, 2014 at 11:23 Comment(0)
D
5

If you have access to UNIX-style commands, you could do:

:%!sort | uniq -u

The -u option to the uniq command performs the task you require. From the uniq command's help text:

   -u, --unique
          only print unique lines

I should note however that this answer assumes that you don't mind that the output doesn't match any sort order that your input file might have already.

Deposition answered 5/3, 2014 at 9:43 Comment(3)
Ah, sorry, missed that (would have been nice if the answer highlighted that)!Conversationalist
but it sorts the input.Jegger
OK. If the original order is important then any solution that uses uniq won't work because it relies on the input being pre-sorted to get the behaviour you need.Deposition
B
3

if you are on linux box with awk available, this line works for your needs:

:%!awk '{a[$0]++}END{for(x in a)if(a[x]==1)print x}'
Betrothal answered 5/3, 2014 at 9:38 Comment(0)
G
2

Assuming you are on an UNIX derivative, the command below should do what you want:

:sort | %!uniq -u

uniq only works on sorted lines so we must sort them first with Vim's buit-in :sort command to save some typing (it works on the whole buffer by default so we don't need to pass it a range and it's a built-in command so we don't need the !).

Then we filter the whole buffer through uniq -u.

Gallenz answered 5/3, 2014 at 9:45 Comment(1)
Ah, sorry, missed that (would have been nice if the answer highlighted that)!Conversationalist
L
1

It does not preserve the order of the remaining lines, but this seems to work:

:sort|%s/^\(.*\)\n\%(\1\n\)\+//

(This version is @Peter Rincker's idea, with a little correction from me.) On vim 7.3, the following even shorter version works:

:sort | %s/^\(.*\n\)\1\+//

Unfortunately, due to differences between the regular-expression engines, this no longer works in vim 7.4 (including patches 1-52).

Largo answered 5/3, 2014 at 13:41 Comment(3)
A bit short: :sort|%s/\(.*\)\n\%(\1\n\)\+//Grandson
Very nice, and quite different. I might even delete my answer if you post yours separately. But anchor your pattern with ^ before \(.*\) or else "a four" and "four" may sort to adjacent lines and be deleted. And you can make it a little simpler by capturing the \n: :sort|s/^\(.*\n\)\1\+// (since \1 is an atom).Largo
I missed that anchor. Good catch! Your second substitution is made much tidier by enlarging the capture group. However there is one problem: It doesnt work correctly with Vim's new NFA regex engine (as of 7.4). %s/\%#=2^\(.*\n\)\1\+//. Feel free to post this answer in my steadGrandson
C
1

My PatternsOnText plugin version 1.30 now has a

:DeleteAllDuplicateLinesIgnoring

command. Without any arguments, it'll work as outlined in your question.

Conversationalist answered 13/3, 2014 at 11:23 Comment(0)
C
0

Taking the code from here and modifying it to delete the lines instead of highlighting them, you'll get this:

function! DeleteDuplicateLines() range
  let lineCounts = {}
  let lineNum = a:firstline
  while lineNum <= a:lastline
    let lineText = getline(lineNum)
    if lineText != ""
        if has_key(lineCounts, lineText)
            execute lineNum . 'delete _'
            if lineCounts[lineText] > 0
              execute lineCounts[lineText] . 'delete _'
              let lineCounts[lineText] = 0
              let lineNum -= 1
            endif
        else
            let lineCounts[lineText] =  lineNum
            let lineNum += 1
        endif
    else
      let lineNum += 1
    endif
  endwhile
endfunction

command! -range=% DeleteDuplicateLines <line1>,<line2>call DeleteDuplicateLines()
Conversationalist answered 5/3, 2014 at 10:20 Comment(0)
L
0

This is not any simpler than @Ingo Karkat's answer, but it is a little more flexible. Like that answer, this leaves the remaining lines in the original order.

function! RepeatedLines(...)
  let first = a:0 ? a:1 : 1
  let last = (a:0 > 1) ? a:2 : line('$')
  let lines = []
  for line in range(first, last - 1)
    if index(lines, line) != -1
      continue
    endif
    let newlines = []
    let text = escape(getline(line), '\')
    execute 'silent' (line + 1) ',' last
      \ 'g/\V' . text . '/call add(newlines, line("."))'
    if !empty(newlines)
      call add(lines, line)
      call extend(lines, newlines)
    endif
  endfor
  return sort(lines)
endfun
:for x in reverse(RepeatedLines()) | execute x 'd' | endfor

A few notes:

  1. My function accepts arguments instead of handling a range. It defaults to the entire buffer.
  2. This illustrates some of the functions for manipulating lists. :help list-functions
  3. I use /\V (very no magic) so the only character I need to escape in a search pattern is the backslash itself. :help /\V
Largo answered 5/3, 2014 at 15:28 Comment(1)
If you want to make a command out of it, then :command! -range=% DeleteDuplicateLines for x in reverse(RepeatedLines(<line1>,<line2>)) <Bar> execute x 'd' <Bar> endfor should work.Largo
O
0
  1. Add line number so that you can restore the order before sort :%s/^/=printf("%d ", line("."))/g
  2. sort :sort /^\d+/
  3. Remove duplicate lines :%s/^(\d+ )(.*)\n(\d+ \2\n)+//g
  4. Restore order :sort
  5. Remove line number added in #1 :%s/^\d+ //g
Oxa answered 13/6, 2021 at 15:57 Comment(0)
G
-1

please use perl ,perl can do it easily !

use strict;use warnings;use diagnostics;
#read input file
open(File1,'<input.txt') or die "can not open file:$!\n";my @data1=<File1>;close(File1);
#save row and count number of row in hash 
my %rownum;
foreach my $line1 (@data1)
{ 
    if (exists($rownum{$line1}))
    { 
        $rownum{$line1}++;
    }
    else
    {
        $rownum{$line1}=1;
    }
}
#if number of row in hash =1 print it
open(File2,'>output.txt') or die "can not open file:$!\n";
foreach my $line1 (@data1)
{ 
    if($rownum{$line1}==1)
    { 
        print File2 $line1;
    }
}
close(File2);
Grane answered 30/4, 2014 at 6:52 Comment(2)
This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post - you can always comment on your own posts, and once you have sufficient reputation you will be able to comment on any post.Doyen
I have provide my perl code here now,it is easy to understandGrane

© 2022 - 2024 — McMap. All rights reserved.