How to find duplicate lines in a file?

G

4

6

I have an input file with foillowing data:

line1
line2
line3
begin
line5
line6
line7
end
line9
line1
line3

I am trying to find all the duplicate lines , I tried

sort filename | uniq -c

but does not seem to be working for me :

It gives me :

  1 begin
  1 end
  1 line1
  1 line1
  1 line2
  1 line3
  1 line3
  1 line5
  1 line6
  1 line7
  1 line9

the question may seem duplicate as Find duplicate lines in a file and count how many time each line was duplicated? but nature of input data is different .

Please suggest .

Godsend answered 9/1, 2017 at 12:20 Comment(2)

If I try to reproduce your problem, I get lines like 2 line3, so probably there is a problem with spacing after line1 etc in the source file. – Heritable 9/1, 2017 at 12:24

Thanks Will there was a spacing problem indeed , I removed the space and result is OK – Godsend 9/1, 2017 at 12:29

T

10

use this:

sort filename | uniq -d
man uniq

Twelvemonth answered 9/1, 2017 at 12:22 Comment(0)

I

0

try

sort -u file

or

awk '!a[$0]++' file

Iva answered 15/6, 2022 at 13:29 Comment(0)

E

0

you'll have to modify the standard de-dupe code just a tiny bit to account for this:

if you want unique copy of the duplicates, then it's very much same idea:

  {m,g}awk 'NF~ __[$_]++' FS='^$'
  {m,g}awk '__[$_]++==!_'

If you want every copy printed for duplicates, then whenever the condition yields true for the first time, print 2 copies of it, plus print new matches along the way.

Usually it's waaaaaaaaay faster to first de-dupe, then sort, instead of the other way around.

Endocrinology answered 15/6, 2022 at 14:57 Comment(0)

M

0

Pass the file name as the first argument to this script.

Example: find-dupes.sh name.ext

#!/usr/bin/env bash

# Check if a file name is provided
if [ $# -eq 0 ]; then
    echo "Usage: $0 [file]"
    exit 1
fi

# File to check for duplicates
file="$1"

# Check if the file exists
if [ ! -f "$file" ]; then
    echo "Error: File not found."
    exit 1
fi

# Finding duplicates
duplicates=$(sort "$file" | uniq -d)

if [ -z "$duplicates" ]; then
    printf "\n%s\n" "No duplicates were found in $file."
else
    printf "\n%s\n\n" "Duplicate lines in $file:"
    echo "$duplicates"
fi

Midmost answered 7/1 at 22:9 Comment(0)

Recommended topics

Hot tags