Best way to remove duplicate entries from a data table

N

11

47

What is the best way to remove duplicate entries from a Data Table?

Notion answered 11/12, 2010 at 6:26 Comment(1)

Please also have a look at https://mcmap.net/q/371999/-c-remove-rows-with-the-same-column-value-from-a-datatable – Dinsdale 24/6, 2014 at 13:31

G

85

Remove Duplicates

public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
   Hashtable hTable = new Hashtable();
   ArrayList duplicateList = new ArrayList();

   //Add list of all the unique item value to hashtable, which stores combination of key, value pair.
   //And add duplicate item value in arraylist.
   foreach (DataRow drow in dTable.Rows)
   {
      if (hTable.Contains(drow[colName]))
         duplicateList.Add(drow);
      else
         hTable.Add(drow[colName], string.Empty); 
   }

   //Removing a list of duplicate items from datatable.
   foreach (DataRow dRow in duplicateList)
      dTable.Rows.Remove(dRow);

   //Datatable which contains unique records will be return as output.
      return dTable;
}

Here Links below

http://www.dotnetspider.com/resources/4535-Remove-duplicate-records-from-table.aspx

http://www.dotnetspark.com/kb/94-remove-duplicate-rows-value-from-datatable.aspx

For remove duplicates in column

http://dotnetguts.blogspot.com/2007/02/removing-duplicate-records-from.html

Griseldagriseldis answered 11/12, 2010 at 6:34 Comment(5)

:How can I do it for multiple columns.I tried including string[] colName.But Table.Rows.Remove(dRow); throws error like "The given DataRow is not in the current DataRowCollection".Please suggest. – Stutman 17/10, 2012 at 12:5

You would have to modify the foreach incrementally on more levels. However, using Linq might be easier. #8940016 – Diffuse 18/4, 2013 at 8:50

The Link from dotnetspark worked for me, because i needed to remove rows only if 2 column's rows match .This only works for one type however, but it is fine for me!! – Donniedonnish 11/11, 2015 at 18:20

this solution is based on column. maybe one column is same but others not. – Dam 25/1, 2017 at 8:13

Works a treat, thank you – Galah 19/12, 2021 at 2:50

P

110

Do dtEmp on your current working DataTable:

DataTable distinctTable = dtEmp.DefaultView.ToTable( /*distinct*/ true);

It's nice.

Perfumery answered 1/2, 2013 at 10:56 Comment(3)

This answer deserves more attention, was exactly what I needed and way simpler than building a hashtable/dictionary/whatever as long as you only need to filter exact duplicates. – Swearword 23/7, 2014 at 6:20

Be careful of the performance of this solution. In my implementation ToTable() was quite slow taking 30 times longer than the original query that got the data. – Audible 2/6, 2015 at 18:17

This is very nice but it can very, very slow for larger data tables. I tried this on a datatable with 800k records with 20 or so columns and it was running for several minutes (I didn't even wait for it to complete) on my 8 vCPU Windows 7 PC. – Collinsia 24/9, 2019 at 19:48

G

85

Remove Duplicates

public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
   Hashtable hTable = new Hashtable();
   ArrayList duplicateList = new ArrayList();

   //Add list of all the unique item value to hashtable, which stores combination of key, value pair.
   //And add duplicate item value in arraylist.
   foreach (DataRow drow in dTable.Rows)
   {
      if (hTable.Contains(drow[colName]))
         duplicateList.Add(drow);
      else
         hTable.Add(drow[colName], string.Empty); 
   }

   //Removing a list of duplicate items from datatable.
   foreach (DataRow dRow in duplicateList)
      dTable.Rows.Remove(dRow);

   //Datatable which contains unique records will be return as output.
      return dTable;
}

Here Links below

http://www.dotnetspider.com/resources/4535-Remove-duplicate-records-from-table.aspx

http://www.dotnetspark.com/kb/94-remove-duplicate-rows-value-from-datatable.aspx

For remove duplicates in column

http://dotnetguts.blogspot.com/2007/02/removing-duplicate-records-from.html

Griseldagriseldis answered 11/12, 2010 at 6:34 Comment(5)

:How can I do it for multiple columns.I tried including string[] colName.But Table.Rows.Remove(dRow); throws error like "The given DataRow is not in the current DataRowCollection".Please suggest. – Stutman 17/10, 2012 at 12:5

You would have to modify the foreach incrementally on more levels. However, using Linq might be easier. #8940016 – Diffuse 18/4, 2013 at 8:50

The Link from dotnetspark worked for me, because i needed to remove rows only if 2 column's rows match .This only works for one type however, but it is fine for me!! – Donniedonnish 11/11, 2015 at 18:20

this solution is based on column. maybe one column is same but others not. – Dam 25/1, 2017 at 8:13

Works a treat, thank you – Galah 19/12, 2021 at 2:50

D

27

A simple way would be:

 var newDt= dt.AsEnumerable()
                 .GroupBy(x => x.Field<int>("ColumnName"))
                 .Select(y => y.First())
                 .CopyToDataTable();

Dryden answered 23/11, 2016 at 6:53 Comment(1)

This is the one for me. Without removing additional columns, simply ignoring duplicate records for the mentioned column names. – Pursuance 6/7, 2020 at 15:19

R

19

This post is regarding fetching only Distincts rows from Data table on basis of multiple Columns.

Public coid removeDuplicatesRows(DataTable dt)
{
  DataTable uniqueCols = dt.DefaultView.ToTable(true, "RNORFQNo", "ManufacturerPartNo",  "RNORFQId", "ItemId", "RNONo", "Quantity", "NSNNo", "UOMName", "MOQ", "ItemDescription");
}

You need to call this method and you need to assign value to datatable. In Above code we have RNORFQNo , PartNo,RFQ id,ItemId, RNONo, QUantity, NSNNO, UOMName,MOQ, and Item Description as Column on which we want distinct values.

Roseola answered 13/3, 2013 at 10:51 Comment(0)

P

10

Heres a easy and fast way using AsEnumerable().Distinct()

private DataTable RemoveDuplicatesRecords(DataTable dt)
{
    //Returns just 5 unique rows
    var UniqueRows = dt.AsEnumerable().Distinct(DataRowComparer.Default);
    DataTable dt2 = UniqueRows.CopyToDataTable();
    return dt2;
}

Pooka answered 7/3, 2013 at 6:33 Comment(0)

M

3

    /* To eliminate Duplicate rows */
    private void RemoveDuplicates(DataTable dt)
    {

        if (dt.Rows.Count > 0)
        {
            for (int i = dt.Rows.Count - 1; i >= 0; i--)
            {
                if (i == 0)
                {
                    break;
                }
                for (int j = i - 1; j >= 0; j--)
                {
                    if (Convert.ToInt32(dt.Rows[i]["ID"]) == Convert.ToInt32(dt.Rows[j]["ID"]) && dt.Rows[i]["Name"].ToString() == dt.Rows[j]["Name"].ToString())
                    {
                        dt.Rows[i].Delete();
                        break;
                    }
                }
            }
            dt.AcceptChanges();
        }
    }

Morel answered 20/3, 2013 at 11:14 Comment(0)

Y

3

There is a simple way using Linq GroupBy Method.

var duplicateValues = dt.AsEnumerable() 

        .GroupBy(row => row[0]) 

        .Where(group => (group.Count() == 1 || group.Count() > 1)) 

        .Select(g => g.Key); 



foreach (var d in duplicateValues)

        Console.WriteLine(d);

See more at: http://www.dotnetlines.com/Blogs/tabid/85/EntryId/49/Remove-duplicate-rows-from-a-DataTable-using-LINQ.aspx

Youngs answered 20/9, 2013 at 5:41 Comment(1)

I guess you don't need to group by count() == 1 or you will get all rows anyway. – Yaakov 6/9, 2014 at 15:54

B

2

Completely distinct rows:

public static DataTable Dictinct(this dt) => dt.DefaultView.ToTable(true);

Distinct by particular row(s) (Note that the columns mentioned in "distinctCulumnNames" will be returned in resulting DataTable):

public static DataTable Dictinct(this dt, params string[] distinctColumnNames) => 
dt.DefaultView.ToTable(true, distinctColumnNames);

Distinct by particular column (preserves all columns in given DataTable):

public static void Distinct(this DataTable dataTable, string distinctColumnName)
{
    var distinctResult = new DataTable();
    distinctResult.Merge(
                     .GroupBy(row => row.Field<object>(distinctColumnName))
                     .Select(group => group.First())
                     .CopyToDataTable()
            );

    if (distinctResult.DefaultView.Count < dataTable.DefaultView.Count)
    {
        dataTable.Clear();
        dataTable.Merge(distinctResult);
        dataTable.AcceptChanges();
    }
}

Burnisher answered 19/1, 2018 at 13:40 Comment(0)

H

0

You can use the DefaultView.ToTable method of a DataTable to do the filtering like this (adapt to C#):

 Public Sub RemoveDuplicateRows(ByRef rDataTable As DataTable)
    Dim pNewDataTable As DataTable
    Dim pCurrentRowCopy As DataRow
    Dim pColumnList As New List(Of String)
    Dim pColumn As DataColumn

    'Build column list
    For Each pColumn In rDataTable.Columns
        pColumnList.Add(pColumn.ColumnName)
    Next

    'Filter by all columns
    pNewDataTable = rDataTable.DefaultView.ToTable(True, pColumnList.ToArray)

    rDataTable = rDataTable.Clone

    'Import rows into original table structure
    For Each pCurrentRowCopy In pNewDataTable.Rows
        rDataTable.ImportRow(pCurrentRowCopy)
    Next
End Sub

Hexyl answered 19/1, 2018 at 18:55 Comment(0)

A

0

In order to distinct all datatable columns, you can easily retrieve the names of the columns in a string array

public static DataTable RemoveDuplicateRows(this DataTable dataTable)
{
    List<string> columnNames = new List<string>();
    foreach (DataColumn col in dataTable.Columns)
    {
        columnNames.Add(col.ColumnName);
    }
    return dataTable.DefaultView.ToTable(true, columnNames.Select(c => c.ToString()).ToArray());
}

As you can notice, I thought of using it as an extension to DataTable class

Admix answered 12/7, 2019 at 8:34 Comment(0)

D

0

I would prefer this as this is faster than DefaultView.ToTable and foreach loop to remove duplicates. Using this, we can have group by on multiple columns as well.

DataTable distinctDT = (from rows in dt.AsEnumerable() 
group rows by new { ColA = rows["ColA"], ColB = rows["ColB"]} into grp
select grp.First()).CopyToDataTable();

Dormant answered 2/3, 2021 at 17:46 Comment(0)

Recommended topics

Hot tags