Efficient list of unique strings C#
Asked Answered
K

7

102

What is the most efficient way to store a list of strings ignoring any duplicates? I was thinking a dictionary may be best inserting strings by writing dict[str] = false; and enumerating through the keys as a list. Is that a good solution?

Klondike answered 28/5, 2009 at 1:13 Comment(0)
A
122

If you are using .NET 3.5, the HashSet should work for you.

The HashSet<(Of <(T>)>) class provides high performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.

Assessment answered 28/5, 2009 at 1:17 Comment(3)
But a HashSet will lose the order of items. A feature a List provides.Wishful
Additional: There is also SortedSet<T> which is a convenient sorted HashSet.Tirewoman
Also note that HashSet cannot be accessed through indice, only through an enumerator as oppose to a List.Warhol
J
33

You can look to do something like this

var hash = new HashSet<string>();
var collectionWithDup = new []{"one","one","two","one","two","zero"}; 

// No need to check for duplicates as the Add method
// will only add it if it doesn't exist already
foreach (var str in collectionWithDup)
    hash.Add(str);   
Janus answered 28/5, 2009 at 3:4 Comment(2)
You don't need the Contains check with a HashSet. You can just call the Add method directly and it will return true or false depending on whether or not the item already exists.Mattingly
Answer should be edited to remove the call to redundant Contains. This all you need for the above example to work: var collectionWithDup = new[] { "one", "one", "two", "one", "two", "zero" }; var uniqueValues = new HashSet<string>(collectionWithDup);Acidic
S
15

I'm not sure if this counts as a good answer, but when faced with the need for a unique set that maintains insertion order, I compromised with a HashSet and a List side-by-side. In this case, whenever you add to the set, do the following:

if(hashSet.Add(item))
    orderList.Add(item);

When removing items, make sure to remove them from both. Thus, as long as you can be sure that nothing else added items to the list, you'll have an insertion-ordered unique set!

Sabinasabine answered 13/6, 2012 at 9:28 Comment(0)
I
15

You could also use Linq as in:

using System.Linq;

var items = new List<string>() { "one", "one", "two", "one", "two", "zero" };

List<string> distinctItems = items.Distinct().ToList();
Ison answered 2/1, 2018 at 14:10 Comment(1)
The official documentation states that "the Distinct<TSource>(IEnumerable<TSource>) method returns an unordered sequence that contains no duplicate values" (emphasis added by me). So you can't use Distinct if the order of the values is important.Faxan
T
10

Use HashSet, no need to check .Contains() , just add your items in list and if its duplicate it will not add it.

   HashSet<int> uniqueList = new HashSet<int>();
   uniqueList.Add(1); // List has values 1
   uniqueList.Add(2);  // List has values 1,2
   uniqueList.Add(1);  // List has values 1,2
   Console.WriteLine(uniqueList.Count); // it will return 2
Thrust answered 21/2, 2014 at 8:56 Comment(0)
D
2

This is not part of the the system namespace but have used the Iesi.Collections from http://www.codeproject.com/KB/recipes/sets.aspx with NHibernate. It has support for hashed set along with sorted set, dictionary set, and so on. Since it has been used with NHibernate it has been used extensively and very stable. This also does not require .Net 3.5

Dimissory answered 28/5, 2009 at 1:42 Comment(0)
C
2

Here is another solution without using the HashSet.

var items = new List<string>() { "one", "one", "two", "one", "two", "zero" };
var uniqueItems = items.Where((item, index) => items.IndexOf(item) == index);

It was adopted from this thread: javascript - Unique values in an array

Test:

using FluentAssertions;

uniqueItems.Count().Should().Be(3);
uniqueItems.Should().BeEquivalentTo("one", "two", "zero");

Performance test for List, HashSet and SortedSet. 1 million iterations:

List: 564 ms
HashSet: 487 ms
SortedSet: 1932 ms

Test source code (gist)

Clavate answered 4/8, 2016 at 10:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.