How to display word differences using c#?
Asked Answered
S

4

19

I would like to show the differences between two blocks of text. Rather than comparing lines of text or individual characters, I would like to just compare words separated by specified characters ('\n', ' ', '\t' for example). My main reasoning for this is that the block of text that I'll be comparing generally doesn't have many line breaks in it and letter comparisons can be hard to follow.

I've come across the following O(ND) logic in C# for comparing lines and characters, but I'm sort of at a loss for how to modify it to compare words.

In addition, I would like to keep track of the separators between words and make sure they're included with the diff. So if space is replaced by a hard return, I would like that to come up as a diff.

I'm using Asp.net to display the entire block of text including the deleted original text and added new text (both will be highlighted to show that they were deleted/added). A solution that works with those technologies would be appreciated.

Any advice on how to accomplish this is appreciated? Thanks!

Sphygmoid answered 19/12, 2009 at 21:13 Comment(1)
Have fun. A coworker of mine got to do this for a release of our flagship product line. He was sufficiently challenged by it. And he used several theory papers to guide his work... griping the entire time about the quality of the writing.Goodnatured
S
19

Microsoft has released a diff project on CodePlex that allows you to do word, character, and line diffs. It is licensed under Microsoft Public License (Ms-PL).

https://github.com/mmanela/diffplex

Sphygmoid answered 12/1, 2010 at 21:36 Comment(3)
DiffPlex lets you define a custom function for how to partition the text before it is diffed. You can use the method: DiffResult CreateCustomDiffs(string oldText, string newText, bool ignoreWhiteSpace, Func<string, string[]> chunker) where chunker tells DiffPlex what are the atomic units to compare against each other.Kelwunn
Hi Jim, I am looking for similar solution, wanted to know your view if using diffplex as is solved your issue?Incommensurate
Looks as though I used codeproject.com/Articles/11454/… for my solution. I don't recall why I used it over diffplex, tbh. This solution wraps the deleted and added words in defined html tags, allowing you to style as you want.Sphygmoid
N
1

Other than a few general optimizations, if you need to include the separators in the comparison you are essentially doing a character by character comparison with breaks. Though you could use the O(ND) you linked, you are going to make as many changes to it as you would basically writing your own.

The main problem with difference comparison is finding the continuation (if I delete a single word, but leave the rest the same).

If you want to use their code start with the example and do not write the deleted characters, if there are replaced characters in the same place, do not output this result. You then need to compute the longest continuous run of "changed" words, highlight this string and output.

Sorry thats not much of an answer, but for this problem the answer is basically writing and tuning the function.

Nomism answered 19/12, 2009 at 21:31 Comment(0)
S
0

Well String.Split with '\n', ' ' and '\t' as the split characters will return you an array of words in your block of text.

You could then compare each array for differences. A simple 1:1 comparison would tell you if any word had been changed. Comparing:

hello world how are you

and:

hello there how are you

would give you that world and changed to there.

What it wouldn't tell you was if words had been inserted or removed and you would still need to parse the text blocks character by character to see if any of the separator characters had been changed.

Strow answered 19/12, 2009 at 21:33 Comment(1)
I'm afraid that String.Split for large blocks of text will be inefficient.Toadflax
F
0

string string1 = "hello world how are you"; string string2 = "hello there how are you";

        var first = string1.Split(' ');
        var second = string2.Split(' ');
        var primary = first.Length > second.Length ? first : second;
        var secondary = primary == second ? first : second;
        var difference = primary.Except(secondary).ToArray();
Fabled answered 23/5, 2018 at 18:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.