I was bored on Thanksgiving break 2012 and answered the question and more. It will work on strings of equal length. It will work if they are not. I added a help, opt handling just for fun. I thought someone might find it useful.
If you are new to PERL add don't know. Don't add any code in your script below DATA to the program.
Have fun.
./diftxt -h
usage: diftxt [-v ] string1 string2
-v = Verbose
diftxt [-V|--version]
diftxt [-h|--help] "This help!"
Examples: diftxt test text
diftxt "This is a test" "this is real"
Place Holders: space = "·" , no charater = "ζ"
cat ./diftxt
----------- cut ✂----------
#!/usr/bin/perl -w
use strict;
use warnings;
use Getopt::Std;
my %options=();
getopts("Vhv", \%options);
my $helptxt='
usage: diftxt [-v ] string1 string2
-v = Verbose
diftxt [-V|--version]
diftxt [-h|--help] "This help!"
Examples: diftxt test text
diftxt "This is a test" "this is real"
Place Holders: space = "·" , no charater = "ζ"';
my $Version = "inital-release 1.0 - Quincey Craig 11/21/2012";
print "$helptxt\n\n" if defined $options{h};
print "$Version\n" if defined $options{V};
if (@ARGV == 0 ) {
if (not defined $options{h}) {usage()};
exit;
}
my $s1 = "$ARGV[0]";
my $s2 = "$ARGV[1]";
my $mask = $s1 ^ $s2;
# setup unicode output to STDOUT
binmode DATA, ":utf8";
my $ustring = <DATA>;
binmode STDOUT, ":utf8";
my $_DIFF = '';
my $_CHAR1 = '';
my $_CHAR2 = '';
sub usage
{
print "\n";
print "usage: diftxt [-v ] string1 string2\n";
print " -v = Verbose \n";
print " diftxt [-V|--version]\n";
print " diftxt [-h|--help]\n\n";
exit;
}
sub main
{
print "\nOrig\tDiff\tPos\n----\t----\t----\n" if defined $options{v};
while ($mask =~ /[^\0]/g) {
### redirect stderr to allow for test of empty variable with error message from substr
open STDERR, '>/dev/null';
if (substr($s2,$-[0],1) eq "") {$_CHAR2 = "\x{03B6}";close STDERR;} else {$_CHAR2 = substr($s2,$-[0],1)};
if (substr($s2,$-[0],1) eq " ") {$_CHAR2 = "\x{00B7}"};
$_CHAR1 = substr($s1,$-[0],1);
if ($_CHAR1 eq "") {$_CHAR1 = "\x{03B6}"} else {$_CHAR1 = substr($s1,$-[0],1)};
if ($_CHAR1 eq " ") {$_CHAR1 = "\x{00B7}"};
### Print verbose Data
print $_CHAR1, "\t", $_CHAR2, "\t", $+[0], "\n" if defined $options{v};
### Build difference list
$_DIFF = "$_DIFF$_CHAR2";
### Build mask
substr($s1,"$-[0]",1) = "\x{00B7}";
} ### end loop
print "\n" if defined $options{v};
print "$_DIFF, ";
print "Mask: \"$s1\"\n";
} ### end main
if ($#ARGV == 1) {main()};
__DATA__
substr
example with a benchmark? Then we could use it as a baseline against which to compare our potential solutions. Also, these aren't Unicode strings, right? (They seem like genetic information...) Will the input always be in a narrow subset of characters (i.e. [ACTG-])? – Findley