I am looking for an efficient solution to do find the longest possible substring in a string tolerating n mismatches in the main string
Eg: Main String
- AGACGTACTACTCTACTAGATGCA*TACTCTAC*
- AGACGTACTACTCTACTAGATGCA*TACTCTAC*
- AGACGTACTACTCTACAAGATGCA*TACTCTAC*
- AGACGTACTACTTTACAAGATGCA*TACTCTAC*
Search String:
- TACTCTACT : this should be considered a match to all of the above main strings.
Also I there could be the case where part of the substring is at the end of main string and I would like to pick that up also.
I would appreciate if you could give some pointers.
PS: I will have one search string and about 100 million main strings to search the substring for.
Thanks! -Abhi
TACTCTAC*
is a better match thanTACTTTACA
in your fourth example – Temekatemerity