I reviewed a lot of literature, but I dont found any information about deleting or insertion substrings into suffix tree. There are only Ukkonen's or McCreight's algorithms for building tree.
The poorest way is to rebuild tree after deleting or inserting a substring. But I think that exists a best way to do it.
For example (positions are counted from 0):
I have suffix tree with "abcdef" and I need to delete symbols from 1 to 3. And then I will have suffix tree with "aef". And then I need to add from position 1 string "as". And after this I will have suffix tree with "aasef".
Can you help me?
you are mixing two tasks in your question, first search for the character, second replace the character. Suffix tree does the first part search the character for you, now you need a second algorithm to replace that character with new character. As the characters are replaced the original suffix tree become invalid, so the tree has to be mapped again to do a second replacement.
What you need is two things, first "suffix array" this will give you more control over searching for characters and their location, second is the "cache algorithm" this will help you with replacement.
I have only just started working with suffix trees, so I might be wrong, but it seems like insertions or deletions can change the tree in pretty radical ways.
"abcdef" is a really trivial suffix tree:
abcdef
├a..$
├b..$
├c..$
├d..$
├e..$
└f$
Adding a 'g' at the end or deleting the 'a' in the beginning is incredibly easy.
But say we shove another 'a' in the middle:
abcadef
├a
│├b..$
│└d..$
├b
├c
├...
We have to go back and check every letter from the beginning to see if we need to insert a node based on this. Same if we have a character from the end:
abafef
├a
│├bafef$
│└fef$
├bafef$
├f
│├ef$
│└$
└ef$
If you now inserted something like "ef" to the end, you'd have to go through and add new nodes all over the place!
Inserting a character looks like it will involve re-examining every character in the string, ie, linear time. Since Ukkonen's algorithm already takes linear time, it shouldn't be worth it to use any dynamic insertion algorithm, you should just regenerate the tree from scratch every time with the confidence that this is still pretty good.
If you don't care about space, you could always cache each step of the tree generation algorithm, then when it comes time for an insertion or deletion at point x, just load up the tree as constructed up to point x.
© 2022 - 2024 — McMap. All rights reserved.