Define your rules:
// 1. A sentence Starts with a Capital letter
// 2. A sentence is preceded by nothing or [.!?], but not [,:;]
// 3. A sentence may be preceded by quotes if not formatted properly, such as ["']
// 4. A sentence may be incorrectly in this case if the word following a quote is a Name
Any additional Rules?
Define your Purpose:
// 1. Remove the last sentence
Assumptions:
If you started from the last character in the string of text and worked backwards, then you'd identify the beginning of the sentence as:
1. The string of text before the character is [.?!] OR
2. The string of text before the character is ["'] and preceded by a Capital letter
3. Every [.] is preceded by a space
4. We aren't correcting for html tags
5. These assumptions are not robust and will need to be adapted regularly
Possible Solution:
Read in your string and split it on the space character to give us chunks of strings to review in reverse.
var characterGroups = $('#this-paragraph').html().split(' ').reverse();
If your string is:
Blabla, some more text here. Sometimes basic html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."
var originalString = 'Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."';
Then your array in characterGroups
would be:
["insane."", ""Something", "as:", "it", "described", "later", "He",
"said.", "quickly", "she", "fence!",", "the", "past", "move", "should", "we",
"think", ""I", "know,", "not", "did", "She", "there?"", "up", "doing", "it",
"is", ""What", "mind:", "to", "came", "that", "thing", "first", "the", "asked",
"I", "over.", "flying", "plane", "a", "saw", "I", "and", "window", "the", "up",
"looked", "I", "harder!", "any", "sentence", "the", "of", ""selection"", "the",
"make", "not", "should", "that", "but", "used", "is", "code", "html", "basic",
"Sometimes", "here.", "text", "more", "some", "Blabla,"]
Note: the '' tags and others would be removed using the .text() method in jQuery
Each block is followed by a space, so when we have identified our sentence start position (by array index) we'll know what index the space had and we can split the original string in the location where the space occupies that index from the end of the sentence.
Give ourselves a variable to mark if we've found it or not and a variable to hold the index position of the array element we identify as holding the start of the last sentence:
var found = false;
var index = null;
Loop through the array and look for any element ending in [.!?] OR ending in " where the previous element started with a capital letter.
var position = 1,//skip the first one since we know that's the end anyway
elements = characterGroups.length,
element = null,
prevHadUpper = false,
last = null;
while(!found && position < elements) {
element = characterGroups[position].split('');
if(element.length > 0) {
last = element[element.length-1];
// test last character rule
if(
last=='.' // ends in '.'
|| last=='!' // ends in '!'
|| last=='?' // ends in '?'
|| (last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z]
) {
found = true;
index = position-1;
lookFor = last+' '+characterGroups[position-1];
} else {
if(element[0] == element[0].toUpperCase()) {
prevHadUpper = true;
} else {
prevHadUpper = false;
}
}
} else {
prevHadUpper = false;
}
position++;
}
If you run the above script it will correctly identify 'He' as the start of the last sentence.
console.log(characterGroups[index]); // He at index=6
Now you can run through the string you had before:
var trimPosition = originalString.lastIndexOf(lookFor)+1;
var updatedString = originalString.substr(0,trimPosition);
console.log(updatedString);
// Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said.
Run it again and get:
Blabla, some more text here. Sometimes basic html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?"
Run it again and get:
Blabla, some more text here. Sometimes basic html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over.
Run it again and get:
Blabla, some more text here. Sometimes basic html code is used but that should not make the "selection" of the sentence any harder!
Run it again and get:
Blabla, some more text here.
Run it again and get:
Blabla, some more text here.
So, I think this matches what you're looking for?
As a function:
function trimSentence(string){
var found = false;
var index = null;
var characterGroups = string.split(' ').reverse();
var position = 1,//skip the first one since we know that's the end anyway
elements = characterGroups.length,
element = null,
prevHadUpper = false,
last = null,
lookFor = '';
while(!found && position < elements) {
element = characterGroups[position].split('');
if(element.length > 0) {
last = element[element.length-1];
// test last character rule
if(
last=='.' || // ends in '.'
last=='!' || // ends in '!'
last=='?' || // ends in '?'
(last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z]
) {
found = true;
index = position-1;
lookFor = last+' '+characterGroups[position-1];
} else {
if(element[0] == element[0].toUpperCase()) {
prevHadUpper = true;
} else {
prevHadUpper = false;
}
}
} else {
prevHadUpper = false;
}
position++;
}
var trimPosition = string.lastIndexOf(lookFor)+1;
return string.substr(0,trimPosition);
}
It's trivial to make a plugin for it if, but beware the ASSUMPTIONS! :)
Does this help?
Thanks,
AE