I have a bunch of badly formatted text with lots of missing punctuation. I want to know if there was any method to segment text into sentences when periods, semi-colons, capitalization, etc. are missing.
For example, consider the paragraph: "the lion is called the king of the forest it has a majestic appearance it eats flesh it can run very fast the roar of the lion is very famous".
This text should be segmented as separate sentences:
- the lion is called the king of the forest
- it has a majestic appearance
- it eats flesh
- it can run very fast
- the roar of the lion is very famous
Can this be done or is it impossible? Any suggestion is much appreciated!