I want to take a text file and create a bigram of all words not separated by a dot ".", removing any special characters. I'm trying to do this using Spark and Scala.
This text:
Hello my Friend. How are
you today? bye my friend.
Should produce the following:
hello my, 1
my friend, 2
how are, 1
you today, 1
today bye, 1
bye my, 1