I would like to modify the script below so that it creates paragraphs out of a random number of the sentences generated by the script. In other words, concatenate a random number (like 1-5) of sentences before adding a newline.
The script works fine as-is, but the output is short sentences separated by a newline. I would like to gather up some sentences into paragraphs.
Any ideas on best practices? Thanks.
from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)])
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s%s" % (" ".join(sentence), newword, sentencesep))
sentence = []
sentencecount += 1
w1, w2 = w2, newword
EDIT 01:
Okay, I have cobbled together a simple "paragraph wrapper," which works well to gather the sentences into paragraphs, but it messed with the output of the sentence generator - I'm getting excessive repetitiveness of the first words, for example, among other issues.
But the premise is sound; I just need to figure out why the functionality of the sentence loop was affected by the addition of the paragraph loop. Please advise if you can see the problem:
# usage: $ python markov_sentences.py < input.txt > output.txt
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
paragraphsep = "\n\n" #String used to seperate sentences
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
maxparagraphs = 10
paragraphs = 0 # reset the outer 'while' loop counter to zero
while paragraphs < maxparagraphs: # start outer loop, until maxparagraphs is reached
w1 = stopword
w2 = stopword
stopsentence = (".", "!", "?",)
sentence = []
sentencecount = 0 # reset the inner 'while' loop counter to zero
maxsentences = random.randrange(1,5) # random sentences per paragraph
while sentencecount < maxsentences: # start inner loop, until maxsentences is reached
newword = random.choice(table[(w1, w2)]) # random word from word table
if newword == stopword: sys.exit()
elif newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentencecount += 1 # increment the sentence counter
w1, w2 = w2, newword
print (paragraphsep) # newline space
paragraphs = paragraphs + 1 # increment the paragraph counter
EDIT 02:
Added sentence = []
as per answer below into elif
statement. To wit;
elif newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = [] # I have to be here to make the new sentence start as an empty list!!!
sentencecount += 1 # increment the sentence counter
EDIT 03:
This is the final iteration of this script. Thanks to grieve for the help in sorting this out. I hope others can have some fun with this, I know I will. ;)
FYI: There is one small artifact - there is an extra end-of-paragraph space that you might want to clean up if you use this script. But, other than that, a perfect implementation of markov chain text generation.
# usage: python markov_sentences.py < input.txt > output.txt
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
paragraphsep = "\n"
count = random.randrange(1,5)
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)]) # random word from word table
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = []
sentencecount += 1 # increment the sentence counter
count -= 1
if count == 0:
count = random.randrange(1,5)
print (paragraphsep) # newline space
w1, w2 = w2, newword