Randomizing text between delimiters
Asked Answered
W

3

6

I have this simple input

I have {red;green;orange} fruit and cup of {tea;coffee;juice}

I use Perl to identify patterns between two external brace delimiters { and }, and randomize the fields inside with the internal delimiter ;.

I'm getting this output

I have green fruit and cup of coffee

This is my working Perl script

perl -plE 's!\{(.*?)\}!@x=split/;/,$1;$x[rand@x]!ge' <<< 'I have {red;green;orange} fruit and cup of {tea;coffee;juice}'

My task is to process this input format

I have { {red;green;orange} fruit ; cup of {tea;coffee;juice} } and {nice;fresh} {sandwich;burger}.

As I understood, the script should skip external closing braces { ... } in the first text part, which has text inside with opening and closing brackets:

{ {red;green;orange} fruit ; cup of {tea;coffee;juice} }

It should choose a random part, like this

{red;green;orange} fruit

or

cup of {tea;coffee;juice}

Then it goes deeper:

green fruit

After all text is processed, the result may be any of the following

I have red fruit and fresh burger.
I have cup of tea and nice sandwich
I have green fruit and nice burger.
I have cup of coffee and fresh burger.

The script should parse and randomize the next text too. For example

This {beautiful;perfect} {image;photography}, captured with the { {NASA;ESA} Hubble Telescope ; {NASA;ESA} Hubble Space Telescope} }, is the {largest;sharpest} image ever taken of the Andromeda galaxy { {— otherwise known as M31;— known as M31}; [empty here] }.
This is a cropped version of the full image and has 1.5 billion pixels. { You would need more than {600;700;800} HD television screens to display the whole image. ; If you want to display the whole image, you need to download more than {1;2} Tb. traffic and use 800 HD displays }

An example output could be

This beautiful image, captured with the NASA Hubble Telescope, is the
sharpest image ever taken of the Andromeda galaxy — otherwise known as
M31.
This is a cropped version of the full image and has 1.5 billion
pixels. You would need more than 700 HD television screens to display
the whole image.
Wack answered 24/12, 2015 at 13:2 Comment(0)
R
2

Nice challenge. What you need to do is to find a set of braces without interior braces, and pick a random item from in there. You need to do that globally. That will replace just the "level 1" braces. You need to loop over the string until no more matches are found.

use v5.18;
use strict;
use warnings;

sub rand_sentence {
    my $copy = shift;
    1 while $copy =~ s{ \{ ([^{}]+) \} } 
                      { my @words = split /;/, $1; $words[rand @words] }xsge;
    return $copy;
}

my $str = 'I have { {red;green;orange} fruit ; cup of {tea;coffee;juice} } and {nice;fresh} {sandwich;burger}.';
say rand_sentence($str);
say '';

$str = <<'END';
This {beautiful;perfect} {image;photography}, captured with the { {NASA;ESA}
Hubble Telescope ; {NASA;ESA} Hubble Space Telescope }, is the
{largest;sharpest} image ever taken of the Andromeda galaxy { {— otherwise
known as M31;— known as M31}; [empty here] }. This is a cropped version of the
full image and has 1.5 billion pixels. { You would need more than {600;700;800}
HD television screens to display the whole image. ; If you want to display the
whole image, you need to download more than {1;2} Tb.  traffic and use 800 HD
displays }
END

say rand_sentence($str);

sample output

I have  orange fruit  and fresh sandwich.

This beautiful photography, captured with the  ESA Hubble Space Telescope , is the
largest image ever taken of the Andromeda galaxy  — otherwise
known as M31. This is a cropped version of the
full image and has 1.5 billion pixels.  If you want to display the
whole image, you need to download more than 1 Tb.  traffic and use 800 HD
displays
Rocambole answered 24/12, 2015 at 14:49 Comment(2)
Why do you use srand?Niles
I thought there was a good reason, but (reading the docs) no.Rocambole
P
3

Going non-greedy is a good thought, but doesn't quite do the trick. And you can add a loop:

perl -plE 'while(s!\{([^{}]*)\}!@x=split/;/,$1;$x[rand@x]!ge){}'

Notice that your sample input has unmatched braces, so this appears to output a spurious '}'

Perloff answered 24/12, 2015 at 13:32 Comment(0)
R
2

Nice challenge. What you need to do is to find a set of braces without interior braces, and pick a random item from in there. You need to do that globally. That will replace just the "level 1" braces. You need to loop over the string until no more matches are found.

use v5.18;
use strict;
use warnings;

sub rand_sentence {
    my $copy = shift;
    1 while $copy =~ s{ \{ ([^{}]+) \} } 
                      { my @words = split /;/, $1; $words[rand @words] }xsge;
    return $copy;
}

my $str = 'I have { {red;green;orange} fruit ; cup of {tea;coffee;juice} } and {nice;fresh} {sandwich;burger}.';
say rand_sentence($str);
say '';

$str = <<'END';
This {beautiful;perfect} {image;photography}, captured with the { {NASA;ESA}
Hubble Telescope ; {NASA;ESA} Hubble Space Telescope }, is the
{largest;sharpest} image ever taken of the Andromeda galaxy { {— otherwise
known as M31;— known as M31}; [empty here] }. This is a cropped version of the
full image and has 1.5 billion pixels. { You would need more than {600;700;800}
HD television screens to display the whole image. ; If you want to display the
whole image, you need to download more than {1;2} Tb.  traffic and use 800 HD
displays }
END

say rand_sentence($str);

sample output

I have  orange fruit  and fresh sandwich.

This beautiful photography, captured with the  ESA Hubble Space Telescope , is the
largest image ever taken of the Andromeda galaxy  — otherwise
known as M31. This is a cropped version of the
full image and has 1.5 billion pixels.  If you want to display the
whole image, you need to download more than 1 Tb.  traffic and use 800 HD
displays
Rocambole answered 24/12, 2015 at 14:49 Comment(2)
Why do you use srand?Niles
I thought there was a good reason, but (reading the docs) no.Rocambole
C
0

TXR solution. There are many ways to approach this.

Let's assume we're reading the data from standard input. How about we read the data in records which are delimited not by the usual newline character, but rather the braced-choices pattern? We do this by creating a record adapter object over the standard input stream. The third argument to the record-adapter function is a Boolean which indicates that we want to keep the terminating delimiter (the part which matches the record-delimiting regex).

Thus if the data looks like this foo bar {bra;ces} xyzzy {a;b;c} d\n it turns into these records: foo bar {bra;ces}, xyzzy {a;b;c} and d\n.

We then process these records as if they were lines of text using the extraction language. They fall into two patterns: lines which end in the braces pattern, and lines which don't. The latter are just echoed. The former are treated as required by the random brace substitution.

We also initialize *random-state* so that the PRNG is seeded to produce a different pseudo-random sequence on each run. If make-random-state is given no arguments, it creates a random state object initialized from system parameters like the process ID and system time:

@(do (set *random-state* (make-random-state)))
@(next @(record-adapter #/{[\w;]+}/ *stdin* t))
@(repeat)
@  (cases)
@*text{@switch}
@    (do (put-string `@text@(first (shuffle (split-str switch ";")))`))
@  (or)
@text
@    (do (put-string text))
@  (end)
@(end)

Test run:

$ cat data
I have {red;green;orange} fruit and cup of {tea;coffee;juice}.
$ txr rndchoose.txr < data
I have red fruit and cup of tea.
$ txr rndchoose.txr < data
I have orange fruit and cup of tea.
$ txr rndchoose.txr < data
I have green fruit and cup of coffee.
Chuu answered 9/5, 2016 at 22:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.