Given an NLP parse tree like
(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))
Original sentence is "You could say that they regularly catch a shower, which adds to their exhilaration and joie de vivre."
How could the clauses be extracted and reverse engineered? We would be splitting at S and SBAR (to preserve the type of clause, eg subordinated)
- (S (NP (PRP You)) (VP (MD could) (VP (VB say)
- (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower))
- (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to)
(NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW
de) (FW vivre))))))))))))) (. .)))
to arrive at
- You could say
- that they regularly catch a shower
- , which adds to their exhilaration and joie de vivre.
Splitting at S and SBAR seems very easy. The problem seems to be stripping away all the POS tags and chunks from the fragments.