A trie seems like it would work for small strings, but not for large documents, so not sure (1-100's of pages of text). Maybe it is possible to combine an inverted index with a suffix tree to get the best of both worlds. Or perhaps using a b-tree with words stored as nodes, and a trie for each node. Not sure. Wondering what a good data structure would be (b-tree, linked-list, etc.).
I'm thinking of searching documents such as regular books, web pages, and source code, so the idea of storing just words in an inverted index doesn't seem quite right. Would be helpful to know if you need alternative solutions for each or if there is a general one that works for them all, or a combination of them.