You might like to consider the FACTORIE toolkit (http://github.com/factorie/factorie). It is a general library for machine learning and graphical models that happens to include an extensive suite of natural language processing components (tokenization, token normalization, morphological analysis, sentence segmentation, part-of-speech tagging, named entity recognition, dependency parsing, mention finding, coreference).
Furthermore it is written entirely in Scala, and it is released under the Apache License.
Documentation is currently sparse, but will be improving in the coming months.
For example, once Maven-based installation is finished you can type at the command line:
bin/fac nlp --pos1 --parser1 --ner1
to launch a socket-listening multi-threaded NLP server. Then query it by piping plain text to its socket number:
echo "Mr. Jones took a job at Google in New York. He and his Australian wife moved from New South Wales on 4/1/12." | nc localhost 3228
The output is then
1 1 Mr. NNP 2 nn O
2 2 Jones NNP 3 nsubj U-PER
3 3 took VBD 0 root O
4 4 a DT 5 det O
5 5 job NN 3 dobj O
6 6 at IN 3 prep O
7 7 Google NNP 6 pobj U-ORG
8 8 in IN 7 prep O
9 9 New NNP 10 nn B-LOC
10 10 York NNP 8 pobj L-LOC
11 11 . . 3 punct O
12 1 He PRP 6 nsubj O
13 2 and CC 1 cc O
14 3 his PRP$ 5 poss O
15 4 Australian JJ 5 amod U-MISC
16 5 wife NN 6 nsubj O
17 6 moved VBD 0 root O
18 7 from IN 6 prep O
19 8 New NNP 9 nn B-LOC
20 9 South NNP 10 nn I-LOC
21 10 Wales NNP 7 pobj L-LOC
22 11 on IN 6 prep O
23 12 4/1/12 NNP 11 pobj O
24 13 . . 6 punct O
Of course there is a programmatic API to all this functionality as well.
import cc.factorie._
import cc.factorie.app.nlp._
val doc = new Document("Education is the most powerful weapon which you can use to change the world.")
DocumentAnnotatorPipeline(pos.POS1).process(doc)
for (token <- doc.tokens)
println("%-10s %-5s".format(token.string, token.posLabel.categoryValue))
will output:
Education NN
is VBZ
the DT
most RBS
powerful JJ
weapon NN
which WDT
you PRP
can MD
use VB
to TO
change VB
the DT
world NN
. .