Any Latent Semantic Indexing?
Asked Answered
B

6

6

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.

Burmese answered 17/11, 2009 at 4:21 Comment(1)
Thanks for adding the comments about jLSI.Lalise
W
5

Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations.

Wild answered 26/12, 2009 at 21:23 Comment(0)
D
1

A google search for java LSI leads to a similar question that recommends SemanticVectors. A package built on top of Lucene that is 'similar' to LSI. I don't know if it's closer than the jLSI implementation.

That thread also mentions that LSI is patented and there aren't a lot of implementations of it. So if you need a standard implementation you may have to use a language other than java.

Doradorado answered 7/12, 2009 at 19:46 Comment(0)
C
1

The S-Space Package has an open source version of LSA, with bindings for the LSI document vectors. (Both approaches operate on the same term-document matrix and are equivalent except in the output.) It's a fairly scalable approach that uses the thin-SVD. I've used it to run LSI on all of Wikipedia with no issue (after removing the infrequent terms with less than 5 occurrences).

As Scott Ray mentioned, the SemanticVectors package also has a good LSI implementation that recently switched to using the same thin-SVD (SVDLIBJ), so you might check that out as if you hadn't before.

Cannice answered 17/3, 2011 at 0:31 Comment(0)
W
1

a google search for NLP tools provide this slides which i think helps ...

Waterbuck answered 2/5, 2012 at 7:13 Comment(0)
C
0

I believe that LSA/LSI was patented in 1989, which means the patent should have just expired. Hopefully we will see some nice open source applications soon.

Collection answered 17/5, 2010 at 21:43 Comment(0)
C
0

Have you tried the Semantic Vector package?

http://code.google.com/p/semanticvectors/

Crinkle answered 10/8, 2011 at 12:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.