Vectorization in Apache Mahout
Asked Answered
M

1

4

I am new to Mahout. I have a requirement to convert a text file to a vector for classification in later stage.

Could anybody of of shed some light on these below questions?

  1. How to convert a text file to a vector in mahout? The file format is like "username|comment about item|rating"
  2. The data will be few TBs. So which algorithm implementable I can use for classification using the vector I suppose to create?

Thanks, Arun

Mcardle answered 13/8, 2012 at 10:39 Comment(0)
C
2

You can check these 2 examples that also somewhat do/explain how to use the Sequence File API. Here and here

And you should definitely read this intro to text analysis

Cremate answered 14/8, 2012 at 8:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.