Use brain.js neural network to do text analysis

Asked 5/5, 2016 at 6:10 Answered 17/3, 2018 at 19:18

Solved neural-network text-analysis brain.js

I'm trying to do some text analysis to determine if a given string is... talking about politics. I'm thinking I could create a neural network where the input is either a string or a list of words (ordering might matter?) and the output is whether the string is about politics.

However the brain.js library only takes inputs of a number between 0 and 1 or an array of numbers between 0 and 1. How can I coerce my data in such a way that I can achieve the task?

Blinders answered 5/5, 2016 at 6:10 Comment(0)

new brain.recurrent.LSTM();

this does the trick for you.

Example,

var brain = require('brain.js')
var net = new brain.recurrent.LSTM();
net.train([
  {input: "my unit-tests failed.", output: "software"},
  {input: "tried the program, but it was buggy.", output: "software"},
  {input: "i need a new power supply.", output: "hardware"},
  {input: "the drive has a 2TB capacity.", output: "hardware"},
  {input: "unit-tests", output: "software"},
  {input: "program", output: "software"},
  {input: "power supply", output: "hardware"},
  {input: "drive", output: "hardware"},
]);

console.log("output = "+net.run("drive"));


output = hardware

refer to this link=> https://github.com/BrainJS/brain.js/issues/65 this has clear explanation and usage of brain.recurrent.LSTM()

Housebound answered 17/3, 2018 at 19:18 Comment(6)

The reason this works, and works well, is because each character represents a neuron in the net. Once you offset a representation of the net's values via a representative neuron, you can feed pretty much anything into a neural network. – Radiotelegram 23/6, 2018 at 0:8

Hear that? ...that's the sound of my mind exploding. Thank you for your answer! – Lens 8/7, 2018 at 7:57

@Lens glad that it helped – Housebound 23/10, 2018 at 9:3

Is there a known limit of how many categories (two categories in this case) you can have, where this approach fails if there are to many? – Carreon 8/1, 2019 at 20:39

@RobertPlummer would not call this 'working well' if you input buy me a driver it will just print out text character. – Ellamaeellan 29/8, 2022 at 21:51

@Ellamaeellan It does work well, it's just that the data provided above isn't enough, add more data that is descriptive and accurate, get better trained model. – Schiffman 14/4, 2023 at 15:8

You need to come up with the model to convert your data to a list of tuples [input, expected_output], where input is a list of numbers between 0 and 1 representing the given words, and output is one number between 0 and 1 representing how close the sentence is to your objective analysis (being political). For example, for the sentence "The quick brown cat jumped over the lazy dog" you might want to give a score of zero. A sentence like "President shakes off corruption scandal" you might want to give a score very close to one.

As you can see, your biggest challenge is actually obtaining the data and cleaning it. Converting it to the training format is easy, you could just hash words into numbers between 0 and 1, and make sure to handle different casing, punctuation, and you might want to step words to get the best results.

One more thing, you can use a term relevance algorithm to rank the importance of words in your training data set, so that you can choose only the top k relevant words in a sentence, since you need uniform data size for each sentence.

Notum answered 5/5, 2016 at 6:26 Comment(4)

I don't think this would work because the number between 0 and 1 is supposed to be continuous. Meaning "fox" might hash to 0.492 and "president" might hash to 0.493 and to the neural net these inputs are really similar but in reality they aren't. I'm looking into NLP now. – Blinders 5/5, 2016 at 7:46

@arasmussen it doesn't matter if the hashes are close for different words, as long as they're different. The NN only needs to get different numbers for different words, then it'll do the association on its own. Your only problem here is if "fox" and "president" somehow hash to the exact same value, but you can get around that if you choose a good hash function. – Notum 5/5, 2016 at 19:15

I don't think that's correct. Do you have a source? – Blinders 7/5, 2016 at 2:15

Unfortunately I don't, it's just my intuition. NN isn't the best tool for this sort of thing anyway, but it would be good to give it a try and see what comes up. Have some fun using NLTK or some similar tools to lemmatize the text and feed it to the NN and see what comes out. – Notum 7/5, 2016 at 6:56

So apparently text doesn't coerce very well to NN input.

A Naive Bayes Classifier looks like exactly what I want. https://github.com/harthur/classifier

Blinders answered 5/5, 2016 at 8:14 Comment(0)

Recommended topics

Hot tags