Numbers of columns of arguments do not match
Asked Answered
H

2

5

I am using this example to conduct sentiment analysis of a collection of txt documents in R. The code is:

library(tm)
library(tidyverse)
library(tidytext)
library(glue)
library(stringr)
library(dplyr)
library(wordcloud)
require(reshape2)

files <- list.files(inputdir,pattern="*.txt")

GetNrcSentiment <- function(file){

    fileName <- glue(inputdir, file, sep = "")
    fileName <- trimws(fileName)
    fileText <- glue(read_file(fileName))
    fileText <- gsub("\\$", "", fileText) 

    tokens <- data_frame(text = fileText) %>% unnest_tokens(word, text)

    # get the sentiment from the first text: 
    sentiment <- tokens %>%
        inner_join(get_sentiments("nrc")) %>% # pull out only sentiment words
        count(sentiment) %>% # count the # of positive & negative words
        spread(sentiment, n, fill = 0) %>% # made data wide rather than narrow
        mutate(sentiment = positive - negative) %>% # positive - negative
        mutate(file = file) %>% # add the name of our file
        mutate(year = as.numeric(str_match(file, "\\d{4}"))) %>% # add the year
        mutate(city = str_match(file, "(.*?).2")[2]) 

    return(sentiment)
}

The .txt files are stored in inputdirand have names AB-City.0000, where AB is an abbreviation of a country, City is a city name and 0000 is year (ranges from 2000 to 2017).

The function works for a single file as expected, i.e. GetNrcSentiment(files[1]) gives me a tibble with proper counts per sentiment. However, when i try to run it for the whole set, i.e.

nrc_sentiments  <- data_frame()

for(i in files){
    nrc_sentiments <- rbind(nrc_sentiments, GetNrcSentiment(i))
}

I get the following error message:

Joining, by = "word"
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

The exact same code works well with longer documents, but gives an error when dealing with shorter texts. It seems that not all sentiments are found in small documents and as a result the number of columns vary for each document, which might lead to this error, but I am not sure. I would appreciate any advice on how to fix the problem. If a sentiment is not found, I would want the entry to be equal to zero (if it is the cause of my problem).

As an aside, bing sentiment function runs through about two dozen of files and gives a different error, which seems to point to the same problem (negative sentiment not found?):

GetBingSentiment <- function(file){
    fileName <- glue(inputdir, file, sep = "")
    fileName <- trimws(fileName)

    fileText <- glue(read_file(fileName))
    fileText <- gsub("\\$", "", fileText)       
    tokens <- data_frame(text = fileText) %>% unnest_tokens(word, text)

    # get the sentiment from the first text: 
    sentiment <- tokens %>%
        inner_join(get_sentiments("bing")) %>% # pull out only sentiment words
        count(sentiment) %>% # count the # of positive & negative words
        spread(sentiment, n, fill = 0) %>% # made data wide rather than narrow
        mutate(sentiment = positive - negative) %>% 
        mutate(file = file) %>% # add the name of our file
        mutate(year = as.numeric(str_match(file, "\\d{4}"))) %>% # add the year
        mutate(city = str_match(file, "(.*?).2")[2])

    # return our sentiment dataframe
    return(sentiment)
}

Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'negative' not found. 

EDIT: Following the recommendation by David Klotz I edited the code to

for(i in files){ nrc_sentiments <- dplyr::bind_rows(nrc_sentiments, GetNrcSentiment(i)) } 

As a result, instead of throwing an error the nrc generates NA if words from a certain sentiment are not found, however after 22 joinings i get a different error:

Error in mutate_impl(.data, dots) : Evaluation error: object 'negative' not found.

The same error shows up when run the bing function with dplyr. Both dataframes by the time the functions reaches 22nd document contain columns for all sentiments. What may cause the error and how to can diagnose it?

Hogg answered 12/6, 2018 at 15:40 Comment(0)
S
6

dplyr's bind_rows function is more flexible than rbind, at least when it comes to missing columns:

nrc_sentiments <- dplyr::bind_rows(nrc_sentiments, GetNrcSentiment(i))
Seicento answered 12/6, 2018 at 15:49 Comment(1)
Thank you. It did help, but when i ran the code for(i in files){ nrc_sentiments <- dplyr::bind_rows(nrc_sentiments, GetNrcSentiment(i)) } after a about 20 joinings i get a different error Error in mutate_impl(.data, dots) : Evaluation error: object 'negative' not found. The same error shows up when try the bing functionHogg
A
1

The input might be missing the "negative" column that is used in the expression

Avidin answered 12/6, 2018 at 22:30 Comment(1)
Thank you. The function runs through 22 documents before it gives an error. At that point, nrc_sentiments is a 22 x 14 tibble, i.e. it includes a column for each of 10 nrc sentiments, plus difference between positive and negative sentiment, file name, year and city. How can i further diagnose the issue?Hogg

© 2022 - 2024 — McMap. All rights reserved.