Adding column to sqlite database
Asked Answered
M

1

7

I am trying to add a vector which I generated in R to a sqlite table as a new column. For this I wanted to use dplyr (I installed the most recent dev. version along with the dbplyr package according to this post here). What I tried:

library(dplyr)
library(DBI) 

#creating initial database and table
dbcon      <- dbConnect(RSQLite::SQLite(), "cars.db") 
dbWriteTable(dbcon, name = "cars", value = cars)
cars_tbl <- dplyr::tbl(dbcon, "cars")

#new values which I want to add as a new column 
new_values <- sample(c("A","B","C"), nrow(cars), replace = TRUE) 

#attempt to add new values as column to the table in the database
cars_tbl %>% mutate(new_col = new_values) #not working

What is an easy way to achieve this (not necessarily with dplyr)?

Merits answered 14/5, 2017 at 19:24 Comment(1)
dplyr is designed to avoid modification to the input data. But you could create a new table based on an existing table/data frame and a mutate() transformation.Strephonn
B
10

Not aware of a way of doing this with dyplr, but you can do it with RSQLite directly. The problem is not actually with RSQLite, but the fact that I don't know how to pass a list to mutate. Note that, in your code, something like this would work:

cars_tbl %>% mutate(new_col = another_column / 3.14)

Anyway, my alternative. I've created a toy cars dataframe.

cars <- data.frame(year=c(1999, 2007, 2009, 2017), model=c("Ford", "Toyota", "Toyota", "BMW"))

I open connection and actually create the table,

dbcon <- dbConnect(RSQLite::SQLite(), "cars.db")
dbWriteTable(dbcon, name = "cars", value = cars)

Add the new column and check,

dbGetQuery(dbcon, "ALTER TABLE cars ADD COLUMN new_col TEXT")
dbGetQuery(dbcon, "SELECT * FROM cars")
  year  model new_col
1 1999   Ford    <NA>
2 2007 Toyota    <NA>
3 2009 Toyota    <NA>
4 2017    BMW    <NA>

And then you can update the new column, but the only tricky thing is that you have to provide a where statement, in this case I use the year.

new_values <- sample(c("A","B","C"), nrow(cars), replace = TRUE) 
new_values
[1] "C" "B" "B" "B"

dbGetPreparedQuery(dbcon, "UPDATE cars SET new_col = ? where year=?",
                   bind.data=data.frame(new_col=new_values,
                                        year=cars$year))

dbGetQuery(dbcon, "SELECT * FROM cars")
  year  model new_col
1 1999   Ford       C
2 2007 Toyota       B
3 2009 Toyota       B
4 2017    BMW       B

As a unique index, you could always use rownames(cars), but you would have to add it as a column in your dataframe and then in your table.

EDIT after suggestion by @krlmlr: indeed much better using dbExecute instead of deprecated dbGetPreparedQuery,

dbExecute(dbcon, "UPDATE cars SET new_col = :new_col where year = :year",
          params=data.frame(new_col=new_values,
                            year=cars$year))

EDIT after comments: I did not think about this a few days ago, but even if it is a SQLite you can use the rowid. I've tested this and it works.

dbExecute(dbcon, "UPDATE cars SET new_col = :new_col where rowid = :id",
          params=data.frame(new_col=new_values,
                            id=rownames(cars)))

Although you have to make sure that the rowid's in the table are the same as your rownames. Anyway you can always get your rowid's like this:

dbGetQuery(dbcon, "SELECT rowid, * FROM cars")
  rowid year  model new_col
1     1 1999   Ford       C
2     2 2007 Toyota       B
3     3 2009 Toyota       B
4     4 2017    BMW       B
Boltrope answered 15/5, 2017 at 11:30 Comment(5)
dbGetPreparedQuery() is deprecated, you should be able to use dbExecute(..., params = data.frame()) instead.Strephonn
Thanks for the answer. I got one problem with your solution. The only field in the database table with unique values is the id which corresponds almost to rownames(df). The difference is a different data type. In the database the type is ident and in R it is character. I also tried it with numeric but both does not work. How can I fix that?Merits
Hi @Merits I've modified my answer, if you still find issues just let me know.Boltrope
Thanks! I figured out that it already worked before. I tried it on the diamond data set (availble via ggplot2) with >50000 rows. It lasts a couple of minutes and hence I always thought R broke. Is it normal that it takes so much time?Merits
Hi. It's always difficult to assess performance, it depends so much e.g. in your machine. Anyway if you are updating 50K registers, that's quite a lot. For example this question has some tips on how to improve performance, it's not for R but some of the tips may apply. I'll try to take a look during the weekend and let you know. Thanks.Boltrope

© 2022 - 2024 — McMap. All rights reserved.