How to prep transaction data into basket for arules
Asked Answered
J

2

10

Ok, so I have searched a lot and want to run arules on sales data. I just need to properly get the data in the right format and set up with the correct "factors" or "variables" and in basket form.

Right now I have sales data with the Order# and then the items inside that. Each order is unique (every new order, a new # gets created and includes the part#), but the same items obviously can appear in many orders.

Currently, my data is set up like this:

Order#    Part#   PartDescription
1         A       PartA
1         B       PartB
1         G       PartG
2         R       PartR
3         A       PartA
3         B       PartB
4         E       PartE
5         Y       PartY
6         A       PartA
6         B       PartB
6         F       PartF
6         V       PartV

So, R doesn't like it in this form, and I have to get it in the form that arules and data analysis will accept.

Yes I save it as a text file and have tried a .csv file, but if I can get step by step instructions on how to prep it or manipulate it in RStudio that'd be great.

I read that it's suppose to be in a basket form such as..

1 (A, B, G)
2 (R)
3 (A, B)
4 (E)
5 (Y)
6 (A, B, F, V)

If that's not accurate please correct me. I get the idea but I just need step by step instructions which I can't seem to find anywhere. I've tried using dplyr and tidyr. I have a good understanding of data analysis but need more direct help on RStudio, so if I could just have that step by step I will understand this further.

Jewfish answered 7/10, 2015 at 16:16 Comment(6)
I also have the data mining plugin for Excel, so if I can do any preparation in there let me know. Thank you.Jewfish
I'm assuming you at least have the data loaded into r as a data.frame? If not, try data <- read.csv("myfile.csv", comment.char="")Foliar
I simply clicked "Import Dataset" and the response below so far is putting my data into the correct basket format. Do I need to load it into r as a data.frame to avoid further problems? What exact way should I load it into r? It's a text file from Excel, should it be .csv? My dad appears in the correct columns/rows. What import settings should I select? Thank you!Jewfish
if the below code is running, it is a data.frame. When you import your data using rstudio import, the command to redo it turns up in the console - it should be something similar to what I had above.Foliar
When importing it using rstudio import, the command it shows is.. > Sales <- read.csv("Sales.csv")Jewfish
I tested a small portion of my Sales data both as a .csv file and your code to load it in as a data.frame, as well as the data in a txt file from Excel and using the Import Dataset in R and both work using the code posted by jeremycg. So I can confirm it works either as a text or csv. Doing the full Sales dataset either way gives me an error for putting it into a basket (see jeremycg below). Thank you.Jewfish
F
9

Take a look at the help page for the "transactions" data type for examples on how to get your data in:

library(arules)
?transactions

For your type, you want to split by Order, then use as to get it into a transactions list:

trans <- as(split(data[,"Part"], data[,"Order"]), "transactions")
inspect(trans)
  items     transactionID
1 {A,B,G}   1            
2 {R}       2            
3 {A,B}     3            
4 {E}       4            
5 {Y}       5            
6 {A,B,F,V} 6   
Foliar answered 7/10, 2015 at 16:39 Comment(4)
Thank you! It ran it with the test dummy data. Now I'm running it on the real data (282,292 entries). If I have further questions on preparation, I'll look and come back here if I haven't found any. But the main question is answered from what I can tell. Will let it run on the larger data set for now. Thank you! I'm surprised I could not find that anywhere really. So simple!Jewfish
So, it works on the test dummy data I made on this forum. But using my real data, doing it the same way, just using the real data, I get this error "Error in asMethod(object) : can not coerce list with transactions with duplicated items" I don't get why? There are duplicate items in the dummy data, and it puts it in a basket for me, so why isn't it doing it the same for my data? I thought it baskets together based on the duplicated Order# and puts in the Part# into the basket, just like my dummy data. It's literally layed out the same way as the dummy data.Jewfish
Wait, I found out that one of the Part#'s is one of the Order#'s so they're duplicating there. I ran the trans command on Order# and PartDescription and that runs for longer, but ends up with the same error. I checked in Excel to see where the Part# equals the Order# and changed the order# to a unique value. I still get the error stated in my previous comment..Jewfish
transactions shouldn't have the same item listed twice - ie if order 1 has "A" in it twice, you'll get an error. Try running data <- unique(data[ , 1:2 ] ) first to remove the doubles. If that doesn't fix it, ask another question - the comments aren;t a good space for trouble shooting another problem.Foliar
T
1

I've had a lot of trouble with coercion (e.g., 'as(dataname, "transactions"..).

I believe that this is due to the fact that I have duplicate records (i.e., the same item purchased more than once in the same transation, when the data is in 'single' format).

This is what finally worked for me:

Transactions<- read.transactions("Data with tx ids, item names, in
                      single format.csv", 
                      rm.duplicates= TRUE, sep=",",
                      format = "single", cols = c(7,9));

(tx id in column 7, item names in column 9)

Tide answered 19/10, 2015 at 21:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.