Basic - T-Test -> Grouping Factor Must have Exactly 2 Levels
Asked Answered
M

4

11

I am relatively new to R. For my assignment I have to start by conducting a T-Test by looking at the effect of a politician's (Conservative or Labour) wealth on their real gross wealth and real net wealth. I have to attempt to estimate the effect of serving in office wealth using a simple t-test.

The dataset is called takehome.dta

Labour and Tory are binary where 1 indicates that they serve for that party and 0 otherwise.

The variables for wealth are lnrealgross and lnrealnet.

I have imported and attached the dataset, but when I attempt to conduct a simple t-test. I get the following message "grouping factor must have exactly 2 levels." Not quite sure where I appear to be going wrong. Any assistance would be appreciated!

Magnitogorsk answered 2/4, 2015 at 20:1 Comment(1)
Please add sample data and show your code (see these guidelines for making a reproducible example.Yachting
E
18

are you doing this:

t.test(y~x)

when you mean to do this

t.test(y,x)

In general use the ~ then you have data like

y <- 1:10
x <- rep(letters[1:2], each = 5)

and the , when you have data like

y <- 1:5
x <- 6:10

I assume you're doing something like:

y <- 1:10
x <- rep(1,10)
t.test(y~x) #instead of t.test(y,x)

because the error suggests you have no variation in the grouping factor x

Exmoor answered 2/4, 2015 at 21:0 Comment(3)
I just tried it with t.test(y,x) as opposed to t.test(y~x), and it worked! Thanks!Magnitogorsk
what is the difference between ~ and , operators for this example?Televise
hey @cuneytyvz to see the difference try: x <- rnorm(10); y <- 1:10; par(mfrow=c(1,2)); plot(y,x);plot(y ~ x)Exmoor
D
5

The differences between ~ and , is the type of statistical test you are running. ~ gives you the mean differences. This is for dependent samples (e.g. before and after). , gives you the difference in means. This is for independent samples (e.g. treatment and control). These two tests are not interchangeable.

Dionysius answered 26/5, 2017 at 14:32 Comment(1)
This is incorrect. The switch paired in the function call is what achieves the dependent/independent samples distinction, it's unrelated to whether you indicate your samples through two numeric vectors t.test(x, y, paired=T) or through a longer vector with a factor vector with two levels df <- data.frame(z = c(x,y), f = rep("a", "b", each = length(x))); t.test("z ~ f", paired = T, data = df). See this sthda easy guide on the topicDonn
D
1

I was having a similar problem and did not realize given the size of my dataset that one of my y's had no values for one of my levels. I had taken a series of gene readings for two groups and one gene had readings only for group 2 and not group 1. I hadn't even noticed but for some reason this presented with the same error as what I would get if I had too many levels. The solution is to remove that y or in my case gene from my analysis and then the error is solved.

Drysalter answered 23/7, 2020 at 20:17 Comment(0)
P
0

You have to have one independent variable with only two groups, for example group 1 and group 2. The independent variable is nominal. For example, you call the independent variable groups.

Then you also have the dependent variable, which is metric. For example, you have a dependent variable, which is weight. That is metric. The data frame is called numbers.

In R you type:

t.test(dataframe$dependent variable~dataframe$independent variable, var.equal = TRUE, alternative = "two.sided")

t.test(numbers$weight~numbers$groups, var.equal = TRUE, alternative = "two.sided")

If the variances are not equal you write:

var.equal = FALSE

If you have a directed hypothesis you write:

alternative = "greater"

or

alternative = "less"
Piroshki answered 25/2 at 11:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.