The other answer and comment already covered why the query was taking so long (you were pulling the entire table into the driver/R earlier than you wanted to), but I wanted to include an example that truly samples the data and another approach that gives you more control (i.e., one that uses SparkSQL). When working with Spark, I try to do all my heavy lifting as actual SQL queries so I would prefer option 2, but I included both in-case one is more helpful than the other.
library(sparklyr)
library(dplyr)
sc = spark_connect(method = "databricks")
tbl_change_db(sc, "prod")
# Option 1, using a fraction (proportion in this case) to pull a random sample
spark_read_table(sc, "signals", memory = FALSE) %>%
select(trip_identifier) %>%
sdf_sample(fraction = .0001, replacement = FALSE, seed = NULL) %>%
collect() %>% #this is not necessary, but it makes the pull-down to R explicit
pull(trip_identifier)
# Option 2, using SparkSQL to run the query as you intended (sampling 10 rows)
sc %>%
sdf_sql("SELECT trip_identifier FROM signals TABLESAMPLE (10 ROWS)") %>%
collect() %>% #this is not necessary, but it makes the pull-down to R explicit
pull(trip_identifier)