I want to limit the rate when fetching data from kafka. My code looks like:
df = spark.read.format('kafka') \
.option("kafka.bootstrap.servers",'...')\
.option("subscribe",'A') \
.option("startingOffsets",'''{"A":{"0":200,"1":200,"2":200}}''') \
.option("endingOffsets",'''{"A":{"0":400,"1":400,"2":400}}''') \
.option("maxOffsetsPerTrigger",20) \
.load() \
.cache()
However when I call df.count()
, the result is 600. What I expected is 20. Does anyone knows why "maxOffsetsPerTrigger" doesn't work.