I created a Spark Structured Streaming application using pyspark. I was not running any cluster of spark, just launching pyspark application using submit-job script. In my application I am using Kafka streaming endpoints (for both reader + writer). Given are three scenarios.
NOTE: All of my configurations, topic names, partition counts everything was same in all three of these scenarios.
pyspark application is running on host machine. Kafka connectors were connecting just fine and my application was running smooth, no issue at all
I have created container of pyspark application and when I run my spark application in the container I was seeing this UnknownTopicOrPartitionException
error.
I have created a containerized version of standalone cluster of pyspark with 2 worker nodes and submitted the job to the master. My job is same pyspark application. I was still seeing this UnknownTopicOrPartitionException
error.
What fixed the issue: My topic names contained underscores (_), removing all underscores fixed the issue. my_cool_topic_name
--> mycooltopicname
What I have tried: I created new topics, changed partition count, changed different configuration settings of spark structured streaming kafka configuration settings, nothing worked.
My Kafka broker is hosted on DigitalOcean with Version 3.5. I was using Kafka Client for the same version (i.e spark-sql-kafka-0-10_2.12:3.5.0).
LeaderNotAvailableException
might be more possible to be observed on the producer side.UnknownTopicOrPartitionException
is more likely thrown by ReplicaFetcherThread. – MerchantmanTimeoutException: Timeout of 60000ms expired before the position for partition mylocaltopic could be determined
. – Barbee