I am trying improve the accuracy of Logistic regression algorithm implemented in Spark using Java. For this I'm trying to replace Null or invalid values present in a column with the most frequent value of that column. For Example:-
Name|Place
a |a1
a |a2
a |a2
|d1
b |a2
c |a2
c |
|
d |c1
In this case I'll replace all the NULL values in column "Name" with 'a' and in column "Place" with 'a2'. Till now I am able to extract only the most frequent columns in a particular column. Can you please help me with the second step on how to replace the null or invalid values with the most frequent values of that column.