I have the following code for linear regression using pyspark.ml package. However I get this error message for the last line, when the model is being fit:
IllegalArgumentException: u'requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.
Does anyone has an idea what is missing?
Is there any replacement in pyspark.ml
for LabeledPoint
in pyspark.mllib
?
from pyspark import SparkContext
from pyspark.ml.regression import LinearRegression
from pyspark.mllib.regression import LabeledPoint
import numpy as np
from pandas import *
data = sc.textFile("/FileStore/tables/w7baik1x1487076820914/randomTableSmall.csv")
def parsePoint(line):
values = [float(x) for x in line.split(',')]
return LabeledPoint(values[1], [values[0]])
points_df = data.map(parsePoint).toDF()
lr = LinearRegression()
model = lr.fit(points_df, {lr.regParam:0.0})
parsePoint
: 0.656992798279138,2.5834056958606 0.716673783763451,2.36159163031627 0.259623437084048,1.69482312701634 – Korikorie