How to configure high performance BLAS/LAPACK for Breeze on Amazon EMR, EC2
Asked Answered
C

2

28

I am trying to set up an environment to support exploratory data analytics on a cluster. Based on an initial survey of what's out there my target is use Scala/Spark with Amazon EMR to provision the cluster.

Currently I'm just trying to get some basic examples up and running to validate that I've got everything configured properly. The problem I am having is that I'm not seeing the performance I expect from the Atlas BLAS libraries on the Amazon machine instance.

Below is a code snippet of my simple benchmark. It's just a square matrix multiply followed by short fat multiply and a tall thin multiply to yield a small matrix that can be printed (I wanted to be sure Scala would not skip any part of the computation due to lazy evaluation).

I'm using Breeze for the linear algebra library and netlib-java to pull in the local native libraries for BLAS/LAPACK

import breeze.linalg.{DenseMatrix, DenseVector}
import org.apache.spark.annotation.DeveloperApi
import org.apache.spark.rdd.RDD
import org.apache.spark.{Partition, SparkContext, TaskContext}
import org.apache.spark.SparkConf

import com.github.fommil.netlib.BLAS.{getInstance => blas}

import scala.reflect.ClassTag

object App {

  def NaiveMultiplication(n: Int) : Unit = {

    val vl = java.text.NumberFormat.getIntegerInstance.format(n)
    println(f"Naive Multipication with vector length " + vl)

    println(blas.getClass().getName())

    val sm: DenseMatrix[Double] = DenseMatrix.rand(n, n)
    val a: DenseMatrix[Double] = DenseMatrix.rand(2,n)
    val b: DenseMatrix[Double] = DenseMatrix.rand(n,3)

    val c: DenseMatrix[Double] = sm * sm
    val cNormal: DenseMatrix[Double] = (a *  c)  * b

    println(s"Dot product of a and b is \n$cNormal")
  }

Based on a web survey of benchmarks I'm expecting a 3000x3000 matrix multiply to take approx. 2-4s using a native, optimized BLAS library. When I run locally on my MacBook Air this benchmark completes in 1.8s. When I run this on EMR it completes in approx. 11s (using a g2.2xlarge instance, though similar results were obtained on a m3.xlarge instance). As another cross check I ran a prebuilt EC2 AMI from the BIDMach project on the same EC2 instance type, g2.2xlarge, and got 2.2s (note, the GPU benchmark for the same calculation yielded 0.047s).

At this point I suspect that netlib-java is not loading the correct lib, but this is where I am stuck. I've gone through the netlib-java README many times and it seems the ATLAS libs are already installed as required (see below)

[hadoop@ip-172-31-3-69 ~]$ ls /usr/lib64/atlas/
libatlas.a       libcblas.a       libclapack.so      libf77blas.so      liblapack.so      libptcblas.so      libptf77blas.so
libatlas.so      libcblas.so      libclapack.so.3    libf77blas.so.3    liblapack.so.3    libptcblas.so.3    libptf77blas.so.3
libatlas.so.3    libcblas.so.3    libclapack.so.3.0  libf77blas.so.3.0  liblapack.so.3.0  libptcblas.so.3.0  libptf77blas.so.3.0
libatlas.so.3.0  libcblas.so.3.0  libf77blas.a       liblapack.a        libptcblas.a      libptf77blas.a
[hadoop@ip-172-31-3-69 ~]$ cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
[hadoop@ip-172-31-3-69 ~]$ ls /etc/ld.so.conf.d
atlas-x86_64.conf  kernel-4.4.11-23.53.amzn1.x86_64.conf  kernel-4.4.8-20.46.amzn1.x86_64.conf  mysql55-x86_64.conf  R-x86_64.conf
[hadoop@ip-172-31-3-69 ~]$ cat /etc/ld.so.conf.d/atlas-x86_64.conf 
/usr/lib64/atlas

Below I've show 2 examples running the benchmark on Amazon EMR instance. The first shows when the native system BLAS supposedly loads correctly. The second shows when the native BLAS does not load and the package falls back to the reference implementation. So it does appear to be loading a native BLAS based on the messages and the timing. Compared to running locally on my Mac, the no BLAS case runs in approximately the same time, but the native BLAS case runs in 1.8s on my Mac compared to 15s in the case below. The info messages are the same for my Mac compared to EMR (other than specific dir/file names, etc.).

[hadoop@ip-172-31-3-69 ~]$ spark-submit --class "com.cyberatomics.simplespark.App" --conf "spark.driver.extraClassPath=/home/hadoop/simplespark-0.0.1-SNAPSHOT-jar-with-dependencies.jar"   --master local[4] simplespark-0.0.1-SNAPSHOT-jar-with-dependencies.jar  3000 naive
Naive Multipication with vector length 3,000
Jun 16, 2016 12:30:39 AM com.github.fommil.jni.JniLoader liberalLoad
INFO: successfully loaded /tmp/jniloader2856061049061057802netlib-native_system-linux-x86_64.so
com.github.fommil.netlib.NativeSystemBLAS
Dot product of a and b is 
1.677332076284315E9   1.6768329748988206E9  1.692150656424957E9   
1.6999000993276503E9  1.6993872020220244E9  1.7149145239563465E9  
Elapsed run time:  15.1s
[hadoop@ip-172-31-3-69 ~]$ 
[hadoop@ip-172-31-3-69 ~]$ spark-submit --class "com.cyberatomics.simplespark.App"  --master local[4] simplespark-0.0.1-SNAPSHOT-jar-with-dependencies.jar  3000 naive
Naive Multipication with vector length 3,000
Jun 16, 2016 12:31:32 AM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
Jun 16, 2016 12:31:32 AM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
com.github.fommil.netlib.F2jBLAS
Dot product of a and b is 
1.6640545115052865E9  1.6814609592261212E9  1.7062846398842275E9  
1.64471099826913E9    1.6619129531594608E9  1.6864479674870768E9  
Elapsed run time:  28.7s

At this point my best guess is that it is actually loading a native lib, but it is loading a generic one. Any suggestions on how I can verify which shared library it is picking up at run time? I tried 'ldd' but that seems not to work with spark-submit. Or maybe my expectations for Atlas are wrong, but seems hard to believe AWS would pre-install the libs if they weren't running a reasonably competitive speeds.

If you see that the libs are not linked up correctly on EMR, please provide guidance on what I need to do in order for the Atlas libs to get picked up by netlib-java.

thanks tim

Cloudless answered 16/6, 2016 at 1:1 Comment(4)
Could you convert "follow-up" into an answer? It provides useful insights and if there won't be any other answer I would like to award the bounty. Thanks in advance!Olive
I cannot even recreate the first instance where you pull in the default EMR Atlas native lib. Did you do other things differently (not listed in your post) that caused the native lib to be used instead of F2jBLAS? No matter what I try, I seem to still be getting F2J.Campus
It's been a long time since I looked at this. I think the way netlib is integrated with Breeze has changed a bit. But as I recall, the key to getting past your issue was to include the .jar that contains the native library stubs. At the time when I first posted the above, the jar HAD to be included explicitly with an additional path variable. It did NOT get included in the fat jar with my application. Here was a good post about setting up netlib to use BLAS datasciencemadesimpler.wordpress.com/tag/blasCloudless
Yeah I was able to finally figure this out from the above and a few other random threads I found. Your post and answer have been very helpful in my process, appreciate it!Campus
C
12

Follow up:

My tentative conclusion is that the Atlas libs installed by default on the Amazon EMR instance is simply slow. Either it is a generic build that has not been optimized for the specific machine type, or it is fundamentally slower than other libraries. Using this script as a guide I built and installed OpenBLAS for the specific machine type where I was running the benchmarks(I also found some helpful info here). Once OpenBLAS was installed my 3000x3000 matrix multiply benchmark completed in 3.9s (as compared to the 15.1s listed above when using the default Atlas libs). This is still slower than the same benchmark run on my Mac (by a factor of x2), but this difference falls in a range that could credibly be due to underlying h/w performance.

Here is a complete listing of the commands I used to install OpenBLAS libs on Amazon's EMR, Spark instance:

sudo yum install git
git clone https://github.com/xianyi/OpenBlas.git
cd OpenBlas/
make clean
make -j4
sudo mkdir /usr/lib64/OpenBLAS
sudo chmod o+w,g+w /usr/lib64/OpenBLAS/
make PREFIX=/usr/lib64/OpenBLAS install
sudo rm /etc/ld.so.conf.d/atlas-x86_64.conf 
sudo ldconfig
sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/libblas.so
sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/libblas.so.3
sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/libblas.so.3.5
sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/liblapack.so
sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/liblapack.so.3
sudo ln -sf /usr/lib64/OpenBLAS/lib/libopenblas.so /usr/lib64/liblapack.so.3.5
Cloudless answered 24/8, 2016 at 17:54 Comment(2)
The Steps I"m running in AWS still claim I am running F2J. What other steps did you take to point breeze to this native lib? Did you include breeze-native dep? What's the significance of including an extra classPath in your question? In your example that is what seemed to pick up the native lib and not F2J.Campus
@NathanielWendt : For people who will encounter this issue in future as I had, I listed out the steps to load Native libraries. gist.github.com/dineshdharme/8bd39cdbc35a09033b9ba2cfd1bdf146Oregon
O
0

Following the below guide, I could get native BLAS libraries to work with.

https://spark.apache.org/docs/latest/ml-linalg-guide.html

https://github.com/luhenry/netlib

sbt dependency :

"dev.ludovic.netlib" % "blas" % "3.0.3"

Then in your scala code, check with the following statements :

// import statement
import dev.ludovic.netlib.blas.NativeBLAS

// print this in your function
println(s"BLAS dev.ludovic.netlib.blas.NativeBLAS: ${NativeBLAS.getInstance()}")

It should print out something like this :

BLAS dev.ludovic.netlib.blas.NativeBLAS: dev.ludovic.netlib.blas.JNIBLAS@68a4dcc6

EDIT : Actually just doing the above steps, didn't work out for me.

I had to separately compile the jars and get spark to use them. I have detailed the steps to compile the jar in this gist.

https://gist.github.com/dineshdharme/8bd39cdbc35a09033b9ba2cfd1bdf146

If you don't want to go through the laborious process, I would suggest to go directly STEP 3 and follow the instructions.

Oregon answered 2/11, 2023 at 3:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.