Spark Scale Out on Mac?

Hi,

I created a SparkConf.properties file in ~/ next to OrbitOmero.properties.

However, when I open Orbit Image Analysis, it seems that the Spark executor is not used.

I say this because clicking on “Batch->Retrieve Existing Results” says “Not implemented for this scaleout implementation”.

Am I missing something here?

Also, the log file does, in fact, not say which SparkConf is being loaded.

Thanks a lot

Dear @Trefex

thanks a lot for trying Orbit in combination with Spark. Indeed that’s super powerful!

Some steps are needed to get it working. I’ll summarize here:

  1. Create a SparkConf.properties file in ~/ next to OrbitOmero.properties as you already did.
    Please refer to the Spark scaleout documentation page and be sure to configure a samba share.
  2. As stated on the Orbit download page since Orbit 2.8 the Spark dependencies are not included and have to be downloaded in addition here and copied into the lib folder of your Orbit installation.
    This will work for Spark 2.2.1. For other versions you have to build you own Spark dependency (let me know if you need help for this, it’s not too hard).
  3. Now you have to activate the Spark scale-out provider. For this you have to modify the config.properties file which is included in the orbit-image-analysis.jar (you might rename the jar to zip, use 7zip or similar to extract and modify the file and rename back to jar):
    Comment out the ScaleoutNoop line and activate the
    ScaleOut=com.actelion.research.orbit.imageAnalysis.dal.ScaleoutSpark
    line.

That’s it, now Orbit should work with your Spark cluster.

Regards,
Manuel

Dear @mstritt

Thanks a lot.

The official doc here https://www.orbit.bio/spark-scaleout/ says that Orbit uses Spark 1.6.0. Is this not the case anymore? It would be indeed nice to have a more updated version of Spark running :slight_smile:

I will try your suggestions and report back.

Kind regards,
Christophe

Hi @mstritt

It is now loading correct SparkExecutor, and I move to Spark 2.1.1.

However, I fail at

java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries!
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:433)
	at sun.nio.ch.Net.bind(Net.java:425)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
	at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
	at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
	at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
	at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
	at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
	at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:748)

Actually after changing /etc/hosts, Orbit just closes without any additional error message…

I read on some forums to edit /etc/hosts, which i did and also to add SPARK_LOCAL_IP="127.0.0.1" but not sure where this could be done.

Would you know how to get this running?

Thanks,
T

Spark 2.2.1 is correct, I just corrected the website.

Not sure about your error, but I guess it’s something Spark related. What I always recommend is to get Spark running without Orbit, e.g. use spark-submit and try the sparkPi example from your client (see tutorial e.g. here).
Once you get this running go ahead with Orbit.

Side note: The spark-scaleout implementation assumes Spark with client driver mode.

Regards,
Manuel

Good advice, however how can i pass the environment variables down to SPARK scalout in Java?

I think the problem comes from there.

I guess what you want is to define the environment variables in conf/spark-env.sh on the Slaves and Master nodes of your Spark cluster - Orbit just submits the job then.

In addition you can define further parameters (not environment variables) in the SparkConf.properties as defined here, but I guess that’s not what you want.

Try to deploy SparkPi like described here with deploy-mode client.

I am able to run Spark from my machine to the remote cluster.

This is issue is with the Orbit Spark driver.

They mention things to fix here but i don’t know how to set these in Oribt.

ok, can you please try to add

spark.driver.bindAddress=127.0.0.1

in your SparkConf.properties?

Then Orbit should add this to the Spark environment.

Regards,
Manuel