Hi all!

I’d like to announce that we’ve done a bit of work to make Jupyter Notebooks in SWAP support Spark kernels. This means that you can now run Spark shells in both local mode (on the notebook server) or YARN mode (distributed on the Hadoop Cluster) inside of a Jupyter notebook. You can then take advantage of fancy Jupyter plotting libraries to make graphs directly from data in Spark.

See https://wikitech.wikimedia.org/wiki/SWAP#Spark for documentation.

This is a new feature, and I’m sure there will be kinks to work out. If you encounter issues of have questions, please respond on this phabricator ticket, or create a new one and add the Analytics tag.

Enjoy!

-Andrew Otto & Analytics Engineering