Apache Toree

40
APACHE TOREE ASIM JALIS GALVANIZE

Transcript of Apache Toree

APACHE TOREEASIM JALIS

GALVANIZE

INTRO

ASIM JALISGalvanize/Zipfian, DataEngineeringCloudera, Microso!,SalesforceMS in Computer Sciencefrom University ofVirginia

WHAT IS GALVANIZE’S DATAENGINEERING IMMERSIVE?

Immersive Peer LearningEnvironmentMaster High-DemandSkills and TechnologiesHeart of San Francisco inSOMA

YOU GET TO . . .Play with Terabytes ofDataSpark, Hadoop, Hive,Kafka, Storm, HBaseData Science at ScaleLevel UP your Career

FOR MORE [email protected]://galvanize.com

TALK OVERVIEW

WHAT IS THIS TALK ABOUT?What is Apache Toree?How can I createIPython/Jupyternotebooks forSpark/Scala?

HOW MANY PEOPLE HERE AREFAMILIAR WITH

IPYTHON/JUPYTER NOTEBOOKS?

HOW MANY PEOPLE HERE AREFAMILIAR WITH APACHE SPARK?

HOW MANY PEOPLE HERE AREFAMILIAR WITH SCALA?

LITERATE PROGRAMMING

WHAT IS LITERATEPROGRAMMING?

Proposed by Don KnuthWrite programs forhumans, not machinesPrograms communicateideas to othersDefault text isdocumentation orthoughtsCode is explicitlymarked out

LITERATE PROGRAMThis program prints hello world.

<<hello.c>>=<<includes>><<main>>@

Some includes.

<<includes>>=#include <stdio.h>@

Print hello world, then exit.

<<main>>=int main(int argc, char *argv[]) { printf("Hello World!\n"); return 0;}@

WHAT PROBLEM DOES JUPYTERSOLVE?

Suppose you want to share programming idea or tutorialYou write an article and embed code in itNow imagine being able to execute the code in the article

HOW IS THIS DIFFERENT FROMCODE COMMENTS?

Commented code is not technical literatureIt cannot be published or read as an articleA literate program is an executable article

JUPYTER/IPYTHON

WHAT IS JUPYTER?Create executabledocumentsOriginally for PythonSupports other systemsthrough kernels

JUPYTER DEMOWrite Markdown textWrite Scala Spark codeExecuteRepeatTab-completion

JUPYTER ARCH

JUPYTER ARCHJupyter serverDisplays notebook inbrowserExecutes code onPython runtimeDisplays output backinto notebook

FERNANDO PÉREZIPython/JupyterinventorParticle Physics PhD,University of Colorado—BoulderNow at UC BerkeleyStarted IPython in 2001

IS IT IPYTHON OR JUPYTER?Started out as IPythonnotebookNot specific to PythonCan work with Scala andother languagesJupyter captures itslanguage independence

APACHE TOREE

WHAT IS TOREE?Toree is a Jupyter KernelExecutes ScalaRuns SparkDriver/Context

HOW DOES TOREE WORK?

TOREE ARCHITECTUREJupyter Server talks toToree KernelToree Kernel talks toSpark DriverSpark Driver talks toSpark Executors

HOW IS TOREE DIFFERENT FROMZEPPELIN

Toree compliant with Jupyter protocolToree is easy to install and useZeppelin does not use Jupyter protocolZeppelin wants to be a platform like Jupyter

TOREE COMMITSLot of activity last yearStabilizing

WHO WROTE TOREE?Top 2 contributors responsible for 50% of commitschipsenkbeil has 318 commitsLull3rSkat3r has 72 commits

ROBERT “CHIP” SENKBEIL ANDCOREY STUBBS

WHY IS IT CALLED TOREE?Nothing special about the name. Somepeople in the group and just picked it out.Some facts though, it is a purposefulmisspelling of the the Japanese word torii.A torrii is a traditional gate for Shintoshrines in Japan. —Corey Stubbs (personalemail)

ACTUAL TORII

TOREE HANDS-ON DEMO

WHAT WE WILL COVERHow to install ToreeHow to automatically pull Java libraries in your notebookHow to publish notebooks on How to turn notebooks into slide shows

http://nbviewer.jupyter.org/

QUICKSTART TUTORIALFor details on how to do all these thingsSee https://github.com/asimjalis/apache-toree-quickstart

CONCLUSION

REFERENCESApache Toree Quickstart Tutorialhttps://github.com/asimjalis/apache-toree-quickstart

Apache Toree on GitHubhttps://github.com/apache/incubator-toree

Apache Toree Homehttps://toree.incubator.apache.org/

QUESTIONS