...

Library - Pyshark

Back to Course

Lesson Description


Lession - #1494 PySpark-SparkFiles


In Apache Spark, you can transfer your records utilizing sc.addFile (sc is your default SparkContext>
and get the path on a worker utilizing SparkFiles.get. In this manner, SparkFiles resolve the paths to records added through SparkContext.addFile(>
.

SparkFiles contain the accompanying classmethods −

  • get(filename>
  • getrootdirectory(>


    get(filename>

    It determines the path of the document that is added through SparkContext.addFile(>
    .

    getrootdirectory(>

    It indicates the path to the root directory, which contains the document that is added through the SparkContext.addFile(>
    .

    ----------------------------------------sparkfile.py------------------------------------
    from pyspark import SparkContext
    from pyspark import SparkFiles
    finddistance = "/home/hadoop/examples_pyspark/finddistance.R"
    finddistancename = "finddistance.R"
    sc = SparkContext("local", "SparkFile App">
    sc.addFile(finddistance>
    print "Absolute Path -> %s" % SparkFiles.get(finddistancename>
    ----------------------------------------sparkfile.py------------------------------------

    command - below is the following command -
    $SPARK_HOME/bin/spark-submit sparkfiles.py


    output - below is the output of the above command -
    Absolute Path ->
    /tmp/spark-f1170149-af01-4620-9805-f61c85fecee4/userFiles-641dfd0f-240b-4264-a650-4e06e7a57839/finddistance.R