...

Library - Pyshark

Back to Course

Lesson Description


Lession - #1497 PySpark-Serializers


Serialization is utilized for execution tuning on Apache Spark. All information that is sent over the organization or kept in touch with the disk or persevered in the memory should be serialized. Serialization assumes a significant part in expensive activities.

PySpark upholds custom serializers for execution tuning. The accompanying two serializers are upheld by PySpark −

MarshalSerializer

Serializes objects utilizing Python's Marshal Serializer. This serializer is quicker than PickleSerializer, however upholds less datatypes.
class pyspark.MarshalSerializer


PickleSerializer

Serializes objects utilizing Python's Pickle Serializer. This serializer upholds almost any Python object, however may not be just about as quick as more specific serializers.
class pyspark.PickleSerializer


Allow us to see a model on PySpark serialization. Here, we serialize the information utilizing MarshalSerializer.

--------------------------------------serializing.py-------------------------------------
from pyspark.context import SparkContext
from pyspark.serializers import MarshalSerializer
sc = SparkContext("local", "serialization app", serializer = MarshalSerializer(>
>
print(sc.parallelize(list(range(1000>
>
>
.map(lambda x: 2 * x>
.take(10>
>
sc.stop(>
--------------------------------------serializing.py-------------------------------------


Command - The command is as follows −
$SPARK_HOME/bin/spark-submit serializing.py
output - The output of the above command is −
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]