Home >> News >> Using Spark DataFrames for large scale data science

Using Spark DataFrames for large scale data science

When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark enabled distributed data processing through functional transformations on distributed collections of data (RDDs). This was an incredibly powerful API—tasks that used to take thousands of lines of code to express could be reduced to dozens.read more

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*