SigmaWay Blog

SigmaWay Blog tries to aggregate original and third party content for the site users. It caters to articles on Process Improvement, Lean Six Sigma, Analytics, Market Intelligence, Training ,IT Services and industries which SigmaWay caters to

Spark or Hadoop Which is a better Big Data framework?

Hadoop, for many years, was the leading open source Big Data framework but recently the newer and more advanced Spark has taken over. Spark is reported to be 100 times faster although it lacks its own distributed storage system. For this reason many projects involve installing Spark on top of Hadoop, where Spark’s advanced analytics can make use of data stored using the Hadoop Distributed File System (HDFS).
What really gives Spark the edge is speed. Spark handles most of its operations ‘in memory’- copying them from the distributed physical storage into far faster logical RAM memory. Spark’s speed of handling advanced data processing tasks such as real time stream processing and machine learning is much more than what could be achieved by Hadoop. Faster dynamic data handling gives Spark the upper hand over Hadoop.
However it must be concluded that these two frameworks are not necessarily mutually exclusive and do not perform exactly the same tasks. In fact using both of them together can actually provide better results than using either one separately.

For more information visit:
http://www.forbes.com/sites/bernardmarr/2015/06/22/spark-or-hadoop-which-is-the-best-big-data-framework/

 

 

  5467 Hits

Spark or Hadoop Which is a better Big Data framework?

Hadoop, for many years, was the leading open source Big Data framework but recently the newer and more advanced Spark has taken over. Spark is reported to be 100 times faster although it lacks its own distributed storage system. For this reason many projects involve installing Spark on top of Hadoop, where Spark’s advanced analytics can make use of data stored using the Hadoop Distributed File System (HDFS).
What really gives Spark the edge is speed. Spark handles most of its operations ‘in memory’- copying them from the distributed physical storage into far faster logical RAM memory. Spark’s speed of handling advanced data processing tasks such as real time stream processing and machine learning is much more than what could be achieved by Hadoop. Faster dynamic data handling gives Spark the upper hand over Hadoop.
However it must be concluded that these two frameworks are not necessarily mutually exclusive and do not perform exactly the same tasks. In fact using both of them together can actually provide better results than using either one separately.

For more information visit:
http://www.forbes.com/sites/bernardmarr/2015/06/22/spark-or-hadoop-which-is-the-best-big-data-framework/

 

 

 

  4841 Hits
Sign up for our newsletter

Follow us