Apache Spark and Big Data Healthcare

Abhishek Ghosh

By Abhishek Ghosh November 10, 2014 10:16 am Updated on November 10, 2014

Apache Spark and Big Data Healthcare

Apache Spark is a cluster computing framework developed by Berkeley, it can be used for high profile client management purpose by the doctors. The broader aspect of Big Data Healthcare has been discussed before. This is not exactly an article for would-be-done purpose, it is for the high end doctors, who can code and has high end clients. Patient is too rough word, usage possible should be done thoughtfully in post-colonial era, in this Open Source market.

Apache Spark and Big Data Healthcare Suits The Practical Need

Apache Spark is free software. The provider need not to depend on closed source model offered by IBM Cloud or the others. The fear of privacy and security of data is more with the high end clients. We can easily run instances on Rackspace or Amazon without making the huge sample really identifiable. With Rackspace, there is no need to depend on Third Party Developers or Freelancers. Apache Spark is at least 100 times faster than Hadoop MapReduce. We can run the analysis and save the output on Cloud Files for off-cluster storage – exactly like running just an instance for testing Nginx and saving the configuration for using it later. The cluster can be destroyed to potentially save money.

Apache Spark supports Java, Scala, and Python APIs – you’ll need to any of the programming language. It has interactive command line interface in traditional Python and stream processing can also be done through Spark Streaming.
Apache Spark Supports higher level libraries for machine learning and graph processing.

Apache Spark and Big Data Healthcare : Resources

Here are some resources to help :

 # rackspace as service provider
http://www.rackspace.com/cloud/big-data/features/
 # see usage of a tool named chartio
http://www.rackspace.com/blog/easy-analytics-on-rackspace-cloud-databases-with-chartio/
 # quick start guide
https://spark.apache.org/docs/latest/quick-start.html
 # rackspace uses yarn
https://spark.apache.org/docs/latest/running-on-yarn.html
 # good blog on healthcare big data
http://rajnmanickam.wordpress.com

1

2

3

4

5

6

7

8

9

10

# rackspace as service provider

http://www.rackspace.com/cloud/big-data/features/

# see usage of a tool named chartio

http://www.rackspace.com/blog/easy-analytics-on-rackspace-cloud-databases-with-chartio/

# quick start guide

https://spark.apache.org/docs/latest/quick-start.html

# rackspace uses yarn

https://spark.apache.org/docs/latest/running-on-yarn.html

# good blog on healthcare big data

http://rajnmanickam.wordpress.com

A cost saving tips for Rackspace – default DataNodes is set to 02, you can decrease it to 01, to get the effective price $0.34 / Hour. Under 1 dollar for an hour of work, not bad.

Tagged With Apache spark for healthare , apache spark in healthcare

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Take The Conversation Further ...

Get new posts by email: