Apache Spark is a cluster computing framework developed by Berkeley, it can be used for high profile client management purpose by the doctors. The broader aspect of Big Data Healthcare has been discussed before. This is not exactly an article for
would-be-done purpose, it is for the high end doctors, who can code and has high end clients. Patient is too rough word, usage possible should be done thoughtfully in post-colonial era, in this Open Source market.
Apache Spark and Big Data Healthcare Suits The Practical Need
Apache Spark is free software. The provider need not to depend on closed source model offered by IBM Cloud or the others. The fear of privacy and security of data is more with the high end clients. We can easily run instances on Rackspace or Amazon without making the huge sample really identifiable. With Rackspace, there is no need to depend on Third Party Developers or Freelancers. Apache Spark is at least 100 times faster than Hadoop MapReduce. We can run the analysis and save the output on Cloud Files for off-cluster storage – exactly like running just an instance for testing Nginx and saving the configuration for using it later. The cluster can be destroyed to potentially save money.
Apache Spark supports Java, Scala, and Python APIs – you’ll need to any of the programming language. It has interactive command line interface in traditional Python and stream processing can also be done through Spark Streaming.
Apache Spark Supports higher level libraries for machine learning and graph processing.
Apache Spark and Big Data Healthcare : Resources
Here are some resources to help :
# rackspace as service provider
# see usage of a tool named chartio
# quick start guide
# rackspace uses yarn
# good blog on healthcare big data
A cost saving tips for Rackspace – default DataNodes is set to 02, you can decrease it to 01, to get the effective price $0.34 / Hour. Under 1 dollar for an hour of work, not bad.