This article is an informative guide especially for those who are not used to data sciences but interested in developing an application with some AI or data analytics functions.
Earlier we had shown how to easily use IBM Watson for text analysis (on Google Docs). This article more towards to help developing something like WordPress plugin to analyze post emotion using AI. IBM Watson Studio helps with a variety of data science products and services into one environment for working with data and deploy machine learning models. IBM Watson Studio is the new name of our previously discussed IBM Data Science Experience (DSX). Through our earlier article, we have introduced the readers to the cloud-based machine learning services which included a working classification.
|Table of Contents|
IBM Watson Studio Basics to Get Started
Watson Studio is a set of tools (and also a collaborative environment) for the data scientists, developers and domain experts. It is also an easy set of tools for the new developers who want to integrate some machine learning or AI capabilities in their application. Needless to say, Watson Studio serves as a great source of computing power to the data scientists and IBM’s cloud object storage (uses OpenStack technology) is a ubiquitous tool.
IBM Watson Studio local is an IDE for data preparation and data modelling. It integrates with private cloud backed by powerful CPU and GPU infrastructure and IBM Cloud Object Storage. It is possible to integrate it with Apache Spark clusters for distributed processing.
IBM Cloud (formerly Bluemix) has a SaaS version of Watson Studio with the Lite plan which is free for 50 capacity unit-hours as monthly limit, 1 virtual CPU, 4 GB RAM. The Lite plan is great to start experimentation. IBM has a beautiful guided tour for Watson Studio.
Basic capabilities of IBM Watson Studio are same as that of predecessor IBM Data Science Experience – with Jupyter Notebook, Python, R, Scala a data science environment to start working. IBM Watson Studio provides the tools and services to store and catalogue data and models, transform and prepare data for analysis and analyze data in a collaborative environment. With Watson Studio, we are getting:
- the capabilities to work around deep learning (including TensorFlow)
- the access to pre-trained models, such as Watson Visual Recognition
- a chance to work with non-structured data
- insight into model management
- a drag-and-drop interface to build analytics models using SPSS Modeler
- easiness to visualize the insights with dynamic dashboards
We already discussed the basic theoritical aspect of application of machine learning in text recognition and approaches of deep learning. For a quick recap, we can divide learning in to:
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
The official examples, help tutorials for working with IBM SPSS Modeler is actually creating a supervised learning model. With IBM SPSS Modeler, we can build machine learning models by simple drag and drop.
How to Create an Instance of Watson Studio on IBM Cloud (suggested way for the new users)
Like any other Cloud services (like former Bluemix), first, the user needs to create an instance of Watson Studio. There will be an associated billing plan and geographic location. Lite plan should be enough to start testing. It must be noted that Watson Studio does not include SPSS functionality in Peru, Ecuador, Colombia and Venezuela (that is what is officially written, we have not tested). After creating the service instance, the next step will be to create a project which will act as a container for the datasets, models, deployments, and API credentials. Each of these project types is pre-configured for a specific task which is usually performed by data scientists. The generic project for working on any type of asset is Standard Type. Also, we can add project collaborators with control on their access level. A detailed, example official guide can be found on IBM’s official site.
How to install IBM Watson Studio local (optional way for the new users)
IBM Watson Studio local is an enterprise-grade software solution for data scientists with data science tools such as RStudio, Spark, Jupyter Notebooks, and Zeppelin notebooks. Integration basically configuring private cloud. It can be installed on a server running REHL, IBM private cloud, HDP cluster, Cloudera cluster, installed via a web browser, integrated with other clusters. Many ways of installing the local software for delivering maximal flexibility. IBM Watson Studio local has 60 days trial and $99/month/user plan. Here is a dedicated official website with documentation on IBM Watson Studio local.
How to Use IBM SPSS Modeler (Drag and Drop)
With IBM SPSS Modeler, we can build machine learning models by simple drag and drop. The visual interface gives us a way to load data, sample it, transform it, apply algorithms, evaluate predictive model performance through a series of nodes to find patterns or variables. We need some sample CSV data to start working (reader needs to use own set).
Within Watson Studio, we need to select New Modeler Flow, give it a name, keep everything at default settings, and then click Create button. Then from the Import menu, we can drag the Data Asset node onto the stream canvas. Then we need to select the CSV data file in the node settings to load. If we right-click the node and select Preview, then we can see our detailed dataset.
Next, to build a modeler stream, we need to navigate to Record Operations, we need to pick Sample and drag it onto the canvas. Then click and drag the line from a visual clue. We can right-click on Sample, go to the settings. Keeping default should work. To experiment with algorithms, we need to navigate to the Modeling menu, find machine learning models from the provided source. We need to choose, drag those the nodes to the canvas and connect them to the Data Types node. We need to click the small blue triangle on the stream canvas top menu to start streams. After the end of the run, orange nodes will appear containing model performance results. We can right click each of them to check. This is the basic process to create simple supervised machine learning models with IBM SPSS Modeler.
Machine Learning Models in Jupyter Notebook
Although this article’s name is pointing to drag and drop work, Jupyter Notebook is used by many of the users who are not data scientists, it is not exactly complex and gives the opportunity to run commands. It is probably practical to touch this part.
We can create machine learning models in (Juptyer) notebook by usual way of writing the code and implementing IBM specific machine learning API. After a model is created, we can train and deploy. Official examples include a sample notebook, showing the commands and steps to load data, create an Apache Spark model, create a pipeline, and train the model. To install the required packages, we can run the command in the usual format :
!pip install wget --user --upgrade
The linked sample notebook shows how to make CSV file available on gpfs, it is simple and easy :
filename = 'GoSales_Tx_NaiveBayes.csv'
if not os.path.isfile(filename): link_to_data = 'https://apsportal.ibm.com/exchange-api/v1/entries/8044492073eb964f46597b4be06ff5ea/data?accessKey=9561295fa407698694b1e254d0099600'
filename = wget.download(link_to_data)
Jupyter notebook of IBM data science experience is our favourite part. It is clearly open source part, even if Watson API included it, it remains “open”. IBM Watson Studio’s drag and drop interface avoid the complexity of training machine learning models making data preparation, model deployment workflow easy to the newbies. It works as ready to use platform with huge computing resources to the data scientists.
However, like anything on this earth, IBM Watson Studio has some cons as well (Tweet this ). Watson Studio does not yet support exporting a fully trained model and also has no way to import trained machine learning model on a different system (Do not misunderstand – Neural networks can be exported in TensorFlow, Keras, PyTorch, Caffe, JSON format for sharing). It is a wizard-like development environment, not a container based training environment. If the mentioned points are not a big headache to the developer then there is probably not many cons.
Watson Studio is delivered mostly like a PaaS service for machine learning. From that angle, it is really successful from a developmental point of view.