Today, data scientists utilize a suite of tools, from fast databases to open source machine learning platforms, in order to conduct sophisticated analyses. When data scientists model, train, and score data, they often query directly from the database as needed. This process of writing one-off SQL queries, exporting results into a data science tool, and then returning results for further development or visualization can be cumbersome and error-prone, often with delays between each step.
What data scientists need is a way to streamline the process from data storage, analytics processing, machine learning work, and then back to storage again for further visualization and analysis. It’s also important that data governance and trust in data is maintained throughout this process, as it is essential to meaningful results and insights.
Looker and Dataiku teamed up to streamline this process and empower data scientists with an end-to-end solution for data prep, machine learning, and access to trusted data for rapid reporting and insights.
The Looker query plugin for Dataiku DSS enables users to seamlessly take prepared and trusted Looker query results and send them directly to Dataiku for machine learning, eliminating manual and cumbersome steps often involved in creating a data science workflow.
With this integration, the processes for completing predictive analytics projects are simplified for tasks like:
Data prep for analytics
Dataiku provides cleansing, processing, and enrichment capabilities that return valuable datasets back to Looker users. The resulting datasets can be piped into any of Dataiku’s data warehouse integrations, where data is then able to be queried by Looker.
Data prep for LookML
Looker's repository of business logic as defined in LookML helps prepare data for further ML model training and evaluation. Plus, with the Looker query plugin, users can surface trusted datasets curated in Looker to the Dataiku platform. From there, Dataiku’s automated ML features make it possible for any analyst, no matter their level of expertise, to reap the benefits of data science without advanced coding, engineering, or even training.
Completing the loop
Users can access machine learning training and evaluation results in Looker for visualization, exploration, and action by writing back results into the source database.
To complete any of the above tasks, users simply tell Dataiku which Looker query to run and use as its training and test dataset - and that’s it! The below graphic shows the step-by-step setup instructions for deploying a model, and the following video highlights the power of this new integration.
Here’s how to get started with the Looker query plugin for Dataiku DSS
Similar to Looker, Dataiku leverages the performance of data warehouses by offering transformation pipelines running in-database (SQL) or in-cluster (Spark, Hadoop). Use cases include data cleansing at scale, data enrichment, and data processing. Results of these operations are then available for analytics in the database.
Looker connects to the database containing results of Dataiku prep, processing and ML as well as other centralized data sources from applications, transactional databases, etc. Business logic is defined in LookML such that all reporting leverages the same definitions.
Datasets can be curated in Looker as reports based on these definitions and sent to Dataiku through the Looker query plugin. The plugin leverages Looker’s API to surface the Looks in the Dataiku platform. The plugin additionally requires a dedicated Python environment to be configured.
Once the plugin has been set up, a Look is selected in Dataiku and can be synced to a local copy on the platform if desired. As an additional option, cleanup on the dataset can be performed to get the data in the necessary format for machine learning.
A machine learning model is run against the dataset provided by Looker.
Results of the model (predictions) can be written back to the database and can then be visualized with Looker.
A production workflow can be established to continually run the model and gather predictions.
This Looker plugin for Dataiku DSS takes a new approach to enterprise-wide data science by not only making it easy to create machine learning models, but also by creating a loop that allows users to further analyze predictive analytics results in their BI layer.