Improving the Analytical Workflow …. with Dataflow!

What is the analytical workflow?  This is the process in which analysts mine, filter, pivot, and exploit data to try and find anomalies and ways to improve business processes. The workflow typically starts with a problem in a business process, and a hypothesis on why the problem is occurring. For example, in the airline industry, a problem may be a delay in take-off times; and a hypothesized reason for this is because of a recent change in airline boarding practices. There are several steps that need to occur to try and identify the root cause, and typically it is not a lock-step process; rather, it involves multiple iterations. Some of the steps in this iterative process include: identification, preparation, development, deployment, execution, and adjustment.

 

 

Here we’ll go through this use case, and identify how dataflow, and in particular, Composable Analytics, can improve the efficiency by reducing the latency between and within the steps for the analyst.

Identification

Identifying and getting access to the source data for analysis can sometimes be an undertaking in itself for large distributed organizations. This may include getting access to flight log times, check-in records, and customer complaint data. And while you would think most data feeds would already been identified in an organization, most problems are unanticipated, and require data from sources not normally used.

During this process, the identified source data is sometimes not the actual data required to fully investigate the problem. This only occurs after diving deep into the data. This can be a very time consuming process. For example, the hypothesis may change and it is now thought maintenance issues are causing the delays. Maintenance records will need to be pulled to continue the investigation.

Composable Analytics provides capabilities for analysts to share and reuse data views. While the data may be new to the analyst, most likely another analyst has investigated a previous problem that required the dataset. The analysts can take the live view and use it within their dataflow application by simply dragging and dropping the component.

Preparation

Data isn’t pretty. It’s hairy, covered with warts, and downright nasty sometimes … and data is never in the format analysts want it in. This may include dates and times in different zones, null and 0 values, strings vs numerics, unstructured text, and contradictions in the data are commonplace. This is actually where the majority of the time is typically spent. And sometimes, preparing the data takes so long, that the results become meaningless.

Composable Analytics provides capabilities to cleanse and filter the data with a dataflow methodology. This allows the analysts to view the intermediate data at each step in the preparation process. This process alone can uncover problems within the systems and overall business process, which may not have been the original target for the analysis. In addition, multiple datasets can be joined together by connecting components using the dataflow language. This results in a tremendous amount of time savings, and the cleansing processes can be shared and reused for future use.

Development

There may be multiple paths an analysts may take to find the root cause. They may group the data by different plane models, carriers, or geographic regions. They may also want to compare departures over time with the deployment of new practices, weather, or seasonality trends. There are several steps in the data processing chain. Unfortunately, building new and accurate analytic process takes far too much time and resources.

Composable Analytics leverages a dataflow methodology for developing analytics. Users can string together reusable components (queries, filters, statistical functions). Multiple applications can be created, or multiple analysis branches within the same application can be developed for side-by-side comparisons of results. A visual mapping of how the data is being processed and joined is critical to extracting insight.

Deployment

If interesting results are found through the creation process, the analytic may be deemed important enough to be ‘built-in’ to the current business practice. However, analysts may use Matlab, R, SAS, and other statistical packages to do the heavy mathematical modeling. However, there is a disconnect between these technologies and production systems. Developers, who normally don’t use the analytical packages, receive the requirements, and rewrite the methods to fit in the product environment. In some circumstances, deployment takes longer than the initial development.

Composable Analytics is an enterprise solution, allowing for analytical techniques to be written and deployed within production systems. There is no need to rewrite the ‘gold standard’ or validated statistical methods for use in production, which would result in delays and bugs potentially introduced. An application can be authored, tested, and deployed in the environment by analysts who have permission. This lowers the bar tremendously for deploying new capabilities within an enterprise.

Execution

For large datasets, there may be long delays in execution time. It’s also common that analysts do not have the hardware resources seen in production, resulting in unknowns surrounding model execution.

Composable Analytics parallelizes the execution of the analytical process, resulting in a tremendous amount of time savings. Execution of the analytics occurs in a shared environment, resulting in very easy execution of another user’s work. There is no need to download all the required dependencies, build, and adjust configuration and connection strings. Analytics can be executed with a click of a button, and knobs adjusted. It’s literally that easy.

Adjustment

Because data and processes are constantly in flux, analytics can become obsolete quickly. Adjustments will need to be made. For example, once a bottleneck in the airplane departures is found, another plaguing delay may be found elsewhere.

Within Composable Analytics, analytical methods can be very easily modified. The system blurs the lines between configuration adjustments, and newly developed methods because components within an application can be easily swapped out and reconfigured.

Summary

Composable Analytics shortens the analytic development cycle by integrating the whole process from start to end. By using a dataflow methodology, analysts organize data flows without the need to learn another programming language, allowing analysts to become very efficient. Not only is the authoring of analytics efficient, but the execution is as well. Composable Analytics parallelizes the execution steps of analytics by analyzing the dependencies between components.

Composable Analytics can be effectively applied in a variety of industry sectors across specific areas or departments. Whether you need to create re-occurring reports, analyze customer data, or automate personal tasks, Composable Analytics is a single cohesive ecosystem for data orchestration and analytics.

Experience Composable Analytics

Try it on the Cloud?

...or install your own instance?