Workflow Builder - Overview

Overview

Aunsight’s Workflow Builder lets you create, schedule, and manage your dataset building processes at scale—a hybrid data integration (ETL) service. Work with data wherever it lives, in the cloud or on-premises, with enterprise-grade security.

The Workflow Builder provides a graphical user interface to build and manage your data pipelines, enabling you to automate usage of Aunsight’s full ecosystem of products and services—from transforming raw data to utilizing powerful scoring engines. Workflow streamlines everything so delivering data autonomously is easy!

What you can do with the Workflow Builder?

Aggregate Data from Multiple Data & Datasets

Move Data Around to Utilize Various Resources

Simplify a Workflow by Using a Modular Series of Workflows

Main Page

When a workflow is created or an existing workflow is selected from the list on the left side of the screen, the main page will display on the right. From this page, you can view general information about the workflow and its history, run or schedule the workflow, or access the page to modify the workflow.

The Details tab contains the general information about the Workflow; its name and ID, creation and last updated dates, tags and tokens, as well as the job and activity history.

The Run tab allows you to run the Workflow and set notifications.

The Versions tab contains a record of previous versions and gives you the ability to view, delete, or revert to a previous version.

Modify Workflow

After clicking Modify from the main page, you can view or edit the details of the workflow itself.

  1. The left portion of the page is a visual representation of the workflow.
  2. Each component is shown as a box with inputs listed on the bottom left side, and outputs on the bottom right.
  3. Connections between inputs and outputs are shown by lines connecting the input/output ports. Components can be dragged or selected, and connections between components can be created by dragging a connector from an output on one component to an input on another.

The menu on the righthand side has two tabs;

  1. the Components tab allows you to add a component from a list of all available components.
  2. The Process tab takes you to the details of a selected component and allows you to view and edit its inputs and outputs.

Components

Components are grouped into multiple categories. A short description of each category is below. More detailed information about each individual component can be found on the product documentation page.

Atlas


Metadata about a dataset in Aunsight is stored in an “Atlas record”. The record structure may vary based on the data type and where it is stored, but in general it has an id, name, description, format, type and location information. Most records also have a schema which describes the structure of the data.

Process


Aunsight supports the execution of custom code via the Process service. This service allows users to register Docker images with Aunsight that can be used to do some custom processing. Once an image is registered it can be launched on demand or as part of a workflow.

Tokamak


Tokamak is the dataflow service, which automates a wide variety of data cleaning and processing tasks.

Momento


Mementos are used to store arbitrary measurements of a certain type within a workflow run, such as number of rows in a dataset. This is typically used for quality assurance.

Resource


The resource component designates a computing or storage resource to be used to run a dataflow.

Utility


Provides functionality within the workflow itself; lightweight data manipulation and process controls.

Organization


The Organization category includes components that allow you to assign the workflow to an organization or project and send notifications to members of an organization at certain points in the workflow.

Rule


A lightweight rules engine geared towards interacting with metadata of other workflow components.

Workflow


The Workflow components allow you to call or run other workflows within a workflow.

Peeper


A Peeper report includes general statistics about the dataset as a whole.

Sightglass


Sightglass provides a framework for displaying well-formed data in a user-friendly way.

Process Details

The process pane on the right side of the screen has three sections;

  1. the top section displays the id, name, and description of the component, with the ability to copy or delete the process,
  2. the second section shows the inputs,
  3. and the third shows the outputs.