Dataflow Builder - Overview
Overview
The Dataflow Builder in Aunsight allows users to easily organize, transform, and format data to be used for reporting, BI, or advanced analytics. The Dataflow Builder abstracts and visualizes complex data operations, making it easier to prepare data and automate the process as data is added or updated. This allows the user to leverage powerful computing engines like Spark and Map/Reduce with an easy to use drag-and-drop interface rather than requiring 20 years of Java experience to perform basic operations. This builder combines popular operations such as table joining, deduplication, arithmetic, aggregations, and string/date manipulations all into a single place—streamlining the number of tools you need to get the job done.
What you can do with Dataflow Builder?
Fix Data
Format Data
Filter Data
Integrate Data Sources
Build Features
Main Page
When a dataflow is created or an existing dataflow is selected from the list on the left side of the screen, the main page will display on the right. From this page, you can view general information about the dataflow and its history, duplicate or delete the dataflow, or set watch notifications.
- The Details tab contains the general information about the dataflow; its name and ID, creation and last updated dates, tags, inputs and outputs, and context information.
- The Versions tab contains a record of previous versions and gives you the ability to view, delete, or revert to a previous version.
- The Jobs tab lists the previous runs of the dataflow with the current state (whether it is in progress, completed, or failed.)
- The Run tab allows you to configure and submit a dataflow job.
Modify Dataflow
After clicking Modify from the main page, you can view or edit the details of the dataflow itself.
- The middle portion of the page is a visual representation of the dataflow. The dataflow runs from top to bottom, and each operation is displayed as a box with the resultant dataset directly underneath. Connections between inputs and outputs are shown by arrows connecting the operations.
- The menu on the right-hand side has two tabs; the Operations tab allows you to add an operation from a list of all available operations. Operations are grouped by category and there is also a search function. The Details tab takes you to the details of a selected operation and allows you to view or edit its title and description, arguments, inputs, and outputs.
- The left side of the screen has three tabs. The first tab shows all the datasets linked in the dataflow, and allows you to import new datasets. The second tab shows the schema of the selected dataset. The last tab is a search feature.
Operations
Operations are grouped into multiple categories. A short description of each category is below.
More detailed information about each individual operations can be found in the Aunsight documentation.
Dataset
These operations are related to retrieving, renaming, and storing the datasets themselves.
Join
These operations bring multiple datasets together, whether through lookup, outer joins, or cartesian crossing.
Group
These operations combine individual rows of a dataset together into a collection based on an aggregate field or fields.
Collection
These operations allow you to interact with aggregations across grouped rows of data. You can compute sums or means or get a field based on a selected sorting method.
Field
These operations manipulate the columns or fields of a dataset, including the ability to add, select, convert, rearrange, or remove fields from the dataset as required.
Row
These operators alter a dataset at the record level to create, explode, filter, pivot, sort, or append rows of data.
Operation Details
The detail pane on the right side of the screen displays the following information;
- the name of the operation,
- Input(s),
- Argument(s),
- Output(s)
Expression Builder
The Expression Builder allows users to write Pig expressions through the interface via drop-down lists.