Exercise Overview

In this tutorial, you will learn how to use Aunsight to create a new dataset, and update its schema.

A) Uploading an Existing Delimited File

  1. Download the file at this link to your local computer (“File->Save As”), saving it as iris.psv to an easily accessible location on your computer.
Save iris.psv file to computer
  1. Navigate to the Datasets landing page in Aunsight.
  2. Click the + button to create a new dataset
Create a new dataset by pressing the + button
  1. Update the following fields in the pop-up box:
    • In the Name field, enter Example Iris Dataset.
    • From Format select the psv option.
    • In this situation, leave the default Resource and Row Delimiter.
    • Click the Upload File check box, press Choose File and select the iris.psv file from the location you saved it on your computer. (Technically, you can upload any type of datafile to Aunsight. However, most of the features of the platform require a delimited file and currently the platform supports psv, tsv, csv, and line delimited JSON files.)
  1. Press Submit. You will learn how to submit a schema for this dataset in the following section.
Upload a dataset to Aunsight

B) Manually Add a Schema

  1. Navigate to the Browse tab. You should see the raw data as we have not added a schema to this dataset yet.
Browse data view
  1. Press Schema and select Guided mode. This will bring you to a WebUI variation of schema creation.
  • A) Add a Column for Sepal Length:
    • Click the + next to Base.
    • In the text box enter sepal_length and press enter. This will be the name of the first field/column in the dataset you uploaded.
    • Select number for Type.
Create first field of schema
Add first column name
Set column data type
  • B) Add a Column for Sepal Width:
    • Click the + next to Base.
    • In the text box enter sepal_width and press enter. This will be the name of the second field/column in the dataset you uploaded.
    • Select number for Type.
Add second column to schema
Set type for second column in schema
  • C) Add a Column for Petal Length:
    • Click the + next to Base.
    • In the text box enter petal_length and press enter. This will be the name of the third field/column in the dataset you uploaded.
    • Select number for Type.
Add third column in schema
  • D) Add a Column for Petal Width:
    • Click the + next to Base.
    • In the text box enter petal_width and press enter. This will be the name of the fourth field/column in the dataset you uploaded.
    • Select number for Type.
Add fourth column in schema
  • E) Add a Column for Species:
    • Click the + next to Base.
    • In the text box enter species and press enter. This will be the name of the fifth field/column in the dataset you uploaded.
    • Select integer for Type.
Add fifth column in schema
  1. Press the Browse button. You should now see a nicely formatted dataset with separated columns.
Browse to see updated table utilizing new schema
  1. Click on Schema again. This is the raw JSON which defines the schema for this datafile. Advanced users may choose to compose the JSON themselves. The Fix Formatting button helps make “prettified” JSON in this view.
Raw JSON for schema

C) Copy/Paste Existing Schema

  1. Go to this link and copy the text onto your clipboard (i.e. highlight all and CTRL-C).
Copy schema to clipboard
  1. Navigate to the Datasets landing page in Aunsight
  2. Click the + button to create a new dataset.
Create a new dataset
  1. In the Name field, enter Workflow 101 HDFS Landing Dataset.
  2. From Format select the psv option.
  3. Click the checkbox next to Add Schema and paste the schema from the clipboard into the window.
  4. Click the Submit button. We’ve created a Dataset with no contents. We will use this as a placeholder for a later exercise (Workflow Builder Exercise #1). If you visit the Browse tab, you will correctly receive an error as there is no contents currently in this dataset.
Enter dataset details
Paste copied schema and submit