Dataset Creation

Upload Existing Delimited File
Manually Add a Schema
Copy/Paste Existing Schema

Exercise Overview

In this tutorial, you will learn how to use Aunsight to create a new dataset, and update its schema.

A) Uploading an Existing Delimited File

Download the file at this link to your local computer (“File->Save As”), saving it as iris.psv to an easily accessible location on your computer.

Navigate to the Datasets landing page in Aunsight.
Click the + button to create a new dataset

Update the following fields in the pop-up box:

- In the Name field, enter Example Iris Dataset.
- From Format select the psv option.
- In this situation, leave the default Resource and Row Delimiter.
- Click the Upload File check box, press Choose File and select the iris.psv file from the location you saved it on your computer. (Technically, you can upload any type of datafile to Aunsight. However, most of the features of the platform require a delimited file and currently the platform supports psv, tsv, csv, and line delimited JSON files.)

Press Submit. You will learn how to submit a schema for this dataset in the following section.

B) Manually Add a Schema

Navigate to the Browse tab. You should see the raw data as we have not added a schema to this dataset yet.

Press Schema and select Guided mode. This will bring you to a WebUI variation of schema creation.

A) Add a Column for Sepal Length:
- Click the + next to Base.
- In the text box enter sepal_length and press enter. This will be the name of the first field/column in the dataset you uploaded.
- Select number for Type.

B) Add a Column for Sepal Width:
- Click the + next to Base.
- In the text box enter sepal_width and press enter. This will be the name of the second field/column in the dataset you uploaded.
- Select number for Type.

C) Add a Column for Petal Length:
- Click the + next to Base.
- In the text box enter petal_length and press enter. This will be the name of the third field/column in the dataset you uploaded.
- Select number for Type.

D) Add a Column for Petal Width:
- Click the + next to Base.
- In the text box enter petal_width and press enter. This will be the name of the fourth field/column in the dataset you uploaded.
- Select number for Type.

E) Add a Column for Species:
- Click the + next to Base.
- In the text box enter species and press enter. This will be the name of the fifth field/column in the dataset you uploaded.
- Select integer for Type.

Press the Browse button. You should now see a nicely formatted dataset with separated columns.

Click on Schema again. This is the raw JSON which defines the schema for this datafile. Advanced users may choose to compose the JSON themselves. The Fix Formatting button helps make “prettified” JSON in this view.

C) Copy/Paste Existing Schema

Go to this link and copy the text onto your clipboard (i.e. highlight all and CTRL-C).

Navigate to the Datasets landing page in Aunsight
Click the + button to create a new dataset.

In the Name field, enter Workflow 101 HDFS Landing Dataset.
From Format select the psv option.
Click the checkbox next to Add Schema and paste the schema from the clipboard into the window.
Click the Submit button. We’ve created a Dataset with no contents. We will use this as a placeholder for a later exercise (Workflow Builder Exercise #1). If you visit the Browse tab, you will correctly receive an error as there is no contents currently in this dataset.

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot. According to their documentation, whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
uncode_privacy[consent_types]	1 year	This cookie is set by Uncode WordPress theme and is used to manage privacy settings on the website.
uncodeAI.css	session	This cookie is set by Uncode WordPress theme to run the Adaptive Images system. According to their documentation, these cookies contain runtime information about the viewport and screen resolution, these data are used on any page refresh to calculate the correct Adaptive Images. No personal information are stored within these cookies.
uncodeAI.images	session	This cookie is set by Uncode WordPress theme to run the Adaptive Images system. According to their documentation, these cookies contain runtime information about the viewport and screen resolution, these data are used on any page refresh to calculate the correct Adaptive Images. No personal information are stored within these cookies.
uncodeAI.screen	session	This cookie is set by Uncode WordPress theme to run the Adaptive Images system. According to their documentation, these cookies contain runtime information about the viewport and screen resolution, these data are used on any page refresh to calculate the correct Adaptive Images. No personal information are stored within these cookies.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__hssc	30 minutes	This cookie is set by HubSpot. The purpose of the cookie is to keep track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.
aka_debug		This cookie is set by the provider Vimeo.This cookie is essential for the website to play video functionality. The cookie collects statistical information like how many times the video is displayed and what settings are used for playback.
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
player	1 year	This cookie is used by Vimeo. This cookie is used to save the user's preferences when playing embedded videos from Vimeo.

Cookie	Duration	Description
__hstc	1 year 24 days	This cookie is set by Hubspot and is used for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_28928707_1	1 minute	This cookie is set by Google and is used to distinguish users.
_gcl_au	3 months	This cookie is used by Google Analytics to understand user interaction with the website.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
hubspotutk	1 year 24 days	This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_dc_gtm_UA-28928707-1	1 minute	No description
AnalyticsSyncHistory	1 month	No description
CONSENT	16 years 7 months	No description
li_gc	2 years	No description
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

Dataset Creation

Table of Contents

Exercise Overview

A) Uploading an Existing Delimited File

B) Manually Add a Schema

C) Copy/Paste Existing Schema

Questions? Let's get them answered.

Questions?Let's get them answered.