IFSM 330 Candy Assignment
Your job is to use SQL to perform an ETL which will accomplish the following:
1. Start with the skeleton starter script we give you, attached to this assignment. Modify the CREATE TABLE command so the schema is as follows:
Computer code
Notes about what you need to do
DROP TABLE IF EXISTS stagingTable;
Leave this code alone – it ensures you a fresh start
CREATE TABLE stagingTable (
yearInt INT(4),
monthInt INT(2),
–there will be more you need to fill in here
);
Right now this creates stagingTable with only two fields, the yearInt and monthInt. Modify the code where highlighted in yellow to correspond to the schema below.
And of course, when you’re done coding, remove the ‘– there will be more you need to fill in here’ comment J
2. Get the 2017 bit of the script working.
Computer code
Notes about what you need to do
–Insert 2017 Data
This is a comment telling you the 2017 data is going to be inserted here
INSERT INTO stagingTable(“monthInt”, “state”, “country”, “region”, “Product_Name”, “unitPrice” –you need to fill this in here)
After you have created stagingTable, this is the first half of a command that will insert the data into stagingTable. You need to replace the yellow highlighted material with your own code to complete it.
SELECT “Month”, “State”, “Country”, “Region”, “Product”, “Per-Unit_price”, “Quantity”, “Order_Total” FROM pd2017
;
This is the second half of a command that will insert data into stagingTable.
You don’t need to change anything here.
UPDATE stagingTable SET yearInt=2017;
This sets the year to 2017 for this data.
You don’t need to change this.
3. Get the 2018 part of the script working.
Computer code
Notes about what you need to do
–Insert 2018 Data
Comment.
INSERT INTO stagingTable(–you need to fill this in here)
This is the first half of a command that will insert the data into stagingTable. You need to replace the yellow highlighted material with your own code to complete it. See the rules below for more details.
SELECT … FROM pd2018
;
This is the second half of a command that will insert data into stagingTable.
You need to replace the yellow highlighted material.
UPDATE stagingTable SET yearInt=2018 WHERE yearInt ISNULL;
This sets the year to 2018 for any new entries which don’t yet have a year.
You don’t need to change this.
4. Get the 2019 part of the script working.
Computer code
Notes about what you need to do
–Insert 2019 Data
Comment.
INSERT INTO stagingTable(–you need to fill this in here)
You need to replace the yellow highlighted material with your own code to complete it. See the rules below for more details.
SELECT … FROM pd2019
;
You need to replace the yellow highlighted material.
UPDATE stagingTable …;
You need to replace the yellow highlighted material.
5. The script will load it into one final table and call it stagingTable
6. Run the checksum script to verify you have the stagingTable calculated correctly.
7. Export your final output table under the name “XX_output_final.csv” where XX are your initials. To export this, you can just use the Export button on the SQLlite menu (it’s right next to the Import button.)
You should do this all in SQLlite. You should not export to Excel and do your manipulations in Excel.
Part C: 2017 Data Notes
Your order 2017 data is contained in the attached file, “2017_product_data_students.csv” and you should have imported it as “pd2017.” A sample of this file’s type of data is contained below in Table 1 Sample of order data from 2017. (Note your file may or may not have the same data in it.)
Your field definitions follow:
· Month: integer, corresponds to the month of the sale. For example, 5 = May.
· Country: text, should all be USA. (All data in this exercise should be USA.)
· Region: text, represents the regions within the country.
· State: text, USPS state abbreviations. Each state is within one region.
· Product: text. This is the name of a packaged food product.
· Per-unit price: integer. This represents the per-unit price in cents; for example, 300 indicates that Orange Creepies sell for $3.00 per package. (For the purposes of this exercise, disregard all currency formatting and just use 300 to represent $3.00.)
· Quantity: integer. This represents how many items were in that particular order. The first order here was for 49 packages of Orange Creepies.
· Order Total: integer. This is the per-unit price x the quantity. The first line here indicates that 300 x 49 = 14700 (or $147.00) was the price of the first order.
Table 1 Sample of order data from 2017
Month
Country
Region
State
Product
Per-Unit Price
Quantity
Order Total
0
7
USA
West
CA