To open the Examples.xlsx workbook, click Help – Example Models, click Forecasting/Data Science Examples, and open the dataset, Examples.
This workbook contains six worksheets containing small sample datasets. The Example 1 dataset contains empty cells (cells B6 and D10), cells containing invalid formulas (B13, C6, & C8), cells containing non numeric characters (D13), etc. Analytic Solver Data Science will treat each of these as missing values.
Select a cell within the data set, then on the Data Science ribbon, select Transform - Missing Data Handling ;to open the Missing Data Handling dialog. Confirm that "Example 1" is displayed for Worksheet.
Click OK. The results of the data transformation are inserted into the Imputation worksheet. Since no treatment was specified for any of the variables, none of the missing or invalid values were replaced.
If “Overwrite existing worksheet” is selected in the Missing Data Handling dialog, Analytic Solver will overwrite the existing data with the treatment option specified. Note: You must save the workbook in order for these changes to be saved.
The Example 2 dataset is similar to the Example 1 dataset in that this dataset contains empty cells (cells B6 and D10), cells containing invalid formulas (B13, C8 & D4), cells containing non numeric characters (column C), etc. In this example we will see how the missing values in the Variable_1 and Variable_3 columns can be replaced by the Mean and Median, respectively.
Select a cell on the Example 2 worksheet, say A2, then click Transform - Missing Data Handling to open the Missing Data Handling dialog. Confirm that Example 2 is displayed for Worksheet, at the top of the dialog.
In the Variable column, select Variable_1, then under How do you want to handle missing values for the selected variable(s), click the down arrow at Select treatment, and select Mean.
Click Apply to selected variable(s). The Missing Data Handling dialog displays Mean under Treatment for Variable_1.
From the Variable column, select Variable_3, and under How do you want to handle missing values for the selected variable(s), click the down arrow, and select Median. Click Apply to selected variable(s).
Click OK to transform the data. See the newly inserted Imputation1 worksheet for the results, shown below.
In the Variable_1 column, invalid or missing values have been replaced with the mean calculated from the remaining values in the column (12.34, 34, 44, -433, 43, 34, 6743, 3, 4, 3). In the Variable_3 column, the cells containing missing values or invalid values are replaced by the median of the remaining values in that column (12, 33, 44, 66, 33, 66, 22, 88, 55, 79). The invalid data for Variable_2 remains, since no treatment was selected for this variable.
Select the Example 3 worksheet. In this data set, Variable_3 has been replaced with date values.
Open the Missing Data Handling dialog. Confirm that Example 3 is displayed for Worksheet. In this example, we will replace the missing / invalid values for Variable_2 and Variable_3 with the mode of each column.
Select Variable_2, then under How do you want to handle missing values for the selected variable(s), click the down arrow next to Select treatment, and select Mode. (The options Mean and Median do not appear in the list since Variable_2 contains non-numeric values.) Click Apply to selected variable(s). Repeat these steps for Variable_3, then click OK.
The results can be found in the newly created Imputation2 worksheet.
The missing values in the Variable_2 column have been replaced by the mode of the valid values (dd) even though, in this instance, the data is non-numeric. (Remember, the mode is the most frequently occurring value in the Variable_2 column.)
In the Variable_3 column, the third and ninth records contained missing values. As you can see, they have been replaced by the mode for that column, 2 – Feb – 01.
The Example 4 dataset again contains missing and invalid data for all three variables: missing data in cells B6 and D10 and Excel errors in cells B13, C6, and C8. In this example, we will demonstrate Analytic Solver Data Science’s ability to replace missing values with User Specified Values.
Open the Missing Data Handling dialog. Confirm that Example 4 is displayed for Worksheet.
Select Variable_1, then click the down arrow next to Select treatment under How do you want to handle missing values for the selected variable(s), then select User specified value. In the field that appears directly to the right of User specified value, enter 100, then click Apply to selected variable(s). Repeat these steps for Variable_2. Then click OK.
The results from the Imputation3 worksheet are shown below.
The missing values for Variable_1 in records 5 and 12 and in records 5 and 7 for Variable_2, have been replaced by 100 while the empty cells for Variable_3 remain untouched.
In the Example 5 dataset, the value -999 appears in all three columns. This example will illustrate Analytic Solver Data Science’s ability to detect a given value and replace that value with a user specified value.
Open the Missing Data Handling dialog. Confirm that Example 5 is displayed for Worksheet.
Select Missing values are represented by this value and enter -999 in the field that appears directly to the right of the option.
Select Variable_1 in the Variables field, click the down arrow next to Select Treatment and choose Mean from the menu, then click Apply to selected variable(s).
Select Variable_2 in the Variables field, click the down arrow next to No Treatment and choose User specified value from the menu. Enter “zzz” for the value then click Apply to selected variable(s).
Finally, select Variable_3 in the Variables field, click the down arrow next to User specified value and choose Mode from the menu. Click Apply to selected variable(s).
Click OK to transform the data. Select the Imputation4 worksheet to view.
Note that in the Variable_1 column, the specified missing code (-999) was replaced by the mean of the column (in record 12). In the Variable_2 column, the missing values have been replaced by the user specified value of “zzz” in records 5 and 7, and for variable_3, by the mode of the column in record 9.
Let’s take a look at one more dataset, Example 6, of Examples.xlsx.
Open the Missing Data Handling dialog, confirm that Example 6 is displayed for Worksheet or Data Source within the Data Source group, then apply the following procedures to the indicated columns.
Select Missing values are represented by this value and enter 33 in the field that appears directly to the right of the option.
Select Variable_1, select Delete record for How do you want to handle missing values for the selected variable(s)?, then click Apply to selected variable(s).
Select Variable_2, select Mode for How do you want to handle missing values for the selected variable(s)?, then click Apply to selected variable(s).
Select Variable_3, select User specified value for How do you want to handle missing values for the selected variable(s)?, enter 9999, then click Apply to selected variable(s).
Click OK to transform the data.
See the output in the Imputation5 worksheet.
Records 7 and 12 have been deleted since Delete record was chosen for the treatment of missing values for Variable_1. In the Variable_2 column, the missing values in records 2 and 11 have been replaced by the mode of the column, "dd". Note: Delete record holds priority over any other instruction in the Missing Data Handling feature.
In the Variable_3 column (renamed to Treated_Variable_3), Analytic Solver Data Science treated 33 as a missing value. As a result, the value of 33 in rows 1, 2, 3, 6, and 9 were replaced by the user specified value of 9999.
Note: The value for Variable_3 for record 12 was 33, which should have been replaced by 9999; however, since Variable_1 contained a missing value for this record, the Delete record instruction was executed first.