Rescaling Options

See below for an explanation of options on all three tabs of the Rescaling dialog: Data, Parameters and Transformation tabs. 

The following options appear on all three tabs of the Rescaling dialog.

The following options appear on all three tabs of the Rescaling dialog.

Help:  Click the Help button to access documentation on all Rescaling options. 

Cancel:  Click the Cancel button to close the dialog without running Rescaling. 

Next:  Click the Next button to advance to the next tab.

Finish:  Click Finish to accept all option settings on all three dialogs, and run Rescaling. 

Rescaling Data Tab

See below for documentation for all options appearing on the Rescaling tab. 

Rescaling Data tab

Rescaling Data tab

Data Source

Worksheet: Click the down arrow to select the desired worksheet where the dataset is contained. 

Workbook:  Click the down arrow to select the desired workbook where the dataset is contained.

Data range:  Select or enter the desired data range within the dataset.   This data range may either be a portion of the dataset or the complete dataset.   

#Columns:  Displays the number of columns in the data range. This option is read only. 

#Rows In Training Set, Validation Set and Test Set:   Displays the number of columns in training, validation and/or test partitions, if they exist. This option is read only. 

Variables

First Row Contains Headers:  Select this checkbox if the first row in the dataset contains column headings. 

Variables:  This field contains the list of the variables, or features, included in the data range. 

Selected Variables: This field contains the list of variables, or features, to be included in Rescaling.

  • To include a variable in Rescaling, select the variable in the Variables list, then click > to move the variable to the Selected Variables list. 
  • To remove a variable as a selected variable, click the variable in the Selected Variables list, then click < to move the variable back to the Variables list. 

Rescaling Parameters Tab

See below for documentation for all options appearing on the Rescaling Parameters tab. 

Rescaling Parameters tab

Rescaling Parameters tab

Preprocessing

Analytic Solver Data Science allows partitioning to be performed on the Parameters tab for k – Means Clustering, if the active data set is un-partitioned.  If the active data set has already been partitioned, this button will be disabled.  Clicking the Partition Data button opens the following dialog.  Select Partition Data on the dialog to enable the partitioning options.   See the Partitioning chapter for descriptions of each Partitioning option shown in the dialog below. 

Partitioning "On-the-fly" dialog

Partitioning "On-the-fly" dialog

Why use “on-the-fly” Partitioning?

If a data partition will be used to train and validate several different algorithms that will be compared for predictive power, it may be better to use the Ribbon Partition choices to create, rescale and/or partition the dataset.  But if the rescaled data and/or data partition will be used with a single algorithm, or if it isn’t crucial to compare algorithms on exactly the same data, “Partition-on-the-Fly”  and “Rescale-on-the-fly” offers several advantages:

  • User interface steps are saved, and the Analytic Solver task pane is not cluttered with partition and rescaling output.
  • Partition-on-the-fly and Rescaling-on-the-fly is much faster than first rescaling the data, creating a standard partition and then running an algorithm.
  • Partition-on-the-fly and Rescaling-on-the-fly can handle larger datasets without exhausting memory, since the intermediate partition results for the partitioned data is never created.

Rescaling:  Fitting

Use Rescaling to normalize one or more features in your data. Many Data Science workflows include feature scaling/normalization during the data preprocessing stage. Along with this general-purpose facility, you can access rescaling functionality directly from the dialogs for Supervised Algorithms available in Analytic Solver Data Science application.

Analytic Solver Data Science provides the following methods for feature scaling:  Standardization, Normalization, Adjusted Normalization and Unit Norm. 

  • Standardization makes the feature values have zero mean and unit variance.  (x−mean)/std.dev.
  • Normalization scales the data values to the [0,1] range.  (x−min)/(max−min). The Correction option specifies a small positive number ε that is applied as a correction to the formula. The corrected formula is widely used in Neural Networks when Logistic Sigmoid function is used to activate the neurons in hidden layers – it ensures that the data values never reach the asymptotic limits of the activation function. The corrected formula is [x−(min−ε)]/[(max+ε)−(min−ε)].
  • Adjusted Normalization scales the data values to the [-1,1] range. [2(x−min)/(max−min)]−1. The Correction option specifies a small positive number ε that is applied as a correction to the formula. The corrected formula is widely used in Neural Networks when Hyperbolic Tangent function is used to activate the neurons in hidden layers – it ensures that the data values never reach the asymptotic limits of the activation function. The corrected formula is {2[(x−(min−ε))/((max+ε)−(min−ε))]}−1.
  • Unit Normalization is another frequently used method to scale the data such that the feature vector has a unit length. This usually means dividing each value by the Euclidean length (L2-norm) of the vector. In some applications, it can be more practical to use the Manhattan Distance (L1-norm).

Show Fitted Statistics

Select Fitted Statistics to include in the Rescaler output.  Shift and Scale values are inferred from the training data. Each formula in the data table can be rearranged into the form (x-shift)/scale. Then other partitions/new data is rescaled using the statistics of data features in the training set.

Transformation Parameters Tab

See below for documentation for all options appearing on the Rescaling Transformation tab. 

Rescaling Transformation tab

Rescaler Dialog, Transformation Tab

See below for explanations of options located on the Rescaler - Transformation dialog.

Partitioned Data

  • Select Training to apply the Rescaler method to the Training Partition.
  • Select Validation to apply the Rescaler method to the Validation Partition, if one exists.
  • Select Testing to apply the Rescaler method to the Test Partition, if one exists.

New Data

See the Scoring New Data for more information on scoring new data within a worksheet or database.