See below for an explanation of each option as displayed on the Import From File System dialog.
Note: Analytic Solver Data Science currently only supports the import of delimited text files. A delimited text file is one in which data values are separated by a character such as a quotation mark, comma, or tab. These characters define a beginning and ending of a string of text.
Directory
Click Browse to navigate to the directory that contains the collection of text documents.
Files
The files contained within the file folder selected for Directory will appear here. Click the > command button to move individual files or the >> button to move the entire collection to the Selected Files list.
Selected Files
The text files listed here have been selected for import or sampling.
Import selected files
Select this option to import the selected text files.
Sample from selected files
Select this option to choose a randomly selected sample from the collection of text documents according to the options selected within the Sampling Options section.
Sample With replacement
If this option is checked, the text files will be sampled with replacement. The default is Sampling without Replacement. With Sampling with replacement selected, text documents chosen during sampling will not be removed from the collection.
Desired sample size
Enter a value for the desired sample size. This value determines the number of text documents to be included in the sample. The default value is half of the number of documents listed in the Selected Files list.
Simple random sampling
If this option is selected, a simple random sample (size n) is chosen from the documents in the Selected Files list such that every random set of n items from the population has an equal chance of being chosen to be included in the sample. Thus simple random sampling not only avoids bias in the choice of individual items, but also gives every possible document an equal chance of being selected. This option is selected by default when Sample from selected files is enabled.
Set Seed
This option initializes the random number generator. Setting the random number seed to a non-zero value, ensures that the same sequence of random numbers is used each time the sample of documents is selected. The default value is 12345. When the seed is zero, the random number generator is initialized from the system clock, so the sequence of documents selected will be different each time a sample is taken. If the results from successive runs needs to be comparable, select Set seed or enter a number into the box. This option is selected by default when Sample from selected files is enabled. This option accepts positive integers with up to nine digits.
Output
If Write file paths is selected, pointers to the file locations are stored on the FileSampling worksheet. If Write file contents is selected, the content of each text document will be written to a cell on the FileSampling worksheet, up to a maximum of 32,767 characters.