Data and Implementation Specifications
Data Requirements
General Requirements
When creating exports or queries for data to be delivered to Othot, consider the following:
- The data must represent all information about both the positive and negative outcomes (e.g., prospects that enrolled and prospects that did not, or students that retained and students that did not).
- Data that is recorded at the end of the cycle and is only for records who achieve the final outcome (e.g., enroll or retain) or highly correlated to the final outcome will not be included in modeling, as it's considered a “leak” variable.
- To allow for proper review, display, and use of variables:
- All provided data must be identified in the Data Mapping document provided in the onboarding phase (or in supplementary data dictionary documents provided by the client).
- Any provided variables that are coded (e.g., 1=YES/0=NO, R=RESIDENT/C=COMMUTER, etc.) must be accompanied with a description or decoded version in a supplemental document.
Data Format Requirements
The following guidelines must be adhered to for all files provided to us. A lack of adherence to these guidelines may increase data configuration time and prevent the processing of files for regular updates.
Item |
Guidelines |
---|---|
File Format |
|
Headers |
The first row of the file must contain column headers. These headers cannot include special characters or line terminators (CRLF/CR/LF). Example header formats are as follows:
|
Delimiters |
|
Encoding |
Accepted encoding includes:
|
Line Terminators |
|
Date Formats |
Accepted date formats include:
|
Required Fields
The following fields are required for all records:
Field |
Description & Specifications |
---|---|
Unique Identifier |
This field represents a unique student record. Generally, it's a student ID (numeric or alpha-numeric) that is assigned to the individual when they enter the population. This ID must not change throughout the cycle. |
Admission Term or Year |
This field represents the term or year in which the record is part of the population for the intended outcome. It's used internally to split data into train or predict sets, as well as to evaluate data consistency and model performance year-over-year. |
Target Variable |
This field represents the target or desired outcome for model training and predicting (e.g., enrollment status/retention status). It can be either a date on which the record achieved the desired outcome or a YES/NO field which indicates whether the record achieved the desired outcome. Note: this field is generally representative of the final outcome at the institution’s “census day.” |
Lifecycle Indicators |
Date fields must be provided for each lifecycle phase or step in the process. For example, standard lifecycles for enrollment might include:
Note: standard lifecycles for retention are dependent on the High Impact Question (HIQ) selected. |
Terminal Node Indicators |
Date fields must be provided for each terminal node in the process (i.e., the points at which the individual exits the process). For example, terminal nodes for enrollment might include:
|
Data Requirements for Platform Functionality
The following requirements are necessary to enable specific functionality:
Functionality |
Requirements |
---|---|
Predictions by Specific Populations |
To view predictions by these breakdowns, the data must be provided as a field in the data. It's important you inform us about any groups of records that move through the process differently than others, or any populations that are treated differently during the process. For example:
|
Additional Predictions |
Common use cases for additional predictions include:
Note that to enable additional predictions, you must provide data for each record that indicates whether they meet the criteria. For example, the “likelihood to be retained” must have an additional field that indicates whether an enrolled student was retained. Furthermore, additional predictions only provide you with the likelihood score. To perform What-If analyses or see top impacts about the additional prediction, you must create a new HIQ. |
What-If Variables |
What-If variables represent variables that can be “controlled” or “influenced” by your institution. For a variable to be used in a What-If analysis, it cannot be a "leak" variable and must be:
|
Best Practices for Data Collection
When providing data to us, consider these best practices:
Category |
Best Practices |
---|---|
Multiple File Merge |
|
Visit and Event Data |
|
Interaction Data |
|
Financial Aid Data |
|
Test Score Data |
|