Rules and V-Fields
Rules
Rules are validations that have been created with the aim of improving data accuracy.
Each submitted file will be evaluated against a set of rules. These rules are generated from both the dataset specifications, as well as rules identified by the Commonwealth and jurisdictions to draw focus to common unusual trends that have been found over the history of the project.
Rules for datasets can be found via the metadata site. Each rule has a set of elements, some of which are used in reporting. These include:
Name - Unique shortname used to identify a rule.
- Class
Anomaly: A field or combination of fields contain data that is likely to be incorrect
Barren: The record is expected to have child records but there are none present. This can occur if the child record exists but has irredeemable errors
Exceptional: Identifies indicators derived from data combinations that are exceptional on statistical (normative) criteria. Exceptional indicators may point to errors with one or more of the component data elements, or be based on correct data
Historical: Information from previous years is used to find changes between years. Examples include: establishments opening, closing or being renamed, significant changes in items that are expected to be stable. The value provided may be correct but should be checked
Inconsistent: There is a logical inconsistency between two fields or derived data items
Invalid: A field contains incorrect data, misformatted or out of Domain
Missing: A field contains no meaningful data. Depending on the entry involved, it may be all spaces, all zeroes, or a Missing value in the Domain (eg. “9”) if applicable to the data-set.
Skeleton: Structural comparisons to the SKL file to check the same set of entities is used, or that there is a statistical match between files
Priority - The priority of rules has been determined by the jurisdictions and Commonwealth to enable users to focus on data issues with the greatest impact on the accuracy of reporting.
Low
Medium
High
Bulk - Simple rules that result in a high number of similar issues, such as spaces being used to indicate missing data rather than the appropriate missing value, are reported in bulk, that is, as a total count of the times the issue exists in the submission file.
Message - Short message that briefly describes the issue. The following list indicates rules for formatting:
$xxx.perc - this extension formats the numbers as percentages
$xxx.commas - this extension formats the numbers with commas
$xxx.dmy and $xxx.ddmmyyyy - these extensions format the numbers as dates
Mark - Indicates on which field or record the error is marked.
Description - Detailed description of the issue.
SQL - Outlines the SQL implementation of the rule.
VFields
Some rules use Virtual Elements (VFields): fields that have not been directly supplied in your data, instead they are calculated from a variety of fields in the submitted data file. VFields and their SQL can be found via the metadata site.
Name - Unique shortname used to identify a V-Field.
Base - Indicates on which record type the calculation is based.
Title - Descriptive title of VField.
SQL - Outlines the SQL implementation of the virtual field calculation.