DataStage test specification format¶

Structure overview
Given these inputs
Sparse Lookup sources
When these conditions are met
Then expect these outputs

Structure¶

A DataStage® test case specification (often abbreviated ‘Spec') is a JSON-formatted file which uses a grammar modelled loosely on the Gherkin syntax used by the Cucumber testing tool. The overall structure follows the common Gherkin pattern …

{
    "given": [
        { This test data on input link 1 },
        { This test data on input link 2 }
    ],
    "when": {
        I execute the test case with these options and parameter values
    },
    "then": [
        { Expect this data to appear on output link 1 },
        { Expect this data to appear on output link 2 }
    ]
}

Note: The user interface may order the JSON objects alphabetically (given > then > when) but this has no effect on the functionality of the test.

Given¶

The given property array associates test data files with your flow's input , thereby defining the test values you wish to inject into your flow's inputs at runtime.

For example:

{
    "given": [
        {
            "path": "fileCustomers.csv",
            "stage": "sfCustomers",
            "link": "Customers" 
        },
        {
            "path": "fileOrders.csv",
            "stage": "sfOrders",
            "link": "Orders"
        }
    ],
}

Some source stages can be configured with multiple output links so each input in your test specification's given property array is uniquely identified using a combination of the stage and link names to eliminate ambiguity. The array also contains a path property to identify the test data CSV file containing the test data that is to be injected on each incoming link.

Note that not every stage in a job must be provided with test data. You can easily craft a test specification which uses test data for only a subset of flow stages.

Sparse Lookup sources¶

When an input source is used with a Sparse Lookup stage then rather than using the stage property to specify the input you will use the sparseLookup property.

For example:

{
    "given": [
        {
            "path": "fileCustomers.csv",
            "stage": "sfCustomers",
            "link": "Customers" 
        },
        {
            "sparseLookup": "SparseLookup",
            "path": "Database-Reference.csv",
            "key": [
                "KEY_COLUMN_1",
                "KEY_COLUMN_2"
            ]
        }
    ],
}

The sparseLookup property identifies a JSON object which specifies …

the value defining the name of the sparse lookup reference stage,
a path to the relevant CSV test data file, and
a list of key columns to be used for the sparse lookup.

When¶

The when property array specifies which job will be executed during testing as well as any parameters (including job macros) that affect the data produced by the job.

For example, this specification will

Substitute hardcoded values for the DSJobStartDate and DSJobStartTime macros and the paramStartKey parameter:

{
    "when": {
        "data_intg_flow_ref": "3023970f-ba2dfb02bd3a",  
        "parameters": {
            "DSJobStartDate": "2012-01-15",
            "DSJobStartTime": "11:05:01",
            "paramStartKey": "100"
        }
    },
}

One application of the parameters property is to supply values to make flows that rely on system date and time information produce a deterministic output by hard coding those values when testing.

Note that the data_intg_flow_ref property is an internally-generated DataStage reference to the flow with which this test is associated and should not be changed.

Then¶

The then property array associates test data files with your flow's output links.

{
    "then": [
        {
            "path": "ODBC_customers.csv",
            "stage": "ODBC_customer",
            "link": "customer_out"
        },
        {
            "path": "ODBC_orders.csv",
            "stage": "ODBC_order",
            "link": "order_out"
        }
    ],
}

Similar to the given property, because some target stages can be configured with multiple input links the test specification's then property array uniquely identifies links using a combination of the stage and link names. The array also contains a path property to identify the test data CSV file containing the expected test output that will be compared to the actual data appearing on each outgoing link.

Other properties which extend the capabilities of your test case can be included in the then property array:

The ClusterKey property: Improve performance of test cases when using data volumes
The checkRowCountOnly property: Configure your tests to only count the number of rows
The ignore property: Exclude specific columns from test comparisons