DATASTAGE: PROCESSING STAGE

Aggregator Stage

Each of these properties has a dependent property as follows:

· Decimal Output. By default all calculation or recalculation columns have an output type of double. This property allows you to specify that the column has an output type of decimal. You can also specify a precision and scale for they type (by default 8,2).

The Inputs page allows you to specify details about the incoming data set.

The General tab allows you to specify an optional description of the input link. The Partitioning tab allows you to specify how incoming data is partitioned before being grouped and/or summarized. The Columns tab specifies the column definitions of incoming data. The Advanced tab allows you to change the default buffering settings for the input link.

The Partitioning tab allows you to specify details about how the incoming data is partitioned or collected before it is grouped and/or summarized. It also allows you to specify that the data should be sorted before being operated on.

By default the stage partitions in Auto mode. This attempts to work out the best partitioning method depending on execution modes of current and preceding stages and how many nodes are specified in the Configuration file.

If the Aggregator stage is operating in sequential mode, it will first collect the data before writing it to the file using the default round Auto collection method.

The Partitioning tab allows you to override this default behavior.

The exact operation of this tab depends on:

· Whether the Aggregator stage is set to execute in parallel or sequential mode.

· Whether the preceding stage in the job is set to execute in parallel or sequential mode.

If the Aggregator stage is set to execute in parallel, then you can set a partitioning method by selecting from the Partitioning mode drop-down list. This will override any current partitioning (even if the Preserve Partitioning option has been set on the previous stage).

If the Aggregator stage is set to execute in sequential mode, but the preceding stage is executing in parallel, then you can set a collection method from the Collection type drop-down list. This will override the default collection method.

The Partitioning tab also allows you to specify that data arriving on the input link should be sorted before being processed. The sort is always carried out within data partitions. If the stage is partitioning incoming data the sort occurs after the partitioning. If the stage is collecting data, the sort occurs before the collection. The availability of sorting depends on the partitioning or collecting method chosen (it is not available for the default auto modes).

If NLS is enabled an additional button opens a dialog box allowing you to select a locale specifying the collate convention for the sort.

You can also specify sort direction, case sensitivity, whether sorted as ASCII or EBCDIC, and whether null columns will appear first or last for each column. Where you are using a keyed partitioning method, you can also specify whether the column is used as a key for sorting, for partitioning, or for both. Select the column in the Selected list and right-click to invoke the shortcut menu.

DATASTAGE

Labels

About Me

Sunday, March 16, 2008

PROCESSING STAGE

0 comments: