About Me

Prospects, This is the Only place Where YOu can Find All Interview questions,Faqs and Real Stuff and scenario & Scripts with Resumes, Stick to It for Updates,,,,

Sunday, March 16, 2008

DIFFERENCE STAGE


>>PREVIOUS>>


DIFFERENCE STAGE



The Difference stage is a processing stage. It performs a record-by-record comparison of two input data sets, which are different versions of the same data set designated the before and after data sets. An example before and after data set are given in Parallel Job Developer's Guide.. Follow this link for a list of steps you must take when deploying a Difference stage in your job.

The Difference stage outputs a single data set whose records represent the difference between them. The stage assumes that the input data sets have been key-partitioned and sorted in ascending order on the key columns you specify for the Difference stage comparison. You can achieve this by using the Sort stage or by using the built in sorting and partitioning abilities of the Difference stage.

The comparison is performed based on a set of difference key columns. Two records are copies of one another if they have the same value for all difference keys. You can also optionally specify change values. If two records have identical key columns, you can compare the value columns to see if one is an edited copy of the other.

The stage generates an extra column, DiffCode, which indicates the result of each record comparison.

The Difference stage is similar, but not identical, to the Change Capture stage. The Change Capture stage is intended to be used in conjunction with the Change Apply stage; it produces a change data set which contains changes that need to be applied to the before data set to turn it into the after data set. The Difference stage outputs the before and after rows to the output data set, plus a code indicating if there are differences. Usually, the before and after data will have the same column names, in which case the after data set effectively overwrites the before data set and so you only see one set of columns in the output. You are warned that DataStage is doing this. If your before and after data sets have different column names, columns from both data sets are output; note that any key and value columns must have the same name.

The stage generates an extra column, Diff, which indicates the result of each record comparison.


PROPERTIES

DIFFERENCE KEY CATEGORIES

Key. Specifies the name of a difference key input column. This property can be repeated to specify multiple difference key input columns. You can use the Column Selection dialog box to select several columns at once if required.

Key has this dependent property:

Case Sensitive. Use this to property to specify whether each key is case sensitive or not. It is set to True by default; for example, the values “CASE” and “case” would not be judged equivalent. This property is only available if the All non-Key columns are values property is set to True.

Difference Values Category

All non-Key Columns are Values. Set this to True to indicate that any columns not designated as difference key columns are value columns. It is False by default. The property has this dependent property:

Case Sensitive. Use this to property to specify whether each value is case sensitive or not. It is set to True by default; for example, the values “CASE” and “case” would not be judged equivalent.

Options Category

Tolerate Unsorted Inputs. Specifies that the input data sets are not sorted. This property allows you to process groups of records that may be arranged by the difference key columns but not sorted. The stage processed the input records in the order in which they appear on its input. It is False by default.

Log Statistics. This property configures the stage to display result information containing the number of input records and the number of copy, delete, edit, and insert records. It is False by default.

Drop Output for Insert. Specifies to drop (not generate) an output record for an insert result. By default, an output record is always created by the stage.

Drop Output for Delete. Specifies to drop (not generate) the output record for a delete result . By default, an output record is always created by the stage.

Drop Output for Edit. Specifies to drop (not generate) the output record for an edit result . By default, an output record is always created by the stage.

Drop Output for Copy. Specifies to drop (not generate) the output record for a copy result . By default, an output record is always created by the stage.

Copy Code. Allows you to specify an alternative value for the code that indicates the after record is a copy of the before record. By default this code is 0.

Deleted Code. Allows you to specify an alternative value for the code that indicates that a record in the before set has been deleted from the after set. By default this code is 2.

Edit Code. Allows you to specify an alternative value for the code that indicates the after record is an edited version of the before record. By default this code is 3.

Insert Code. Allows you to specify an alternative value for the code that indicates a new record has been inserted in the after set that did not exist in the before set. By default this code is 1.





>>NEXT>>

1 comments:

Saucy said...

Your codes for the Differernce Stage are not accurate, the code numbers you have are for the Change Capture Stage, where the Difference Stage uses different codes.