When you are planning on using DataStage, you will need to know about the various join stages that you will be able to use. The join stages include the Join, Modify, and Merge stages. Each of these stages has its own importance when it comes to the overall functionality of the program.
The DataStage Joins stages
The DataStage join stages are a group of stages that perform join operations between two or more data sets. Each stage performs different join operations. This allows for improved performance and performance results. Depending on the type of job, the stages can be grouped together into different groups.
In this tutorial, you’ll learn how to create a DataStage job. Once you’ve created a job, you can see a preview of what the job will look like, along with stages to represent each input and output. You can also use this tool to validate the data set and the transformations performed.
First, you must specify the source. To do so, go to the Data source location page and verify the name of your database. After you confirm, you can click the load button to populate the connection information.
Then, you can choose which data sets you’d like to input. You can select tables from ASN schema or a sequential or indexed file.
You can also pre-process the data. For example, you can drop duplicate records from your update data sets. Also, you can remove all master rows that do not match the master row. If you’re using the Auto partition method, the data will be sorted automatically.
When you’re ready to compile the job, you can choose to start the job. By selecting MIN_SYNCHPOINT, the job will know which rows to extract. During compilation, the job is checked to ensure the inputs and transformations are correct.
There are five jobs to choose from in the Compilation Wizard. These are the Join, Merge, Combine, Modify and Filter.
Each stage can output to more than one stage. Typically, there is one input and one output. However, some stages allow you to accept multiple inputs.
The Join stage in DataStage is used for matching Oracle table extracts against another table extract. This stage is similar to the Lookup stage, but is faster. It uses a key column to sort the input data sets by ascending order.
The JOIN stage is the most appropriate for larger datasets. For smaller datasets, the Lookup stage is more appropriate.
A lookup is the process of searching for records using the range of a column. In a DataStage job, it is possible to define several different types of lookups, from Sparse lookup to range lookup.
If you are interested in the range lookup, you should know that you need a contiguous block of physical memory. Also, you need to write a SQL statement.
You can use the DB2 Control Center to configure stages. This will allow you to specify the types of sorting and partitioning that will be implemented.
You can also verify the performance of your job by setting the OSH_PRINT_SCHEMAS environment variable. This will ensure that your run-time schemas match your job design.
You can also check the link ordering. This is a type of lookup that will allow you to specify which update links will send rejected rows to which reject links.
Alternatively, you can use the GetNARowLkpKeys routine to validate the field values of Not Available row. These routines are only available on Linux and Windows.
You can also use the Sort Funnel to combine your input records. This is a less expensive way to do the same thing. But it does not guarantee that you will get the same sorting order.
Another option is the Continuous Funnel, which takes one record from each input link. While this is not the most efficient method, it can be used if you have limited resources.
Lookup stage is a special processing stage in DataStage. It is used for reading data from other Parallel job stages. The stage has its own user interface.
The most common use for lookup is to map short codes in input data to expanded information. There are several methods for doing this. Depending on the lookup table, a code can be a key column. Similarly, a state code can be added to a new column defined for the output link.
When creating a lookup table, you should ensure that your data fits into a physical memory. This will help maximize performance.
Sorting the data will also improve performance. Using the appropriate sorting keys is a must. You can sort the data by either case or range.
A join is the standard operation for merging data from two datasets. However, there are many different types of joins. Some are more efficient than others. They all have their own selection criteria.
Another method is the sequential stage. This process converts columns into rows. In this stage, records are copied in order from the first input data set to the output data set.
To make the most of the lookup stage, you should be aware of the differences between the various types of joins. Besides the obvious, you can choose between a single reject link, multiple reject links, or a continuous funnel.
For the lookup stage, you should consider the following factors: the size of your data set, the type of lookup, and the number of references you need to consider. If you have a very large reference database, a join is probably not the best choice. Also, a join will likely require more memory than a normal lkp.
The Merge stage of DataStage is a lot more than a copy of your master data set to an output data set. It has an array of optional features that can be used to customize your workflow.
In addition to the usual suspects, you can choose to include a single output link or multiple. You can choose to skip some of the less-than-optimal input links to get the most out of your output data set. The inputs can be sorted or key-parted, as required.
The merge stage is designed to optimize the output of each input data set while ensuring that each record is matched to the appropriate lookup table. For example, if you have a table named employees in your master data set, a separate lookup table for employees in your update data set would be redundant. To achieve this, you might want to consider a preprocessing function that scours your master data set for duplicate records. This is the shortest route to a perfect outcome.
The Merge stage is just one component in a suite of tools that helps you create and maintain your data repository. There are four main components to consider: the stage, the manager, the before/after subroutine and the transform function. With all these components, it’s easy to find a solution that meets your unique needs. Choosing the best stage for your data repository can be a matter of time and money. Whether you’re looking for the right fit for your database, or need an enterprise tool for your team, you’ll be able to find it with DataStage. Your data is safe and secure, and you’ll always be up-to-date.
IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Lookup Stage
IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Join Stage