During the course of the week, the discussion happened regarding the different places where a person might read the DataStage and QualityStage logs in InfoSphere. I hadn’t really thought about it, but here are a few places that come to mind:
IBM InfoSphere DataStage and QualityStage Operations Console
IBM InfoSphere DataStage and QualityStageDirector client
IBM InfoSphere DataStage and QualityStageDesigner client by pressing Ctrl+L
Related Reference
IBM Knowledge Center> InfoSphere Information Server 11.7.0 > InfoSphere DataStage and QualityStage > Monitoring jobs
While investigating a recent Infosphere Information Server (IIS), Datastage, Essbase Connect error I found the explanations of the probable causes of the error not to be terribly meaningful. So, now that I have run our error to ground, I thought it might be nice to jot down a quick note of the potential cause of the ‘Client Commands are Currently Not Being Accepted’ error, which I gleaned from the process.
Error Message Id
IIS-CONN-ESSBASE-01010
Error Message
An error occurred while processing the request on the server. The error information is 1051544 (message on contacting or from application:[<<DateTimeStamp>>]Local////3544/Error(1013204) Client Commands are Currently Not Being Accepted.
Possible Causes of The Error
This Error is a problem with access to the Essbase object or accessing the security within the Essbase Object. This can be a result of multiple issues, such as:
Object doesn’t exist – The Essbase object didn’t exist in the location specified,
Communications – the location is unavailable or cannot be reached,
Path Security – Security gets in the way to access the Essbase object location
Essbase Security – Security within the Essbase object does not support the user or filter being submitted. Also, the Essbase object security may be corrupted or incomplete.
Essbase Object Structure – the Essbase object was not properly structured to support the filter or the Essbase filter is malformed for the current structure.
Related References
IBM Knowledge Center, InfoSphere Information Server 11.7.0, Connecting to data sources, Enterprise applications, IBM InfoSphere Information Server Pack for Hyperion Essbase
While chasing an error to which only applied to join type stages, I thought it might be nice to identify what the InfoSphere Information Server DataStage / QualityStage are. There are three of them, as you can see from the picture above, which are the:
Join Stage,
Lookup Stage,
And, Merge Stage.
All three stages that join data based on the values of identified keycolumns.
Related References
IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Lookup Stage
IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Join Stage
IBM Knowledge Center, InfoSphere Information Server 11.7.0, InfoSphere DataStage and QualityStage, Developing parallel jobs, Processing Data, Merge Stage
This a known error for windows systems and applies to DataStage and DataQuality jobs using the any or all the three join type stages (Join, Merge, and Lookup).
Error Message
<<Link name>>,0: APT_IOPort::readBlkVirt: read for block header, partition 0, [fd 4], returned -1 with errno 10,054 (Unknown error)
Message ID
IIS-DSEE-TFIO-00223
Applies To
Windows systems only
Parallel Engine Jobs the three join type stages (Join, Merge, and Lookup). It does not apply to Server Engine jobs.
Infosphere Information Server (IIS), Datastage and DataQuality 9.1 and higher
The Fix
Add the APT_NO_IOCOMM_OPTIMIZATION in project administrator and set to blank or 0. I left it blank so it would not impact other jobs
Add the environment variable to the job producing the error and set to 1
What it APT_NO_IOCOMM_OPTIMIZATION Does
Sets the use of shared memory as the transport type, rather than using the default sockets transport type.
Note that in most cases sockets transport type is faster, so, you likely will not to set this across the project as the default for all job. It is best to apply it as necessary for problematic jobs.
When you are controlling a chain of sequences in the job stream and taking advantage of reusable (multiple instances) jobs it is useful to be able to pass the Invocation ID from the master controlling sequence and have it passed down and assigned to the job run. This can easily be done with needing to manual enter the values in each of the sequences, by leveraging the DSJobInvocationId variable. For this to work:
The job must have ‘Allow Multiple Instance’ enabled
The Invocation Id must be provided in the Parent sequence must have the Invocation Name entered
The receiving child sequence will have the invocation variable entered
At runtime, a DataStage invocation id instance of the multi-instance job will generate with its own logs.
Variable Name
DSJobInvocationId
Note
This approach allows for the reuse of job and the assignment of meaningful instance extension names, which are managed for a single point of entry in the object tree.
Related References:
IBM Knowledge Center > InfoSphere Information Server 11.5.0
InfoSphere DataStage and QualityStage > Designing DataStage and QualityStage jobs > Building sequence jobs > Sequence job activities > Job Activity properties
Since the Infosphere, information server, repository, has to be installed manually with the scripts provided in the IBM software, sometimes you run into difficulties. So, here’s a quick script, which I have found useful in the past to identify user permissions for the IAUSER on Oracle database’s to help rundown discrepancies in user permissions.
SELECT *
FROM ALL_TAB_PRIVS
WHERE GRANTEE = ‘iauser’
If we cannot run against the ALL_TAB_PRIVS view, then we can try the ALL_TAB_PRIVS view:
SELECT *
FROM USER_TAB_PRIVS
WHERE GRANTEE = ‘iauser’
Related References
oracle help Center > Database Reference > ALL_TAB_PRIVS view
Working with timestamp literals in the Infosphere SFDC Connector soql is much like working date literals. So, here a quick example which may save you some time.
SOQL Timestamp String Literals Where Clause Rules
Basically, the timestamp pattern is straight forward and like the process for dates, but there are some differences. The basic rules are for a soql where clause:
No quotes
No functions
No Casting function, or casting for the where soql where clause to read it
It only applies to datetime fields
A Timestamp identifier ‘T’
And the ISO 1806 time notations
Example SOQL Timestamp String Literals
So, here are a couple of timestamp string literal examples in SQL:
1901-01-01T00:00:00-00:00
2016-01-31T00:00:00-00:00
9999-10-31T00:00:00-00:00
Example SQL with Timestamp String Literal Where Clause
I found working with date literal, when working with the Infosphere SFDC Connector soql, to be counterintuitive for me. At least as I, normally, as I use SQL. I spent a little time running trials in Workbench, before I finally locked on to the ‘where clause’ criteria data pattern. So, here a quick example.
SOQL DATE String Literals Where Clause Rules
Basically, the date pattern is straight forward. The basic rules are for a soql where clause:
No quotes
No functions
No Casting function, or casting for the where soql where clause to read.
Example SOQL DATE String Literals
So, here are a couple of date string literal examples in SQL:
1901-01-01
2016-01-31
9999-10-31
Example SQL with Date String Literal Where Clause
Select
t.id,
t.Name,
t.Target_Date__c,
t.User_Active__c
From Target_and_Segmentation__c t
where t.Target_Date__c > 2014-10-31
Related References
Salesforce Developer Documentation
Home, Developer Documentation, Force.com SOQL and SOSL Reference
I’ve had these notes laying around for a while, so, I thought I consolidate them here. So, here are few guidelines to ensure the quality of your dimension table structures.
Dimension Table Effective Practices
The table naming convention should identify it as a dimension table. For example:
Suffix Pattern:
<<TableName>>_Dim
<<TableName>>_D
Prefix Pattern:
Dim_<TableName>>
D_<TableName>>
Have Primary Key (PK) assigned on table surrogate Key
Audit fields – Type 1 dimensions should:
Have a Created Date timestamp – When the record was initially created
have a Last Update Timestamp – When was the record last updated
Job Flow: Do not place the dimension processing in the fact jobs.
Every Dimension should have a Zero (0), Unknown, row
Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
Keep dimension processing outside of the fact jobs
The APT_TSortOperator warning happens when there is a conflict in the portioning behavior between stages. Usually, because the successor (down Stream) stage has the ‘Partitioning / Collecting’ and ‘Sorting’ property set in a way that conflicts with predecessor (upstream) stage’s properties, which it is set to preserver. This can occur when the successor stage has the “Preserve Partitioning” property set to:
‘Default (Propagate)’
‘Propagate’, or
‘Set’
Preserve Partitioning Property – list
Message ID
IIS-DSEE-TFOR-00074
Message Text
<<Link Name Where Warning Occurred>>: When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0.
Warning Fixes
First, if the verify that the partitioning behaviors of both stages are correct
If so, set the predecessor ‘Preserve Partitioning’ property to “Clear”
If not, then correct the partitioning behavior of the stage which is in error
Datastage Director Message – Numeric string expected
Beware when you see this message when working with Boolean in DataStage, the message displays as informational (at list it did for me) not as a warning or an error. Even though it seems innocuous, what it meant for my job, was the Boolean (‘true’ / ‘false’) was not being interpreted and everything posted to ‘false’.
In DataStage the Netezza ‘Boolean’ field/Data SQL type maps to the ‘Bit’ SQL type, which expects a numeric input of Zero (0) or one (1). So, my solution (once I detected the problem during unit testing) was to put Transformer Stage logic in place to convert the Boolean input to the expected number value.
Netezza to Datastage Data Type Mapping
Netezza data types
InfoSphere DataStage
data types (SQL types)
Expected Input value
BOOLEAN
Bit
0 or 1 (1 = true, 0 = false)
Transformer Stage logic Boolean Handling Logic
A Netezza Boolean field can store: true values, false values, and null. So, some thought should be given to you desired data outcome for nulls
This first example sets a that the nulls are set to a specific value, which can support a specific business rule for null handling and, also, provide null handling for non-nullable fields. Here we are setting nulls to the numeric value for ‘true’ and all other non-true inputs to ‘false’.
If isnull(Lnk_Src_In.USER_ACTIVE) then 1 Else if Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0
These second examples sets a that the nulls are set by the Else value, if your logic direction is correct value and still provides null handling for non-nullable fields.
If Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0
If Lnk_Src_In.USER_ACTIVE = ‘False’ Then 0 Else 1
Director Log Message
Message ID
IIS-DSEE-TBLD-00008
Message Text
<<Link Name Where Message Occurred>>: Numeric string expected. Use default value.
Or something like this:
<<Link Name Where Message Occurred>>: Numeric string expected for input column ‘<<Field Name Here>>‘. Use default value.
Related References
Boolean
PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza user-defined functions, UDX data types reference information, Supported data types, Boolean
PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases
PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Data types, Logical data types
InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage
When projects transition to production, the receiving team need some transition documentation to help with knowledge transfer and to guide them while they get accustomed to operating the application and can form their own documentation. This temple is a shell, which usually provides enough detail, with some updating, to provide a quick reference regarding the application. Also, having a temple can be a real time-saver, as opposed to writing it from scratch.
When a project is created, there is a default repository structure created for use in the DataStage designer client.
Default DataStage Repository Structure
However, some additional organization will be required for most DataStage projects. Usually, this organization occurs in in these areas:
Addition of structure within the “Jobs” folder
Addition of a “Parameter Sets” folder
Addition of structure within the “Table Definitions” folder
Addition of a “Developer Work Area” folder
Repository Structure within the “Jobs” folder
Below is a sample of a folder structure for multiple applications that share a common Repository. Pattern includes, but does not illustrate all other delivered folders. In addition to the core folder structure, developers can create individual working, test, and in progress folders, which do not migrate, but keep work segregated.
Jobs Folder Pattern Datastage Repository Structure
Parameter Sets Folders
The parameter set folders or for two sets of information.
First, are the database parameters, which include data connections and the attached parameter sets.
The second, for job parameters, which may include parameter sets, for things like e-mail parameters, surrogate key file paths, etc.; which is a best practice, rather creating them as project level parameters.
The Tables Definition folder have folders added to segregate the imported meta data for source and target system and, in some case, may need folders to logically organize imported meta which may reside within the same database and/or schema, but belong to different logical layer.
Peer code review happens during the development phase and focuses on the overall quality and compliance to standards of code and configuration artifacts. However, the hard part of performing a Peer code review isn’t, performing the review, but rather to achieving consistency and thoroughness in the review. This is where a checklist can contribute significantly, providing a list of things to check and providing a relative weight for the findings. I hope this template assists with your DataStage job review process.
Standard practices help you and other understand your work. This can be very important when working on large teams, working across team boundaries, or when large complex sets of process and objects may be involved. When you consider the importance of naming convention, when coupled with standard practice, the benefit should be obvious, but often practice doesn’t execute or document their conventions. So, these standard naming conventions may help when none exist or you need to assemble your own naming conventions.
<<SomeIdentifier >> = should be replaced with appropriate information
There are three primary ways to create a parameter sets and is a different practice from adding ‘User Defined’ variables in InfoSphere DataStage Administrator. The ways to create a parameter set are:
Create a parameter set from a data connection stage
Create a Parameter Set from the navigation of DataStage designer, and
Create a Parameter Set from a job
Create a parameter set from Data Connection Stage
This is used to create parameter sets for Database connections parameters
To create a new Parameter Set from a Data Connection
Select: File > New > Other and select “Data Connection“
Complete the data Connection stage properties, then save the stage.
Open the Connection stage and navigate to the “Parameters” Tab
Then, click on the “Associated Parameter Set” button, and Chose the “Create & Attach” menu item
This will Launch a Dialog
Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set
On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
Click OK to save the Parameter set.
Create a Parameter Set from the navigation of DataStage designer
This is, perhaps, the more traditional way of creating a parameter set.
To create a new Parameter Set
Select: File > New > Other and select “Parameter Set”
This will Launch a Dialog
Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set.
Note: Existing Environment Variables can also added.
Create a Parameter Set from a job On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
Click OK to save the Parameter set.
This approach is, perhaps, less traditional, but is equally effective, if you find yourself creating additional jobs and now need to share the same parameters. This is a quick and easy to generate a parameter set from an existing job.
To create a new Parameter Set from a job
Open the job that you want to create a parameter set for.
Click “Edit > Job Properties” to open the “Job Properties” window.
Click the “Parameters” tab.
Press and hold the Ctrl key, then select the parameters that you want to include in the parameter set.
With your parameters highlighted, click “Create Parameter Set”. The Parameter Set window opens.
Enter a name and short description for your parameter set.
Click the “Parameters” tab; the parameters that you selected are listed.
Click the ”Values” tab.
Enter a name in the Value File name field, then press Enter. The value for each of your parameters is automatically populated with the path name that you entered.
If a default value is not already set, enter a value for each parameter. For example, if the variable is a Pathname type, enter a default path name.
Click “OK” to close the Parameter Set window.
In the Save Parameter Set As window, select the folder where you want to save your parameter set and click Save. When prompted to replace the selected parameters with the parameter set, click Yes.
The ‘Unterminated quoted string’ error occurs from time to time when working with the InfoSphere DataStage Netezza JDBC Connector stage and is nebulas, at best. However, the solution is, normally, straight forward enough once you understand it. Usually, this error is the result of target table fields or field being shorter than the input data. The fix is, normally, to compare you input field lengths (or composite field length, if consolidation fields into one field) and adjusting the field length higher. In some cases, if business rules allow you may be able to substring or truncate the input data length (not a recommended approach), but information can be lost with this approach.
Tgt_IIS_Job_Dim,0: The connector encountered a Java exception: org.netezza.error.NzSQLException: ERROR: Unterminated quoted string at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:287) at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:76) at org.netezza.sql.NzConnection.execute(NzConnection.java:2904) at org.netezza.sql.NzStatement._execute(NzStatement.java:885) at org.netezza.sql.NzPreparedStatament.executeUpdate(NzPreparedStatament.java:229) at com.ibm.is.cc.jdbc.CC_JDBCRecordDataSetConsumer.executeStatements(CC_JDBCRecordDataSetConsumer.java:2846) at com.ibm.is.cc.jdbc.CC_JDBCBigBufferRecordDataSetConsumer.consumeBigBuffer(CC_JDBCBigBufferRecordDataSetConsumer.java:712)
This is a snippet of very useful information, which is buried in the IBM documentation. So, I thought I would bring it up a level for those of you who may be making decisions regarding Information server installations (IIS). The table below provides an excerpt of the IBM IIS repository databases placement and guidance. Please, keep in mind that which of these databases may be required for your installation may be a subset of these database depending upon your IIS product offering license.
Metadata Repository Databases
Repository or Database
Description
Default Database And Schema
Active Infosphere Information Server Metadata Repository
Stores the metadata about external data sources that are governed, managed, and analyzed by InfoSphere Information Server components. Normally referred to as the metadata repository.
Database:XMETA
Schema:XMETA
Database must be the same database that is used for the staging metadata repository.
Infosphere Information Server Staging Area
Stores metadata that is imported from external data sources so that it can be examined before it is moved to the active metadata repository.
Database:XMETA
Schema:XMETASR
Database must be the same database that is used for the active metadata repository.
Analysis Database
Stores results of information analysis by InfoSphere Information Analyzer.
Database:IADB
Schema:IAUSER
Database cannot be the same database that is used for the active metadata repository or staging area.
Operations Database
Stores monitoring data that is displayed by the InfoSphere DataStage® and QualityStage®Operations Console.
Database:XMETA
Schema: User-defined repository user name, typically DSODB
Database can be the same or different as the database that is used for the metadata repository.
Standardization Rules Designer Database
Stores a copy of revisions to InfoSphere QualityStage rule sets that have been made in the IBM InfoSphere QualityStage Standardization Rules Designer.
Database:XMETA
Schema: User-defined data store user name, typically SRDUSER
Database can be the same or different as the database that is used for the metadata repository.
Exceptions Database
Stores exceptions that are generated by InfoSphere Information Server products and components.
Database:XMETA
Schema: User-defined repository user name, typically ESDB
Database can be the same or different as the database that is used for the metadata repository.
Match Designer Database
Stores the results of match test passes by InfoSphere QualityStage Match Designer, a component of InfoSphere QualityStage. This data store is an ODBC data source that is used as a staging area before match designs are checked in to the active metadata repository.
Database: MDDB
User-defined database name and schema name. No default, but typically MDDB.
Database cannot be the same database as that used for the metadata repository.