Essbase Connector Error – Client Commands are Currently Not Being Accepted

Advertisements
DataStage Essbase Connector

While investigating a recent Infosphere Information Server (IIS), Datastage, Essbase Connect error I found the explanations of the probable causes of the error not to be terribly meaningful.  So, now that I have run our error to ground, I thought it might be nice to jot down a quick note of the potential cause of the ‘Client Commands are Currently Not Being Accepted’ error, which I gleaned from the process.

Error Message Id

  • IIS-CONN-ESSBASE-01010

Error Message

An error occurred while processing the request on the server. The error information is 1051544 (message on contacting or from application:[<<DateTimeStamp>>]Local////3544/Error(1013204) Client Commands are Currently Not Being Accepted.

Possible Causes of The Error

This Error is a problem with access to the Essbase object or accessing the security within the Essbase Object.  This can be a result of multiple issues, such as:

  • Object doesn’t exist – The Essbase object didn’t exist in the location specified,
  • Communications – the location is unavailable or cannot be reached,
  • Path Security – Security gets in the way to access the Essbase object location
  • Essbase Security – Security within the Essbase object does not support the user or filter being submitted. Also, the Essbase object security may be corrupted or incomplete.
  • Essbase Object Structure –  the Essbase object was not properly structured to support the filter or the Essbase filter is malformed for the current structure.

Related References

IBM Knowledge Center, InfoSphere Information Server 11.7.0, Connecting to data sources, Enterprise applications, IBM InfoSphere Information Server Pack for Hyperion Essbase

Printable PDF Version of This Article

 

DataStage – How to Pass the Invocation ID from one Sequence to another

Advertisements
DataStage Invocation ID Passing Pattern Overview

When you are controlling a chain of sequences in the job stream and taking advantage of reusable (multiple instances) jobs it is useful to be able to pass the Invocation ID from the master controlling sequence and have it passed down and assigned to the job run.  This can easily be done with needing to manual enter the values in each of the sequences, by leveraging the DSJobInvocationId variable.  For this to work:

  • The job must have ‘Allow Multiple Instance’ enabled
  • The Invocation Id must be provided in the Parent sequence must have the Invocation Name entered
  • The receiving child sequence will have the invocation variable entered
  • At runtime, a DataStage invocation id instance of the multi-instance job will generate with its own logs.

Variable Name

  • DSJobInvocationId

Note

This approach allows for the reuse of job and the assignment of meaningful instance extension names, which are managed for a single point of entry in the object tree.

Related References: 

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and QualityStage > Designing DataStage and QualityStage jobs > Building sequence jobs > Sequence job activities > Job Activity properties

Oracle – How to get a list of user permission grants

Advertisements

Since the Infosphere, information server, repository, has to be installed manually with the scripts provided in the IBM software, sometimes you run into difficulties. So, here’s a quick script, which I have found useful in the past to identify user permissions for the IAUSER on Oracle database’s to help rundown discrepancies in user permissions.

SELECT *

FROM ALL_TAB_PRIVS

WHERE  GRANTEE = ‘iauser’

If we cannot run against the ALL_TAB_PRIVS view, then we can try the ALL_TAB_PRIVS view:

SELECT *

FROM USER_TAB_PRIVS

WHERE  GRANTEE = ‘iauser’

Related References

oracle help Center > Database Reference > ALL_TAB_PRIVS view

What are the Core Capability of Infosphere Information Server?

Advertisements

Three Core Capabilities of Information Server

InfoSphere Information Server (IIS) has three core capabilities:

  • Information Governance
  • Data Integration
  • Data Quality

What the Core Capabilities Provide

The three-core capability translate in to the high-level business processes:

Information Governance – Understand and collaborate

Provides a centrally managed repository and approach, which provides:

  • Information blueprints
  • Relationship discovery across data sources
  • Information technology (IT)-to-business mapping

Data Integration – Transform and deliver

A data integration capability, which provides:

  • Transformation
    • Massive scalability
    • Power for any complexity
    • Total traceability
  • Delivery
    • Data capture at any time
    • Delivery anywhere
    • Big data readiness

Data Quality – Cleanse and monitor

To turn data assets into trusted information:

  • Analysis & validation
  • Data cleansing
  • Data quality rules & management

Related References

IBM Knowledge Center, InfoSphere Information Server Version 11.5.0

Overview of IBM InfoSphere Information Server, Introduction to InfoSphere Information Server

InfoSphere Information Server Suites

Advertisements

The Six Suites of Infosphere Information Server

InfoSphere Information Server 11 Series is, primarily, made up of 6 suites / packages, which provide a mix of components:

  • InfoSphere Information Governance Catalog
  • BigInsights BigIntegrate
  • BigInsights
  • InfoSphere Information Server for Data Integration
  • InfoSphere Information Server for Data Quality
  • InfoSphere Information Server Enterprise Edition (is inclusive of all 5 packages above)

IIS capabilities / components

These License suites are comprised of differing combinations IIS capabilities / components, which consist of:

  • InfoSphere DataStage®
  • InfoSphere QualityStage®
  • InfoSphere DataStage and QualityStage Designer
  • InfoSphere Data Click
  • InfoSphere FastTrack
  • InfoSphere Information Analyzer
  • InfoSphere Information Governance Catalog
  • InfoSphere Information Governance Dashboard
  • InfoSphere Information Services Director

Additional Purchasable Infosphere capabilities

Additional Infosphere capabilities can be licensed / purchased to supplement the primary suite licenses:

  • Cognos® Business Intelligence
  • BigInsights Data Scientist
  • InfoSphere Change Data Delivery
  • InfoSphere Data Architect

Table of Component /capability Alignment to IIS Suites

Suites

InfoSphere Information Server for Data Integration

InfoSphere Information Server for Data Quality

InfoSphere Information Governance Catalog

BigInsights BigIntegrate

BigInsights

InfoSphere Information Server Enterprise Edition

Components

      

InfoSphere DataStage®

 ✓* 

InfoSphere QualityStage®

   

InfoSphere DataStage and QualityStage Designer

✓*

InfoSphere Data Click

   

InfoSphere FastTrack

   

InfoSphere Information Analyzer

   

InfoSphere Information Governance Catalog

✓*✓*✓*✓*

InfoSphere Information Governance Dashboard

InfoSphere Information Services Director

 

Footnotes:

* = usage and license restriction may apply.

Related References

IBM Knowledge Center, InfoSphere Information Server, Version 11.5.0,

Overview of IBM InfoSphere Information Server, Introduction to InfoSphere Information Server, Components in the InfoSphere Information Server suite

SFDC – Using a timestamp literal in a where clause

Advertisements

Salesforce Connector

Working with timestamp literals in the Infosphere SFDC Connector soql is much like working date literals.  So, here a quick example which may save you some time.

SOQL Timestamp String Literals Where Clause Rules

Basically, the timestamp pattern is straight forward and like the process for dates, but there are some differences. The basic rules are for a soql where clause:

  • No quotes
  • No functions
  • No Casting function, or casting for the where soql where clause to read it
  • It only applies to datetime fields
  • A Timestamp identifier ‘T’
  • And the ISO 1806 time notations

Example SOQL Timestamp String Literals

So, here are a couple of timestamp string literal examples in SQL:

  • 1901-01-01T00:00:00-00:00
  • 2016-01-31T00:00:00-00:00
  • 9999-10-31T00:00:00-00:00

Example SQL with Timestamp String Literal Where Clause

 

Select e.Id,

e.AccountId,

e.StartDateTime

From Event e

WHERE e.StartDateTime > 2014-10-31T00:00:00-00:00

 

Related References

Salesforce Developer Documentation

Home, Developer Documentation, Force.com SOQL and SOSL Reference

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_dateformats.htm

Salesforce Workbench

Home, Technical Library, Workbench

W3C

Date Time Formats

 

SFDC Salesforce Connector – Column Returns Null values, when SOQL Returns Data in Workbench

Advertisements

Salesforce Connector

Recently, encountered a scenario, which is a little out of the norm while using the SFDC Connector.  Once the issue is understood, it is easily remedied.

The problem / Error

  • SOQL run in Salesforce workbench and column returns data
  • The DataStage job/ETL runs without errors or warnings
  • The target column output only returns null values

The Cause

In short the cause is a misalignment between the SOQL field name and the column name in the columns tab of the connector.

The Solution

The fix is simply to convert the dots in the field name to underscores.   Basically, a field name on SOQL of Account.RecordType.Name becomes Account_RecordType_Name.

Example Field / Column Name  Fix

Example SQL

Select c.Id,

c.AccountId,

c.CV_Account_Number__c,

c.Name,

c.Role__c,

c.Status__c,

c.Account.RecordType.Name

From Contact c

Columns Tab With Correct Naming Alignment

Please note that the qualifying dots have been converted to underscores.

SFDC Connector Columns Tab

Related References

 

SFDC – Using a date literal in a where clause

Advertisements

I found working with date literal, when working with the Infosphere SFDC Connector soql, to be counterintuitive for me.  At least as I, normally, as I use SQL.  I spent a little time running trials in Workbench, before I finally locked on to the ‘where clause’ criteria data pattern.  So, here a quick example.

SOQL DATE String Literals Where Clause Rules

Basically, the date pattern is straight forward. The basic rules are for a soql where clause:

  • No quotes
  • No functions
  • No Casting function, or casting for the where soql where clause to read.

Example SOQL DATE String Literals

So, here are a couple of date string literal examples in SQL:

  • 1901-01-01
  • 2016-01-31
  • 9999-10-31

Example SQL with Date String Literal Where Clause

 

Select

t.id,

t.Name,

t.Target_Date__c,

t.User_Active__c

From Target_and_Segmentation__c t

where t.Target_Date__c > 2014-10-31

 

Related References

Salesforce Developer Documentation

Home, Developer Documentation, Force.com SOQL and SOSL Reference

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_dateformats.htm

Salesforce Workbench

Home, Technical Library, Workbench

 

Data Modeling – Fact Table Effective Practices

Advertisements

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Datastage – When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0

Advertisements
APT_TSortOperator Warning

The APT_TSortOperator  warning happens when there is a conflict in the portioning behavior between stages.  Usually, because the successor (down Stream) stage has the ‘Partitioning / Collecting’ and ‘Sorting’ property set in a way that conflicts with predecessor (upstream) stage’s properties, which it is set to preserver.  This can occur when the successor stage has the “Preserve Partitioning” property set to:

  • ‘Default (Propagate)’
  • ‘Propagate’, or
  • ‘Set’
Preserve Partitioning Property – list

Message ID

  • IIS-DSEE-TFOR-00074

Message Text

  • <<Link Name Where Warning Occurred>>: When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0.

Warning Fixes

  • First, if the verify that the partitioning behaviors of both stages are correct
  • If so, set the predecessor ‘Preserve Partitioning’ property to “Clear”
  • If not, then correct the partitioning behavior of the stage which is in error

Clear Partitioning Property Screenshot

Preserve Partitioning Property – Set To Clear

Infosphere DataStage – Boolean Handling for Netezza

Advertisements

Beware when you see this message when working with Boolean in DataStage, the message displays as informational (at list it did for me) not as a warning or an error.  Even though it seems innocuous, what it meant for my job, was the Boolean (‘true’ / ‘false’) was not being interpreted and everything posted to ‘false’.

In DataStage the Netezza ‘Boolean’ field/Data SQL type maps to the ‘Bit’ SQL type, which expects a numeric input of Zero (0) or one (1).  So, my solution (once I detected the problem during unit testing) was to put Transformer Stage logic in place to convert the Boolean input to the expected number value.

Netezza to Datastage Data Type Mapping

Netezza data types

InfoSphere DataStage

data types (SQL types)

Expected Input value

BOOLEANBit0 or 1 (1 = true, 0 = false)

Transformer Stage logic Boolean Handling Logic

A Netezza Boolean field can store: true values, false values, and null. So, some thought should be given to you desired data outcome for nulls

This first example sets a that the nulls are set to a specific value, which can support a specific business rule for null handling and, also, provide null handling for non-nullable fields.  Here we are setting nulls to the numeric value for ‘true’ and all other non-true inputs to ‘false’.

If isnull(Lnk_Src_In.USER_ACTIVE) then 1 Else if Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

These second examples sets a that the nulls are set by the Else value, if your logic direction is correct value and still provides null handling for non-nullable fields.

  • If  Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

  • If  Lnk_Src_In.USER_ACTIVE = ‘False’ Then 0 Else 1

Director Log Message

Message ID

  • IIS-DSEE-TBLD-00008

Message Text

  • <<Link Name Where Message Occurred>>: Numeric string expected. Use default value.

Or something like this:

  • <<Link Name Where Message Occurred>>: Numeric string expected for input column ‘<<Field Name Here>>‘. Use default value.

Related References

Boolean

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza user-defined functions, UDX data types reference information, Supported data types, Boolean

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.udf.doc/r_udf_boolean_datatype.html

Data types and aliases

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.sproc.doc/c_sproc_data_types_aliases.html

Logical data types

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Data types, Logical data types

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_data_types_logical.html

Data type conversions from Netezza to DataStage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/com.ibm.swg.im.iis.conn.netezza.use.doc/topics/nzcc_mappingdatatypes.html

Infosphere DataStage – Designer Client Repository Structure

Advertisements

Default Repository Structure

When a project is created, there is a default repository structure created for use in the DataStage designer client.

However, some additional organization will be required for most DataStage projects.  Usually, this organization occurs in in these areas:

  • Addition of structure within the “Jobs” folder
  • Addition of a “Parameter Sets” folder
  • Addition of structure within the “Table Definitions” folder
  • Addition of a “Developer Work Area” folder

Repository Structure within the “Jobs” folder

Below is a sample of a folder structure for multiple applications that share a common Repository.  Pattern includes, but does not illustrate all other delivered folders. In addition to the core folder structure, developers can create individual working, test, and in progress folders, which do not migrate, but keep work segregated.

Parameter Sets Folders

The parameter set folders or for two sets of information.

  • First, are the database parameters, which include data connections and the attached parameter sets.
  • The second, for job parameters, which may include parameter sets, for things like e-mail parameters, surrogate key file paths, etc.; which is a best practice, rather creating them as project level parameters.

Table Definitions

The Tables Definition folder have folders added to segregate the imported meta data for source and target system and, in some case, may need folders to logically organize imported meta which may reside within the same database and/or schema, but belong to different logical layer.

InfoSphere DataStage – DataStage Parallel Job Peer Code Review Checklist Template

Advertisements

Peer code review happens during the development phase and focuses on the overall quality and compliance to standards of code and configuration artifacts. However, the hard part of performing a Peer code review isn’t, performing the review, but rather to achieving consistency and thoroughness in the review.   This is where a checklist can contribute significantly, providing a list of things to check and providing a relative weight for the findings.  I hope this template assists with your DataStage job review process.

Infosphere Datastage – Standard Practice- Sequence Naming Conventions

Advertisements

Standard practices help you and other understand your work.  This can be very important when working on large teams, working across team boundaries, or when large complex sets of process and objects may be involved.  When you consider the importance of naming convention, when coupled with standard practice, the benefit should be obvious, but often practice doesn’t execute or document their conventions.  So, these standard naming conventions may help when none exist or you need to assemble your own naming conventions.

<<SomeIdentifier >> = should be replaced with appropriate information

  1. Sequence Object Naming Conventions
EntityConvention
Master Control Sequence (parent)Master_<<Application>>_ <<Application Job Stream Name>>_Seq
Sequence<<Application>>_<<job_Name>>_Seq
  1. Sequence Stage Naming Conventions
EntityConvention
End LoopEL__<<PrimaryFunction>>
Error Handler EH_<<PrimaryFunction>>
Execute CommandEC_<<PrimaryFunction>>
Job Activity Job_<<PrimaryFunction>>
Nested Condition NC_<<PrimaryFunction>>
Notify Notify_<<PrimaryFunction>>
Routine Activity Rtn_<<PrimaryFunction>>
Sequence Links (messages) Msg_<<Number or Description>>
Sequence Links (Non-messages) Lnk_<<Number or Description>>
SequencerSeqr_<<Number or Description>>
Sequencer (All) SeqAll_<<Identifier>>
Sequencer (Any) SeqAny_<<Identifier>>
Start LoopSL__<<PrimaryFunction>>
Terminator ActivityTA__<<PrimaryFunction>>
User VariablesUV__<<PrimaryPurpose>>
Wait For File WFF__<<PrimaryFunction>>

InfoSphere DataStage – Ways to Create a Datastage Parameter Set

Advertisements

There are three primary ways to create a parameter sets and is a different practice from adding ‘User Defined’ variables in InfoSphere DataStage Administrator. The ways to create a parameter set are:

  • Create a parameter set from a data connection stage
  • Create a Parameter Set from the navigation of DataStage designer, and
  • Create a Parameter Set from a job

Create a parameter set from Data Connection Stage

This is used to create parameter sets for Database connections parameters

To create a new Parameter Set from a Data Connection

  • Select: File > New > Other and select “Data Connection
  • Complete the data Connection stage properties, then save the stage.
  • Open the Connection stage and navigate to the “Parameters” Tab
  • Then, click on the “Associated Parameter Set” button, and Chose the “Create & Attach” menu item
  • This will Launch a Dialog
  • Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
  • In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set
  • On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
  • Click OK to save the Parameter set.

Create a Parameter Set from the navigation of DataStage designer

This is, perhaps, the more traditional way of creating a parameter set.

To create a new Parameter Set

  • Select: File > New > Other and select “Parameter Set”
  • This will Launch a Dialog
  • Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
  • In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set.
Note: Existing Environment Variables can also added.
  • Create a Parameter Set from a job
    On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
  • Click OK to save the Parameter set.

This approach is, perhaps, less traditional, but is equally effective, if you find yourself creating additional jobs and now need to share the same parameters.  This is a quick and easy to generate a parameter set from an existing job.

To create a new Parameter Set from a job

  • Open the job that you want to create a parameter set for.
  • Click “Edit > Job Properties” to open the “Job Properties” window.
  • Click the “Parameters” tab.
  • Press and hold the Ctrl key, then select the parameters that you want to include in the parameter set.
  • With your parameters highlighted, click “Create Parameter Set”.  The Parameter Set window opens.
    • Enter a name and short description for your parameter set.
    • Click the “Parameters” tab; the parameters that you selected are listed.
    • Click the ”Values” tab.
    • Enter a name in the Value File name field, then press Enter.  The value for each of your parameters is automatically populated with the path name that you entered.
    • If a default value is not already set, enter a value for each parameter. For example, if the variable is a Pathname type, enter a default path name.
    • Click “OK” to close the Parameter Set window.
    • In the Save Parameter Set As window, select the folder where you want to save your parameter set and click Save. When prompted to replace the selected parameters with the parameter set, click Yes.
  • Click “OK” to close the Job Properties window.

Netezza / PureData – Table Documentation Practices

Advertisements

Applying a few table effective practices can:

  • Significantly aide others in understanding the purpose and use of tables and column fields
  • Provide more migratable meta data to inform and enable the capabilities of other tools
  • And, from a self-preservation point of view, reduce the number of meetings, emails, and phone required to explain and clarify what the table and fields business meaning and roles

Among the Effective practices when modeling and design tables are:

  • Add Primary Key
  • Add table comments
  • Add column field comments
  • Add ‘Organize by’ fields to provide essential optimization on frequently used and/or key performance fields

*DataStage*DSR_SELECT (Action=3); check DataStage is set up correctly in project

Advertisements

Having encountered this DataStage client error in Linux a few times recently, I thought I would document the solution, which has worked for me.

Error Message:

Error calling subroutine: *DataStage*DSR_SELECT (Action=3); check DataStage is set up correctly in project

(Subroutine failed to complete successfully (30107))

Probable Cause of Error

  • NodeAgents has stopped running
  • Insufficient /temp disk space

Triage Approach

To fix this error in Linux:

  • Ensure disk space is available and you may want clean up the /tmp directory of any excel non-required files.
  • Start the NodeAgents.sh, if it is not running

Command to verify Node Agent is running

ps -ef | grep java | grep Agent

Command to Start Node Agent

This example command assumes the shell script is in its normal location, if not you will need to adjust the path.

/opt/IBM/InformationServer/ASBNode/bin/NodeAgents.sh start

Node Agent Logs

These logs may be helpful:

  • asbagent_startup.err
  • asbagent_startup.out

Node Agent Logs Location

This command will get you to where the logs are normally located:

cd /opt/IBM/InformationServer/ASBNode/

InfoSphere Datastage – How to Improve Sequential File Performance Using Parallel Environment Variables

Advertisements

While extensive use of sequential files is not best practice, sometimes there is no way around it, due to legacy systems and/or existing processes. However, recently, I have encountered a number of customers who are seeing significant performance issues with sequential file intensive processes. Sometimes it’s the job design, but often when you look at the project configuration they still have the default values. This is a quick and easy thing to check and to adjust to get a quick performance win if they’ve not already been adjusted. These are delivered variables, but should seriously be considered for adjustment in nearly all data stage ETL projects. The adjustment must be based on the amount of available memory, the volume of workload that is sequential file intensive, and the environment you’re working in. Some experiential adjustment may be required, but I have provided a few recommendations below.

Environment Variable Properties

Category NameTypeParameter NamePromptSizeDefault Value
Parallel > Operator SpecificStringAPT_FILE_EXPORT_BUFFER_SIZESequential write buffer sizeAdjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096.128
Parallel > Operator SpecificStringAPT_FILE_IMPORT_BUFFER_SIZESequential read buffer sizeAdjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096.128

Related References

InfoSphere DataStage – How to calculate age in a transformer

Advertisements

Occasionally, there is a need to calculate the between two dates for any number of reasons. For example, the age of a person, of an asset, age of an event.  So, having recently had to think about how to do this in a DataStage Transformer, rather in SQL, I thought it might be good to document a couple of approaches, which can provide the age.  This code does it at the year level, however, if you need the decimal digits or other handling them the rounding within the DecimalToDecimal function can be changed accordingly.

Age Calculation using Julian Date

DecimalToDecimal((JulianDayFromDate(<>) – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT) )/365.25, ‘trunc_zero’)

Age Calculation using Julian Date with Null Handling

If a date can be missing from you source input data, then null handling is recommended to prevent job failure.  This code uses 1901-01-01 as the null replacement values, but it can be any date your business requirement stipulates.

DecimalToDecimal((JulianDayFromDate( NullToValue(<>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) )  – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT)) /365.25, ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate

DecimalToDecimal(DaysSinceFromDate(<>, <>) /365.25 , ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate with Null Handling

Here is a second example of null handling being applied to the input data.

DecimalToDecimal(DaysSinceFromDate(<>, NullToValue(<< Input date (e.g.Date of Birth) >>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) ) /365.25 , ‘trunc_zero’)

Netezza Connector Stage, Table name required warning for User-defined SQL Write mode

Advertisements

Recently, while working at a customer site and I encountered an anomaly in the Netezza Connector stage, when choosing ‘User-defined SQL’ write mode, the ‘Table name’ displays a caution/warning even though a table name should not be required.  If you are using a user-defined SQL statement and/or have parametrized your SQL scripts to make the job reusable, each SQL and/or SQL script would have its own schema and table name being passed in.  After some investigation, a workaround was found, which both allows you to populate table name and leverage with different schema and table names within your SQL statement and/or.

Table Name, User-defined SQL, Warning

You will notice, in a screenshot below the ‘User-defined SQL’, ‘write mode’, the property has been chosen, a parameter has been placed in the ‘User-defined SQL’ property, and ‘Read user-defined SQL from a file’ property has been set to ‘Yes’.  However, the yellow triangle displays on the ‘Table name’ property marking it as a required item.  This, also, occurs when placing SQL statements in the User-defined SQL property, whether reading from a file or not.

Table Name, User-defined SQL, Warning Workaround

After some experimentation, the workaround is straight forward enough.  Basically, give the ‘table name’ property something to read successfully, so it can move on to the user-defined SQL and/or user-defined SQL file script, which the process actually needs to execute. In the screenshot below, the SYSTEM.DEFINITION_SCHEMA._V_DUAL view was used, so, it could be found, then the script file passed by the parameter runs fine.  Another view or table, which the DataStage user has access to, should just as well.

Related References

Surrogate Key File Effective Practices

Advertisements

Here are a few thoughts on effectively working with IBM Infosphere, Information Server, DataStage surrogate key files, which may prove useful for developers.

Placement

  • The main thing about placement is that it be in a consistent location. Developers and production support teams should need to guess or look up where it is for every DataStage project. So, it is best to put the surrogate keys in same base path and that each project has its own subfolder to facilitate migrations and to reduce the possibility of human error. Here Is the patch structure, which is commonly use:

Path

  • /data/SRKY/<<Project Name>>

Parameter Sets

  • As a best practice, the surrogate key file path should be in a parameter set and the parameter used in the jobs, as needed.  This simplifies maintenance, if and when changes to the path are required, and during migrations.

Surrogate Key Parameter Set Screenshot – Example Parameter Tab

Surrogate Key Parameter Set Screenshot – Example Values Tab

Surrogate Key Job Parameter Example Path using Parameter

Permissions

  • DataStage must have permissions to:
    • The entire parent path
    • The project folder, and
    • The surrogate key files themselves.

To ensure the DataStage has access to the path and Surrogate files, ensure:

  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to folders in the path, with at least a “-rw-r–r–“ (644) permissions level. Keeping the permissions to a minimum can, for obvious reasons,  prevent inadvertent overwrites of the surrogate key files; thus, avoiding some, potentially, serious cleanup.
  • The ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files

Surrogate Key File Minimum Permissions

Productivity Tip – Quickly create a new surrogate key file

Advertisements

This productivity tip, is how we can quickly create a new surrogate key file in Linux.  This example is leveraging native capabilities of Red Hat Enterprise Linux (RHEL) to skip a few commands, by using an existing surrogate key file to create a new surrogate file with a minimum of keys strokes and command line entries.

Creating a New Surrogate Key File From an Existing File

The basic process consists of just a few steps:

  1. Navigate to the location of your existing surrogate key files
  2. Copy an existing surrogate file
  3. Empty the new surrogate key file

Navigate to the location of your existing surrogate key files

This step is preparatory step; you will need to look at the path variable for the project you are working with to know where to go.  The actual path to the surrogate files your project can vary by project.

Copy an existing surrogate file

Assuming you have existing surrogate key files configured as needed, the use of the copy (cp) command can and the interactive and preserve options can eliminate the need to create the file, then set groups and permissions.   The interactive (-i) option prevent you from overwriting any existing files, in case you made a filename typo and the preserver (-p) option preserve the specified attributes (e.g. ownership, and permissions).

Basic Command

  • Here is the command formats with interactive and preserve, either format works
    • cp -ip <<FileName to Be Copied>> <<New Filename>>
  • Here is the command formats with only preserve
    • cp -p <<FileName to Be Copied>> <<New Filename>>

Example Command

  • cp -ip srky  blogexampl.srky

Empty the new surrogate key file

Setting the newly create surrogate key file to null will empty the file, so, DataStage can begin from the point configure in your DataStage job.

Basic Command

  • cat /dev/null > <<FileName>>

Example Command

  • cat /dev/null > blogexample.srky

Related References

Productivity Tip – Changing Owner and Groups on Surrogate Key File

Advertisements

This practice tip, is how we quickly update surrogate key file owner and group in Linux.  This example is leveraging native capabilities of Red Hat Enterprise Linux (RHEL) to skip a few commands, by using a combined command to set both the owner and group of a surrogate key file in a single command.

Surrogate Key File Owners and Groups

To ensure the DataStage has access to the path and Surrogate files, ensure the ‘dsadm’ (owner) and ‘dstage’ (group) have access to the surrogate key files

Setting Surrogate Key File Owners and Groups

You can change the ownership and group of a surrogate file at the same time, in Linux, with the change owner command. To do this navigate the surrogate key path containing the file, then execute the chown combined command.

Command chown basic format

  • chown <<OWNER>>:<<Group>> <<File Name>>

Example chown command

  • chown dsadm:dstage Blogexampl.txt

Related Reference

Infosphere Datastage – Client Tier Image Requirements

Advertisements

A frequent question encountered in the early stages of a client engagement is: what are the recommended Windows Client Tier Image Requirements (Virtual machine image or Desktop)?  However, I have never found this information to be satisfactorily documented by IBM.  So, invariably, we end up providing our best guidance based on experience, which in the case of the normal InfoSphere Information Server (IIS) installation is provided in the table below.

Recommended Developer Client Tier Characteristics

ItemQuantityNotes

Application Directory Storage

8 GB or Larger 

Memory

6 GB or MoreThis should be a developer images, so, more is better.  Especially, given the other applications, which the developer may also be using simultaneously.
CPU Cores2 or moreThis should be a developer images, so, more is better.  Especially, given the other applications, which the developer may also be using simultaneously.

Permissions

N/AUsers performing the client software installation, must have full Administrative rights on the Virtual Machine image or individual workstation.

Protection Software

N/AAll Firewalls and virus protection software needs to be disabled on the client, while client software installation is in progress.  Also, required Infosphere firewall ports for the client tools must be open to use the tools.

Related References

DataStage – IIS-DSEE-TBLD-00008- Processing Input Record APT_Decimal Error

Advertisements

This another one of those nebulas error messages, which can cost a lot of time in research, if you don’t know how to simplify the process a bit.  However, determining where the error is can be a bit of a challenge if you have not encountered this error before and figured out the trick, which isn’t exactly intuitive.

In this case, as it turned out after I had determined where the error was, it was as simple as having missed resetting the stage variable properties, when the other decimal fields increased.

How to identify where this error occurs?

Disabling the APT_DISABLE_COMBINATION environment variable by:

  • adding the APT_DISABLE_COMBINATION environment variable to the job properties
  • setting the  APT_DISABLE_COMBINATION environment variable it to true in the job properties
  • compiling the job and running the job again

This approach will, usually, provide a more meaningful identification of the stage with the error.

Note:  Please remember to remove the APT_DISABLE_COMBINATION environment variable before moving it to testing and/or releasing your code in production.

Message ID

IIS-DSEE-TBLD-00008

Error Message with combine enabled:

APT_CombinedOperatorController(1),0: Exception caught in processingInputRecord() for input “0”: APT_Decimal::ErrorBase: From: the source decimal has even precision, but non-zero in the leading nybble, or is too large for the destination decimal… Record dropped. Create a reject link to save the record if needed.

Error message with combine disabled

Tfm_Measures_calc,0: Exception caught in processingInputRecord() for input “0”: APT_Decimal::ErrorBase: From: the source decimal has even precision, but non-zero in the leading nybble or is too large for the destination decimal… Record dropped. Create a reject link to save the record if needed.

Note: Measures_calc is the stage name

Related References

DataStage – How to add or subtract from a date in a transformer stage

Advertisements

From time to time there is a need to adjust a date within a DataStage transformer.  It could be as small subtracting a day to set a batch processing date to yesterday or something bigger.  If you need to adjust a date by a year, month and/or day the DateOffsetByComponents function is the method.  The DateOffsetByComponents function can work with a fixed value or input from the job either; an input field, or a stage variable.  The main thing is knowing what to put in which position in the offset values with the understanding that offsets values can be positive or negative, thereby, allowing the addition and subtraction of different part of a date in a dataStage transformer.

Purpose of DateOffsetByComponents Function

  • This function is used to obtain a specific required date by applying offset to a given date.

Syntax

DateOffsetByComponents(<<Input Date>>, year offset, month offset, day of month offset)

How to subtract a year

DateOffsetByComponents(dsjobstartdate,-1, 0, 0)

How to add a year

DateOffsetByComponents(dsjobstartdate ,1, 0, 0)

How to subtract a month

DateOffsetByComponents(dsjobstartdate, 0, -1, 0)

How to add a month

DateOffsetByComponents(dsjobstartdate, 0, 1, 0)

How to subtract a day

DateOffsetByComponents(dsjobstartdate, 0, 0, -1)

How to add a day

DateOffsetByComponents(dsjobstartdate, 0, 0, 1)

Related References

IBM InfoSphere Licensing – DataStage and DataQuality

Advertisements

Licensing is one of the major factors which can limit Infosphere performance, flexibility, and extensibility. Licensing can affect you performance through the type of product which being licensed and the Quality of CPU’s Cores being licensed, and the operating system for the license was purchased.

The Type Of Product For Which Was Licensed Was Or Is Being Purchased

The type of product license purchased reduce your ability to tune and achieve optimal performance by reducing the ability to perform parallel processing.  For Example, the difference between the IBM InfoSphere DataStage licenses:

  • IBM InfoSphere DataStage Server Edition – A standalone ETL solution without full parallel processing framework capabilities.
  • IBM InfoSphere DataStage – Includes parallel processing capabilities to handle a massive volume, velocity, and variety of data.

Also, Information server packs and special environment licenses provide enhanced capability in specific environment scenarios.  For example:

  • IBM InfoSphere DataStage MVS Edition – Provides capability to generate COBOL programs that run natively on the mainframe to access and integrate mainframe data sources to load into a data warehouse.
  • IBM InfoSphere Information Server Pack for Oracle Applications – Allows data extraction from the entire Oracle E-Business Suite of applications, including Oracle Financials, Manufacturing, CRM and others, in a DataStage ETL job.
  • IBM InfoSphere Information Server Pack for PeopleSoft Enterprise – Provides a graphical way to extract data from PeopleSoft Enterprise in a DataStage ETL job.
  • IBM InfoSphere Information Server Pack for Salesforce.com – Enables a no-programming, meta-data based, fully configurable integration between salesforce.com and other enterprise applications and data repositories.

The Quality Of CPU’s Cores For The Licensed Was Or Is Being Purchased

The proper sizing of the license to acquire has a significant effect on DataStage’s ability to leverage the system CPU’s/cores and is frequently undersized.  While there is a balance between cost and performance, under sizing your license can add cost in support, as well as, increase processing times.  Furthermore, most performance issues reported to IBM are with system licensed for less than eight (8) CPU/Cores

Related References

  • Licensed IBM InfoSphere DataStage editions and feature packs activation and deactivation
  • Viewing a list of activated IBM InfoSphere DataStage editions and feature packs

Infosphere – decimal_from_string Conversion Error

Advertisements

This is another one of those nebulas error, which can kick out of DataStage, DataQuality, and/or DataClick.  This error can be particularly annoying because it doesn’t identify the field or even the precise command, which is causing the error.  So, there can be more than a field and/or more than one command causing the problem.

Error

Conversion error calling conversion routine decimal_from_string data may have been lost

Resolution

To resolve this error, check for the correct formatting (date format, decimal, and null value handling) before passing to DataStage StringToDate, DateToString, DecimalToString or StringToDecimal functions.  Additionally, even if the formatting is correct, you may need to embed commands to completely clear the issue.

Example

Here is a recent example of command embedding, which has cleared the issue, but I’m sure you will adapt this concept in other ways to meet all your needs.

DecimalToString( DecimalToDecimal( <>,’trunc_zero’),”suppress_zero”)

InfoSphere – JobRun Table, Run Type Codes

Advertisements

The JobRun table contains details of information about the IBM InfoSphere DataStage and QualityStage jobs that have run or are currently running.

Column name

  • RunType

Column Type Code

  • Reference (Ref)

Column Description

  • Column distinguishes a normal run from a validate or reset run.

Source View

  • runtyperef

Source View Structure

  • RUNTYPECODE Varchar(3 OCTETS)
  • RUNTYPENAME Varchar(32 OCTETS)
  • RUNTYPEDESCRIPTION Varchar(255 OCTETS)

Current Values

  RUNTYPECODE RUNTYPENAME   RUNTYPEDESCRIPTION
?!?Invalid valueA non-numeric value was found
???Unknown valueAn out-of-range numeric value was found
RESResetReset run
RUNRunNormal run mode
VALValidateValidation-only run

Related Links

How to Schedule an Infosphere Datastage Job in crontab

Advertisements

This is a quick, easy, shortcut way to schedule an InfoSphere DataStage job in Linux crontab to take advantage of capabilities within crontab not available in the InfoSphere graphical user interface (GUI).

For this example, the job has been adjusted from the stand InfoSphere scheduler graphical user interface (GUI) setting to run every 15 minutes, which is not available in the GUI.

The Basic crontab Scheduling Process

  • Schedule the job in DataStage Director
  • Login into Linux as the user, who created the schedule
  • Run ‘crontab -e’ command
  • Edit crontab command line using VI commands
  • Saves changes

Note: The revised schedule if different from the InfoSphere scheduler GUI standard setting will not display as changed in the GUI.  However, the jobs will run as scheduled, if edited correctly, and can be verified in the Director Client.

Related links: