Business – The Benefits of a Data-Driven Organization and Datafication


What is a Data-driven organization and what are the benefits of a Data-driven organization? The answer depends on how your organization is structured and how you use data. Let’s explore a few examples, as well as the benefits of a Data-driven organization and Datafication. Then, let’s explore the business processes that support Data-driven organizations. In addition, we will explore how data can be used to improve customer service and drive booking growth.

Data-driven organizations

Organizations that embrace Datafication are renowned for improving decision-making processes, fatter bottom lines, and higher ROI. This new paradigm demands organizations to think outside their walls and silos to maximize data’s potential for growth. Organizations need to build a data infrastructure and employ data scientists to make sense of massive amounts of data. This is not an easy task, but companies that embrace Datafication can reap great rewards.

NCell, a Nepalese telecommunications company, shared its mobile call records with a data scientist in Sweden to improve disaster response. The two companies have teamed up to use data to increase their capacity to act as a catalyst for social change. In doing so, they have leveraged Datafication to make a real impact on society. Data-driven organizations aren’t only a good thing, they’re a necessary evil.

Big data is reshaping the world. It is transforming industries, enabling organizations to create better products and services and improve efficiency. However, the process of Datafication requires careful planning. Cities that implement Datafication are able to gain deep insights from their data, transforming it into valuable information that can be used to make better decisions and improve the quality of life. Datafication also requires that cities have a strategy for controlling the knowledge they extract.


Data-driven organizations and Datafication are transforming business processes. The process of making decisions based on data will allow you to maximize resources and streamline current processes to stay competitive. This type of transformation will require you to integrate IoT technologies, such as mobile devices, wearables, and Bluetooth beacons, into your business processes. Read on to learn more about the benefits of this type of transformation. Here are some examples of the ways it will benefit your organization.

‘Datafication’ is a recent technological trend that aims to turn almost every aspect of a business into data that can be tracked, monitored, and analyzed. The term “datafication” originated with Victor Mayer-Schoenberger and Kenneth Cukier in 2013. It is the transformation of an organization into a data-driven enterprise. It makes it easier to measure, monitor, and optimize critical business processes.

By using customer data, enterprises can craft relevant triggers for their target audiences. For instance, a company can gather data from social media, mobile devices, and apps. This data can be used to evaluate potential talent. Using data-driven processes to evaluate talent can also result in new personality measures. Data-driven organizations are resilient, innovative, and delighting customers. While it may be risky to enter new markets or develop new products, this type of organization is ready to take the risk.

A data-driven organization is one that recognizes the power of data and knows that it is impossible to make decisions without proper analysis. Instead, data-driven organizations refine their gathered data and use the information they mine for growth. Data-driven organizations use the right kind of data at the right time. This data may include customer behavior, demographic data, and survey responses. The process of datafication is a continuous process, and it starts at the top level of the organization.

Business processes

In today’s fast-paced environment, new terms are being created all the time. Words like datafication, gamification, and gamification are derived from two words and then linked with the -fication suffix. In simple terms, datafication refers to the process of making a business data-driven. In this way, data is collected from different sources and stored in a central location for tracking, monitoring, and optimization. Businesses can also acquire massive amounts of data and make the best use of them at a later time.

Moreover, data is considered the basic building block of knowledge and the engine for increased forms of intellectual knowing. It is also considered an aesthetic aspect of organizational life. Its aesthetic dimension, however, does not directly relate to organizational efficacy. This makes it necessary to engage in the opportunities offered by big data to transform organizational processes. If we can use data as an aesthetic, then our companies can engage in the process of Datafication.

Using data to make business decisions has many benefits for organizations. Data-driven organizations can outperform competitors by 6% in profitability, while outperforming their competition by 5% in productivity. They are 162% more likely to achieve their revenue goals and are 58% more likely to beat them. Consequently, it is essential to have a data culture for data-driven organizations to succeed. And it is essential to have a collaborative and data-driven workforce to realize this potential.

Ultimately, data-driven organizations can mitigate risk by allowing decision-makers to make decisions based on data and analytics. A successful Netflix show ‘House of Cards’, which is a political thriller, utilized vast amounts of data and predictive analytics. If you’re interested in becoming a Data-Driven Organization, consider these three concepts to transform your business. Once you do, you’ll be well on your way to becoming a Data-Driven Organization.


The benefits of a data-driven organization extend beyond the ability to respond faster to changes in the market. For instance, effective data analytics can help sales teams improve their performance. Data on consumer behavior can help marketing departments improve their strategies and build more effective remote teams. Using analytics to make more informed decisions about business growth and strategy is also an effective way to outsmart competitors. It also makes it easier for companies to spot new business opportunities.

A data-driven organization also makes it easier to scale, make changes, and implement new ideas. By having the data on hand, people can analyze it properly and act accordingly. This improves employee morale and loyalty. In addition to these benefits, a data-driven organization can better respond to changing market conditions and adapt to new market conditions. The benefits of data-driven organizations can be viewed across the entire organization. To find out more about the benefits of data-driven organizations, read on.

A data-driven organization is well-equipped to respond to changing customer expectations and meet their evolving needs. Innovating and breaking into new markets can be risky, but a data-driven organization is equipped to handle the rigor of these challenges. By using data to inform decision-making, data-driven organizations are more likely to delight their customers. Companies like Amazon have built their entire businesses around the exploitation of information.

The use of data-driven tools and analytics will strengthen your business. By incorporating data-driven processes into your organization, you will increase the quality of products and services and be better equipped to handle new challenges. And data-driven organizations can meet regulatory requirements with ease. Lastly, the benefits of becoming data-driven are many. You will be able to focus on the most important issues, and make better decisions, while simultaneously maximizing profits.


As organizations adopt the use of big data and analytics to improve the way they work, reporting is an essential component of the data-driven organization. The drive to report results may come from legal requirements, responsibilities, or internal cultural forces. However, it must be a central part of the decision-making process. The following are some tips to improve reporting in data-driven organizations and Datafication. Once you’ve mastered these basics, you’ll be well on your way to Datafication.

The power of data lies in its simplicity. Data-driven organizations are well-equipped to meet their customers’ demands and stay ahead of competitors. The innovation that these organizations are capable of is bound to impress even the most discerning customers. However, breaking into new markets, introducing new products, and innovating can be risky. Fortunately, this can be mitigated with the use of big data analytics.

The rise of big data has changed the way journalists tell stories. Increasing amounts of data can be used to better understand and track specific audiences. Big data analytics tools are ubiquitous in newsrooms, providing deeper insights into the reading habits of an audience. Ultimately, quantitative and qualitative insights can help journalists better serve their audiences. By combining quantitative and qualitative insights, data-driven reporting can improve the quality of stories and improve newsroom operations.

Big data is a powerful tool for business intelligence. With the right skills, Big Data can enhance decision-making and improve workflow. Big data can help improve patient care, increase sales, and improve overall efficiency. The Geisinger Health System in Pennsylvania has embraced data-driven care as an enterprise-wide strategy. And it hasn’t stopped there. Rather than focusing on individual projects and implementing innovative analytics, the company is now adopting an entire data-driven culture.

Advantages of a Data Driven Organization

Technology – Why Business Intelligence (BI) needs a Semantic Data Model


A semantic data model is a method of organizing and representing corporate data that reflects the meaning and relationships among data items. This method of organizing data helps end users access data autonomously using familiar business terms such as revenue, product, or customer via the BI (business intelligence) and other analytics tools. The use of a semantic model offers a consolidated, unified view of data across the business allowing end-users to obtain valuable insights quickly from large, complex, and diverse data sets.

What is the purpose of semantic data modeling in BI and data virtualization?

A semantic data model sits between a reporting tool and the original database in order to assist end-users with reporting. It is the main entry point for accessing data for most organizations when they are running ad hoc queries or creating reports and dashboards. It facilitates reporting and improvements in various areas, such as:

  • No relationships or joins for end-users to worry about because they’ve already been handled in the semantic data model
  • Data such as invoice data, salesforce data, and inventory data have all been pre-integrated for end-users to consume.
  • Columns have been renamed into user-friendly names such as Invoice Amount as opposed to INVAMT.
  • The model includes powerful time-oriented calculations such as Percentage in sales since last quarter, sales year-to-date, and sales increase year over year.
  • Business logic and calculations are centralized in the semantic data model in order to reduce the risk of incorrect recalculations.
  • Data security can be incorporated. This might include exposing certain measurements to only authorized end-users and/or standard row-level security.

A well-designed semantic data model with agile tooling allows end-users to learn and understand how altering their queries results in different outcomes. It also gives them independence from IT while having confidence that their results are correct.

Extraction, Transformation & Loading Vs. Enterprise Application Integration


Over recent years, business enterprises relying on accurate and consistent data to make informed decisions have been gravitating towards integration technologies. The subject of Enterprise Application Integration (EAI) and Extraction, Transformation & Loading (ETL) lately seems to pop up in most Enterprise Information Management conversations.

From an architectural perspective, both techniques share a striking similarity. However, they essentially serve different purposes when it comes to information management. We’ve decided to do a little bit of research and establish the differences between the two integration technologies.

Enterprise Application Integration

Enterprise Application Integration (EAI) is an integration framework that consists of technologies and services, allowing for seamless coordination of vital systems, processes, as well as databases across an enterprise.

Simply put, this integration technique simplifies and automates your business processes to a whole new level without necessarily having to make major changes to your existing data structures or applications.

With EAI, your business can integrate essential systems like supply chain management, customer relationship management, business intelligence, enterprise resource planning, and payroll. Well, the linking of these apps can be done at the back end via APIs or the front end GUI.

The systems in question might use different databases, computer languages, exist on different operating systems or older systems that might not be supported by the vendor anymore.

The objective of EAI is to develop a single, unified view of enterprise data and information, as well as ensure the information is correctly stored, transmitted, and reflected. It enables existing applications to communicate and share data in real-time.

Extraction, Transformation & Loading

The general purpose of an ETL system is to extract data out of one or more source databases and then transfer it to a target destination system for better user decision making. Data in the target system is usually presented differently from the sources.

The extracted data goes through the transformation phase, which involves checking for data integrity and converting the data into a proper storage format or structure. It is then moved into other systems for analysis or querying function.

With data loading, it typically involves writing data into the target database destination like data warehouse and operational data store.

ETL can integrate data from multiple systems. The systems we’re talking about in this case are often hosted on separate computer hardware or supported by different vendors.

Differences between ETL and EAI

EAI System

  • Retrieves small amounts of data in one operation and is characterized by a high number of transactions
  • EAI system is utilized for process optimization and workflow
  • The system does not require user involvement after it’s implemented
  • Ensures a bi-directional data flow between the source and target applications
  • Ideal for real-time business data needs
  • Limited data validation
  • Integrating operations is pull, push, and event-driven.

ETL System

  • It is a one-way process of creating a historical record from homogeneous or heterogeneous sources
  • Mainly designed to process large batches of data from source systems
  • Requires extensive user involvement
  • Meta-data driven complex transformations
  • Integrating operation is a pull, query-driven
  • Supports proper profiling and data cleaning
  • Limited messaging capabilities

Both integration technologies are an essential part of EIM, as they provide strong capabilities for business intelligence initiatives and reporting. They can be used differently and sometimes in mutual consolidation.

Technology – Using Logical Data Lakes


Today, data-driven decision making is at the center of all things. The emergence of data science and machine learning has further reinforced the importance of data as the most critical commodity in today’s world. From FAAMG (the biggest five tech companies: Facebook, Amazon, Apple, Microsoft, and Google) to governments and non-profits, everyone is busy leveraging the power of data to achieve final goals. Unfortunately, this growing demand for data has exposed the inefficiency of the current systems to support the ever-growing data needs. This inefficiency is what led to the evolution of what we today know as Logical Data Lakes.

What Is a Logical Data Lake?

In simple words, a data lake is a data repository that is capable of storing any data in its original format. As opposed to traditional data sources that use the ETL (Extract, Transform, and Load) strategy, data lakes work on the ELT (Extract, Load, and Transform) strategy. This means data does not have to be first transformed and then loaded, which essentially translates into reduced time and efforts. Logical data lakes have captured the attention of millions as they do away with the need to integrate data from different data repositories. Thus, with this open access to data, companies can now begin to draw correlations between separate data entities and use this exercise to their advantage.

Primary Use Case Scenarios of Data Lakes

Logical data lakes are a relatively new concept, and thus, readers can benefit from some knowledge of how logical data lakes can be used in real-life scenarios.

To conduct Experimental Analysis of Data:

  • Logical data lakes can play an essential role in the experimental analysis of data to establish its value. Since data lakes work on the ELT strategy, they grant deftness and speed to processes during such experiments.

To store and analyze IoT Data:

  • Logical data lakes can efficiently store the Internet of Things type of data. Data lakes are capable of storing both relational as well as non-relational data. Under logical data lakes, it is not mandatory to define the structure or schema of the data stored. Moreover, logical data lakes can run analytics on IoT data and come up with ways to enhance quality and reduce operational cost.

To improve Customer Interaction:

  • Logical data lakes can methodically combine CRM data with social media analytics to give businesses an understanding of customer behavior as well as customer churn and its various causes.

To create a Data Warehouse:

  • Logical data lakes contain raw data. Data warehouses, on the other hand, store structured and filtered data. Creating a data lake is the first step in the process of data warehouse creation. A data lake may also be used to augment a data warehouse.

To support reporting and analytical function:

  • Data lakes can also be used to support the reporting and analytical function in organizations. By storing maximum data in a single repository, logical data lakes make it easier to analyze all data to come up with relevant and valuable findings.

A logical data lake is a comparatively new area of study. However, it can be said with certainty that logical data lakes will revolutionize the traditional data theories.

Infosphere Information Server (IIS) – Where you can view DataStage and QualityStage Logs?


During the course of the week, the discussion happened regarding the different places where a person might read the DataStage and QualityStage logs in InfoSphere. I hadn’t really thought about it, but here are a few places that come to mind:

  • IBM InfoSphere DataStage and QualityStage Operations Console
  • IBM InfoSphere DataStage and QualityStage Director client
  • IBM InfoSphere DataStage and QualityStage Designer client by pressing Ctrl+L

Related Reference

IBM Knowledge Center> InfoSphere Information Server 11.7.0 > InfoSphere DataStage and QualityStage > Monitoring jobs

IBM Knowledge Center > InfoSphere Information Server 11.7.0 > Installing > Troubleshooting software installation > Log files

Essbase Connector Error – Client Commands are Currently Not Being Accepted


DataStage Essbase Connector

While investigating a recent Infosphere Information Server (IIS), Datastage, Essbase Connect error I found the explanations of the probable causes of the error not to be terribly meaningful.  So, now that I have run our error to ground, I thought it might be nice to jot down a quick note of the potential cause of the ‘Client Commands are Currently Not Being Accepted’ error, which I gleaned from the process.

Error Message Id


Error Message

An error occurred while processing the request on the server. The error information is 1051544 (message on contacting or from application:[<<DateTimeStamp>>]Local////3544/Error(1013204) Client Commands are Currently Not Being Accepted.

Possible Causes of The Error

This Error is a problem with access to the Essbase object or accessing the security within the Essbase Object.  This can be a result of multiple issues, such as:

  • Object doesn’t exist – The Essbase object didn’t exist in the location specified,
  • Communications – the location is unavailable or cannot be reached,
  • Path Security – Security gets in the way to access the Essbase object location
  • Essbase Security – Security within the Essbase object does not support the user or filter being submitted. Also, the Essbase object security may be corrupted or incomplete.
  • Essbase Object Structure –  the Essbase object was not properly structured to support the filter or the Essbase filter is malformed for the current structure.

Related References

IBM Knowledge Center, InfoSphere Information Server 11.7.0, Connecting to data sources, Enterprise applications, IBM InfoSphere Information Server Pack for Hyperion Essbase

Printable PDF Version of This Article


DataStage – How to Pass the Invocation ID from one Sequence to another

DataStage Invocation ID Passing Pattern Overview

When you are controlling a chain of sequences in the job stream and taking advantage of reusable (multiple instances) jobs it is useful to be able to pass the Invocation ID from the master controlling sequence and have it passed down and assigned to the job run.  This can easily be done with needing to manual enter the values in each of the sequences, by leveraging the DSJobInvocationId variable.  For this to work:

  • The job must have ‘Allow Multiple Instance’ enabled
  • The Invocation Id must be provided in the Parent sequence must have the Invocation Name entered
  • The receiving child sequence will have the invocation variable entered
  • At runtime, a DataStage invocation id instance of the multi-instance job will generate with its own logs.

Variable Name

  • DSJobInvocationId


This approach allows for the reuse of job and the assignment of meaningful instance extension names, which are managed for a single point of entry in the object tree.

Related References: 

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and QualityStage > Designing DataStage and QualityStage jobs > Building sequence jobs > Sequence job activities > Job Activity properties

DataStage – How to use single quoted parameter list in an Oracle Connector

Data Integration

While working with a client’s 9.1 DataStage version, I ran into a situation where they wanted to parameterize SQL where clause lists in an Oracle Connector stage, which honestly was not very straight forward to figure out.  First, if the APT_OSL_PARAM_ESC_SQUOTE is not set and single quotes are used in the parameter, the job creates unquoted invalid SQL when the parameter is populated.  Second, I found much of the information confusing and/or incomplete in its explanation.   After some research and some trial and error, here is how I resolved the issue.  I’ll endeavor to be concise, but holistic in my explanation.

When this Variable applies

This where I know this process applies, there may be other circumstances to which is this applicable, but I’m listing the ones here with which I have recent experience.

Infosphere Information Server Datastage

  • Versions 91, 11.3, and 11.5

Oracle RDBMS

  • Versions 11g and 12c

Configurations process

Here is a brief explanation of the steps I used to implement the where clause as a parameter.  Please note that in this example, I am using a job parameter to populate on a portion of the where clause, you can certainly pass the entire where clause as a parameter, if it is not too long.

Configure Project Variable in Administrator

  • Add APT_OSL_PARAM_ESC_SQUOTE to project in Administrator
  • Populate the APT_OSL_PARAM_ESC_SQUOTE Variable

Create job parameter

Following your project name convention or standard practice, if you customer and/or project do not have established naming conventions, create the job parameter in the job. See jp_ItemSource parameter in the image below.

Job Parameter In Oracle Connector

Add job parameter to Custom SQL in Select Oracle Connector Stage

On the Job parameter has been created, add the job parameter to the SQL statement of the job.

Job Parameter In SQL

Related References

IBM Knowledge Center > InfoSphere Information Server 11.5.0

Connecting to data sources > Databases > Oracle databases > Oracle connector

IBM Support > Limitation of the Parameter APT_OSL_PARAM_ESC_SQUOTE on Plugins on Parallel Canvas

IBM Knowledge Center > InfoSphere Information Server 11.5.0

InfoSphere DataStage and Quality > Stage > Reference > Parallel Job Reference > Environment Variables > Miscellaneous > APT_OSL_PARAM_ESC_SQUOTE

OLTP vs Data Warehousing


OLTP Versus Data Warehousing

I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager, and another. You get the picture.
In this article I will explain this very same thing to you. So you know  how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.


OLTP stands for OLine Transactional Processing. With other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnoses and a treatment plan in there as well. Doctors, nurses and other people working with patients use this system on a daily basis to enter and get detailed information on their patients.
The way the data is stored within operational systems is so the data can be used efficiently by the people working directly on the product, or with the patient in this case.

Data Warehousing

A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day to day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnoses that takes more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.

The difference between OLTP and Data Warehousing

This is how a Data Warehouse works:

The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transforms the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that reports are formed with a reporting tool, from the data that lies in the Data Warehouse.

This is how OLTP works:

Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports form the operational databases.

Pro’s and Con’s

Data Warehousing


  • There is no strain on the operational systems during business hours
    • As you can schedule the ETL processes to run during the hours the least amount of people are using the operational system, you won’t disturb the operational processes. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
  • Data from different systems can be combined
    • It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
  • Data is optimized for making queries and reports
    • You use different data in reports than you use on a day to day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month and year is saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
  • Data is saved longer than in the source systems
    • The source systems need to have their old records deleted when they are no longer used in the day to day operations. So they get deleted to gain performance.


  • You always look at the past
    • A Data Warehouse is updated once a night, or even just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients are in the hospital are right now. Or what surgeon didn’t show up on time this morning.
  • You don’t have all the data
    • A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
  • Data isn’t the same as the data in the source systems
    • Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example: the Data Warehouse at the hospital excluded all transactions that were marked as cancelled. If you try to get the same reports from both systems, and don’t exclude the cancelled transactions in the source system, you’ll get different results.

online transactional processing (OLTP)


  • You get real time data
    • If someone is entering a new record now, you’ll see it right away in your report. No delays.
  • You’ve got all the details
    • You have access to all the details that the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.


  • You are putting strain on an application during business hours.
    • When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day to day operations. And if you make an error, by for instance forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
  • You can’t compare the data with data from other sources.
    • Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data is always going to be different because it is granulated on a different level, or not all data is relevant for both systems.
  • You don’t have access to old data
    • To keep the applications at peak performance, old data, that’s irrelevant to day to day operations is deleted.
  • Data is optimized to suit day to day operations
    • And not for report making. This means you’ll have to get creative with your queries to get the data you need.

So what method should you use?

That all depends on what you need at that moment. If you need detailed information about things that are happening now, you should use OLTP.
If you are looking for trends, or insights on a higher level, you should use a Data Warehouse.

 Related References

Oracle – How to get a list of user permission grants


Since the Infosphere, information server, repository, has to be installed manually with the scripts provided in the IBM software, sometimes you run into difficulties. So, here’s a quick script, which I have found useful in the past to identify user permissions for the IAUSER on Oracle database’s to help rundown discrepancies in user permissions.



WHERE  GRANTEE = ‘iauser’

If we cannot run against the ALL_TAB_PRIVS view, then we can try the ALL_TAB_PRIVS view:



WHERE  GRANTEE = ‘iauser’

Related References

oracle help Center > Database Reference > ALL_TAB_PRIVS view

What are the Core Capability of Infosphere Information Server?


Three Core Capabilities of Information Server

InfoSphere Information Server (IIS) has three core capabilities:

  • Information Governance
  • Data Integration
  • Data Quality

What the Core Capabilities Provide

The three-core capability translate in to the high-level business processes:

Information Governance – Understand and collaborate

Provides a centrally managed repository and approach, which provides:

  • Information blueprints
  • Relationship discovery across data sources
  • Information technology (IT)-to-business mapping

Data Integration – Transform and deliver

A data integration capability, which provides:

  • Transformation
    • Massive scalability
    • Power for any complexity
    • Total traceability
  • Delivery
    • Data capture at any time
    • Delivery anywhere
    • Big data readiness

Data Quality – Cleanse and monitor

To turn data assets into trusted information:

  • Analysis & validation
  • Data cleansing
  • Data quality rules & management

Related References

IBM Knowledge Center, InfoSphere Information Server Version 11.5.0

Overview of IBM InfoSphere Information Server, Introduction to InfoSphere Information Server

InfoSphere Information Server Suites


The Six Suites of Infosphere Information Server

InfoSphere Information Server 11 Series is, primarily, made up of 6 suites / packages, which provide a mix of components:

  • InfoSphere Information Governance Catalog
  • BigInsights BigIntegrate
  • BigInsights
  • InfoSphere Information Server for Data Integration
  • InfoSphere Information Server for Data Quality
  • InfoSphere Information Server Enterprise Edition (is inclusive of all 5 packages above)

IIS capabilities / components

These License suites are comprised of differing combinations IIS capabilities / components, which consist of:

  • InfoSphere DataStage®
  • InfoSphere QualityStage®
  • InfoSphere DataStage and QualityStage Designer
  • InfoSphere Data Click
  • InfoSphere FastTrack
  • InfoSphere Information Analyzer
  • InfoSphere Information Governance Catalog
  • InfoSphere Information Governance Dashboard
  • InfoSphere Information Services Director

Additional Purchasable Infosphere capabilities

Additional Infosphere capabilities can be licensed / purchased to supplement the primary suite licenses:

  • Cognos® Business Intelligence
  • BigInsights Data Scientist
  • InfoSphere Change Data Delivery
  • InfoSphere Data Architect

Table of Component /capability Alignment to IIS Suites


InfoSphere Information Server for Data Integration

InfoSphere Information Server for Data Quality

InfoSphere Information Governance Catalog

BigInsights BigIntegrate


InfoSphere Information Server Enterprise Edition



InfoSphere DataStage®


InfoSphere QualityStage®


InfoSphere DataStage and QualityStage Designer


InfoSphere Data Click


InfoSphere FastTrack


InfoSphere Information Analyzer


InfoSphere Information Governance Catalog


InfoSphere Information Governance Dashboard

InfoSphere Information Services Director



* = usage and license restriction may apply.

Related References

IBM Knowledge Center, InfoSphere Information Server, Version 11.5.0,

Overview of IBM InfoSphere Information Server, Introduction to InfoSphere Information Server, Components in the InfoSphere Information Server suite

SFDC – Using a timestamp literal in a where clause


Salesforce Connector

Working with timestamp literals in the Infosphere SFDC Connector soql is much like working date literals.  So, here a quick example which may save you some time.

SOQL Timestamp String Literals Where Clause Rules

Basically, the timestamp pattern is straight forward and like the process for dates, but there are some differences. The basic rules are for a soql where clause:

  • No quotes
  • No functions
  • No Casting function, or casting for the where soql where clause to read it
  • It only applies to datetime fields
  • A Timestamp identifier ‘T’
  • And the ISO 1806 time notations

Example SOQL Timestamp String Literals

So, here are a couple of timestamp string literal examples in SQL:

  • 1901-01-01T00:00:00-00:00
  • 2016-01-31T00:00:00-00:00
  • 9999-10-31T00:00:00-00:00

Example SQL with Timestamp String Literal Where Clause


Select e.Id,



From Event e

WHERE e.StartDateTime > 2014-10-31T00:00:00-00:00


Related References

Salesforce Developer Documentation

Home, Developer Documentation, SOQL and SOSL Reference

Salesforce Workbench

Home, Technical Library, Workbench


Date Time Formats


SFDC Salesforce Connector – Column Returns Null values, when SOQL Returns Data in Workbench


Salesforce Connector

Recently, encountered a scenario, which is a little out of the norm while using the SFDC Connector.  Once the issue is understood, it is easily remedied.

The problem / Error

  • SOQL run in Salesforce workbench and column returns data
  • The DataStage job/ETL runs without errors or warnings
  • The target column output only returns null values

The Cause

In short the cause is a misalignment between the SOQL field name and the column name in the columns tab of the connector.

The Solution

The fix is simply to convert the dots in the field name to underscores.   Basically, a field name on SOQL of Account.RecordType.Name becomes Account_RecordType_Name.

Example Field / Column Name  Fix

Example SQL

Select c.Id,







From Contact c

Columns Tab With Correct Naming Alignment

Please note that the qualifying dots have been converted to underscores.

SFDC Connector Columns Tab

Related References


Data Modeling – Fact Table Effective Practices


Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Data Modeling – Dimension Table Effective Practices


I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

Datastage – When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0

APT_TSortOperator Warning

The APT_TSortOperator  warning happens when there is a conflict in the portioning behavior between stages.  Usually, because the successor (down Stream) stage has the ‘Partitioning / Collecting’ and ‘Sorting’ property set in a way that conflicts with predecessor (upstream) stage’s properties, which it is set to preserver.  This can occur when the successor stage has the “Preserve Partitioning” property set to:

  • ‘Default (Propagate)’
  • ‘Propagate’, or
  • ‘Set’
Preserve Partitioning Property – list

Message ID

  • IIS-DSEE-TFOR-00074

Message Text

  • <<Link Name Where Warning Occurred>>: When checking operator: Operator of type “APT_TSortOperator”: will partition despite the preserve-partitioning flag on the data set on input port 0.

Warning Fixes

  • First, if the verify that the partitioning behaviors of both stages are correct
  • If so, set the predecessor ‘Preserve Partitioning’ property to “Clear”
  • If not, then correct the partitioning behavior of the stage which is in error

Clear Partitioning Property Screenshot

Preserve Partitioning Property – Set To Clear

Infosphere DataStage – Boolean Handling for Netezza


Beware when you see this message when working with Boolean in DataStage, the message displays as informational (at list it did for me) not as a warning or an error.  Even though it seems innocuous, what it meant for my job, was the Boolean (‘true’ / ‘false’) was not being interpreted and everything posted to ‘false’.

In DataStage the Netezza ‘Boolean’ field/Data SQL type maps to the ‘Bit’ SQL type, which expects a numeric input of Zero (0) or one (1).  So, my solution (once I detected the problem during unit testing) was to put Transformer Stage logic in place to convert the Boolean input to the expected number value.

Netezza to Datastage Data Type Mapping

Netezza data types

InfoSphere DataStage

data types (SQL types)

Expected Input value

BOOLEANBit0 or 1 (1 = true, 0 = false)

Transformer Stage logic Boolean Handling Logic

A Netezza Boolean field can store: true values, false values, and null. So, some thought should be given to you desired data outcome for nulls

This first example sets a that the nulls are set to a specific value, which can support a specific business rule for null handling and, also, provide null handling for non-nullable fields.  Here we are setting nulls to the numeric value for ‘true’ and all other non-true inputs to ‘false’.

If isnull(Lnk_Src_In.USER_ACTIVE) then 1 Else if Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

These second examples sets a that the nulls are set by the Else value, if your logic direction is correct value and still provides null handling for non-nullable fields.

  • If  Lnk_Src_In.USER_ACTIVE = ‘true’ Then 1 Else 0

  • If  Lnk_Src_In.USER_ACTIVE = ‘False’ Then 0 Else 1

Director Log Message

Message ID

  • IIS-DSEE-TBLD-00008

Message Text

  • <<Link Name Where Message Occurred>>: Numeric string expected. Use default value.

Or something like this:

  • <<Link Name Where Message Occurred>>: Numeric string expected for input column ‘<<Field Name Here>>‘. Use default value.

Related References


PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza user-defined functions, UDX data types reference information, Supported data types, Boolean

Data types and aliases

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza stored procedures, NZPLSQL statements and grammar, Variables and constants, Data types and aliases

Logical data types

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Data types, Logical data types

Data type conversions from Netezza to DataStage

InfoSphere Information Server, InfoSphere Information Server 11.5.0, Connecting to data sources, Databases, Netezza Performance Server, Netezza connector, Designing jobs by using the Netezza connector, Defining a Netezza connector job, Data type conversions, Data type conversions from Netezza to DataStage

InfoSphere DataStage – Operations Manual Template


When projects transition to production, the receiving team need some transition documentation to help with knowledge transfer and to guide them while they get accustomed to operating the application and can form their own documentation.  This temple is a shell, which usually provides enough detail, with some updating, to provide a quick reference regarding the application.  Also, having a temple can be a real time-saver, as opposed to writing it from scratch.

Related References

Infosphere DataStage – Designer Client Repository Structure


Default Repository Structure

When a project is created, there is a default repository structure created for use in the DataStage designer client.

However, some additional organization will be required for most DataStage projects.  Usually, this organization occurs in in these areas:

  • Addition of structure within the “Jobs” folder
  • Addition of a “Parameter Sets” folder
  • Addition of structure within the “Table Definitions” folder
  • Addition of a “Developer Work Area” folder

Repository Structure within the “Jobs” folder

Below is a sample of a folder structure for multiple applications that share a common Repository.  Pattern includes, but does not illustrate all other delivered folders. In addition to the core folder structure, developers can create individual working, test, and in progress folders, which do not migrate, but keep work segregated.

Parameter Sets Folders

The parameter set folders or for two sets of information.

  • First, are the database parameters, which include data connections and the attached parameter sets.
  • The second, for job parameters, which may include parameter sets, for things like e-mail parameters, surrogate key file paths, etc.; which is a best practice, rather creating them as project level parameters.

Table Definitions

The Tables Definition folder have folders added to segregate the imported meta data for source and target system and, in some case, may need folders to logically organize imported meta which may reside within the same database and/or schema, but belong to different logical layer.

InfoSphere DataStage – DataStage Parallel Job Peer Code Review Checklist Template


Peer code review happens during the development phase and focuses on the overall quality and compliance to standards of code and configuration artifacts. However, the hard part of performing a Peer code review isn’t, performing the review, but rather to achieving consistency and thoroughness in the review.   This is where a checklist can contribute significantly, providing a list of things to check and providing a relative weight for the findings.  I hope this template assists with your DataStage job review process.

Infosphere Datastage – Standard Practice- Sequence Naming Conventions


Standard practices help you and other understand your work.  This can be very important when working on large teams, working across team boundaries, or when large complex sets of process and objects may be involved.  When you consider the importance of naming convention, when coupled with standard practice, the benefit should be obvious, but often practice doesn’t execute or document their conventions.  So, these standard naming conventions may help when none exist or you need to assemble your own naming conventions.

<<SomeIdentifier >> = should be replaced with appropriate information

  1. Sequence Object Naming Conventions
Master Control Sequence (parent)Master_<<Application>>_ <<Application Job Stream Name>>_Seq
  1. Sequence Stage Naming Conventions
End LoopEL__<<PrimaryFunction>>
Error Handler EH_<<PrimaryFunction>>
Execute CommandEC_<<PrimaryFunction>>
Job Activity Job_<<PrimaryFunction>>
Nested Condition NC_<<PrimaryFunction>>
Notify Notify_<<PrimaryFunction>>
Routine Activity Rtn_<<PrimaryFunction>>
Sequence Links (messages) Msg_<<Number or Description>>
Sequence Links (Non-messages) Lnk_<<Number or Description>>
SequencerSeqr_<<Number or Description>>
Sequencer (All) SeqAll_<<Identifier>>
Sequencer (Any) SeqAny_<<Identifier>>
Start LoopSL__<<PrimaryFunction>>
Terminator ActivityTA__<<PrimaryFunction>>
User VariablesUV__<<PrimaryPurpose>>
Wait For File WFF__<<PrimaryFunction>>

InfoSphere DataStage – Ways to Create a Datastage Parameter Set


There are three primary ways to create a parameter sets and is a different practice from adding ‘User Defined’ variables in InfoSphere DataStage Administrator. The ways to create a parameter set are:

  • Create a parameter set from a data connection stage
  • Create a Parameter Set from the navigation of DataStage designer, and
  • Create a Parameter Set from a job

Create a parameter set from Data Connection Stage

This is used to create parameter sets for Database connections parameters

To create a new Parameter Set from a Data Connection

  • Select: File > New > Other and select “Data Connection
  • Complete the data Connection stage properties, then save the stage.
  • Open the Connection stage and navigate to the “Parameters” Tab
  • Then, click on the “Associated Parameter Set” button, and Chose the “Create & Attach” menu item
  • This will Launch a Dialog
  • Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
  • In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set
  • On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
  • Click OK to save the Parameter set.

Create a Parameter Set from the navigation of DataStage designer

This is, perhaps, the more traditional way of creating a parameter set.

To create a new Parameter Set

  • Select: File > New > Other and select “Parameter Set”
  • This will Launch a Dialog
  • Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:
  • In the Parameters Tab, enter in the Parameters you wish to include in this Parameter Set.
Note: Existing Environment Variables can also added.
  • Create a Parameter Set from a job
    On the Values tab, specify a Value File name (please follow naming convention standards to prevent rework and other problems). This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.
  • Click OK to save the Parameter set.

This approach is, perhaps, less traditional, but is equally effective, if you find yourself creating additional jobs and now need to share the same parameters.  This is a quick and easy to generate a parameter set from an existing job.

To create a new Parameter Set from a job

  • Open the job that you want to create a parameter set for.
  • Click “Edit > Job Properties” to open the “Job Properties” window.
  • Click the “Parameters” tab.
  • Press and hold the Ctrl key, then select the parameters that you want to include in the parameter set.
  • With your parameters highlighted, click “Create Parameter Set”.  The Parameter Set window opens.
    • Enter a name and short description for your parameter set.
    • Click the “Parameters” tab; the parameters that you selected are listed.
    • Click the ”Values” tab.
    • Enter a name in the Value File name field, then press Enter.  The value for each of your parameters is automatically populated with the path name that you entered.
    • If a default value is not already set, enter a value for each parameter. For example, if the variable is a Pathname type, enter a default path name.
    • Click “OK” to close the Parameter Set window.
    • In the Save Parameter Set As window, select the folder where you want to save your parameter set and click Save. When prompted to replace the selected parameters with the parameter set, click Yes.
  • Click “OK” to close the Job Properties window.

Netezza / PureData – How to add comments on a field


The ‘Comment on Column’ provides the same self-documentation capability as ‘Comment On table’, but drives the capability to the column field level.  This provides an opportunity to describe the purpose, business meaning, and/or source of a field to other developers and users.  The comment code is part of the DDL and can be migrated with the table structure DDL.  The statement can be run independently or working with Aginity for PureData System for Analytics, they can be run as a group, with the table DDL, using the ‘Execute as a Single Batch (Ctrl+F5) command.

Basic ‘COMMENT ON field’ Syntax

  • The basic syntax to add a comment to a column is:

COMMENT ON COLUMN <<Schema.TableName.ColumnName>> IS ‘<<Descriptive Comment>>’;

Example ‘COMMENT ON Field’ Syntax

  • This is example syntax, which would need to be changed and applied to each column field:

COMMENT ON COLUMN time_dim.time_srky IS ‘time_srky is the primary key and is a surrogate key derived from the date business/natural key’;

Netezza / PureData – Table Documentation Practices


Applying a few table effective practices can:

  • Significantly aide others in understanding the purpose and use of tables and column fields
  • Provide more migratable meta data to inform and enable the capabilities of other tools
  • And, from a self-preservation point of view, reduce the number of meetings, emails, and phone required to explain and clarify what the table and fields business meaning and roles

Among the Effective practices when modeling and design tables are:

  • Add Primary Key
  • Add table comments
  • Add column field comments
  • Add ‘Organize by’ fields to provide essential optimization on frequently used and/or key performance fields

*DataStage*DSR_PROJECT (Action=8); check DataStage is set up correctly in project


Basically, the Action=8 error, which I normally see when opening the DataStage Director Client application, means that one or more of the RT_LOG files have become corrupted.  Usually, this problem occurs in relation to disk space issues; although, there can be other causes.

Error Message

Error calling subroutine: *DataStage*DSR_PROJECT (Action=8); check DataStage is set up correctly in the project

(Subroutine failed to complete successfully (30107))

The Cleanup approach

The cleanup process really consists of three primary steps:

  • Free disk space
  • Restart the application process
  • And, fix corrupted log

Free Disk Space

This can consist of:

  • Cleaning ‘/tmp’ Space
  • Removing any large unnecessary files
  • Enlarging ‘/tmp’ space allocation
  • Adding addition disks space, if necessary

Restart Application Processes

Once you have free the disk space available restarting VM/server is recommended, However, if that is not a realistic option, then at least reboot the Infosphere Datastage engine to ensure the newly freed memory is registering with the applications and to ensure everything is restarted and running.

Fix Corrupted logs

Perhaps, the cleanest way reset all logs is to perform a ‘Multiple Job Compile’.  Running the jobs will also overwrite the logs, but is a little more hit and miss, if not all the jobs are not in job streams/batches, which can be run at this time.  The logs can be manually overwritten by compiling the job or performing a reset.  The trick, with manual reset, is that you have to know which job to reset, so, this could take a while to get them all. The logs can be manually, dropped and reset, but I recommend that approach only as a last resort.

*DataStage*DSR_SELECT (Action=3); check DataStage is set up correctly in project


Having encountered this DataStage client error in Linux a few times recently, I thought I would document the solution, which has worked for me.

Error Message:

Error calling subroutine: *DataStage*DSR_SELECT (Action=3); check DataStage is set up correctly in project

(Subroutine failed to complete successfully (30107))

Probable Cause of Error

  • NodeAgents has stopped running
  • Insufficient /temp disk space

Triage Approach

To fix this error in Linux:

  • Ensure disk space is available and you may want clean up the /tmp directory of any excel non-required files.
  • Start the, if it is not running

Command to verify Node Agent is running

ps -ef | grep java | grep Agent

Command to Start Node Agent

This example command assumes the shell script is in its normal location, if not you will need to adjust the path.

/opt/IBM/InformationServer/ASBNode/bin/ start

Node Agent Logs

These logs may be helpful:

  • asbagent_startup.err
  • asbagent_startup.out

Node Agent Logs Location

This command will get you to where the logs are normally located:

cd /opt/IBM/InformationServer/ASBNode/

Netezza / PureData – ERROR [HY000] ERROR: Bad timestamp external representation ‘0000-00-00’


While working on a recent data conversion a data timestamp error occurred, where the legacy code and data was ‘0000-00-00’ default stamp, which was causing errors in Netezza/PureData Analytics (PDA).  ‘0000-00-00’ is not a valid timestamp in Netezza and will produce a ‘ ERROR [HY000] ERROR:  Bad timestamp external representation ‘0000-00-00’’ notice.

When handling default dates in SQL and ETL’s, generally speaking, ‘1901-01-01’ or ‘0001-01-01’ as use as defaults for null and/or invalid dates.  Both of these dates and or timestamps will insert into Netezza, which I used a quick, simple, proof table to demonstrate, which were SQL extracted and in the table below.

Valid Defaults dates in Netezza

11901-01-01 00:00:00
2 00:00:00

Related References

Temporal data types

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Data types, Temporal data types

Netezza date/time data type representations

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza user-defined functions, Data type helper API reference, Temporal data type helper functions, Netezza date/time data type representations

Date/time functions

PureData System for Analytics, PureData System for Analytics 7.2.1, IBM Netezza database user documentation, Netezza SQL basics, Netezza SQL extensions, Date/time functions

What is the URL for InfoSphere Information Analyzer


This is one of those things, which usually comes up during new installs.   To access the Infosphere information analyzer thin client there are two approaches. First, is via the Infosphere launchpad page and, the second is to go directly to the information analyzer page. Both URLs are provided in below.

InfoSphere LaunchPad URL

https:// <<Host_Server>> : 9443/ibm/iis/launchpad/

Information Analyzer URL 

https:// <<Host_Server>> : 9443/ibm/iis/dq/da/login.jsp

Related References