Infosphere DataStage – Node Best Practices

In general, the performance and overall efficiency of your Datastage ETLs can be impacted by a number of items one of the more common of which is the configuration of nodes within infosphere.  Nodes, when properly configured,  allow Infosphere to perform Massively Parallel Processing (MPP) and the ultimate goal of your Node configuration in Infosphere is to provide the maximum ability to perform concurrent work, this includes concurrent read/write capability to both logical and physical drives.

So, here are a few pointers, which may help, if you haven’t already worked through them.

 To be most efficient nodes should be:

  • Nodes should be mapped to no more than  1 node for each core/CPU
  • Use multiple Configuration [node] Files aimed at different Cores/CPUs for small, medium, and large jobs
  • when mapping aligning Nodes to disk drive mappings keep these tips in mind for best results:

At a minimum, map each Node to one disk.

  • Map one drive (for physical drives, this means a separate read, write point/Spindle) for each resource disk and scratch disk mapped.
  • For scratch map multiple scratch spaces (temporary working space) to each node.
  • Perform scratch space maintenance on scratch disks mapped to physical drives to remove orphan files and free processing space.

When deciding which resources to use for scratch disks, keep in mind this performance hierarchy (ordered best to least performant):

  • RAM Disks
  • Flash Memory
  • Solid-State Drives (SSDs)
  • Hard Disk Drives (HDDs)