Use DataStage Parallel Environment Variables To Improve Sequential File Performance
While extensive use of sequential files is not best practice, sometimes there is no way around it, due to legacy systems and/or existing processes. However, recently, I have encountered a number of customers who are seeing significant performance issues with sequential file-intensive processes. Sometimes it’s the job design, but often when you look at the project configuration they still have the default values. This is a quick and easy thing to check and adjust to get a quick performance win if they’ve not already been adjusted. These are delivered variables, but should seriously be considered for adjustment in nearly all data stage ETL projects. The adjustment must be based on the amount of available memory, the volume of workload that is sequential file intensive, and the environment you’re working in. Some experiential adjustment may be required, but I have provided a few recommendations below.
Environment Variable Properties
Category Name | Type | Parameter Name | Prompt | Size | Default Value |
Parallel > Operator Specific | String | APT_FILE_EXPORT_BUFFER_SIZE | Sequential write buffer size | Adjustable in 8 KB units. Recommended values for Dev: 2048; Test & Prod: 4096. | 128 |
Parallel > Operator Specific | String | APT_FILE_IMPORT_BUFFER_SIZE | Sequential read buffer size | Adjustable in 8 KB units. Recommended values for Dev: 2048; Test & Prod: 4096. | 128 |
Related References
IBM / Documenation / InfoSphere Information Server / 11.7 / Reading and writing files
IBM / Documenation / InfoSphere Information Server / 11.7 / APT_FILE_EXPORT_BUFFER_SIZE
Bert’s Blog / IBM InfoSphere DataStage – Parallel Environment Variables