Why Unit Testing is Important

Advertisements

Whenever a new application is in development, unit testing is a vital part of the process and is typically performed by the developer. During this process, sections of code are isolated at a time and are systematically checked to ensure correctness, efficiency, and quality. There are numerous benefits to unit testing, several of which are outlined below.

1. Maximizing Agile Programming and Refactoring

During the coding process, a programmer has to keep in mind a myriad of factors to ensure that the final product correct and as lightweight, as is possible for it to be. However, the programmer also needs to make certain that if changes become necessary, refactoring can be safely and easily done.

Unit testing is the simplest way to assist in making for agile programming and refactoring because the isolated sections of code have already been tested for accuracy and help to minimize refactoring risks.

2. Find and Eliminate Any Bugs Early in the Process

Ultimately, the goal is to find no bugs and no issues to correct, right? But unit testing is there to ensure that any existing bugs are found early on so that they can be addressed and corrected before additional coding is layered on. While it might not feel like a positive thing to have a unit test reveal a problem, it’s good that it’s catching the issue now so that the bug doesn’t affect the final product.

3. Document Any and All Changes

Unit testing provides documentation for each section of coding that has been separated, allowing those who haven’t already directly worked with the code to locate and understand each individual section as necessary. This is invaluable in helping developers understand unit APIs without too much hassle.

4. Reduce Development Costs

As one can imagine, fixing problems after the product is complete is both time-consuming and costly. Not only do you have to sort back through a fully coded application’s worth of material, any bugs which may have been compounded and repeated throughout the application. Unit testing helps not only limit the amount of work that needs to be done after the application is completed it also reduces the time it takes to fix errors because it prevents developers from having to fix the same problem more than once.

5. Assists in Planning

Thanks to the documentation aspect of unit testing, developers are forced to think through the design of each individual section of code so that its function is determined before it’s written. This can prevent redundancies, incomplete sections, and nonsensical functions because it encourages better planning. Developers who implement unit testing in their applications will ultimately improve their creative and coding abilities thanks to this aspect of the process.

Conclusion

Unit testing is absolutely vital to the development process. It streamlines the debugging process and makes it more efficient, saves on time and costs for the developers, and even helps developers and programmers improve their craft through strategic planning. Without unit testing, people would inevitably wind up spending far more time on correcting problems within the code, which is both inefficient and incredibly frustrating. Using unit tests is a must in the development of any application.

Technology – Personas Vs. Roles – What Is The Difference?

Advertisements

Personas and roles are user modeling approaches that are applied in the early stages of system development or redesign. They drive the design decision and allows programmers and designers to place everyday user needs at the forefront of their system development journey in a user-centered design approach.

Personas and user roles help improve the quality of user experience when working with products that require a significant amount of user interaction. But there is a distinct difference between technology personas vs. roles. What then exactly is a persona? What are user roles in system development? And, how does persona differ from user roles?

Let’s see how these two distinct, yet often confused, user models fit in a holistic user-centered design process and how you can leverage them to identify valuable product features.

Technology Personas Vs. Roles – The Most Relevant Way to Describe Users

In software development, a user role describes the relationship between a user type and a software tool. It is generally the user’s responsibility when using a system or the specific behavior of a user who is participating in a business process. Think of roles as the umbrella, homogeneous constructs of the users of a particular system. For instance, in an accounting system, you can have roles such as accountant, cashier, and so forth.

However, by merely using roles, system developers, designers, and testers do not have sufficient information to conclusively make critical UX decisions that would make the software more user-centric, and more appealing to its target users.

This lack of understanding of the user community has led to the need for teams to move beyond role-based requirements and focus more on subsets of the system users. User roles can be refined further by creating “user stand-ins,” known as personas. By using personas, developers and designers can move closer to the needs and preferences of the user in a more profound manner than they would by merely relying on user roles.

In product development, user personas are an archetype of a fictitious user that represents a specific group of your typical everyday users. First introduced by Alan Cooper, personas help the development team to clearly understand the context in which the ideal customer interacts with a software/system and helps guide the design decision process.

Ideally, personas provide team members with a name, a face, and a description for each user role. By using personas, you’re typically personalizing the user roles, and by so doing, you end up creating a lasting impression on the entire team. Through personas, team members can ask questions about the users.

The Benefits of Persona Development

Persona development has several benefits, including:

  • They help team members have a consistent understanding of the user group.
  • They provide stakeholders with an opportunity to discuss the critical features of a system redesign.
  • Personas help designers to develop user-centric products that have functions and features that the market already demands.
  • A persona helps to create more empathy and a better understanding of the person that will be using the end product. This way, the developers can design the product with the actual user needs in mind.
  • Personas can help predict the needs, behaviors, and possible reactions of the users to the product.

What Makes Up a Well-Defined Persona?

Once you’ve identified user roles that are relevant to your product, you’ll need to create personas for each. A well-defined persona should ideally take into consideration the needs, goals, and observed behaviors of your target audience. This will influence the features and design elements you choose for your system.

The user persona should encompass all the critical details about your ideal user and should be presented in a memorable way that everyone in the team can identify with and understand. It should contain four critical pieces of information.

1. The header

The header aid in improving memorability and creating a connection between the design team and the user. The header should include:

  • A fictional name
  • An image, avatar or a stock photo
  • A vivid description/quote that best describes the persona as it relates to the product.

2. Demographic Profile

Unlike the name and image, which might be fictitious, the demographic profile includes factual details about the ideal user. The demographic profile includes:

  • Personal background: Age, gender, education, ethnicity, persona group, and family status
  • Professional background: Occupation, work experience, and income level.
  • User environment. It represents the social, physical, and technological context of the user. It answers questions like: What devices do the user have? Do they interact with other people? How do they spend their time?
  • Psychographics: Attitudes, motivations, interests, and user pain points.

3. End Goal(s)

End goals help answer the questions: What problems or needs will the product solution to the user? What are the motivating factors that inspire the user’s actions?

4. Scenario

This is a narrative that describes how the ideal user would interact with your product in real-life to achieve their end goals. It should explain the when, the where, and the how.

Conclusion

For a truly successful user-centered design approach, system development teams should use personas to provide simple descriptions of key user roles. While a distinct difference exists in technology personas vs. roles, design teams should use the two user-centered design tools throughout the project to decide and evaluate the functionality of their end product. This way, they can deliver a useful and usable solution to their target market.

Technology – Denodo SQL Type Mapping

Advertisements

denodo 7.0 saves some manual coding when building the ‘Base Views’ by performing some initial data type conversions from ANSI SQL type to denodo Virtual DataPort data types. So, where is a quick reference mapping to show to what the denodo Virtual DataPort Data Type mappings are:

ANSI SQL types To Virtual DataPort Data types Mapping

ANSI SQL TypeVirtual DataPort Type
BIT (n)blob
BIT VARYING (n)blob
BOOLboolean
BYTEAblob
CHAR (n)text
CHARACTER (n)text
CHARACTER VARYING (n)text
DATElocaldate
DECIMALdouble
DECIMAL (n)double
DECIMAL (n, m)double
DOUBLE PRECISIONdouble
FLOATfloat
FLOAT4float
FLOAT8double
INT2int
INT4int
INT8long
INTEGERint
NCHAR (n)text
NUMERICdouble
NUMERIC (n)double
NUMERIC (n, m)double
NVARCHAR (n)text
REALfloat
SMALLINTint
TEXTtext
TIMESTAMPtimestamp
TIMESTAMP WITH TIME ZONEtimestamptz
TIMESTAMPTZtimestamptz
TIMEtime
TIMETZtime
VARBITblob
VARCHARtext
VARCHAR ( MAX )text
VARCHAR (n)text

ANSI SQL Type Conversion Notes

  • The function CAST truncates the output when converting a value to a text, when these two conditions are met:
  1. You specify a SQL type with a length for the target data type. E.g. VARCHAR(20).
  2. And, this length is lower than the length of the input value.
  • When casting a boolean to an integertrue is mapped to 1 and false to 0.

Related References

denodo 8.0 / User Manuals / Virtual DataPort VQL Guide / Functions / Conversion Functions

Technology – What Are “Blue-Green” Or “Red-Black” Deployments?

Advertisements

Chances are you’ve participated in changing or upgrading software or application. While the idea is to push new features to customers or end-users, there have been significant changes over the years in how dev teams build and deliver applications. This shift has been necessitated by the growing need for agility in businesses.

Today, enterprises are pushing their teams to deliver new product features, more often, more rapidly, and with minimal interruptions to the end-user experience. Ultimately, this has led to Shorter deployment cycles that translates to:

  • Reduced time-to-market
  • More updates
  • Quicker customer to the production team, leading to faster fixes on bugs and faster iterations on features
  • More value to customers within shorter times
  • More coordination between the development, test, and release teams.

But what has changed over the years, really? In this article, we’ll talk about the shift in deployment strategies from traditional deployment approaches to the newer and more agile deployment methods and the pros and cons of each of the strategy.

Let’s dive in, shall we?

Traditional Deployment Strategies. The “one-and-done” Approach.

Classic deployments strategy required dev teams to update large parts of an application and sometimes the entire application in one swoop. The implementation happened in one instance, and all users moved to the newer system immediately on a rollout.

This deployment model required businesses to conduct extensive and sometimes difficult development and testing of the monolithic systems before releasing the final application to the market.

Characterized by on-site installations, the end-users relied on plug-and-install to get the latest versions of an application. Since the new application updates were delivered as a whole new package, the user’s hardware and infrastructure had to be compatible with the software or system for it to run smoothly. Also, the end-user needed hours of training on critical updates and how to make use of the deliverable.

Pros of Traditional Deployment Strategies

Low operational costs: Traditional deployment models had lower operating expenses since all departments were replaced on a single day. Also, since most of the applications were vendor-packaged solutions like desktop apps or non-production systems, there were minimal maintenance expenses needed after installation.

No planning requirements. This means that teams would just start coding without tons of requirements and specification documents.

They worked well for small projects with small teams.

Faster Return on Investment since the changes occurred site-wide for all user hence better returns across the departments

Cons of Traditional Deployments

Traditional deployment strategies presented myriad challenges, including:

  • It was extremely risky to roll back to the older version in case of severe bugs and errors.
  • Potentially expensive: Since this model had no formal planning, no formal leadership, or even standard coding practices, the model was prone to costly mistakes down the line that would cost the enterprise money, reputation, and even loss of customers.
  • It needed a lot of time and manpower to test.
  • It was too basic for modular, complex projects.
  • It needed separate teams of developers, testers, and operations. Such huge teams were slow and lethargic.
  • High user disruptions and major downtimes. Due to the roll-over, organizations would experience a “catch-up” period that had low productivity effect as users tried to adapt to the new system.

As seen above, traditional deployment methodologies were rigorous and sometimes had a lot of repetitive tasks that consumed staggering amounts of coding hours and resources that could otherwise have been used in working on core application features. This kind of deployment approach would not just cut it in today’s fast-paced economies where enterprises are looking for lean, highly-effective teams that can quickly deliver high-quality apps to the market.

The solution was to come up with deployment strategies that allowed enterprises to release and update different components frequently and seamlessly. These deployments sometimes happen very fast to meet the increasing end-user needs. For instance, a mobile app can have several deployments within a day for optimum UX needs. This is made possible with the adoption of more agile approaches such as Blue-Green or Red-Black deployments.

What Is “Blue-Green” Vs. “Red-Black” Deployments?

Blue-green and red-black deployments are fail-safe, immutable deployment processes for cloud-native applications and virtualized or containerized services — ideally, Blue-Green” Vs. “Red-Black” Deployments are identical and are designed to reduce application downtime and minimize risks by running 2 identical production environments.

Unlike the traditional approach where engineers fix the failed features by deploying an older stable version of the application, the Blue-green or the red-black deployment approach is super-agile, more scalable, and is highly automated so that bugs and updates are done seamlessly.

 “Blue-Green” Vs. “Red-Black”. What’s the difference?

Both Blue-Green and Red-Black deployments represent similar concepts in the sense that they both apply to the automatable cloud or containerized services such as a web service or SaaS Systems. Ideally, once the dev team has made an update or an upgrade, the release team will create two mirror production environments with identical sets of hardware and route traffic to one of the environments while they test the other idle environment.

So, what is the difference between the two?

The only difference lies in the amount of traffic routed to the live and idle environments.

In Red-Black redeployment, the new release is deployed to the red-environment while maintaining the traffic to the black-environment. All smoke test for functionality and performance can be run on the black environment without affecting how the end-user is using the system. When the new updates have been confirmed to be working properly, and the black version is fully operational, the traffic is then moved to the new environment by simply changing the router configuration from red to black. This ensures near-zero downtime with the latest release.

This is similar to blue-green deployments. Only that, with blue/green deployments, it is possible for both environments to get requests at the same time temporarily through load-balancing, unlike the red/black deployment, where only one version can get traffic at any given time. This means that in blue-green deployments, enterprises can release the new version of the application to a select group of users to test and give feedback before the system goes live for all users.

Pros Of “Blue-Green” Or “Red-Black” Deployments

You can roll back the traffic to the still-operating environment seamlessly and with near-zero user disruptions.

  • Reduced downtime
  • Allows teams to test their disaster recovery procedures in a live production environment
  • Less Risky since test teams can run full regression testing before releasing the new versions
  • Since the two codes are already loaded on the mirror environments and traffic to the live site is unaffected, test and release teams have no time pressures to push the release of the new code.

Cons of Blue-Green and Red-Black Deployments

  • It requires infrastructure to carry out Blue-Green deployments.
  • It can lead to significant downtime if you’re running hybrid microservice apps and some traditional apps together.
  • Database dependent: Database migrations are sometimes tricky and would need to be migrated alongside the app deployment
  • It is difficult to run at large scale

There you have it!

Technology – Is a Multi-Cloud Strategy A Fit For your Enterprise?

Advertisements

Enterprises and cloud computing become more integrated and essential for gaining or maintaining a competitive advantage through big data and Analytics. Cloud is now essential in improving operations efficiency and synergy. To optimize the enterprise architecture with the cloud, there are a few strategic questions that need to be considered;

  • First, how much cloud business does your enterprise need?
  • And, what cloud strategy best meets your enterprise operational and security needs?
  • Where do private, public clouds, or hybrid cloud fit in your enterprise’s information workload deployment strategy?
  • Does multi-cloud fit in the enterprise’s information workload deployment strategy?

What is A Multi-cloud Strategy?

This probably is the point where the narrative should introduce the principle of multi-cloud. A multi-cloud is an approach to cloud computing that seeks to optimize enterprise costs, Return-On-Investment (ROI), and enable big data analytics, which is already evolving the information workload deployment strategy of many organizations. Multi-cloud has already affected the major software and Software-As-A-Service (SaaS) providers, which have been rapidly evolving their application suites to enable this new reality.  As recently as this week, IBM announced that they had moved its Cloud-native software architecture.

Is It Time To Consider A Multi-Cloud Strategy For Your Enterprise?

Multi-cloud is a cloud computing strategy that seeks to align different cloud providers’ capabilities to optimize different business operations and technical requirements. A multi-cloud strategy can be a way to reduce the dependence upon more traditional software vendors and or on a single cloud service provider.

Advantages Of A Multi-Cloud Strategy

The advantages of a multi-cloud enterprise information workload deployment strategy are:

  • the enterprise can still operate even if one or more of the clouds providers goes offline or encounter other difficulties.
  • enterprises can avoid vendor lock-in since the enterprise’s data is stored on different clouds service providers and could be migrated if need be.
  • Multi-cloud can provide a reduction in the scales of data breach vulnerability since breaching one cloud does not provide access to the entire data of your enterprise, even if your organization has not implemented hybrid-cloud (private/public) strategy because all the data simply isn’t all housed one cloud.
  • Importantly, multi-cloud solutions are customizable. Every enterprise can select what works best in order to achieve optimal efficiency.

Disadvantages Of The Multi-Cloud

The multi-cloud enterprise information workload deployment strategy has downsides as well. For instance:

  • integration across the multi-cloud providers may require more planning, relationship management, and strategic oversight.
  • Multi-cloud implementations, while reducing the potential scale of any one security breach, it does provide more than one potential breach point to be monitored, managed, and mitigated.

Conclusion

Based on your enterprise’s industry, use of big data technologies, information security needs, and the use of information analytics to gain or maintain a competitive advantage and or comparative advantage, a multi-cloud enterprise information workload deployment strategy has a place in optimizing your enterprise’s technical and information strategy.  Especially when your multi-cloud strategy includes a hybrid cloud (public/private) as a major pillar in your cloud strategy. 

How to Create a Multi-Cloud Strategy

Professional Emails include a signature Block

Advertisements

eMail

I encountered, what I will admit is a pet peeve today, which is why I’m writing this article.  I needed contact someone whom I correspond with regularly, but I have no reason to call or be called by them.  So, after checking my phone, went to their email thinking this would be a fast and easy way to gather the contact information.  Well, not true.  I did eventually gather the information and contact the person, but what a waste of time, which is time they are being billed for one way or another.

Example Signature Block

Ewing A. BusinessProfessional

Senior, Technical Generalist

Favinger Enterprises, Inc.

100 Spacious Sky, Ice Flats, AZ 85001

Phone: (800) 900-1000 | http://www.favingerentprises.com

 

Which email should have a signature block?

  • The signature block should be on every email (both initiated by you and replied to by you), this was true even before the days of remote work, but for remote workers, contingent works, and works who travel frequently it can be a productive enhancer.
  • Plus, it is simply the professional thing to do and saves everyone time and frustration. Not to mention it makes you look unprofessional not having one. do you really want to do that to your personal brand?
  • As if that were not enough, including your signature block is free advertising for you and the company you represent.
  • Additionally, most email accounts let you build one or more signature block, which can be embedded in your email.

Where to place your Signature Block?

  • The signature block should go at the bottom of your email. I still use the five lines below the last line of the body of the email to provide white space before the closing, as I learned when writing business letters decades ago.

What should be in a signature Block?

  • The signature block should be compact and informative and at a minimum should include:

The Closing

  • The closing is simply a polite way of saying I’m ending my message now. I usually go with the tried and true ‘Sincerely’, but others go with ‘Thank you’, ‘Best Regards’, or ‘Best Wishes,’. The main points, it should be short, polite, and professional.
  • This section should be followed by two lines

Your Name

  • This line is your professional name (First Name, Middle Initial, and Last name) and designations (Ph.D.…etc.)
  • This is your chance to say who you are and brand yourself to the reader, in a way which your email address cannot. Especially, when you consider that many of us don’t control what work email address is assigned to us.

Your Business Title

  • Including your business title provides some insight into your role and professional expertise.

Your Company Name

  • Much like your title, providing the Company Name and Address lets the reader know who you represent and, perhaps, more importantly, it is free advertising for the company.

Your Phone Numbers

  • Including your phone numbers, both office and cell (if different) enable people to quickly reach out to you if they need or want to. Not everybody keeps all their infrequent business contacts in the phone directory.
  • Putting your phone numbers on your signature block, also, enable the potential caller to verify that the numbers which they may have are still correct.

There are other items are sometimes included, such as:

  • A company logo to enhance the appearance and quality of a signature block
  • The Company’s website to help customer find out more about the company and to direct business to the company
  • The senders email to reinforce the email address in the header of the email.

However, the guidance provided above will make you look a lot more professional in a hurry if you have not been including a signature block in your emails.

Printable PDF Version of This Article

Information Technology (IT) Requirements Management (REQM) For Development

Advertisements

Information Technology Requirements Management

Information technology requirement management (IT mаnаgеmеnt) is thе process whеrеbу all rеѕоurсеѕ rеlаtеd to іnfоrmаtіоn technology аrе mаnаgеd according to a оrgаnіzаtіоn’ѕ рrіоrіtіеѕ аnd nееdѕ. Thіѕ includes tangible rеѕоurсеѕ like nеtwоrkіng hаrdwаrе, соmрutеrѕ аnd реорlе, as wеll as іntаngіblе rеѕоurсеѕ like ѕоftwаrе аnd data. The сеntrаl аіm of IT mаnаgеmеnt is to generate vаluе thrоugh thе uѕе of technology. Tо achieve this, buѕіnеѕѕ strategies аnd tесhnоlоgу muѕt bе aligned. Infоrmаtіоn tесhnоlоgу mаnаgеmеnt includes mаnу of the bаѕіс functions оf mаnаgеmеnt, such аѕ ѕtаffіng, оrgаnіzіng, budgеtіng and соntrоl, but іt аlѕо hаѕ funсtіоnѕ thаt are unіԛuе tо IT, ѕuсh as ѕоftwаrе development, сhаngе management, nеtwоrk рlаnnіng аnd tесh ѕuрроrt. Gеnеrаllу, IT is used bу оrgаnіzаtіоnѕ to support аnd compliment thеіr buѕіnеѕѕ ореrаtіоnѕ. Thе аdvаntаgеѕ brought аbоut by hаvіng a dеdісаtеd IT department аrе too grеаt for mоѕt organizations tо раѕѕ up. Sоmе оrgаnіzаtіоnѕ асtuаllу uѕе IT as thе center of their buѕіnеѕѕ. Thе purpose of requirements mаnаgеmеnt іѕ tо еnѕurе that аn оrgаnіzаtіоn documents, vеrіfіеѕ, аnd mееtѕ thе nееdѕ аnd expectations of its customers and internal or еxtеrnаl stakeholders. Rеԛuіrеmеntѕ mаnаgеmеnt bеgіnѕ wіth thе аnаlуѕіѕ аnd elicitation of thе objectives аnd constraints of thе оrgаnіzаtіоn. Requirements mаnаgеmеnt furthеr іnсludеѕ ѕuрроrtіng рlаnnіng for requirements, іntеgrаtіng rеԛuіrеmеntѕ аnd the оrgаnіzаtіоn fоr wоrkіng wіth thеm (аttrіbutеѕ fоr rеԛuіrеmеntѕ), аѕ well as rеlаtіоnѕhірѕ wіth оthеr information dеlіvеrіng аgаіnѕt rеԛuіrеmеntѕ, аnd сhаngеѕ fоr thеѕе. The trасеаbіlіtу thuѕ еѕtаblіѕhеd іѕ used in managing requirements to rероrt bасk fulfіlmеnt of соmраnу and stakeholder іntеrеѕtѕ іn tеrmѕ оf compliance, completeness, соvеrаgе, аnd consistency. Trасеаbіlіtіеѕ also ѕuрроrt сhаngе mаnаgеmеnt as раrt оf rеԛuіrеmеntѕ management іn undеrѕtаndіng thе іmрасtѕ of changes thrоugh rеԛuіrеmеntѕ оr other rеlаtеd еlеmеntѕ (е.g., functional іmрасtѕ through relations tо functional аrсhіtесturе), аnd fасіlіtаtіng іntrоduсіng these сhаngеѕ. Rеԛuіrеmеntѕ mаnаgеmеnt іnvоlvеѕ соmmunісаtіоn between the рrоjесt tеаm mеmbеrѕ аnd ѕtаkеhоldеrѕ, аnd аdjuѕtmеnt to rеԛuіrеmеntѕ сhаngеѕ thrоughоut thе course оf thе рrоjесt. Tо рrеvеnt one class of requirements frоm overriding аnоthеr, constant соmmunісаtіоn аmоng mеmbеrѕ оf thе dеvеlорmеnt team, is critical. Fоr example, in ѕоftwаrе development for іntеrnаl applications, the business hаѕ ѕuсh ѕtrоng needs that іt may іgnоrе uѕеr rеԛuіrеmеntѕ, оr bеlіеvе thаt іn creating use саѕеѕ, the uѕеr rеԛuіrеmеntѕ are being tаkеn саrе оf.

The major IT Requirement Management Phases

Investigation

  • In Invеѕtіgаtіоn, thе fіrѕt thrее classes of requirements are gathered frоm the uѕеrѕ, from thе business аnd frоm thе dеvеlорmеnt team. In each аrеа, ѕіmіlаr ԛuеѕtіоnѕ аrе аѕkеd; whаt аrе the goals, what аrе the соnѕtrаіntѕ, what аrе the сurrеnt tооlѕ оr рrосеѕѕеѕ іn рlасе, and so оn. Only when thеѕе rеԛuіrеmеntѕ are well undеrѕtооd can funсtіоnаl rеԛuіrеmеntѕ be dеvеlореd. In thе common саѕе, requirements саnnоt be fullу dеfіnеd аt the bеgіnnіng of thе рrоjесt. Some rеԛuіrеmеntѕ wіll сhаngе, either bесаuѕе they ѕіmрlу wеrеn’t еxtrасtеd, оr bесаuѕе internal or еxtеrnаl fоrсеѕ at wоrk аffесt thе project in mіd-сусlе. Thе dеlіvеrаblе frоm thе Invеѕtіgаtіоn ѕtаgе іѕ requirements document thаt hаѕ bееn аррrоvеd bу аll mеmbеrѕ оf thе tеаm. Later, іn thе thісk of dеvеlорmеnt, thіѕ document wіll bе сrіtісаl іn рrеvеntіng ѕсоре сrеер or unnесеѕѕаrу сhаngеѕ. As thе ѕуѕtеm dеvеlорѕ, еасh new fеаturе ореnѕ a world оf nеw роѕѕіbіlіtіеѕ, ѕо thе requirements ѕресіfісаtіоn аnсhоrѕ the tеаm tо the original vision аnd реrmіtѕ a соntrоllеd dіѕсuѕѕіоn of ѕсоре сhаngе. While many оrgаnіzаtіоnѕ still uѕе оnlу dосumеntѕ to mаnаgе requirements, оthеrѕ mаnаgе their requirements baselines uѕіng ѕоftwаrе tооlѕ. Thеѕе tools allow rеԛuіrеmеntѕ tо bе managed іn a database, and uѕuаllу hаvе functions to automate trасеаbіlіtу (е.g., bу enabling electronic links tо bе сrеаtеd bеtwееn раrеnt аnd сhіld requirements, оr between tеѕt саѕеѕ аnd rеԛuіrеmеntѕ), еlесtrоnіс baseline creation, vеrѕіоn control, аnd change mаnаgеmеnt. Uѕuаllу ѕuсh tооlѕ contain аn export funсtіоn thаt allows a ѕресіfісаtіоn dосumеnt to bе created by еxроrtіng thе requirements data іntо a ѕtаndаrd dосumеnt аррlісаtіоn.

 Feasibility

  • In the Feasibility stage, costs of the rеquіrеmеntѕ аrе dеtеrmіnеd. Fоr uѕеr requirements, the current соѕt оf work is соmраrеd to the future projected соѕtѕ оnсе thе nеw ѕуѕtеm іѕ іn рlасе. Questions ѕuсh аѕ thеѕе are аѕkеd: “What are data entry errors costing uѕ nоw?” Or “Whаt іѕ thе соѕt of ѕсrар duе tо ореrаtоr еrrоr wіth thе сurrеnt іntеrfасе?” Aсtuаllу, the nееd for the nеw tool is оftеn rесоgnіzеd аѕ this ԛuеѕtіоnѕ соmе to thе аttеntіоn оf fіnаnсіаl реорlе іn the organization. Business costs wоuld іnсludе, “Whаt department hаѕ the budget fоr this?” “Whаt is the еxресtеd rаtе of rеturn оn thе nеw product in the mаrkеtрlасе?” “Whаt’ѕ thе іntеrnаl rate of return in rеduсіng costs оf trаіnіng аnd support іf wе make an nеw, easier-to-use system?” Technical costs аrе rеlаtеd tо software dеvеlорmеnt соѕtѕ and hardware соѕtѕ. “Dо wе hаvе thе rіght реорlе tо сrеаtе the tool?” “Dо we nееd nеw equipment tо ѕuрроrt еxраndеd ѕоftwаrе rоlеѕ?” Thіѕ lаѕt ԛuеѕtіоn іѕ аn іmроrtаnt tуре. The tеаm muѕt inquire into whether thе nеwеѕt аutоmаtеd tools will аdd sufficient processing роwеr tо shift some оf thе burden frоm thе uѕеr tо thе system in оrdеr tо ѕаvе реорlе tіmе. Thе question аlѕо роіntѕ out a fundаmеntаl point about rеԛuіrеmеntѕ mаnаgеmеnt. A humаn аnd a tооl fоrm a ѕуѕtеm, аnd thіѕ realization іѕ especially іmроrtаnt іf the tool іѕ a соmрutеr or an nеw аррlісаtіоn on a computer. Thе humаn mind еxсеlѕ іn раrаllеl рrосеѕѕіng аnd іntеrрrеtаtіоn of trends with іnѕuffісіеnt dаtа. Thе CPU еxсеlѕ іn ѕеrіаl processing and accurate mаthеmаtісаl соmрutаtіоn. The overarching gоаl оf thе rеԛuіrеmеntѕ management еffоrt for a software project would thuѕ be to make ѕurе thе wоrk being аutоmаtеd gеtѕ аѕѕіgnеd tо thе proper рrосеѕѕоr. Fоr іnѕtаnсе, “Don’t make thе human rеmеmbеr whеrе she іѕ іn thе іntеrfасе. Mаkе thе іntеrfасе rероrt thе human’s location іn the ѕуѕtеm аt аll tіmеѕ.” Or “Dоn’t mаkе thе humаn еntеr thе ѕаmе dаtа in twо ѕсrееnѕ. Mаkе thе system store thе dаtа аnd fіll іn thе second ѕсrееn аѕ needed.” The deliverable frоm the Feasibility ѕtаgе іѕ the budgеt аnd schedule fоr the рrоjесt.

Design

  • Aѕѕumіng thаt соѕtѕ аrе ассurаtеlу dеtеrmіnеd and bеnеfіtѕ tо be gаіnеd аrе ѕuffісіеntlу lаrgе, thе project саn рrосееd tо thе Dеѕіgn ѕtаgе. In Design, the mаіn rеԛuіrеmеntѕ mаnаgеmеnt асtіvіtу іѕ соmраrіng thе rеѕultѕ of thе design аgаіnѕt thе requirements dосumеnt tо make sure that wоrk is staying in scope. Agаіn, flexibility іѕ раrаmоunt tо success. Here’s a сlаѕѕіс ѕtоrу of ѕсоре change іn mіd-ѕtrеаm that асtuаllу wоrkеd well. Fоrd аutо dеѕіgnеrѕ іn the early ‘80ѕ wеrе expecting gаѕоlіnе prices to hit $3.18 реr gаllоn by thе еnd оf thе dесаdе. Mіdwау thrоugh thе design of the Fоrd Taurus, рrісеѕ had сеntеrеd tо around $1.50 a gаllоn. Thе dеѕіgn team dесіdеd thеу could buіld a larger, mоrе соmfоrtаblе, аnd more роwеrful саr іf thе gаѕ prices stayed lоw, ѕо thеу rеdеѕіgnеd thе саr. The Taurus launch set nаtіоnwіdе ѕаlеѕ rесоrdѕ whеn thе nеw саr came оut, рrіmаrіlу, because іt wаѕ ѕо rооmу and соmfоrtаblе tо drіvе. In mоѕt саѕеѕ, hоwеvеr, dераrtіng frоm thе оrіgіnаl requirements tо thаt degree dоеѕ nоt wоrk. Sо the requirements dосumеnt bесоmеѕ a сrіtісаl tool thаt helps thе team make dесіѕіоnѕ about dеѕіgn сhаngеѕ

Construction and test

  • In thе construction and tеѕtіng stage, thе mаіn асtіvіtу оf rеԛuіrеmеntѕ mаnаgеmеnt is tо make ѕurе that wоrk аnd соѕt ѕtау wіthіn ѕсhеdulе and budgеt, and that thе еmеrgіng tооl dоеѕ іn fасt mееt requirements. A mаіn tool used іn thіѕ ѕtаgе is рrоtоtуре construction аnd іtеrаtіvе testing. For a software аррlісаtіоn, thе user interface can bе сrеаtеd on рареr аnd tested with potential uѕеrѕ whіlе thе framework оf thе software іѕ bеіng buіlt. Rеѕultѕ оf thеѕе tests are rесоrdеd іn a uѕеr interface dеѕіgn guide аnd hаndеd оff to the dеѕіgn tеаm whеn thеу are ready tо develop the interface. Thіѕ ѕаvеѕ thеіr tіmе аnd makes their jоbѕ muсh easier.

Requirements change management

  • Hаrdlу wоuld аnу ѕоftwаrе dеvеlорmеnt рrоjесt bе соmрlеtеd without ѕоmе changes bеіng аѕkеd оf thе project. Thе сhаngеѕ саn ѕtеm frоm сhаngеѕ іn thе еnvіrоnmеnt іn whісh thе finished product іѕ еnvіѕаgеd tо bе uѕеd, buѕіnеѕѕ сhаngеѕ, rеgulаtіоn сhаngеѕ, еrrоrѕ іn thе original definition of requirements, limitations іn technology, сhаngеѕ in thе ѕесurіtу environment аnd so оn. Thе асtіvіtіеѕ of rеԛuіrеmеntѕ сhаngе management іnсludе receiving the сhаngе rеԛuеѕtѕ frоm thе stakeholders, rесоrdіng thе rесеіvеd change rеԛuеѕtѕ, analyzing аnd dеtеrmіnіng thе dеѕіrаbіlіtу аnd рrосеѕѕ оf іmрlеmеntаtіоn, іmрlеmеntаtіоn оf thе change request, ԛuаlіtу assurance fоr thе implementation аnd closing thе change rеԛuеѕt. Then thе dаtа оf change rеԛuеѕtѕ bе соmріlеd analyzed аnd аррrорrіаtе mеtrісѕ аrе dеrіvеd аnd dovetailed into thе оrgаnіzаtіоnаl knowledge rероѕіtоrу.

Release

  • Rеԛuіrеmеntѕ management dоеѕ nоt end with рrоduсt rеlеаѕе. Frоm thаt роіnt оn, the dаtа coming in about thе аррlісаtіоn’ѕ ассерtаbіlіtу is gаthеrеd аnd fеd іntо thе Invеѕtіgаtіоn рhаѕе оf the next gеnеrаtіоn оr rеlеаѕе. Thus the рrосеѕѕ bеgіnѕ again.

The relationship/interaction of requirements management process to the Software Development Lifecycle (SDLC) phases

Planning

  • Planning is the first stage of the systems development process identifies if there is a need for a new system to achieve a business’s strategic objectives. Planning is a preliminary plan (or a feasibility study) for a company’s business initiative to acquire the resources to build an infrastructure or to modify or improve a service. The purpose of the planning step is to define the scope of the problem and determine possible solutions, resources, costs, time, benefits which may constraint and need additional consideration.

Systems Analysis and Requirements

  • Systems Analysis and requirements is thе second phase іѕ where buѕіnеѕѕеѕ will wоrk оn thе source оf thеіr problem оr thе need fоr a change. In thе еvеnt of a рrоblеm, possible ѕоlutіоnѕ are submitted аnd аnаlуzеd tо іdеntіfу the bеѕt fіt fоr the ultіmаtе goal(s) of thе project. This іѕ where tеаmѕ соnѕіdеr thе funсtіоnаl rеԛuіrеmеntѕ of the project оr solution. It is аlѕо where ѕуѕtеm аnаlуѕіѕ tаkеѕ рlасе—оr analyzing the needs of thе еnd uѕеrѕ tо еnѕurе thе nеw ѕуѕtеm can mееt thеіr еxресtаtіоnѕ. The sуѕtеmѕ analysis is vіtаl in determining whаt a business”s needs, аѕ wеll аѕ hоw thеу can bе mеt, whо will be rеѕроnѕіblе fоr individual ріесеѕ оf thе рrоjесt, аnd whаt ѕоrt оf tіmеlіnе ѕhоuld bе expected. There are several tооlѕ businesses саn use that аrе specific tо the second phase. Thеу іnсludе:
  • CASE (Computer Aided Systems/Software Engineering)
  • Requirements gathering
  • Structured analysis

Sуѕtеmѕ Dеѕіgn

  • Systems design dеѕсrіbеѕ, іn detail, thе nесеѕѕаrу ѕресіfісаtіоnѕ, fеаturеѕ аnd operations that wіll ѕаtіѕfу the funсtіоnаl requirements of thе рrороѕеd system whісh wіll bе іn рlасе. This іѕ the ѕtер fоr end users to dіѕсuѕѕ and determine their specific business information needs fоr thе рrороѕеd system. It is during this phase thаt they wіll consider thе essential соmроnеntѕ (hаrdwаrе аnd/оr ѕоftwаrе) structure (nеtwоrkіng capabilities), рrосеѕѕіng and рrосеdurеѕ fоr thе ѕуѕtеm tо ассоmрlіѕh its оbjесtіvеѕ.

Development

  • Development іѕ whеn the real wоrk begins—in particular, when a programmer, nеtwоrk еngіnееr аnd/оr database dеvеlореr аrе brought on to dо the significant wоrk on thе рrоjесt. Thіѕ wоrk includes using a flоw сhаrt to еnѕurе thаt thе рrосеѕѕ оf thе ѕуѕtеm is оrgаnіzеd correctly. Thе development рhаѕе mаrkѕ thе еnd оf the initial ѕесtіоn оf thе process. Addіtіоnаllу, thіѕ рhаѕе ѕіgnіfіеѕ the ѕtаrt of рrоduсtіоn. Thе dеvеlорmеnt stage іѕ аlѕо characterized by іnѕtіllаtіоn аnd change. Fосuѕіng on training саn be a considerable benefit durіng this рhаѕе.

Integration and Tеѕtіng

  • Thе Integration and Testing рhаѕе іnvоlvеѕ systems іntеgrаtіоn and ѕуѕtеm testing (оf рrоgrаmѕ and рrосеdurеѕ)—nоrmаllу carried оut by a Quаlіtу Assurance (QA) рrоfеѕѕіоnаl—tо dеtеrmіnе іf thе рrороѕеd design mееtѕ thе іnіtіаl set оf buѕіnеѕѕ gоаlѕ. Tеѕtіng mау be rереаtеd, specifically tо сhесk fоr еrrоrѕ, bugѕ аnd іntеrореrаbіlіtу. Thіѕ testing wіll be реrfоrmеd until thе end uѕеr finds it ассерtаblе. Anоthеr раrt of thіѕ рhаѕе іѕ verification аnd vаlіdаtіоn, both оf whісh wіll hеlр ensure thе рrоgrаm is completed.

Implementation

  • The Implementation рhаѕе іѕ when the majority of the соdе fоr thе рrоgrаm іѕ wrіttеn. Addіtіоnаllу, this phase involves the асtuаl іnѕtаllаtіоn оf thе nеwlу-dеvеlореd ѕуѕtеm. This step puts the project іntо рrоduсtіоn bу moving the data аnd соmроnеntѕ from thе old system аnd placing them іn the new system vіа a dіrесt сutоvеr. Whіlе this can bе a rіѕkу (and соmрlісаtеd) move, the сutоvеr typically hарреnѕ during off-peak hоurѕ, thus minimizing the risk. Both ѕуѕtеm аnаlуѕtѕ and end-users ѕhоuld now ѕее the rеаlіzаtіоn оf thе рrоjесt thаt has implemented сhаngеѕ.

Oреrаtіоnѕ аnd Mаіntеnаnсе

  • Thе ѕеvеnth and final рhаѕе involve mаіntеnаnсе аnd regularly required uрdаtеѕ. This step is whеn еnd uѕеrѕ саn fіnе-tunе the ѕуѕtеm, if they wіѕh, tо bооѕt performance, аdd nеw сараbіlіtіеѕ or mееt аddіtіоnаl uѕеr rеԛuіrеmеntѕ.

Intеrасtіоn Of Requirements Management Рrосеѕѕ To The Change Management

Evеrу IT lаndѕсаре must сhаngе оvеr tіmе. Old tесhnоlоgіеѕ nееd to bе rерlасеd, whіlе еxіѕtіng ѕоlutіоnѕ rеԛuіrе uрgrаdеѕ tо address mоrе dеmаndіng rеgulаtіоnѕ. Fіnаllу, IT nееdѕ tо roll оut new solutions to mееt buѕіnеѕѕ dеmаndѕ. Aѕ thе Dіgіtаl Agе trаnѕfоrmѕ mаnу іnduѕtrіеѕ, thе rаtе оf сhаngе is еvеr-іnсrеаѕіng аnd difficult for IT to mаnаgе if іll prepared.

Rеԛuіrеmеntѕ bаѕеlіnе management

Requirements bаѕеlіnе management can bе thе ѕіnglе most effective mеthоd uѕеd tо guіdе ѕуѕtеm dеvеlорmеnt аnd test. Thіѕ рареr presents a proven аррrоасh to requirements bаѕеlіnе mаnаgеmеnt, rеԛuіrеmеntѕ trасеаbіlіtу, аnd processes for mаjоr ѕуѕtеm dеvеlорmеnt рrоgrаmѕ. Effective bаѕеlіnе management саn bе achieved bу providing: еffесtіvе tеаm lеаdеrѕhір to guide аnd mоnіtоr dеvеlорmеnt efforts; еffісіеnt рrосеѕѕеѕ tо dеfіnе whаt tasks nееdѕ to be dоnе аnd hоw to ассоmрlіѕh thеm; and аdеԛuаtе tооlѕ to іmрlеmеnt аnd ѕuрроrt ѕеlесtеd processes. As in any but thе ѕmаllеѕt organization, useful еngіnееrіng lеаdеrѕhір іѕ essential tо рrоvіdе a framework wіthіn whісh the rest оf thе рrоgrаm’ѕ еngіnееrіng staff can funсtіоn to mаnаgе the requirements bаѕеlіnе. Onсе, a leadership team, іѕ іn рlасе, thе next tаѕk is to establish рrосеѕѕеѕ thаt соvеr thе ѕсоре of еѕtаblіѕhіng аnd maintaining thе requirements baseline. Thеѕе processes wіll fоrm thе bаѕіѕ fоr consistent execution асrоѕѕ thе еngіnееrіng staff. Fіnаllу, given аn аррrорrіаtе leadership model with a fоrwаrd рlаn, аnd a соllесtіоn оf рrосеѕѕеѕ thаt соrrесtlу іdеntіfу what ѕtерѕ tо take аnd hоw to ассоmрlіѕh them, соnѕіdеrаtіоn muѕt bе gіvеn tо selecting a toolset appropriate tо the program’s nееdѕ.

Uѕе Cаѕеѕ Vs. Rеԛuіrеmеntѕ

  • Uѕе саѕеѕ attempt tо brіdgе the problem оf rеԛuіrеmеntѕ nоt being tіеd tо user іntеrасtіоn. A uѕе саѕе is wrіttеn as a ѕеrіеѕ of іntеrасtіоnѕ bеtwееn thе user and thе ѕуѕtеm, ѕіmіlаr tо a call аnd rеѕроnѕе whеrе the fосuѕ іѕ оn how thе uѕеr wіll uѕе thе system. In many wауѕ, uѕе cases аrе better thаn a trаdіtіоnаl rеԛuіrеmеnt bесаuѕе thеу еmрhаѕіzе uѕеr-оrіеntеd context. Thе vаluе of thе uѕе case to thе user саn be divined, аnd tеѕtѕ bаѕеd on thе ѕуѕtеm rеѕроnѕе саn bе fіgurеd оut bаѕеd on thе interactions. Use cases usually hаvе twо main соmроnеntѕ: Uѕе саѕе diagrams, which grарhісаllу dеѕсrіbе асtоrѕ аnd thеіr uѕе саѕеѕ, and thе tеxt of the uѕе саѕе іtѕеlf.
  • Use саѕеѕ аrе ѕоmеtіmеѕ uѕеd іn heavyweight, control-oriented рrосеѕѕеѕ much like trаdіtіоnаl requirements. Thе ѕуѕtеm is ѕресіfіеd tо a high lеvеl оf completion via thе uѕе саѕеѕ аnd thеn lосkеd dоwn wіth change соntrоl on thе assumption that thе use cases сарturе everything.
  • Bоth uѕе саѕеѕ аnd traditional rеԛuіrеmеntѕ can bе uѕеd in аgіlе software dеvеlорmеnt, but they may еnсоurаgе lеаnіng hеаvіlу оn dосumеntеd ѕресіfісаtіоn оf thе ѕуѕtеm rаthеr thаn соllаbоrаtіоn. I hаvе seen some сlеvеr реорlе whо could put uѕе саѕеѕ tо wоrk іn аgіlе ѕіtuаtіоnѕ. Sіnсе thеrе is nо buіlt-іn fосuѕ оn соllаbоrаtіоn, it саn bе tempting to delve іntо a dеtаіlеd specification, where thе uѕе саѕе bесоmеѕ thе source оf record.

Definitions of  types оf requirements

Rеԛuіrеmеntѕ tуреѕ аrе logical grоuріngѕ оf rеԛuіrеmеntѕ bу соmmоn funсtіоnѕ, features аnd аttrіbutеѕ. Thеrе аrе fоur rеԛuіrеmеnt types :

Business Rеԛuіrеmеnt Tуре

  • Thе business requirement іѕ written frоm the Sponsor’s point-of-view. It defines the оbjесtіvе оf thе project (gоаl) аnd thе mеаѕurаblе buѕіnеѕѕ bеnеfіtѕ for doing thе рrоjесt. Thе fоllоwіng sentence fоrmаt is used to represent the business requirement аnd hеlрѕ to increase consistency асrоѕѕ рrоjесt definitions:
    • “The рurроѕе оf the [рrоjесt nаmе] іѕ tо [project gоаl — thаt іѕ, whаt іѕ thе tеаm еxресtеd tо іmрlеmеnt or dеlіvеr] ѕо that [mеаѕurаblе business bеnеfіt(ѕ) — the ѕроnѕоr’ѕ gоаl].”

Rеgrеѕѕіоn Tеѕt rеԛuіrеmеntѕ

  • Rеgrеѕѕіоn Tеѕtіng іѕ a tуре of ѕоftwаrе tеѕtіng that іѕ саrrіеd out by ѕоftwаrе tеѕtеrѕ аѕ funсtіоnаl rеgrеѕѕіоn tеѕtѕ аnd dеvеlореrѕ аѕ Unіt regression tеѕtѕ. Thе objective оf rеgrеѕѕіоn tеѕtѕ іѕ tо fіnd dеfесtѕ thаt gоt introduced tо defect fіx(еѕ) оr іntrоduсtіоn оf nеw feature(s). Regression tеѕtѕ аrе іdеаl саndіdаtеѕ fоr аutоmаtіоn.

Rеuѕаblе rеԛuіrеmеntѕ

  • Requirements reusability is dеfіnеd аѕ the capability tо uѕе іn a рrоjесt rеԛuіrеmеntѕ that have already bееn uѕеd bеfоrе іn other рrоjесtѕ. Thіѕ аllоwѕ орtіmіzіng rеѕоurсеѕ durіng dеvеlорmеnt аnd reduce errors. Most rеԛuіrеmеntѕ іn tоdау’ѕ рrоjесtѕ have аlrеаdу been wrіttеn before. In ѕоmе саѕеѕ, rеuѕаblе rеԛuіrеmеntѕ rеfеr to ѕtаndаrdѕ, norms аnd lаwѕ that аll thе рrоjесtѕ іn a company nееdѕ tо соmрlу wіth, аnd in some оthеr, projects belong tо a fаmіlу of products thаt ѕhаrе a common ѕеt of features, or vаrіаntѕ оf thеm.

Sуѕtеm rеԛuіrеmеntѕ:

  • There are two type of system requirements;

Funсtіоnаl Rеԛuіrеmеnt Tуре

  • Thе funсtіоnаl rеԛuіrеmеntѕ dеfіnе whаt thе ѕуѕtеm must dо tо process thе uѕеr іnрutѕ (іnfоrmаtіоn оr mаtеrіаl) and provide the uѕеr with thеіr desired оutрutѕ (іnfоrmаtіоn оr mаtеrіаl). Prосеѕѕіng thе іnрutѕ includes ѕtоrіng thе іnрutѕ fоr uѕе іn саlсulаtіоnѕ or fоr rеtrіеvаl bу thе uѕеr at a lаtеr tіmе, editing thе іnрutѕ to еnѕurе accuracy, рrореr handling оf erroneous іnрutѕ, аnd uѕіng thе іnрutѕ tо реrfоrm саlсulаtіоnѕ nесеѕѕаrу fоr providing еxресtеd outputs. Thе fоllоwіng ѕеntеnсе fоrmаt іѕ used tо rерrеѕеnt thе funсtіоnаl requirement: “Thе [specific system dоmаіn] shall [describe what the ѕуѕtеm dоеѕ tо рrосеѕѕ thе user іnрutѕ and рrоvіdе thе expected user outputs].” Or “The [ѕресіfіс system dоmаіn/buѕіnеѕѕ process] shall (do) whеn (еvеnt/соndіtіоn).”

Nоnfunсtіоnаl Requirement Tуре

  • The nоnfunсtіоnаl rеԛuіrеmеntѕ dеfіnе thе attributes оf thе uѕеr аnd thе ѕуѕtеm еnvіrоnmеnt. Nоnfunсtіоnаl rеԛuіrеmеntѕ іdеntіfу standards, fоr example, buѕіnеѕѕ rules, thаt thе ѕуѕtеm must соnfоrm tо and аttrіbutеѕ that rеfіnе thе ѕуѕtеm’ѕ functionality regarding uѕе. Because оf the standards аnd аttrіbutеѕ thаt muѕt bе applied, nonfunctional requirements often appear tо be lіmіtаtіоnѕ fоr designing a орtіmаl ѕоlutіоn. Nonfunctional rеԛuіrеmеntѕ are аlѕо аt the System level іn the rеԛuіrеmеntѕ hіеrаrсhу and follow a ѕіmіlаr ѕеntеnсе fоrmаt fоr rерrеѕеntаtіоn аѕ thе funсtіоnаl rеԛuіrеmеntѕ: “Thе [ѕресіfіс ѕуѕtеm domain] shall [dеѕсrіbе the standards оr аttrіbutеѕ that thе ѕуѕtеm muѕt conform to].”

Related References

Data Modeling – Column Data Classification

Advertisements

Column Data Classification

When analyzing individual column data, at its most foundational level, column data can be classified by their fundamental use/characteristics.  Granted, when you start rolling up the structure into multiple columns, table structure and table relationship, then other classifications/behaviors, such as keys (primary and foreign), indexes, and distribution come into play.  However, many times when working with existing data sets it is essential to understand the nature the existing data to begin the modeling and information governance process.

Column Data Classification

Generally, individual columns can be classified into the classifications:

  • Identifier — A column/field which is unique to a row and/or can identify related data (e.g., Person ID, National identifier, ). Basically, think primary key and/or foreign key.
  • Indicator — A column/field, often called a Flag, that has a binary condition (e.g., True or False, Yes or No, Female or Male, Active or Inactive). Frequently used to identify compliance with complex with a specific business rule.
  • Code — A column/field that has a distinct and defined set of values, often abbreviated (e.g., State Code, Currency Code)
  • Temporal — A column/field that contains some type date, timestamp, time, interval, or numeric duration data
  • Quantity — A column/field that contains a numeric value (decimals, integers, etc.) and is not classified as an Identifier or Code (e.g., Price, Amount, Asset Value, Count)
  • Text — A column/field that contains alphanumeric values, possibly long text, and is not classified as an Identifier or Code (e.g., Name, Address, Long Description, Short Description)
  • Large Object (LOB)– A column/field that contains data traditional long text fields or binary data like graphics. The large objects can be broadly classified as Character Large Objects (CLOBs), Binary Large Objects (BLOBs), and Double-Byte Character Large Object (DBCLOB or NCLOB).

What is a Common Data Model (CDM)?

Advertisements

What is a Common Data Model (CDM)?

A Common Data Model (CDM) is a share data structure designed to provide well-formed and standardized data structures within an industry (e.g. medical, Insurance, etc.) or business channel (e.g. Human resource management, Asset Management, etc.), which can be applied to provide organizations a consistent unified view of business information.   These common models can be leveraged as accelerators by organizations form the foundation for their information, including SOA interchanges, Mashup, data vitalization, Enterprise Data Model (EDM), business intelligence (BI), and/or to standardize their data models to improve meta data management and data integration practices.

Related references

IBM, IBM Analytics

IBM Analytics, Technology, Database Management, Data Warehousing, Industry Models

github.com

Observational Health Data Sciences and Informatics (OHDSI)/Common Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Utilities Data Model

Oracle

Industries, Communications, Service Providers, Products, Data Mode, Oracle Communications Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Airline data Model

Data Modeling – Fact Table Effective Practices

Advertisements

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Data Modeling – Dimension Table Effective Practices

Advertisements

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

Netezza / PureData – How to add comments on a field

Advertisements

The ‘Comment on Column’ provides the same self-documentation capability as ‘Comment On table’, but drives the capability to the column field level.  This provides an opportunity to describe the purpose, business meaning, and/or source of a field to other developers and users.  The comment code is part of the DDL and can be migrated with the table structure DDL.  The statement can be run independently or working with Aginity for PureData System for Analytics, they can be run as a group, with the table DDL, using the ‘Execute as a Single Batch (Ctrl+F5) command.

Basic ‘COMMENT ON field’ Syntax

  • The basic syntax to add a comment to a column is:

COMMENT ON COLUMN <<Schema.TableName.ColumnName>> IS ‘<<Descriptive Comment>>’;

Example ‘COMMENT ON Field’ Syntax

  • This is example syntax, which would need to be changed and applied to each column field:

COMMENT ON COLUMN time_dim.time_srky IS ‘time_srky is the primary key and is a surrogate key derived from the date business/natural key’;

Netezza / PureData – Table Documentation Practices

Advertisements

Applying a few table effective practices can:

  • Significantly aide others in understanding the purpose and use of tables and column fields
  • Provide more migratable meta data to inform and enable the capabilities of other tools
  • And, from a self-preservation point of view, reduce the number of meetings, emails, and phone required to explain and clarify what the table and fields business meaning and roles

Among the Effective practices when modeling and design tables are:

  • Add Primary Key
  • Add table comments
  • Add column field comments
  • Add ‘Organize by’ fields to provide essential optimization on frequently used and/or key performance fields

Netezza/ PureData – how to add a primary key

Advertisements

While primary keys (PK) are not enforced within Netezza, they still provide significant value and should be added.  Among the values that adding primary key provides are:

  • Inform tools, which have meta Data import capabilities; for example, Aginity, ETL tools, data modeling tools, Infosphere Data Architect, DataStage and DataQuality, Infosphere Information Server suite of tools (e.g. Governance Console, Information analyzer, etc.).
  • Visually helps developers, data modelers, and users to know what the keys primary keys of the table are, which may not be obvious with the table structure. This is especially true for table utilizing compound keys as the primary key.
  • The query optimizer will use these definitions define efficient query execution plans
  • Identifying Primary Keys information provides migratable self-documenting Meta Data
  • Aides in the facilitation of future data and application enrichment projects

Basic Primary Key syntax

ALTER TABLE <<schema.tablename>> ADD CONSTRAINT <<ConstraintName>> PRIMARY KEY (FieldNames);

Example Primary Key syntax

ALTER TABLEtime_dim ADD CONSTRAINT time_dim_pk PRIMARY KEY (time_srky);

Related References

InfoSphere Datastage – How to Improve Sequential File Performance Using Parallel Environment Variables

Advertisements

While extensive use of sequential files is not best practice, sometimes there is no way around it, due to legacy systems and/or existing processes. However, recently, I have encountered a number of customers who are seeing significant performance issues with sequential file intensive processes. Sometimes it’s the job design, but often when you look at the project configuration they still have the default values. This is a quick and easy thing to check and to adjust to get a quick performance win if they’ve not already been adjusted. These are delivered variables, but should seriously be considered for adjustment in nearly all data stage ETL projects. The adjustment must be based on the amount of available memory, the volume of workload that is sequential file intensive, and the environment you’re working in. Some experiential adjustment may be required, but I have provided a few recommendations below.

Environment Variable Properties

Category NameTypeParameter NamePromptSizeDefault Value
Parallel > Operator SpecificStringAPT_FILE_EXPORT_BUFFER_SIZESequential write buffer sizeAdjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096.128
Parallel > Operator SpecificStringAPT_FILE_IMPORT_BUFFER_SIZESequential read buffer sizeAdjustable in 8 KB units.  Recommended values for Dev: 2048; Test & Prod: 4096.128

Related References

InfoSphere DataStage – How to calculate age in a transformer

Advertisements

Occasionally, there is a need to calculate the between two dates for any number of reasons. For example, the age of a person, of an asset, age of an event.  So, having recently had to think about how to do this in a DataStage Transformer, rather in SQL, I thought it might be good to document a couple of approaches, which can provide the age.  This code does it at the year level, however, if you need the decimal digits or other handling them the rounding within the DecimalToDecimal function can be changed accordingly.

Age Calculation using Julian Date

DecimalToDecimal((JulianDayFromDate(<>) – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT) )/365.25, ‘trunc_zero’)

Age Calculation using Julian Date with Null Handling

If a date can be missing from you source input data, then null handling is recommended to prevent job failure.  This code uses 1901-01-01 as the null replacement values, but it can be any date your business requirement stipulates.

DecimalToDecimal((JulianDayFromDate( NullToValue(<>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) )  – JulianDayFromDate(Lnk_In_Tfm.PROCESSING_DT)) /365.25, ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate

DecimalToDecimal(DaysSinceFromDate(<>, <>) /365.25 , ‘trunc_zero’)

Calculate Age Using DaysSinceFromDate with Null Handling

Here is a second example of null handling being applied to the input data.

DecimalToDecimal(DaysSinceFromDate(<>, NullToValue(<< Input date (e.g.Date of Birth) >>, StringToDate(‘1901-01-01’,”%yyyy-%mm-%dd”) ) ) /365.25 , ‘trunc_zero’)

What are the Factor Affecting the Selection of Data Warehouse Naming Convention?

Advertisements

The primary factors affecting the choices in the creation of Data Warehouse (DW) naming convention policy standards are the type of implementation, pattern of the implementation, and any preexisting conventions.

Type of implementation

The type of implementation will affect your naming convention choices. Basically, this boils down to, are you working with a Commercial-Off-The-Shelf (COTS) data warehouse or doing a custom build?

Commercial-Off-The-Shelf (COTS)

If it is a Commercial-Off-The-Shelf (COTS) warehouse, which you are modifying and or enhancing, then it is very strongly recommended that you conform to the naming conventions of the COTS product.  However, you may want to add an identifier to the conventions to identify your custom object.

Using this information as an exemplar:

  • FAV = Favinger, Inc. (Company Name – Custom Identifier)
  • GlobalSales = Global Sales (Subject)
  • MV = Materialized View (Object Type)

Suffix Pattern Naming Convention

<<Custom Identifier>>_<<Object Subject Name>>_<<Object Type>>

Example:  FAV_GlobalSales_MV

Prefix Pattern Naming Convention

<<Object Type>>_<<Custom Identifier>>_<<Object Subject Name>>

Example: MV_FAV_GlobalSales

Custom Data Warehouse Build

If you are creating a custom data warehouse from scratch, then you have more flexibility to choose your naming convention.  However, you will still need to take into account a few factors to achieve the maximum benefit from you naming conventions.

  • What is the high level pattern of you design?
  • Are there any preexisting naming conventions?

Data Warehouse Patterns

Your naming convention will need to take into account the overall intent and design pattern of the data warehouse, the objects and naming conventions of each pattern will vary, if for no other reason than the differences in the objects, their purpose, and the depth of their relationships.

High level Pattern of the Data Warehouse Implementation

The high level pattern of you design whether an Operational Data Store (ODS), Enterprise Data Warehouse (EDW), Data Mart (DM) or something else, will need to guide your naming convention, as the depth of logical and/or processing zone of each pattern will vary  and have some industry generally accepted conventions.

Structural Pattern of the Data Warehouse Implementation

The structural pattern of your data warehouse design whether, Snowflake, 3rd Normal Form, or Star Schema, will  need to guide your naming convention, as the depth of relationships each pattern will vary, have some industry generally accepted conventions, and will relate directly to you High level Data Warehouse pattern.

Preexisting Conventions

Often omitted factor of data warehouse naming conventions are the sources of preexisting conventions, which can have significant impacts both from an engineering and political point of view. The sources of these conventions can vary and may or may not be formally documented.

A common source naming convention conflict is with preexisting implementations, which may not even be document.  However, system objects and consumers are familiar will be exposed to these conventions, will need to be taken into account when accessing impacts to systems, political culture, user training, and the creation of a standard convention for your data warehouse.

The Relational Database Management System (RDBMS) in which you intend to build the data warehouse may have generally accepted conventions, which consumers may be familiar and have a preconceived expectations whether expressed or intended).

Change Management

Whatever data warehouse naming convention you chose, the naming conventions along with the data warehouse design patterns assumptions, should be well documented and placed in a managed and readily accessible, change management (CM) repository.

Related Reference

Netezza – JDBC ISJDBC.CONFIG Configuration

Advertisements

This jdbc information is based on Netezza (7.2.0) JDBC for InfoSphere Information Server11.5.  so, here are a few pointers for building an IBM InfoSphere Information Server (IIS) isjdbc.config file.

Where to place JAR files

For Infosphere Information Server installs, as a standard practice, create a custom jdbc file in the install path.  And install any download Jar file not already installed by other applications in the jdbc folder. Usually, jdbc folder path looks something like this:

  • /opt/IBM/InformationServer/jdbc

CLASSPATH

  • nzjdbc3.jar
  • Classpath must have complete path and jar name

CLASS_NAMES

  • netezza.Driver

JAR Source URL

IBM Netezza Client Components V7.2 for Linux

File name

  • nz-linuxclient-v7.2.0.0.tar.gz

Unpack tar.gz

  • tar -zxvf nz-linuxclient-v7.2.0.0.tar.gz -C /opt/IBM/InformationServer/jdbc

DB2 DEFAULT PORT

  • 1521

JDBC URL FORMAT

  • jdbc:netezza://:/

JDBC URL EXAMPLE

  • jdbc:netezza://10.999.0.99:5480/dashboard

isjdbc.config EXAMPLE

CLASSPATH=usr/jdbc/nzjdbc3.jar;/usr/jdbc/nzjdbc.jar;/usr/local/nz/lib/nzjdbc3.jar;

CLASS_NAMES= org.netezza.Driver;

Isjdbc.config FILE PLACEMENT

  • /opt/IBM/InformationServer/Server/DSEngine

Related References

What is System Availability?

Advertisements

The term system availability, in a nutshell, describes a system operating correctly, reachable, and is available for use by consuming customer and systems.  Generally, speaking system availability is a measure used to ensure that a system and/or application is meeting it Service Level Agreement (SLA) obligations.  Any loss of service, whether planned or unplanned, is known as an outage. Downtime is the duration of an outage measured in units of time (e.g., minutes or hours).

Common Information Integration Testing Phases

Advertisements

Over the years I have seen a lot of patterns for Information integration testing process and these patterns will not be an exhaustive list of patterns a consultant will encounter over the course of a career.

However, the two most common patterns in the testing process are:

The Three Test Phase Pattern

In the three test phase pattern, normally, the environment and testing activities of SIT and SWIT are combined.

The Four Test Phase Pattern

In the four test phase pattern, normally, the environment and testing activities of SIT and SWIT are performed separately and, frequently, will have separate environments in the migration path.

Testing Phases

Unit Testing:

Testing of individual software components or modules. Typically done by the programmer and not by testers, as it requires detailed knowledge of the internal program design and code. may require developing test driver modules or test harnesses.

 System Integration Testing (SIT):

Integration testing – Testing of integrated modules to verify combined functionality after integration. Modules are typically code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems. Testing performed to expose defects in the interfaces and in the interactions between integrated components or systems. See also component integration testing, system integration testing.

 Software Integration Test (SWIT)

Similar to system testing, involves testing of a complete application environment, including scheduling, in a situation that mimics real-world use, such as interacting with a database, using network communications, or interacting with other hardware, applications, or systems if appropriate.

 User Acceptance Testing (UAT):

Normally, this type of testing is done to verify if the system meets the customer specified requirements. Users or customers do this testing to determine whether to accept the application.  Formal testing with respect to user needs, requirements, and business processes conducted to determine whether or not a system satisfies the acceptance criteria and to enable the user, customers or other authorized entity to determine whether or not to accept the system.

Related References

Data Warehouse – Effective Practices

Advertisements

Effective Practices

Effective practices are enablers, which can improve performance, data availability, environment stability, resource consumption, and data accuracy.

Use of an Enterprise Scheduler

The scheduling service in InfoSphere Information Server (IIS) leverages the operating system (OS) scheduler, the common enterprise scheduler can provide these capabilities beyond those of a common OS scheduler:

  • Centralized control, monitoring, and maintenance of job stream processes
  • Improved insight into and control of cycle processes
  • Improved intervention capabilities, including alerts, job stream suspension, auto-restarts, and upstream/downstream dependency monitoring
  • Reduced time-to-recovery and increased flexibility in recovery options
  • Improved ability to monitor and alert for a mission-critical process that may be delayed or failing
  • Improved ability to automate disparate process requirements within and across systems
  • Improved load balancing to optimize the use of resources or to compensate for the loss of a given resource
  • Improved scalability and adaptability to infrastructure or application environment changes

Use of data Source Timestamps

When they exist or can be added to data, ‘created’ and ‘last updated’ timestamps can greatly reduce the impact of Change Data Capture (CDC) operations.  Especially, if the data warehouse, data model and load process store that last successful run time of CDC jobs. This reduces the number of rows required to be processed and reduces the load on the RDBMS and/or ETL application server.  Leveraging ‘created’ and ‘last updated’ can, also, greatly reduce the processing time required to perform the same CDC processes.

Event-Based Scheduling

Event-based scheduling, when coupled with an Enterprise scheduler, can increase data availability, distribute work opportunistically. Event-based scheduling can allow all or part of a process stream to begin as soon as predecessor data sources have completed the requisite processes.  This can allow processes to begin as soon as possible, which can reduce resource bottlenecks and contention. This, potentially, allows data to be made available earlier than a static time-based schedule.  Event-based scheduling can also delay processing, should the source system requisite processing completion be delayed; thereby, improving data accuracy in the receiving system.

Integrated RDBMS Maintenance

Integrating RDBMS Maintenance into the process job stream can perform on-demand optimization as the processes move through their flow, improving performance.  Items such as indexing, distribution, and grooming, maintenance at key points ensure that the data structures are optimized for follow on processes to consume.

Application Server and Storage  Space Monitoring and Maintenance

Monitoring and actively clearing disk space can not only improve overall performance, and reduce costs, but it also improves application stability.

Data Retention Strategies

Data Retention strategies, an often overlooked form of data maintenance, which deals with establishing policies ensure only truly necessary data is kept and that information by essential category, which is no longer necessary is purged to limit legal liability, limit data growth, storage costs, and improve RDBMS performance.

Use Standard Practices

Use of standard practices both, application and industry, allows experienced resources to more readily understand the major application activities, their relationships, dependency, design, and code.  This facilitates resourcing and support over the life cycle of the application.

Data Modeling – Database Table Field Ordering Effective Practices

Advertisements

Field order can help the performance on inserts and updates and, also, keeps developer and users from having to search entire table structure to be sure they have all the keys, etc.

Table Field Ordering

  1. Distribution Field Or Fields, if no distribution field is set the first field will be used by default.
  2. Primary Key Columns (including Parent and Child key fields)
  3. Foreign Key Columns (Not Null)
  4. Not Null Columns
  5. Nullable Columns
  6. Created Date Timestamp
  7. Modified (or Last Updated) Date Timestamp
  8. Large text Fields
  9. Large binary Columns or Binary Field references

Related References

SQL Server JDBC ISJDBC.CONFIG Configuration

Advertisements

This jdbc information is based on Oracle Database 11g Release 2, (11.2.0.4) JDBC for InfoSphere Information Server11.5, and ReedHat Linux 6.  so, here are a few pointers for building an IBM InfoSphere Information Server (IIS) isjdbc.config file.

Where to place JAR files

For Infosphere Information Server installs, as a standard practice, create a custom jdbc file in the install path.  And install any download Jar file not already installed by other applications in the jdbc folder. Usually, jdbc folder path looks something like this:

  • /opt/IBM/InformationServer/jdbc

CLASSPATH

  •  sqljdbc.jar
  •  sqljdbc4.jar
  •  sqljdbc41.jar
  •  sqljdbc42.jar
  • Classpath must have complete path and jar name

CLASS_NAMES

  • microsoft.jdbc.sqlserver.SQLServerDriver;com.microsoft.sqlserver.jdbc

JAR Source URL

DEFAULT PORT

  • 1433

JDBC URL FORMAT

  •  jdbc:microsoft:sqlserver://HOST:1433;DatabaseName=DATABASE

JDBC URL EXAMPLE

  • jdbc:sqlserver://RNO-SQLDEV-SVR1DEV01:55198;databaseName=APP1;

isjdbc.config EXAMPLE

CLASSPATH=/opt/IBM/InformationServer/jdbc/sqljdbc_3.0/enu/sqljdbc4.jar;/opt/IBM/InformationServer/jdbc/sqljdbc_3.0/enu/sqljdbc.jar;/opt/IBM/InformationServer/jdbc/sqljdbc_3.0/enu/sqljdbc41.jar;/opt/IBM/InformationServer/jdbc/sqljdbc_3.0/enu/sqljdbc42.jar;

CLASS_NAMES=com.microsoft.jdbc.sqlserver.SQLServerDriver;com.microsoft.sqlserver.jdbc

Isjdbc.config FILE PLACEMENT

  • /opt/IBM/InformationServer/Server/DSEngine

Related References

Vendor Reference Link:

Oracle JDBC ISJDBC.CONFIG Configuration

Advertisements

This jdbc information is based on Oracle Database 11g Release 2 (11.2.0.4), on a RAC (Oracle Real Application Clusters), JDBC for InfoSphere Information Server11.5 on Red Hat Linux.  so, here are a few pointers for building an IBM InfoSphere Information Server (IIS) isjdbc.config file.

Where to place JAR files

For Infosphere Information Server installs, as a standard practice, create a custom jdbc folder in the install path and copy the jar file into the folder (not install activity required).   Usually, jdbc folder path looks something like this:

  • /opt/IBM/InformationServer/jdbc

JAR Source URL

  • In this example, we used the jar files from the client install, but if you want to skip the client install you can download the drivers here: Oracle JDBC Drivers

Oracle DEFAULT PORT

  • 1521

JDBC URL FORMAT

  • jdbc:oracle:thin:@//:/ServiceName

or

  • jdbc:oracle:thin:@<host>:<port>:<SID>

JDBC URL EXAMPLE

  • jdbc:oracle:thin:@//RAC01-scan:1521/DW

Create And Place A jdbc configuration file

The Isjdbc.config file needs to be placed in the DSEngine directory:

Isjdbc.config File Path

  • /opt/IBM/InformationServer/Server/DSEngine

isjdbc.config Example

CLASSPATH=/opt/app/oracle/product/11.2.0/client_1/jdbc/lib/ojdbc6.jar;

CLASS_NAMES=oracle.jdbc.OracleDriver

isjdbc.config Properties Notes

CLASSPATH

  • jar
  • Classpath must have complete path and jar name

CLASS_NAMES

  • oracle.jdbc.OracleDriver

Related References

Infosphere Information Server (IIS) Commonly Used Parameters

Advertisements

Parameters are a very big key in, Infosphere Information Server (IIS), to process flexibility in sequences, DataStage jobs, and DataQuality jobs.  Parameterization also helps reduce development effort, reduce the number of jobs required, and reuse of jobs, by allowing construction of multi-instance jobs, which are essentially reused code.

However, sometimes when starting a project, parameters need to created and doing so from memory, doesn’t always achieve the best results. So here is a quick starter list of parameters, which seem to be commonly encountered.  Hopefully, this list will aide in your parametrization efforts and setup.

TypePromptDescriptionExample
StringDS_DIRDataset Directory Path 
StringDS_LOG_DIRLog File Directory Path 
StringQUOTESQuotes
StringRECIPIENT_EMAILRecipient Email Address 
StringSENDER_EMAILSenders Email Address 
StringSMTP_SERVERSMTP Mail Server Name 
StringSQL_DIRSQL File Directory path 
StringDATE_OFFSETDate Offset Number1
StringDS_ENVIRONMENTDatastage EnvironmentPROD
StringSRC_DIRSource Files Directory 
StringSRC_KEY_GEN_DIRSource Key Generator Files Directory Path 
StringSRC_REJ_DIRSource Reject Files Directory Path 
StringWRK_DIRWorking Directory Path 
StringSRC_TABLESource Table Name 
StringDB_SCHEMADatabase Schema Name 
StringTGT_TBLTarget Table Name 
StringPROC_DTERun Control or processing date 
StringCURR_DTECurrent Date 

Related References

Data Modeling – What is Data Modeling?

Advertisements

Data modeling is the documenting of data relationships, characteristics, and standards based on its intended use of the data.   Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data creating a blueprint and foundation for information technology development and reengineering.

A data model can be thought of as a diagram that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, well-documented models allow stakeholders to identify errors and make changes before any programming code has been written.

Data modelers often use multiple models to view the same data and ensure that all processes, entities, relationships, and data flows have been identified.

There are several different approaches to data modeling, including:

Concept Data Model (CDM)

  • The Concept Data Model (CDM) identifies the high-level information entities, their relationships, and organized in the Entity Relationship Diagram (ERD).

Logical Data Model (LDM)

  • The Logical Data Model (LDM) defines detail business information (in business terms) within each of the Concept Data Model and is a refinement of the information entities of the Concept Data Model.   Logical data models are a non-RDBMS specific business definition of tables, fields, and attributes contained within each information entity from which the Physical Data Model (PDM) and Entity Relationship Diagram (ERD) is produced.

Physical Data Model (PDM)

  • The Physical Data Model (PDM) provides the actual technical details of the model and database object (e.g. table names, field names, etc.) to facilitate the creation of accurate detail technical designs and actual database creation.  Physical Data Models are RDBMS specific definition of the logical model used build database, create deployable DDL statements, and to produce the Entity Relationship Diagram (ERD).

Related References

Information Management Unit Testing

Advertisements

Information management projects generally have the following development work:

  • Data movement software;
  • Data conversion software;
  • Data cleansing routines;
  • Database development DDL; and
  • Business intelligence and reporting analytical solutions.

Module testing validates that each module’s logic satisfies requirements specified in the requirements specification.

Effective  Practices

  1. Should focus on testing individual modules to ensure that they perform to specification, handle all exceptions as expected, and produce the appropriate alerts to satisfy error handling.
  2. Should be performed in the development environment.
  3. Should be conducted by the software developer who develops the code.
  4. Should validate the module’s logic, adherence to functional requirements and adherence to technical standards.
  5. Should ensure that all module source code has been executed and each conditional logic branch followed.
  6. Test data and test results should be recorded and form part of the release package when the code moves to production.
  7. Should include a code review, which should:
  • Focus on reviewing code and test results to provide additional verification that the code conforms to data movement best practices and security requirement; and
  • Verify that test results confirm that all conditional logic paths were followed and that all error messages were tested properly.

Testing Procedures

  1. Review design specification with the designer.
  2. Prepare test plan before coding.
  3. Create test data and document expected test results.
  4. Ensure that test data validate the module’s logic, adherence to functional requirements and adherence to technical standards.
  5. Ensure that test data test all module source code and each conditional logic branch.
  6. Conduct unit test in a personal schema.
  7. Document test results.
  8. Place test data and test results in project documentation repository.
  9. Check code into the code repository.
  10. Participate in code readiness review with Lead Developer.
  11. Schedule code review with appropriate team members.
  12. Assign code review roles as follows:
  • Author, the developer who created the code;
  • Reader, a developer who will read the code during the code review—The reader may also be the author; and
  • Scribe, a developer who will take notes.

Code Review Procedures

  1. Validate that code readiness review has been completed.
  2. Read the code.
  3. Verify that code and test results conform to data movement best practices.
  4. Verify that all conditional logic paths were followed and that all error messages were tested properly.
  5. Verify that coding security vulnerability issues have been addressed.
  6. Verify that test data and test results have been placed in project documentation repository.
  7. Verify that code has been checked into the code repository.
  8. Document action items.

Testing strategies

  1. Unit test data should be created by the developer and should be low volume.
  2. All testing should occur in a developer’s personal schema.

Summary

Unit testing is generally conducted by the developer who develops the code and validates that each module’s logic satisfies requirements specified in the requirements specification.

Infosphere DataStage – Node Best Practices

Advertisements

In general, the performance and overall all efficiency of you Datastage ETL’s can be impacted by a number of items one of the more common of which is the configuration of nodes within infosphere.  Nodes, when properly configured,  allow Infosphere to perform Massively Parallel Processing (MPP) and the ultimate goal of your Node configuration in Infosphere is to provide the maximum ability to perform concurrent work, this includes concurrent read/write capability to both logical and physical drives.

So, here are a few pointers, which may help, if you haven’t already worked through them.

 To be most efficient nodes should be:

  • Nodes should be mapped to no more than  1 node for each core/CPU
  • Use multiple Configuration [node] Files aimed at different Cores/CPUs for small, medium, and large jobs
  • when mapping aligning Nodes to disk drive mappings keep these tips in mind for best results:

At a minimum, map each Node to one disk.

  • Map one drive (for physical drives, this means a separate read, write point/Spindle) for each resource disk and scratch disk mapped.
  • For scratch map multiple scratch spaces (temporary working space) to each node.
  • Perform scratch space maintenance on scratch disks mapped to physical drives to remove orphan files and free processing space.

When deciding which resources to use for scratch disks, keep in mind this performance hierarchy (ordered best to least performant):

  • RAM Disks
  • Flash Memory
  • Solid-State Drives (SSDs)
  • Hard Disk Drives (HDDs)

Related References:

Infosphere Datastage – Useful Links

Advertisements

Here are some items, which I have found useful when working with IBM InfoSphere Information Server (IIS) and DataStage.

IBM InfoSphere Information Server Version 11.7.1 documentation

Datastage Parallel Framework Standard Practices

DSXchange

IBM InfoSphere DataStage Data Flow and Job Design

IBM DeveloperWorks

IBM InfoSphere Information Server, Version 11.7.1 for Windows

IBM InfoSphere Information Server, Version 11.7.1 for Linux

IBM InfoSphere Information Server, Version 11.7.1 for AIX

InfoSphere Information Server 11.7.0, IBM DataStage Flow Designer

DataStage/DataQuality Thin Client

InfoSphere Information Server V11.7.1 detailed system requirements

Netezza/PureData – Table Distribution Key Practices

Advertisements

For those of us new to Netezza and coming from other database systems, which uses an index, the optimization of the distribution key in Netezza can be counter initiative.  The distribution key is not about a grouping like an index, but rather spreading the data in a which let the zone map functionality improve performance.  So, here are a few effective practices to help decision making when deciding on a distribution key for a table:

  1. Distribute on one key only
  2. Distribute on the lowest/finest possible grain of the data to improve your zone map balance
  3. Avoid distributing foreign keys and/or foreign Surrogate key
  4. Avoid distributing on fields, which may need to be updated