Technology – Understanding Data Model Entities

Advertisements

Data Modeling is an established technique of comprehensively documenting an application or software system with the aid of symbols and diagrams. It is an abstract methodology of organizing the numerous data elements and thoroughly highlighting how these elements relate to each other. Representing the data requirements and elements of a database gra phically is called an Entity Relationship Diagram, or ERD.

What is an Entity?

Entities are one of the three essential components of ERDs and represent the tables of the database. An entity is something that depicts only one information concept. For instance, order and customer, although related, are two different concepts, and hence are modeled as two separate entities.

A data model entity typically falls in one of five classes – locations, things. events, roles, and concepts. Examples of entities can be vendors, customers, and products. These entities also have some attributes associated with them, which are some of the details that we would want to track about these entities.

A particular example of an entity is referred to as an instance. Instances form the various rows or records of the table. For instance, if there is a table titled ‘students,’ then a student named William Tell will be a single record of the table.

Why Do We Need a Data Model Entity?

Data is often stored in various forms. An organization may store data in XML files, spreadsheets, reports, and relational databases. Such a fragmented data storage methodology can present challenges during application design and data access. Writing maintainable and efficient code becomes all the more difficult when one has to think about easy data access, scalability, and storage. Additionally, moving data from one form to the other is difficult. This is where the Entity Data Model comes in. Describing the data in the form of relationships and entities, the structure of the data becomes independent of the storage methodology. As the application and data evolve, so does the Data Model Entity. The abstract view allows for a much more streamlined method of transforming or moving data.

SQL Server Length Function Equivalent

Advertisements

The purpose of the Length function in SQL

The SQL LENGTH function returns the number of characters in a string. The LENGTH function is available in many Database Management Systems (DBMS).

The LENGTH Function Syntax

  • LENGTH(string)

LENGTH Function Notes

  • If the input string is empty, the LENGTH returns 0.
  • If the input string is NULL, the LENGTH returns NULL.

Length Function Across Databases

When working as a technical consultant, one has to work with customer’s databases and as you move from one database to another you will find that the function commands may vary–assuming the database has an equivalent function.

Working with VQL and SQL Server got me thing about the LENGTH() function, so, here is a quick references list, which does include the SQL Server.  

IBM DB2

  • LENGTH( )

IBM Informix

  • CHAR_LENGTH() Or CHARACTER_LENGTH()

MariaDB

  • LENGTH( )

Microsoft SQL Server

  • LEN( )

MySQL

  • CHAR_LENGTH() Or CHARACTER_LENGTH()

Netezza

  • LENGTH( )

Oracle

  • LENGTH( )

PostgreSQL

  • CHAR_LENGTH() Or CHARACTER_LENGTH()

SOQL (SalesForce)

  • SOQL has no LENGTH function

VQL (Denodo)

  • LEN( )

Technology – Denodo Data Virtualization Project Roles

Advertisements

A Denodo virtualization project typically classifies the project duties of the primary implementation team into four Primary roles.

Denodo Data Virtualization Project Roles

  • Data Virtualization Architect
  • Denodo Platform Administrator
  • Data Virtualization Developer
  • Denodo Platform Java Programmer
  • Data Virtualization Internal Support Team

Role To Project Team Member Alignment

While the denodo project is grouped into security permissions and a set of duties, it is import to note that the assignment of the roles can be very dynamic as to their assignment among project team members.  Which team member who performs a given role can change the lifecycle of a denodo project.  One team member may hold more than one role at any given time or acquire or lose roles based on the needs of the project.

Denodo virtualization Project Roles Duties

Data Virtualization Architect

The knowledge, responsibilities, and duties of a denodo data virtualization architect, include:

  • A Deep understanding of denodo security features and data governance
  • Define and document5 best practices for users, roles, and security permissions.
  • Have a strong understanding of enterprise data/information assets
  • Defines data virtualization architecture and deployments
  • Guides the definition and documentation of the virtual data model, including, delivery modes, data sources, data combination, and transformations

Denodo Platform Administrator

The knowledge, responsibilities, and duties of a Denodo Platform Administrator, Include:

  • Denodo Platform Installation and maintenance, such as,
    • Installs denodo platform servers
    • Defines denodo platform update and upgrade policies
    • Creates, edits, and removes environments, clusters, and servs
    • Manages denodo licenses
    • Defines denodo platform backup policies
    • Defines procedures for artifact promotion between environments
  • Denodo platform configuration and management, such as,
    • Configures denodo platform server ports
    • Platform memory configuration and Java Virtual Machine (VM) options
    • Set the maximum number of concurrent requests
    • Set up database configuration
      • Specific cache server
      • Authentication configuration for users connecting to denodo platform (e.g., LDAP)
      • Secures (SSL) communications connections of denodo components
      • Provides connectivity credentials details for clients tools/applications (JDBC, ODBC,,,etc.)
      • Configuration of resources.
    • Setup Version Control System (VCS) configuration for denodo
    • Creates new Virtual Databases
    • Create Users, roles, and assigns privileges/roles.
    • Execute diagnostics and monitoring operations, analyzes logs and identifies potentials issues
    • Manages load balances variables

Data Virtualization Developer

The Data Virtualization Developer role is divided into the following sub-roles:

  • Data Engineer
  • Business Developer
  • Application Developer

the knowledge, responsibilities, and duties of a Denodo Data Virtualization Developer, by sub-role, Include:

Data Engineer

The denodo data engineer’s duties include:

  • Implements the virtual data model construction view by
    • Importing data sources and creating base views, and
    • Creating derived views applying combinations and transformations to the datasets
  • Writes documentation, defines testing to eliminate development errors before code promotion to other environments

Business Developer

The denodo business developer’s duties include:

  • Creates business vies for a specific business area from derived and/or interface views
  • Implements data services delivery
  • Writes documentation

Application Developer

The denodo application developer’s duties include:

  • Creates reporting vies from business views for reports and or datasets frequently consumed by users
  • Writes documentation

Denodo Platform Java Programmer

The Denodo Platform Java Programmer role is an optional, specialized, role, which:

  • Creates custom denodo components, such as data sources, stored procedures, and VDP/iTPilot functions.
  • Implements custom filters in data routines
  • Tests and debugs any custom components using Denodo4e

Data Virtualization Internal Support Team

The denodo data virtualization internal support team’s duties include

  • Access to and knowledge of the use and trouble of developed solutions
  • Tools and procedures to manage and support project users and developers

denodo Virtualization – Useful Links

Advertisements

Here are some denodo Virtualization references, which may be useful.

Reference Name Link
denodo Home Page https://www.denodo.com/en/about-us/our-company
denodo Platform 7.0 Documentation https://community.denodo.com/docs/html/browse/7.0/
denodo Knowledge Base and Best Practices https://community.denodo.com/kb/
denodo Tutorials https://community.denodo.com/tutorials/
denodo Express 7.0 Download https://community.denodo.com/express/download
Denodo Virtual Data Port (VDP) https://community.denodo.com/kb/download/pdf/VDP%20Naming%20Conventions?category=Operation
JDBC / ODBC drivers for Denodo https://community.denodo.com/drivers/
Denodo Governance Bridge – User Manual https://community.denodo.com/docs/html/document/denodoconnects/7.0/Denodo%20Governance%20Bridge%20-%20User%20Manual
Virtual DataPort VQL Guidehttps://community.denodo.com/docs/html/browse/7.0/vdp/vql/introduction/introduction
Denodo Model Bridge – User Manualhttps://community.denodo.com/docs/html/document/denodoconnects/7.0/Denodo%20Model%20Bridge%20-%20User%20Manual
Denodo Connects Manualshttps://community.denodo.com/docs/html/browse/7.0/denodoconnects/index
Denodo Infosphere Governance Bridge – User Manualhttps://community.denodo.com/docs/html/document/denodoconnects/7.0/Denodo%20Governance%20Bridge%20-%20User%20Manual

PostgreSQL – Useful Links

Advertisements

PostgreSQLUseful Links

Here are some PostgreSQL references, which may be useful.

Reference Type Link
PostgreSQL Home page & Download https://www.postgresql.org/
PostgreSQL Online Documentation https://www.postgresql.org/docs/manuals/
Citus – Scalability and Parallelism Extension https://github.com/citusdata
Free PostgreSQL Training https://www.enterprisedb.com/free-postgres-training
PostgreSQL Wiki https://wiki.postgresql.org/wiki/Main_Page

The Business Industries Using PostgreSQL

Advertisements

PostgreSQL is an open-source database, which was released in 1996. So, PostgreSQL has been around a long time.   So, among the many companies and industries which know they are using PostgreSQL, many others are using PostgreSQL and don’t know it because it is embedded as the foundation in some other application’s software architecture.  

I hadn’t paid much attaint to PostgreSQL even though it as been on the list leading databases used by business for years.  Mostly I have been focused on the databases my customer were using (Oracle, DB2, Microsoft SQL Server, and MySQL/MariaDB).  However, during a recent meeting I was surprised to learn that io had been using and administering PostrgresSQL embedded as part of another software vendors application, which made me take the time to pay attention to PostgreSQL. Especially, who is using PostgreSQL and what opportunities that may provide for evolving my career? 

The Industries Using PostgreSQL

According to enlyft, the major using the PostgreSQL are Computer Software and Information Technology And services companies.  

  PostgreSQL Consumers Information

Here is the  link to enlyft page, which provides additional information companies and industries using PostgreSQL:

How To Get A List Of Oracle Database Schemas?

Advertisements

Well, this is one of those circumstances, where your ability to answer this question will depend upon your user’s assigned security roles and what you actually want. 

To get a complete list, you will need to use the DBA_ administrator tables to which most of us will not have access.  In the very simple examples below, you may want to add a WHERE clause to eliminate the system schemas from the list, like ‘SYS’ and ‘SYSTEM,’ if you have access to them.

Example Administrator (DBA) Schema List

SELECT distinct OWNER as SCHEMA_NAME

FROM  DBA_OBJECTS

ORDER BY OWNER;

Example Administrator (DBA) Schema List Results Screenshot

Fortunately for the rest of us, there are All user tables, from which we can get a listing of the schemas to which we have access.

Example All Users Schema List

SELECT distinct OWNER as SCHEMA_NAME

FROM    ALL_OBJECTS

ORDER BY OWNER;

Example All Users Schema List Results Screenshot

Related References

Oracle help Center > Database> Oracle > Oracle Database > Release 19

Oracle SQL*Plus Is still Around

Advertisements

It is funny how you cannot work with some for a while because of newer tools, and then rediscover them, so to speak.  The other day I was looking at my overflow bookshelf in the garage and saw an old book on Oracle SQL*Plus and was thinking, “do I still want or need that book?”. 

In recent years I have been using a variety of other tools when working with oracle. So, I really hadn’t thought about the once ubiquitous Oracle SQL*Plus command-line interface for Oracle databases, which around for thirty-five years or more.  However, I recently needed to do an Oracle 18C database install to enable some training and was pleasantly surprised Oracle SQL*Plus as a menu item. 

Now I have been purposely using Oracle SQL*Plus again to refresh my skills, and I will be keeping my Oracle SQL*Plus: The Definitive Guide, after all. 

How to Determine Your Oracle Database Name

Advertisements

Oracle provides a few ways to determine which database you are working in.  Admittedly, I usually know which database I’m working in, but recently I did an Oracle Database Express Edition (XE) install which did not goes has expected and I had reason to confirm which database I was actually in when the SQL*Plus session opened.  So, this lead me to consider how one would prove exactly which database they were connected to.  As it happens, Oracle has a few ways to quickly display which database you are connected to and here are two easy ways to find out your Oracle database name in SQL*Plus:

  • V$DATABASE
  • GLOBAL_NAME

Checking the GLOBAL_NAME table

The First method is to run a quick-select against the GLOBAL_NAME table, which. is publicly available to logged-in users of the database

Example GLOBAL_NAME Select Statement

select * from global_name;

Checking the V$DATABASE Variable

The second method is to run a quick-select a V$database. However, not everyone will have access to the V$database variable.

Example V$database Select Statement

select name from V$database;

OLTP vs Data Warehousing

Advertisements

OLTP Versus Data Warehousing

I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager, and another. You get the picture.
In this article I will explain this very same thing to you. So you know  how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.

OLTP

OLTP stands for OLine Transactional Processing. With other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnoses and a treatment plan in there as well. Doctors, nurses and other people working with patients use this system on a daily basis to enter and get detailed information on their patients.
The way the data is stored within operational systems is so the data can be used efficiently by the people working directly on the product, or with the patient in this case.

Data Warehousing

A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day to day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnoses that takes more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.

The difference between OLTP and Data Warehousing

This is how a Data Warehouse works:

The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transforms the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that reports are formed with a reporting tool, from the data that lies in the Data Warehouse.

This is how OLTP works:

Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports form the operational databases.

Pro’s and Con’s

Data Warehousing

Pro’s:

  • There is no strain on the operational systems during business hours
    • As you can schedule the ETL processes to run during the hours the least amount of people are using the operational system, you won’t disturb the operational processes. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
  • Data from different systems can be combined
    • It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
  • Data is optimized for making queries and reports
    • You use different data in reports than you use on a day to day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month and year is saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
  • Data is saved longer than in the source systems
    • The source systems need to have their old records deleted when they are no longer used in the day to day operations. So they get deleted to gain performance.

Con’s:

  • You always look at the past
    • A Data Warehouse is updated once a night, or even just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients are in the hospital are right now. Or what surgeon didn’t show up on time this morning.
  • You don’t have all the data
    • A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
  • Data isn’t the same as the data in the source systems
    • Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example: the Data Warehouse at the hospital excluded all transactions that were marked as cancelled. If you try to get the same reports from both systems, and don’t exclude the cancelled transactions in the source system, you’ll get different results.

online transactional processing (OLTP)

Pro’s

  • You get real time data
    • If someone is entering a new record now, you’ll see it right away in your report. No delays.
  • You’ve got all the details
    • You have access to all the details that the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.

Con’s

  • You are putting strain on an application during business hours.
    • When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day to day operations. And if you make an error, by for instance forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
  • You can’t compare the data with data from other sources.
    • Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data is always going to be different because it is granulated on a different level, or not all data is relevant for both systems.
  • You don’t have access to old data
    • To keep the applications at peak performance, old data, that’s irrelevant to day to day operations is deleted.
  • Data is optimized to suit day to day operations
    • And not for report making. This means you’ll have to get creative with your queries to get the data you need.

So what method should you use?

That all depends on what you need at that moment. If you need detailed information about things that are happening now, you should use OLTP.
If you are looking for trends, or insights on a higher level, you should use a Data Warehouse.

 Related References

Data Modeling – Column Data Classification

Advertisements

Column Data Classification

When analyzing individual column data, at its most foundational level, column data can be classified by their fundamental use/characteristics.  Granted, when you start rolling up the structure into multiple columns, table structure and table relationship, then other classifications/behaviors, such as keys (primary and foreign), indexes, and distribution come into play.  However, many times when working with existing data sets it is essential to understand the nature the existing data to begin the modeling and information governance process.

Column Data Classification

Generally, individual columns can be classified into the classifications:

  • Identifier — A column/field which is unique to a row and/or can identify related data (e.g., Person ID, National identifier, ). Basically, think primary key and/or foreign key.
  • Indicator — A column/field, often called a Flag, that has a binary condition (e.g., True or False, Yes or No, Female or Male, Active or Inactive). Frequently used to identify compliance with complex with a specific business rule.
  • Code — A column/field that has a distinct and defined set of values, often abbreviated (e.g., State Code, Currency Code)
  • Temporal — A column/field that contains some type date, timestamp, time, interval, or numeric duration data
  • Quantity — A column/field that contains a numeric value (decimals, integers, etc.) and is not classified as an Identifier or Code (e.g., Price, Amount, Asset Value, Count)
  • Text — A column/field that contains alphanumeric values, possibly long text, and is not classified as an Identifier or Code (e.g., Name, Address, Long Description, Short Description)
  • Large Object (LOB)– A column/field that contains data traditional long text fields or binary data like graphics. The large objects can be broadly classified as Character Large Objects (CLOBs), Binary Large Objects (BLOBs), and Double-Byte Character Large Object (DBCLOB or NCLOB).

What is a Common Data Model (CDM)?

Advertisements

What is a Common Data Model (CDM)?

A Common Data Model (CDM) is a share data structure designed to provide well-formed and standardized data structures within an industry (e.g. medical, Insurance, etc.) or business channel (e.g. Human resource management, Asset Management, etc.), which can be applied to provide organizations a consistent unified view of business information.   These common models can be leveraged as accelerators by organizations form the foundation for their information, including SOA interchanges, Mashup, data vitalization, Enterprise Data Model (EDM), business intelligence (BI), and/or to standardize their data models to improve meta data management and data integration practices.

Related references

IBM, IBM Analytics

IBM Analytics, Technology, Database Management, Data Warehousing, Industry Models

github.com

Observational Health Data Sciences and Informatics (OHDSI)/Common Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Utilities Data Model

Oracle

Industries, Communications, Service Providers, Products, Data Mode, Oracle Communications Data Model

Oracle

Oracle Technology Network, Database, More Key Features, Airline data Model

Databases – Database Isolation Level Cross Reference

Advertisements

Database And Tables

 

Here is a table quick reference of some common database and/or connection types, which use connection level isolation and the equivalent isolation levels. This quick reference may prove useful as a job aid reference, when working with and making decisions about isolation level usage.

Database isolation levels

Data sources

Most restrictive isolation level

More restrictive isolation level

Less restrictive isolation level

Least restrictive isolation level

Amazon SimpleDB

Serializable Repeatable read Read committed Read Uncommitted

dashDB

Repeatable read Read stability Cursor stability Uncommitted read

DB2® family of products

Repeatable read Read stability* Cursor stability Uncommitted read

Informix®

Repeatable read Repeatable read Cursor stability Dirty read

JDBC

Serializable Repeatable read Read committed Read Uncommitted

MariaDB

Serializable Repeatable read Read committed Read Uncommitted

Microsoft SQL Server

Serializable Repeatable read Read committed Read Uncommitted

MySQL

Serializable Repeatable read Read committed Read Uncommitted

ODBC

Serializable Repeatable read Read committed Read Uncommitted

Oracle

Serializable Serializable Read committed Read committed

PostgreSQL

Serializable Repeatable read Read committed Read committed

Sybase

Level 3 Level 3 Level 1 Level 0

 

Related References

 

Data Modeling – Fact Table Effective Practices

Advertisements

Here are a few guidelines for modeling and designing fact tables.

Fact Table Effective Practices

  • The table naming convention should identify it as a fact table. For example:
    • Suffix Pattern:
      • <<TableName>>_Fact
      • <<TableName>>_F
    • Prefix Pattern:
      • FACT_<TableName>>
      • F_<TableName>>
    • Must contain a temporal dimension surrogate key (e.g. date dimension)
    • Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, and AVG, etc.)
    • Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
    • Do not place the dimension processing in the fact jobs

Data Modeling – Dimension Table Effective Practices

Advertisements

I’ve had these notes laying around for a while, so, I thought I consolidate them here.   So, here are few guidelines to ensure the quality of your dimension table structures.

Dimension Table Effective Practices

  • The table naming convention should identify it as a dimension table. For example:
    • Suffix Pattern:
      • <<TableName>>_Dim
      • <<TableName>>_D
    • Prefix Pattern:
      • Dim_<TableName>>
      • D_<TableName>>
  • Have Primary Key (PK) assigned on table surrogate Key
  • Audit fields – Type 1 dimensions should:
    • Have a Created Date timestamp – When the record was initially created
    • have a Last Update Timestamp – When was the record last updated
  • Job Flow: Do not place the dimension processing in the fact jobs.
  • Every Dimension should have a Zero (0), Unknown, row
  • Fields should be ‘NOT NULL’ replacing nulls with a zero (0) numeric and integer type fields or space ( ‘ ‘ ) for Character type files.
  • Keep dimension processing outside of the fact jobs

Related References

Data Modeling – Database Table Field Ordering Effective Practices

Advertisements

Field order can help the performance on inserts and updates and, also, keeps developer and users from having to search entire table structure to be sure they have all the keys, etc.

Table Field Ordering

  1. Distribution Field Or Fields, if no distribution field is set the first field will be used by default.
  2. Primary Key Columns (including Parent and Child key fields)
  3. Foreign Key Columns (Not Null)
  4. Not Null Columns
  5. Nullable Columns
  6. Created Date Timestamp
  7. Modified (or Last Updated) Date Timestamp
  8. Large text Fields
  9. Large binary Columns or Binary Field references

Related References