Technology – What Is The Data Fabric Approach?

What is The data fabric, and how does it automating discovery, creation, and ingestion help organizations? Data-fabric tools, which can be appliances, devices, or software, allow users to quickly, easily, and securely access and manage large amounts of data. Automating the discovery, creation, and ingestion, big data Fabric accelerates real-time insights from operational data silos, reducing IT expenses. While this is already a buzzword amongst business architects and data enthusiasts, what exactly does the introduction of data-fabric tools mean for you?

In an enterprise environment, managing information requires integrating diverse systems, applications, storage, and servers. This means that finding out what consumers need is often difficult without the aid of industry-wide data-analyzing, data-warehousing, and application discovery methods. Traditional IT policies such as traditional computing, client-server, or workstation-based architectures are no longer enough to satisfy the needs of companies within an ever-changing marketplace.

Companies in the information age no longer prefer to work in silos. Organizations now face the necessity of automating the management of their data sources. This entails the management of a large number of moving parts -not just one. Therefore, a data management system needs to be very flexible and customizable to cope with the fast changes taking place in information technology. The traditional IT policies may not keep up with the pace of change; thus, some IT departments might be forced to look for alternative solutions such as a data fabric approach. A data-fabric approach automates the entire data management process, from discovery to ingestion.

Data fabrics are applications that enable organizations to leverage the full power of IT through a common fabric. With this approach, real-time business decisions can be made, enabling the tactical and strategic deployment of applications. Imagine the possibilities: using data management systems to determine which applications should run on the main network or which ones should be placed on a secondary network. With real-time capabilities, these applications can also be able to use different storage configurations – meaning, real-time data can be accessed from any location, even while someone is sleeping. And because the applications running on the fabric are designed to be highly available and fault-tolerant, any failure within the same fabric will not affect other services or applications. This results in a streamlined and reliable infrastructure.

There are two types of data fabrics: infrastructure-based and application-based. Infrastructure-based data fabrics are used in large enterprises where multiple applications need to be implemented and managed simultaneously. For example, the IT department may decide to use an enterprise data lake (EDL) to use many file servers. Enterprise data lakes allow users to access data directly from the source rather than log on to a file server every time they need information. File servers are more susceptible to viruses, so IT administrators may find it beneficial to deploy their EDLS over the file server. This scenario exemplifies the importance of data preparation and recovery.

Application-wise, data preparation can be done by employing the smart enterprise graph (SEM). A smart enterprise graph is one in which all data sources (read/write resources) are automatically classified based on capacity and relevance and then mapped in a manner that intelligently allows organizations to rapidly use the available resources. Organizations can decide how to best utilize their data sources based on key performance indicators (KPIs), allowing them to make the most of their available resources. This SEM concept has been implemented in many different contexts, including online retailing, customer relationship management (CRM), human resources, manufacturing, and financial industries.

Data automation also provides the basis for big data fabric, which refers to collecting, preparing, analyzing, and distributing big data on a managed infrastructure. In a big data fabric environment, data is processed more thoroughly and more quickly than ingesting on a smaller scale. Enterprises are able to reduce costs, shorten cycle times, and maximize operational efficiencies by automating ingesting, processing, and deployment on a managed infrastructure. Enterprises may also discover ways to leverage their existing network and storage systems to improve data processing speed and storage density.

When talking about what is data fabric approach, it’s easy to overstate its value. However, in the right environments and with the right intelligence, data fabrics can substantially improve operational efficiencies, reduce maintenance costs, and even create new business opportunities. Any company looking to expand their business should consider deploying a data fabric approach as soon as possible. In the meantime, any IT department looking to streamline its operations and decrease workloads should investigate the possibility of implementing a data fabrics approach.

Technology – What Can Dremio Do For You?

Dremio is a cloud-based platform providing business data lake storage and analytic solutions. Dremio’s is a major competitor with:

  • Denodo,
  • DataBrick, and
  • Cloudera.

Dremio provides fast, fault-tolerant, scalable, and flexible database access with MySQL, Informix, PHP, Java-location, and more. Their database engine is based on Apache Arrow and is designed for fast, low-cost, and high-throughput data access for any web application.

Dremio provides high-throughput ingested data lakes optimized on Apache Arrow and MySQL for fast, fault-tolerant, scalable, and flexible query and data ingestion. With Dremio, you can easily put together a system capable of loading information as and when the user wants it, and you get highly flexible solutions for all kinds of businesses. With Dremio, your customer can focus on building his business rather than worrying about your server requirements.

If you are looking for a web analytics solution that will give you the insight you need to improve your business runs and grow, look no further than Dremio. With their state-of-the-art technology and user-friendly user interface, you can manage your dynamic data and queries easily and efficiently with just a few clicks. With their free today and pay later plans, you can take advantage of Dremio for your small and medium-sized business. In addition to their sophisticated and powerful analytics tools, they also offer advanced reporting such as real-time reporting for enterprise deployment options.

Dremio was developed by two world-class industry veterans who have spent years developing it into what it is today. With this software, you can build a highly efficient and secure data access and analytical layer with MySQL, PHP, Informix, and other layers such as HDFS, Ceph, and Red Hat Enterprise Linux. Their objective is to provide the best in data governance and security along with easy and intuitive access to your dynamic data. The result is an intuitive solution for all of your data access needs, from scheduling data jobs to back-up and restore. With Dremio, your developers will focus on their core business and let the technology work for you to provide you with an effective data layer.

With Dremio, your team can take full advantage of their built-in semantic layer that allows them to manage and access a rich data model without writing the SQL or Java code. With Dremio, your team can: Create, drop, update and delete all information in the semantic layer. With the ability to manage, view, and search for schemas, relationships, schemas, and tables, you can take full advantage of your full Dremio license along with powerful analytical abilities.

Another way that Dremio helps your team gain analytical power is by providing easy access to their own set of tools. The most powerful tool available to your team is the Metadata Browser. With the Metadata Browser, you can preview all of the stored information in your chosen Dataset. You can see all of the relationships, columns, names, sizes, and other details that you want to work with.

If you are looking for an easy way to manage and update all of your Datasets and work with multiple Datasets simultaneously, then using the Data Catalog is a must! With the Data Catalog, you will not only be able to view your entire data catalog at once but also drill down into it for further investigation. Imagine being able to update all of your Datasets, groups, departments, and projects all in one place. This feature alone could save your team hours each week!

When you are choosing your Dremio provider, make sure that they offer the Data Catalog. Dremio also offers a data source editor, so if you are a newcomer to Dremio and do not know how to build a data source, this is a great feature to have. After all, how many times have you wanted to import a certain group of Datasets and cannot remember exactly where you saved it? The Data Catalog makes it easy and painless to import and save your data. This is probably one of the best features of Dremio that I can talk about.

Technology – The Advantages of Using Microsoft SQL Server Integration Services

Microsoft SQL Server Integration Services (SSIS) is designed to combine the features of SQL Server with components of Enterprise Management System (EMMS) so that they can work together for enterprise solutions. Its core area of expertise is bulk/batched data delivery. As a SQL Server collection member, Integration Services is a logical solution to common organizational needs and current market trends, particularly those expressed by previously installed SQL Server users. It extends SSIS functionality, such as data extraction from external sources, data transformations, data maintenance, and data management. It also helps to convert data from one server into another.

There are several ways to use SSIS. External data sources may be data obtained from an outside source, such as a third-party application, or data obtained from an on-site database, such as a company’s own system. These external sources may contain transformations, including automatic updates, or specific requests, such as viewing certain data sources. There is also the possibility of data integration, in which different sets of data sources may be integrated into SSIS. Integration Services is useful for developing, deploying, and maintaining customer databases and other information sources.

The advantage of integrating SSIS with other vendors’ products is that it allows information to be made available within the organization and outside the organization. In other words, vendors can sell to internal users as well as external customers. Integration Services is usually sold as part of Microsoft SQL Server solutions. However, some companies may develop their own SSIS interfaces and build the entire communication layer independently.

There are two major advantages of using SSIS. The first is great support for telecommunication companies and enterprises that need to process a huge amount of information quickly and efficiently. Telecommunication companies use SSI to interface with other modules such as Microsoft Office applications, Sharepoint, and more. Another advantage of SSI is that integration provides access to all of the capabilities that a particular program or server has, such as data integration with Microsoft Visual Basic and JavaScript and the program’s full functionality or server. SSI is commonly used for web applications, particularly in sites that have to process large amounts of data quickly and efficiently.

There are a few disadvantages of using SSI, however. SSI is quite slow when compared to VBA and another object-oriented programming (OOP) methods. SSI also has some disadvantages in data quality, and the SSI interface can be difficult to use if one does not know how to code in the programming language. SSI is also limited in the number of programs and applications that can be integrated into one installation of SSI.

SSI is not only less flexible than VBA but can also be slower when compared to the traditional VBA script programs, as well. SSI can use a program or server with an SSI interface. Still, not all programs and servers that support SSI will provide an interactive command line for integration with a Microsoft SQL Server Integration Services database. In some cases, an interactive command line is necessary for SSI to use the DTS file necessary to process the data from an in-house database. SSI cannot connect to SSO independently but can use an in-house or external SSI file as a starting point for a connect and bind scenario.

For SSI to work effectively in a team-based development environment, the developer must understand and be familiar with the program. SSI has been designed with several different developer topologies and languages to write code and have it run in a timely manner while keeping track of files that might not be included with the program. A team-based development environment should be defined as a group effort where regular communication between team members and corporate databases can help this process along. SSI was designed to provide developers with the flexibility and control they need to maintain these relationships.

SSI can provide several advantages over VBA, including support for data structures in various programming languages and formats. This type of integration can save time for a business and is very cost-effective. SSI also provides several different programming interfaces and is flexible enough to use in any environment. If your company needs to use SSI, you must take the time to learn how to integrate it with your company’s database to ensure that the data structures used are compatible and effective for your application.

Technology – When To Cache A Denodo View

Here’s a quick summary of practices about when to use cache when developing denotative views.  These guidelines come from the usual documentation and practical experience and may help you decide whether to cache a view. These are general guidelines, and they should happen the conflict with any guidance you’ve gotten from the Denodo; Please use the advice provided by Denodo.

What is a table cache?

In denodo, a cache is a database table that contains a result set of a view at the point in time, which is stored in a JDBC database

Why Cache?

Cache in Denodo can be used for several purposes:

Enhancing Performance

Improving performance is the primary purpose of caching and can be overcome slow data sources, data sources with limited SQL functionality, and/or tuned long runner views. 

Protecting data sources from costly queries

Caching can shield essential systems from excess load cause by query load from large, long-running queries and/or frequent queries during critical operation times.

Reusing complex data combinations and transformations

Caching views that consolidate data from multiple data sources, perform complex calculations, and apply complex derivations and business rules provide and optimize pre-enriched data set for consumption.

Cache View Modeling Best Practice

Add a primary key or a unique index

Adding a primary key or a unique index helps the optimizer define performance strategies and accurate cost estimates when the view joins to other views.

Add Cache indexes

Add Cache indexes based on understanding actual consumer usage of view (e.g., commonly used prompts, etc.)

Caching Tips and Cautions

Here are some considerations to keep in mind when making caching decisions.

Avoid Caching Intermediate Views

Where possible, avoiding caching of intermediate views allows the optimizer to make better decisions about data movement, pushdown, and branching.  This allows denodo to perform great SQL simplification. 

The volume of view to be cached

Where possible, avoid caching large views (e.g., views with a large number of rows/columns). Evaluate the cache size and make an appropriate decision.

Denodo Reference Links

Best Practices to Maximize Performance III: Caching

Denodo E-books

Denodo Cookbook: Query Optimization

Related Blog Posts

Denodo View Performance Best Practice

List of denodo supported JDBC databases

The question ‘What are the JDBC databases supported by denodo” is one of those questions, which always seems to come up either in customer meetings, Proof Of Concept (POC) implementations, or planning for POC’s. While it is documented by denodo, I seem to spend more time looking for it than I should. So, I thought it might be useful to document the Supported JDBC Data Sources page URL for easy reference.

Related References

Denodo > User Manuals > Virtual DataPort Administration Guide > Appendix > Supported JDBC Data Sources

How to save Denodo Virtual Data Port (VDP) VQL Shell results

In a recent project, I was asked by a new user of Denodo Virtual Data Port (VDP) how to save denodo VDP VQL Shell results. So, here is a simple outline of the process of exporting the VQL Shell results:

  • Execute the VQL Shell SQL
  • Click on the ‘Save’ button in the ‘Results’ tab to save the results
  • A dialog box will open
  • Check  ‘include Results’ checkbox
  • Set the separator character
  • Set the path of the Output file and desired filename.
  • If you want to the results with a header in a delimited file, enable ‘Include header’ checkbox.
  • Click ‘OK’

Where to find the Denodo ODBC Driver

Recently having had to look up where to download the Denodo 7.0 Open Database Connectivity (ODBC) driver. So, I thought it might be useful document the ODBC driver download page.

Related References

Denodo > ODBC > Denodo 7.0

How to save Denodo Virtual Data Port (VDP) view results

In a recent project, I was asked by a new user of Denodo Virtual Data Port (VDP) how to save denodo VDP view results. So, here is a simple outline of the process of exporting the view results:

  • Execute the view
  • Click on the ‘Save’ button in the Results tab to save results
  • A dialog box will open
  • Check  ‘include Results’ checkbox
  • Set the separator character
  • Set the path of the Output file and desired filename.
  • If you want to the results with a header in a delimited file, enable ‘Include header’ checkbox.
  • Click ‘OK’

What is a Data Warehouse?

What is a Data Warehouse

The description of what a data warehouse is varies greatly.  The definition that I give that seems to work is that a data warehouse a database repository that supports system interfaces, reporting and business analysis, data integration and domain normalization, and structure optimization.  The structure can vary greatly depending the school of thought used to construct the data warehouse and will have at least one data mart.

What a data warehouse is:

  • A source of data and enriched information used for reporting and business analysis.
  • A repository of metadata that organizes data into hierarchies used in reporting and analysis

What a data warehouse is not:

  • A reporting application in and of itself; it is used by other applications to provide reporting and analysis.
  • An exact copy of all tables/data in the source systems.  Only those portions of source system tables/data required to support reporting and analysis are moved into data warehouse.
  • An Online Transaction Processing (OLTP) system.
  • An archiving tool.  Data is kept in data warehouse in accordance with the data retention guidelines and/or as long as need to support frequently used reporting and analysis needs.

Extraction, Transformation & Loading Vs. Enterprise Application Integration

Over recent years, business enterprises relying on accurate and consistent data to make informed decisions have been gravitating towards integration technologies. The subject of Enterprise Application Integration (EAI) and Extraction, Transformation & Loading (ETL) lately seems to pop up in most Enterprise Information Management conversations.

From an architectural perspective, both techniques share a striking similarity. However, they essentially serve different purposes when it comes to information management. We’ve decided to do a little bit of research and establish the differences between the two integration technologies.

Enterprise Application Integration

Enterprise Application Integration (EAI) is an integration framework that consists of technologies and services, allowing for seamless coordination of vital systems, processes, as well as databases across an enterprise.

Simply put, this integration technique simplifies and automates your business processes to a whole new level without necessarily having to make major changes to your existing data structures or applications.

With EAI, your business can integrate essential systems like supply chain management, customer relationship management, business intelligence, enterprise resource planning, and payroll. Well, the linking of these apps can be done at the back end via APIs or the front end GUI.

The systems in question might use different databases, computer languages, exist on different operating systems or older systems that might not be supported by the vendor anymore.

The objective of EAI is to develop a single, unified view of enterprise data and information, as well as ensure the information is correctly stored, transmitted, and reflected. It enables existing applications to communicate and share data in real-time.

Extraction, Transformation & Loading

The general purpose of an ETL system is to extract data out of one or more source databases and then transfer it to a target destination system for better user decision making. Data in the target system is usually presented differently from the sources.

The extracted data goes through the transformation phase, which involves checking for data integrity and converting the data into a proper storage format or structure. It is then moved into other systems for analysis or querying function.

With data loading, it typically involves writing data into the target database destination like data warehouse and operational data store.

ETL can integrate data from multiple systems. The systems we’re talking about in this case are often hosted on separate computer hardware or supported by different vendors.

Differences between ETL and EAI

EAI System

  • Retrieves small amounts of data in one operation and is characterized by a high number of transactions
  • EAI system is utilized for process optimization and workflow
  • The system does not require user involvement after it’s implemented
  • Ensures a bi-directional data flow between the source and target applications
  • Ideal for real-time business data needs
  • Limited data validation
  • Integrating operations is pull, push, and event-driven.

ETL System

  • It is a one-way process of creating a historical record from homogeneous or heterogeneous sources
  • Mainly designed to process large batches of data from source systems
  • Requires extensive user involvement
  • Meta-data driven complex transformations
  • Integrating operation is a pull, query-driven
  • Supports proper profiling and data cleaning
  • Limited messaging capabilities

Both integration technologies are an essential part of EIM, as they provide strong capabilities for business intelligence initiatives and reporting. They can be used differently and sometimes in mutual consolidation.