Thursday, January 28, 2010
Top data warehousing links
Top sites about data warehousing, business intelligence and ETL
Business Intelligence directory
www.bi-dw.info is an independent business intelligence and data warehousing Web informative platform. It provides reliable and high grade information on subjects such as Business Intelligence, Data Warehousing, ETL process and Data Integration, Enterprise Performance Management and Reporting. The Web site is a collection of Internet resources, which have been manually selected and edited. It also contains revisions and rating on offered materials and is constantly being developed by its authors and visitors. All the resources presented are available for free.
The site contains detailed information on commercially offered software grouped according to its application for Data Warehousing and Databases, Reporting and OLAP, and ETL and Data Integration. It also includes tutorial section with links to tutorial Web sites of commercial vendors and presentations of all data analyzing solutions available on the market, as well as vendor section providing a comprehensive list of software manufacturers. Besides being a complete source of information on commercially available tools, it also offers links to articles and forums on ETL, data warehousing, data management and business intelligence. All the resources come with a short description to facilitate the evaluation of its usefulness. For regular visitors recently updated resource categories are displayed on the homepage.
http://www.daniel-lemire.com/
This is a personal Web site and blog of Daniel Lemiere, a Computer Science Professor at Universite du Quebec a Montreal. His research interests focus on databases, data warehousing and OLAP, recommender systems, collaborative filtering and information retrieval. His Web site was designed for students, teachers and data management professionals. Everyone can contact Daniel Lemire and ask him questions about data management aspects or leave a comment on one of his posts.
In a subdirectory www.daniel-lemire.com/OLAP/ you can find a comprehensive bibliography of Data Warehouse and OLAP papers with links to full papers. There are also useful links and conferences listed. The bibliography is maintained from 1997 and last updated in November 2009. All the available papers are grouped in three sections: Conferences, Journals and Workshops, Data Warehousing, and OLAP. You can easily find the one that is interesting for you by using a detailed table of content. You can also help to update the site by submitting your papers and links.
http://www.dwinfocenter.org/
The Data Warehousing Info Center is a professional Web site geared to people who are new to data warehousing field. It contains a selection of assays that can help to build your knowledge about the topic. For beginners the assays on a definition of data warehousing, the case for data warehousing, the case against data warehousing, aspects of data warehousing architecture, a definition of decision support, and what decision support tools are used for are recommended. Additionally, the site includes many practical items of advice such as books worth reading, free or trial data management software recommended for training, and further learning.
http://www.dwreview.com/
www.dwreview.com is an objective analysis on data warehousing. Its authors have been actively involved in data warehousing for the last fifteen years and have been observing how this field has evolved and become more and more common in business. This resource deals with technologies such as ETL, Meta Data, Data Warehouses/Data Marts and Analytical Tools. All the expertises of the professional tools from commercial vendors (SAP BW, Oracle, Microsoft, PeopleSoft, Cognos, Data Mirror, MicrosStrategy, Crystal Decisions, Business Objects, SAS) discuss financial, manufacturing, services and retail aspects. All the tools described here have been used by the authors in their professional lives.
www.dwreview.com contains four main sections:
Data Warehousing Overview – describes data warehousing infrastructure and data life cycle management. It also provides basic information on pre-data warehouse, data cleansing, data repositories and front-end analysis
Data Warehousing articles – a collection of articles grouped in a few categories such as planning for a data warehouse or data life cycle management
Data mining – this section contains articles, tips and tricks on Data Mining, and description of various aspects of the process
OLAP – a selection of a few articles dealing with Online Analytical Processing
ETL Tools
http://www.etltools.net/ is an independent Web site about Extract Transform Load process. The portal is not affiliated with any of the commercial vendors whose software is described and compared there. All the resources are divided in three sections:
ETL software – provides reliable and approachable information about what ETL is and mentions the steps of which ETL process consist, explains how ETL tools work, describes types of ETL tools and lists commercial and open-source vendors offering ETL tools. In ETL tools comparison you can learn about advantages and disadvantages of the most popular ETL software on the market and become acquainted with comparison criteria. It also gives several pieces of advice on how to choose a suitable tool and how to start with ETL
Commercial ETL vendors – presents commercial vendors and their software in an impartial and neutral way and gives detailed information on ETL tools offered by each vendors. It includes descriptions of data integration platforms offered by Informatica, Information Server, Ab initio. Business Objects, Oracle, Integration Services, Sybase, SAS and Pervasive.
Free ETL tools – delineates open source (freeware) data integration tools, their pros and cons, and possible applications. It also gives more detailed description of free ETL tools provided by Pentaho Data Integration, Talend, Clover.ETL, and KETL. Additionally, it informs about the limitations of the open source license software.
http://www.1keydata.com/
www.1keydata.com is a data warehousing Web site providing reliable information on ETL, databases, OLAP and Business Intelligence. Its aim is to help beginners to develop their knowledge on the topics and to learn how to successfully implement a data warehousing project. The contents of the Web site is mostly information coming from author's personal experience as a business intelligence client and vendor. The site is vendor neutral, although some of the vendors names may be mentioned for informative purposes.
The site contains five sections:
Tools – this section describes all the types of tools used for data management and processing. The site is manufacturer neutral, although some of the names may be mentioned. The tools covered here are: database tools, ETL tools, OLAP, Reporting and Metadata tools. You can also learn about hardware used for data processing and about advantages, disadvantages and technical requirements of building your own data warehousing tools.
Steps – mentions all the most important steps required to conduct a data warehousing project from requirement gathering and query optimization to production rollout as well as author's personal approach and advice
Business Intelligence – discusses the relationship between data warehousing and business intelligence as well as business intelligence on its own
Concepts – explains and discusses main concepts in data warehousing such as dimensional data model, star schema, snowflake schema, slowly changing dimension, conceptual data model, logical data model, physical data model, data integrity, OLAP and its variants.
Glossary – explanation of common terms used in data warehousing
http://www.learndatamodeling.com/
LearnDataModeling.com is a complete tutorial in a form of a collection of articles that provide you with detailed knowledge about data modeling and how the technology fits in Data Warehousing. The Web site is created by professionals who want to share their experience in the field of data modeling and warehouse. It is addressed to both professionals and beginners, so you will not be overwhelmed by the amount of detailed information and specialist terms. LearnDataModeling.com is constantly being developed and the authors are planning to add more examples and tips in topics of data modeling, data warehousing and business intelligence.
All the available articles are divided in eight sections: Business Process and Business Modeling, Data Modeling, database and Data Modeling, Data Warehouse and ETL, Metadata and Business Intelligence, ERP (Enterprise Resource Planning) and Information Technology – Overview. Each section contains from four to twenty four articles. The articles describe the subject in an approachable but reliable way and many of them contain tables or graphs that makes understanding of the topic easier. After reading a piece of information of your interest you can easily navigate to next article to further enrich your knowledge or simply move to another section.
http://www.rapid-business-intelligence-success.com/
Rapid Business Intelligence Success is a business-addressed Web site describing advantages of application of Business Intelligence tools for successful management. This site offers some answers to common problem occurring during BI implementation but also provides basic knowledge about what actually is Business Intelligence (BI), Data Mining, and Data Warehousing. It uses simple language, avoids technical descriptions and terms, and emphasizes BI role as a tool in management of an enterprise. The directory is subdivided into several sections of which a few are related to BI:
Business Intelligence – gives a short explanation of the term, outlines possible application for improvement of company’s performance and points out the most common obstacles for BI implementation.
Data Warehousing – explains what a data warehouse is and what it is used for from a business point of view, and places it in a process of managing an enterprise. It also shortly describes ETL, the significance of data quality and the importance of application of ETL for successful data managing in a company. The section gives advice on how to structure company’s data (Star and snowflake dimensional data models) and how to correctly develop a data warehouse project.
Data mining – describes simple data mining techniques and depicts typical application for data mining
Software overview – an overview of business intelligence software such as Database Management Software or Systems, Data Extraction and Integration Tools, Querying and Reporting Tools, and Data Visualization or Front End or User Interfaces.
http://www.searchdatamanagement.com/
Search Data Management is a relevant and rich resource base for data management professionals and business leaders. It offers up to date news, advice provided by data management experts, customized research, learning guides and white papers. The site also offers tips on software vendors and products overview.
Search data management is an exhaustive source of knowledge for more advanced users. If you do not know what you are looking for you may easily get lost and overwhelmed by the amount of information offered, but if you are data management professional you can benefit a lot from browsing this Web site. All the topics presented are divides into a several sections, each of which contain a collection of articles, latest news, research and expert advice on the subtopic. Every subsection contains a highlighted list of the most important resources and a “must read” box with essential information on the subject. It also provides a list of useful links for external resources.
One of the biggest advantages of Search data Management is the fact, that you can ask your strategy and implementation questions on-line. They will be answered on a continual basis by some of the leading authorities in the data management field chosen for their professional knowledge. There is also a list of questions asked so far, so you may find a ready answer for your problem.
etl-tools.info
ETL Tools Info is an informative Web site about two main aspects of Business intelligence: ETL and data warehousing. It explains a role of business intelligence in modern business and describes ETL process and Data Warehouse in a more detailed way. It also depicts data warehouse architecture and related components such as data marts, BI reporting, data federation, data warehouse team, metadata, and SCD.
The portal provides tutorials and examples on IBM Infosphere Information Server, Datastage, SAS and Pentaho Open Source.
For users looking for practical knowledge there are ETL and Data Warehousing tutorials. They are organized into lessons illustrating various business intelligence scenarios for typical data warehousing or ETL challenge. All the tutorials constitute a knowledge base with practical examples teaching how to implement and manage a Extract Transform Load process in data warehouse environment. The topics covered are: surrogate key generation, header and trailer processing, loading customers, data allocation for ETL, data masking, data quality testing, XML ETL processing with sample tasks in Pentaho Data Integration, DataStage, SAS, Cognos Power Play and Pentaho Kettle. There are also available tutorials dedicated to certain platforms such as Datastage, SAS ETL and Pentaho for more advanced users who are already familiar with these platforms.
Saturday, January 9, 2010
Data warehousing design
Once one knows what data warehousing really is, it can be used in one’s business or company. As every useful tool it needs to be kept up to date and maintained properly. If it’s not then it can become a hindrance rather than a help.
So, let’s take a closer look at some advice about how to design and implement one’s data warehouse in a way that maximize the probability of success.
First group of principles is concerned with the business and includes: organizational consensus, data integrity, implementation efficiency, user friendliness, and operational efficiency.
The first principle, organizational consensus, is concerned with one’s employees. If the team is not willing to accept and make use of the data warehouse it’s pointless to design and improve it. If the data warehouse is perceived as a futile intrusion by a company’s staff then they will not approve it and consequently will not benefit from it. Thus it’s significant to make sure to introduce the data warehouse concept properly to the staff.
Next principle, data integrity, stresses the importance of having a consistent data warehouse. The chances for data inconsistency and replication should be minimized. If data integration and standardization is done then any methodology that is implemented should be working.
The third principle, implementation efficiency, says about introducing and using data warehouse in the most effective way possible. If one’s data warehouse looks beautiful and elegant but does not meet company’s needs then it’s simply impractical.
The second last principle is user friendliness. Not only one’s staff needs to approve the data warehouse but, what is also equally important, be able to use it effectively. It should be designed in a way that guarantees that the least technically gifted person in the team will be capable of using it successfully.
The last principle, operational efficiency, is closely connected with the implementation efficiency because real operational efficiency can be achieved only with the data warehouse that is designed in a way that guarantees its easy implementation and maintenance.
A proper date warehouse should be designed as straightforward and efficient in implementation. A properly designed data warehouse should be easy enough to support and facilitate immediate responses to business requests.
The above information can be complimented with two principles for effective data warehouse design for IT environments.
The first one is scalability which can often pose a big problem in designing a proper data warehouse. The solution for this is simply to work in scalability from the very beginning. A company should use toolsets and platforms supporting future expansion of data and changing business requirements.
The second one is compliance with IT standards. It is simply about the need of levering already existing skill sets of both IT and business users.
Sunday, December 20, 2009
ETL architecture
At the beginning was a real mess with data which was tried to implement into the people’s minds and everybody around let’s say “big bosses” was expecting from those poor people to use that information in correct way, to know everything about information which were circulating within the company. It was real misunderstanding from side of those people who should be run companies in certain way which would allowed to maintain good atmosphere. among everybody within the firm.
Fortunately some really smart people in form of academic scientists, managing directors mainly on business ground made some struggle and created very very good tools which caused revolution in the area of managing of information.
Because it was plenty of ideas and plenty good tools so every firm has its own choice and it can choose from the verity of tools which are not replaceable in today’s world.
Let’s write something about one of them as I mentioned before. ETL architecture which allows to extract, transform and load data into certain way which allows to use that data in normal mode for example to sell things from warehouse of certain company by people employed there which they have not knowledge about specific and all data which are in that firm because they do not need to know. It is enough for them just this particular we can say computer program created for that purpose which allows them to work smoothly and reach goals set up by management.
To describe and give the certain definition to ETL architecture we can say that it is the core process of data integration and it is mainly connected and linked with warehouse data. ETL is able to extract data from the source let’s say from the top to the new format which is needed for particular business rules in the company and then set up as a target structure for that firm.
We see here that ETL architecture is the tool which is so much needed in today’s world of companies where certain good planning and managing of information by these tools not necessary ETL architecture because we have plenty more of them can be really the best way to the success.
We can divde this particular process of transforming the way that data works for us.
First of all we can use the ETL architecture in more simple way to reach target point but that will not be working at the highest level of our expectancy. That is why we can use more complicated way to use the same system to transform our data but in other available way and when we finish we will have program in which will be able to work more effectively because as it is in the live easy ways to do things something are not necessary good in longer terms. So we have to sometimes choose ways where is the more work throughout the project and at the end of it we can enjoy program with whom we can work easier and our work is more effective, simpler and bring some good results.
As we can notice that all tools are used very often nowadays in business where we need to have certain information, we can say while the client is waiting, very fast plus must be accurate and reliable.
Thursday, November 19, 2009
ETL
ETL - extract, transform, load
ETL architecture is used in order to manage and optimize the data together with the process of its transfer which, as a result, leads to simplification and standardization of data storage in the data warehouse.
Purpose of the ETL tool is to create a universal framework for different processes of transforming data into target destination.
The ETL letters are the abbreviations for extraction, transformation and loading of data, which means that the ETL tools extract data from a source, transform it into new formats and load it into target data warehouse.
System’s architects created a scheme which eliminates repeatedly appearing actions. Loading the data into warehouse requires two types of actions: one, in which data is transformed from the identifiable source system according to source-specific rules – this action results in transforming data into a standardized format. Each action has to be done by individual method for every data storage system.
The other, conformed type of ETL architecture means creating universal way of data entity, regardless of the source and purpose; it allows to follow reusable rules which are applied in business and work in different conditions with variety of data. Both systems has pros and cons. At the beginning the first one requires fewer actions to be performed in order to work smoothly and seems easier. However, looking at the overall life cycle of the data warehouse, in the long run using the conformed ETL is more efficient.
It is because data warehouse is being developed and has been growing continually when the new data is added to the system, so when the traditional architecture is taken into consideration, though it may seem easier to make it, in the long run it turns out that the same or similar actions are repeated, needlessly lengthening the development time and increasing its complexity. Implementing the universal for one entity ELT architecture requires more work than doing the same for a particular task, however in the long run it brings more benefits as the data is stored and the scheme may be used further.
Thus the most obvious advantage of using ETL is that the storage data is easily reused which results in improving its quality. Moreover it’s easier to make and use, because in fact it is less complex than storing data in different traditional systems. Additional advantage is also that it’s less difficult to add and acquire new source systems. One should also bear in mind that the conform ETL architecture is not to be applied everywhere, it should be proceeded by detailed analysis, which would ensure the benefits that would possibly result from following the conform architecture. Otherwise it may not be so beneficial.
This data architecture enables direct access to data in operational systems, which is very important and in fact shorten the time considerably, which otherwise would be spent on data implementation and simplify the data usage.
Data Integration
Do You know what exactly it is?
Data Integration - it is almost as simple as it sounds and it is pretty much intangible goal which software industry is still running for. In this article I will try to explain why data integration is so huge bite for this companies.
Data integration is like big tree where every branch connected and cooperating with the other. All the data living in this “branch” affords to the users unification view of this data. The process of data integration occurring in every part of our live. For example: when two similar company have to merge their databases or at education to merge different results. However, in our contention world the ITC infrastructure played major role in a learning environment which can drawn in the right students, sponsoring and research project. The most important think is the role to data integration. It means that two system can be integrate when:
Both look the same;
Act seems to be the same;
and both of this system produce and consume the same data.
In sum, data integration is the ability to control, make, share and consume the same data.
Example
Some web application have information about the cities such a weather, demographics, cinemas, hotels, tourism, etc. Typically, the information have to be in one type of database with a single schema. But it is not as simple as seems to be. Single enterprise would find the information which can be difficult to collect. Even if resources gather volumes of information about weather or tourism, it would probable duplicate the data.
The originators of this idea try to develop the best model of virtual schema that the users want the most. This wrapper escorted to resources or application to improve on compatibility of the crime database weather websites. This adapters for each data source transform the query results into a straight form. Finally, the database connect the performance to show unified view and answer for the users question.