Top 9 Best Practices for Data Warehouse Development
When planning for a modern cloud data warehouse development project, having some form or outline around understanding the business and IT needs and pain points will be key to the ultimate success of your venture. Being able to tell the right story will give the business the structure it needs to be successful in data warehousing efforts.
Here are 9 things you should know about staying current in data warehouse development, but won’t necessarily hear from your current IT staff and consultants.
1) Have a data model. Getting a common understanding of what information is important to the business will be vital to the success of the data warehouse. Sometimes the businesses themselves don’t know their own data needs or landscape. They will be using different words for the same data sets, the same words for different data sets, etc. Modeling the business’ information can be a real eye opener for all parties concerned.
2) Have a data flow diagram. Knowing where all the business’ data repositories are and how the data travels within the company in a diagram format allows everyone to determine the best steps for moving forward. You can’t get where you want to be if you don’t know where you are.
3) Build a source agnostic integration layer. The integration layers’ sole purpose is to pull together information from multiple sources. This is generally done to allow better business reporting. Unless the company has a custom application developed with a business-aligned data model on the back end, choosing a 3rd party source to align to defeats that purpose. Integration MUST align with the business model.
4) Adopt a recognized data warehouse architecture standard.(i.e. 3NF, star schema [dimensional], Data Vault). Regardless of the actual approach chosen, picking a standard and sticking with it will enable efficiency within a data warehouse development approach. Supporting a singular methodology for support and troubleshooting allows new staff to join the team and ramp-up faster.
5) Consider adopting an agile data warehouse methodology. Data warehouses no longer have to be large, monolithic, multi quarter / year efforts. With proper planning aligning to a single integration layer, data warehouse projects can be broken down into smaller, faster deliverable pieces that return value much more quickly. This also allows you to prioritize the warehouse as the business needs change.
6) Favor ELT over ETL. Moving corporate data, as is, to a single platform should be job #1. Then legacy systems can be bypassed and retired along the way, helping the business realize savings faster. Once data is colocated, it is much more efficient to let the power of a single cloud engine do integrations and transformations (i.e. fewer moving parts, push down optimizations, etc.).
7) Adopt a data warehouse automation tool. Automation allows you to leverage your IT resources more fully, iterate faster through projects and enforce coding standards (i.e. Wherescape, AnalytixDS, Ajilius, homespun, etc.) for easier support and ramp-up.
8) Get your staff trained in modern approaches. Giving your team knowledge of the advantages of newer technologies and approaches lets your IT staff become more self-sufficient and effective. This will also open up more understanding and options in hiring and contracting with the best resources that the IT industry has to offer.
9) Pick a cloud-based data warehouse environment. For the least initial investment, the storage and compute elasticity coupled with the pay-as-you-go nature of cloud-based services provide the most flexible data warehousing solution on the market.