Difference Between Data Mining and Data Warehousing

Data Mining vs Data Warehousing

The process of data mining refers to a branch of computer science that deals with the extraction of patterns from large data sets. These sets are then combined using statistical methods and from artificial intelligence. Data mining in modern business is responsible for the transformation of raw data into sources of artificial intelligence. The data is manipulated and is thus able to give reliable decisions that can be used in decision making. This gives businesses an advantage over competition in that they have data sets that can be relied upon to provide intelligence. Data mining is also used by organizations in profiling practices including marketing, surveillance scientific discovery and detection of fraud.
There are other common terms that might be associated with data mining, such as data fishing, data dredging or even data snooping. All these point toward different variations of data mining which are employed in sampling small data sets that may be too small to produce statistical inferences. These are, however, crucial in outlining the validity of data in use and can be used in creating a hypothesis when looking forward to reach a given data population.

A data warehouse, on the other hand, is a term that describes a system in an organization that is used in the collection of data. This data collected by a data warehouse is what is provided by the transactional systems such as invoice, purchase records or even loan records. The data records are taken from the individual points of creation and are brought together under one roof that is the data warehouse. This data is then reported and the reporting is done in an aggregated manner to assist users of the business information in making valid decisions. The data warehouse to work effectively requires the data source, a database and a reporting tool.

It can therefore be said that a data warehouse is a database that is used for the specific purposes of reporting on data that has been analyzed. This data comes from the different systems that have been put up for reporting.

To accomplish its function, the data warehouse maintains functions in three distinct layers. These include staging, integration and access. In the staging process, raw data is stored by developers for the sole purpose of analysis and support. The integration layer is used in integration of data and to have an abstraction level from users of the data. Lastly, the access layer is important in getting data out of different users of data.
Both data mining and data warehousing can be referred to as tools that are used for the collection of business intelligence. The main difference of the two is the how the business intelligence is collected. It can therefore be said that data that has been well warehoused is quite easy to mine and thus make use of. The data warehouse thus is responsible for making the work of the data mining easier in housing all the relevant data that needs to be mined at a central location, rather than when data mining has to keep seeking for data in different locations. This helps economize on the time spent on data mining and the resources used in mining.

Summary

Data mining is the process of extracting data from large data sets.
Data warehousing is the process of pooling all relevant data together.
Both data mining and data warehousing are business intelligence collection tools.
Data mining is specific in data collection.
Data warehousing is a tool to save time and improve efficiency by bringing data from different location from different areas of the organization together.
Data warehouse has three layers, namely staging, integration and access.