Diapositiva 1

20
Data Wharehousing OLAP Data Wharehousing OLAP Data Mining Data Mining S. Costantini S. Costantini Università degli Studi di L’Aquila Università degli Studi di L’Aquila [email protected] [email protected]

Transcript of Diapositiva 1

Page 1: Diapositiva 1

Data Wharehousing OLAP Data Wharehousing OLAP Data MiningData Mining

S. CostantiniS. CostantiniUniversità degli Studi di L’AquilaUniversità degli Studi di L’Aquila

[email protected]@di.univaq.it

Page 2: Diapositiva 1

S. Costantini / Data Wharehousing

2

Ringraziamenti (Acknowledgment)

• Part of this material is taken from: Database Systems: The Complete Book, by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom, edited by Prentice-Hall.

• URL: http://www-db.stanford.edu/~ullman/dscb.html

Page 3: Diapositiva 1

S. Costantini / Data Wharehousing

3

Cos’è in sostanza un Data Wharehouse?

• E’ una vista materializzata• Aggiornata a intervalli stabiliti (a

seconda dell’applicazione)• E’ un cosiddetto “sistema di

integrazione di dati” perché può contenere dati provenienti da vari database (detti “sorgenti”)

Page 4: Diapositiva 1

S. Costantini / Data Wharehousing

4

Perché i Data Warehouse?

• Perché le query di analisi statistica ed esame dei dati per estrarne varie informazioni (dette query “OLAP”, vedi seguito) sono pesanti e diminuiscono troppo la performance del sistema. Però non necessitano della versione più aggiornata dei dati.

Page 5: Diapositiva 1

S. Costantini / Data Wharehousing

5

Perché i Data Wharehouse

• Allora conviene separare le query usuali dalle query OLAP, creando per queste ultime un Data Wharehouse

• Per le query OLAP il modello relazionale non è ottimale, quindi nel creare un Data Wharehouse il modello dei dati viene modificato.

Page 6: Diapositiva 1

S. Costantini / Data Wharehousing

6

Observation• Traditional database systems are tuned

to many, small, simple queries.• Some new applications use fewer, more

time-consuming, complex queries.• New architectures have been

developed to handle complex “analytic” queries efficiently.

Page 7: Diapositiva 1

S. Costantini / Data Wharehousing

7

The Data Warehouse• The most common form of data

integration.– Copy sources into a single DB (warehouse)

and try to keep it up-to-date.– Usual method: periodic reconstruction of

the warehouse, perhaps overnight.– Frequently essential for analytic queries.

Page 8: Diapositiva 1

S. Costantini / Data Wharehousing

8

OLTP• Most database operations involve On-

Line Transaction Processing (OTLP).– Short, simple, frequent queries and/or

modifications, each involving a small number of tuples.

– Examples: Answering queries from a Web interface, sales at cash registers, selling airline tickets.

Page 9: Diapositiva 1

S. Costantini / Data Wharehousing

9

OLAP• Of increasing importance are On-Line

Application Processing (OLAP) queries.– Few, but complex queries --- may run for

hours.– Queries do not depend on having an

absolutely up-to-date database.

Page 10: Diapositiva 1

S. Costantini / Data Wharehousing

10

OLAP Examples1. Amazon analyzes purchases by its

customers to come up with an individual screen with products of likely interest to the customer.

2. Analysts at Wal-Mart look for items with increasing sales in some region.

Page 11: Diapositiva 1

S. Costantini / Data Wharehousing

11

Data Warehouses• Doing OLTP and OLAP in the same database

system is often impractical– Different performance requirements– Analysis queries require data from many sources

• Solution: Build a “data warehouse”– Copy data from various OLTP systems– Optimize data organization, system tuning for OLAP– Transactions aren’t slowed by big analysis queries– Periodically refresh the data in the warehouse

Page 12: Diapositiva 1

S. Costantini / Data Wharehousing

12

Common Architecture• Relational Databases handle OLTP.• Local databases copied to a central

warehouse overnight.• Analysts use the warehouse for OLAP.

Page 13: Diapositiva 1

S. Costantini / Data Wharehousing

13

Definition of data warehousing

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.

Page 14: Diapositiva 1

S. Costantini / Data Wharehousing

14

Loading the Data Warehouse

Source Systems

Data Staging Area

Data Warehouse(OLTP)

Data is periodically extracted

Data is cleansed and transformed

Users query the data warehouse

Page 15: Diapositiva 1

S. Costantini / Data Wharehousing

15

Data Mining• Data mining is a popular term for

queries that summarize big data sets in useful ways.

• Examples:1. Clustering all Web pages by topic.2. Finding characteristics of fraudulent

credit-card use.

Page 16: Diapositiva 1

S. Costantini / Data Wharehousing

16

Data Warehouse

Customers

Etc…

Vendors Etc…

Orders

DataWarehouse

Enterprise“Database”

Transactions

Copied, organizedsummarized

Data Mining

Data Miners:• “Farmers” – they know• “Explorers” - unpredictable

Page 17: Diapositiva 1

S. Costantini / Data Wharehousing

17

Market-Basket Data• An important form of mining from

relational data involves market baskets = sets of “items” that are purchased together as a customer leaves a store.

• Summary of basket data is frequent itemsets = sets of items that often appear together in baskets.

Page 18: Diapositiva 1

S. Costantini / Data Wharehousing

18

Data Mining Flavors• Directed – Attempts to explain or

categorize some particular target field such as income or response.

• Undirected – Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes.

Page 19: Diapositiva 1

S. Costantini / Data Wharehousing

19

Data Mining Examples in Enterprises

• Government– Track down criminals (Police also)– Treasury Dept – suspicious int’l funds

transfer• Phone companies• Supermarkets & Superstores • Mail-Order, On-Line Order

Page 20: Diapositiva 1

S. Costantini / Data Wharehousing

20

Data Mining Examples in Enterprises

• Financial Institutions • Insurance Companies • Web sites• Many others…