``Using data mining, businesses and organizations have discovered hidden knowledge in large databases that yield new customers, greater efficiency, and enormous profits.'' Is this fact or advertising hype? A bit of both? This one day workshop will introduce and critically examine data mining techniques using real case studies, such as the data mining case study from the 2000 SSC meeting.
Data mining problems are characterized by large amounts of data (both in the number of observations and the number of variables recorded on each observation), application of flexible modelling techniques, and intensive computation. Research fields involved in data mining include statistics, machine learning (artificial intelligence), and databases. From a statistical perspective, many data mining tools could be described as flexible models and methods for exploratory data analysis.
For example, suppose a direct marketing company has available a database of potential customers and information on whether they responded to a direct mailing campaign. Data mining techniques would answer such questions as ``What customers are most likely to respond to a mailing?'', ``Are there groups (or segments) of customers with similar characteristics or behaviour?'', and ``Are there interesting relationships between customer characteristics?'' While some techniques such as logistic regression or hierarchical clustering will probably be familiar to many statisticians, others like neural networks, decision trees, boosting, and interactive dynamic graphics will also be discussed.
By being vendor-neutral, the presentation will separate the hype from the facts. Data mining will be related to to recent and possible future research in statistics and computer science.
His research
interests include tree models, variable selection in linear models,
data mining,
Bayesian methods and industrial statistics. He has taught several introductory
courses on data mining
to people in industry through the Institute
for the Improvement in Quality and Productivity and also
teaches a graduate
level course on topics related to data mining and the computational
exploration of
data.