# Logistics

Instructor
Kevin Dunn, kevin.dunn@mcmaster.ca (no office on campus)
Class time and location
• Friday afternoons from 14:00 to 17:00 (some classes will run from 16:00 to 19:00)
• We will meet in JHE342.
• First class starts on 9 September and will be a full 3-hour class.

Official description
This course is based around multivariate latent variable models which assume low dimensional latent variable structures for the data. Multivariate statistical methods including Principal Component Analysis (PCA), and Projection to Latent Structures (PLS) are used for the efficient extraction of information from large databases, typically collected by on-line process computers. These models are used for the analysis of process problems, for on-line process monitoring, and for process improvement.
Prerequisites

A `basic course in statistics is a definite requirement. You must be comfortable with univariate distributions, data visualization, linear regression and process monitoring. However, these topics are covered again in this course, focusing on their practical application to engineering problems. An excellent understanding of matrix methods is required.

Programming skills of any type (MATLAB, Python, R) is extremely desirable, as we will be manipulating (largish) datasets to extract information.

Course materials

The course website will be permanently available: http://latent.connectmv.com

Course materials, readings, assignments, and datasets will be available from the website. Course announcements will be posted to the main page of the website - students are expected to check the website several times per week.

Required textbook

There is no official course textbook. We will use the instructor's slides for the course. You will supplement these slides with notes from the class. The definitive reference sources will be a variety of journal articles that are listed on the literature website. The instructor will point out which readings correspond to each section of the course.

Other reference texts are generally available in Thode Library.

Course software
Software is obviously critical when dealing with data analysis. However this course will focus on the methods and in particular understanding exactly what the methods are doing and how to interpret the results. This means you can pretty much use any software package in the future. We will use a variety of software packages in the course, but the main one will be ProSensus Multivariate which has a 1 year academic license.
Course outline

The course is divided into several sections, taught over 12 weeks. The exact outline is still to be announced, but these topics will be covered.

1. Justification for using multivariate methods
2. Large datasets and different ways to visualize them: sparklines, scatterplot matrices, histograms, box plots
3. PCA: preprocessing, conceptual, geometric and algebraic interpretations
4. Interpreting scores, loadings, SPE, $$T^2$$ and contribution plots
5. PCA: different ways to calculate the PCA model
6. PCA: explaining variance and when to stop: scree plot, $$Q^2$$, and other methods
7. PCA applications: learning from data, troubleshooting, process improvement (e.g. early release of a manufactured product); incorporating first-principles models
8. Regression modelling: OLS, MLR, PCR, introducing PLS
9. PLS: calculating the model, interpreting weights, difference between loadings and weights; cautions regarding empirical models
10. PLS applications: Monitoring with a PCA and PLS model; classification: PCA, PLS-DA, PLS
11. Multiblock data sets (models from many data sources): process understanding, process monitoring
12. Time series analysis (process dynamics via lagging) and batch data (how unfolding is just another form of lagging): feature extraction, alignment, missing data imputation
13. Kernel methods for extremely large data sets; model updating with new data (adaptive modelling)
14. The latent variable space: DOE's in the latent space, QSAR, principle properties
15. Process control, product design and optimization in the latent variable space

To assess your understanding of the course materials, the grading for the course will be:

Component Fraction Notes
Assignments 20% Expect around 4 to 6 assignments, to be completed individually
Class participation 10% Discussion of assignments and assigned readings, questions and overall participation is required from all students.
Final project 70% An in-class presentation and written project report.

• Readings will be assigned each week, and then discussed in class the following week.
• The final grades will be converted to letter grades using the Registrar's recommended procedure.
• Adjustment to the final grades may be done at the discretion of the instructor.

# Important notes

Class participation
Please bring a laptop to every class. Various software packages will be demonstrated during class time, and it will be to your advantage to following along with the instructor. The instructor will present the solutions to the assignments in the software (written solutions are not provided), so again it will be to your advantage to follow along with this.
Out-of-class access
Since the course instructor does not have an office on campus, office hours will be after the class time.
Disclaimer
The above outline may be modified, as circumstances change.