Course description
This course takes a hands-on approach towards introducing the theoretical and practical aspects required to undertake rigorous and valid data analysis of multivariate biological datasets using the R environment.
In recent years, data science has become one of the most popular topics across all research fields, biology included, to the point that it is almost impossible to conduct any research project without a component of sound statistical analysis. For this reason, we specifically tailored this introductory data science course for biologists. The course covers the principal concepts and analytical techniques used in data science with applications in biology through a series of practical sessions.
You will develop the core skills required to perform exploratory data analysis (EDA) and robust statistical inference for assessing relationships and evaluating research questions. The course aims to bring participants up to an advanced level in using R tools for data science and they will have the opportunity to gain significant hands-on experience. This course is specifically designed for delegates from a life science background where only basic or no prior programming skills are required.
Upcoming start dates
Suitability - Who should attend?
This training is suitable for postgraduate students and professionals in Agrifood and life sciences who want to learn best practices for experimental design and data collection, inspection and manipulation and visualisation of biological datasets, statistical analysis using the R environment and interpretation of the results.
Outcome / Qualification etc.
On successful completion of this course you should be able to:
- Use R syntax and ecosystem to perform data analysis tasks.
- Critically assess the basic principles of different statistical techniques and be able to implement them programmatically,
- Effectively integrate and devise statistical methods into experimental design protocols,
- Conduct initial exploratory data analysis and manipulate the data to meet required specifications using different data pre-processing techniques,
- Apply exploratory data analysis using unsupervised multivariate analysis methods.
Training Course Content
Core content
- Introduction to programming in R: data structures in R, functions, control statements and loops.
- Data visualisation and EDA using ggplot2 and interactive visualisation libraries.
- Data pre-treatment and quality control: treating missing values, outliers, data smoothing, data transformation and scaling, looking at data distributions, etc.
- Correlation and linear regression.
- Descriptive statistics and inferential statistics including parametric and non-parametric tests, such as t-test, and analysis of variance (one-way, two-way and mixed ANOVA).
- Unsupervised multivariate analysis: implementing methods such as principal components analysis (PCA), hierarchical cluster analysis (HCA) and k-means to uncover inherent patterns in the dataset to reveal naturally occurring clusters.
Course delivery details
The total length of the course is 2.5 days and consists of several lectures and practical workshops with comprehensive tutorials being delivered throughout the course; covering introduction to R programming and Data Science. The course will take place in a computer lab and delegates will be supported by the tutors and teaching assistants at all times.
Request info
Cranfield University
Cranfield is a specialist postgraduate university that is a global leader for education and transformational research in technology and management. We have many world-class, large-scale facilities, including our own global research airport, which offers a unique environment for transformational education...