Education & Outreach – CAGEF Centre for the Analysis of Genome Evolution & Function

CAGEF contributes to undergraduate and graduate student experiences at the University of Toronto through active involvement in courses and collaborative programs.

COURSE DESCRIPTIONS

Below you’ll find a description of our current course offerings.

INTRODUCTION TO R

Lecturer: Dr. C. Mok

This is a beginner’s introduction to R and the Jupyter Notebook environment for individuals with no prior experience or background. Individuals who complete the course will be able to:

Work with the Jupyter Notebook environment and navigate the R programming language.
Understand data structures and data types.
Import data into R and manipulate in a tabular form.
Transform ‘messy’ datasets into ‘tidy’ datasets.
Make exploratory plots as well as publication-quality graphics.
Use string searching and manipulation to clean data.
Perform basic statistical tests and run a regression model.
Use flow control and build branching code.

Each class will consist of a short introductory section followed by ‘code-along’ hands-on learning that will gradually build up the lecture’s topic(s). At the conclusion of this course, students will have the skills to import, curate and format their own data, as well as perform exploratory data analysis to produce statistical models and plots of their results.

INTRODUCTION TO PYTHON

Lecturer: Dr. C. Mok

This is a beginner’s introduction to Python and the Jupyter Notebook environment for data science applications. The course is intended for students with no computer science background who want to develop the skills needed to analyze their own data. Individuals who complete this course will be able to:

Perform data analysis in Python using the Jupyter Notebook environment.
Understand Python data structures and data types.
Manipulate Python objects such as lists, data frames, and dictionaries.
Import data into Python and transform ‘messy’ datasets into ‘tidy’ datasets.
Use flow control to develop branching code.
Use regular expression and string manipulation to explore and clean data.
Make exploratory plots.

FUNDAMENTALS OF GENOMIC DATA SCIENCE

Lecturer: Dr. C. Mok

The rise of next-generation genomics has changed the way we think about, study, and employ genetic data, enabling applications that were, until recently, merely the stuff of science fiction. These advances have dramatically increased both the size and scope of biological datasets, and consequently, increased the need for basic computational literacy for nearly all biologists.

This course is designed to serve as an introduction to genomic data science for students who do not have a background in bioinformatics. Students in the course will learn to perform several basic genomic data analyses using Galaxy, an open, web-based platform that incorporates multiple bioinformatics tools into a friendly Graphical User Interface (GUI). Students will then learn to scale up these genomic analyses using the Unix command line to tackle larger and more complex datasets. During the course, students will learn how to:

Use Galaxy and command line tools to process and manipulate data
Use the Integrative Genomics Viewer to visualize genomes
Work in a Unix terminal
Install bioinformatics software
Connect and work on remote servers
Understand common genomics file formats
Perform de novo genome assembly, reference-based genome assembly, genome annotation, variant calling, and RNA-seq data analysis.

The course will take advantage of online resources for background material, while spending class time analyzing real data sets. Students are expected to have a basic understanding of genomics and molecular biology, but no prior computational knowledge is required.

Each class will consist of a short introductory section followed by ‘code-along’ hands-on learning that will gradually build up the lecture’s topic(s). At the conclusion of this course, students will have the skills to source, import, and quality-check large datasets while using available software tools to perform pipeline analyses to summarize their results.

ADVANCED DATA VISUALIZATION IN R

Lecturer: Dr. C. Mok

This is an intermediate to advanced level introduction to R and the packages associated with visualizing large or complex data sets. Participants are strongly encouraged to have prior experience in R (i.e., Introduction to R, CSB1020). Individuals who complete the course will be able to manipulate and prepare large datasets to produce publication-quality graphics. The goal of this course is to introduce the proper use, interpretation, and production of simple, popular and complex data visualizations. Topics will include:

A deep dive into building relatable figures with the popular ggplot package.
Analysis and visualization of large datasets from differential expression experiments.
Popular visualization methods and packages for genes and genome analysis.

Each class will consist of a short introductory section followed by ‘code-along’ hands-on learning that will gradually build up the lecture’s topic(s). A homework assessment will be assigned after each class to reinforce the skills learned. At the conclusion of the course, students will have the skills to interpret and produce high-quality visualizations that can properly convey their results and conclusions to their peers and the public.

METHODS IN GENOMICS AND PROTEOMICS

Lecturer: Dr. P. Wang

This is an intensive and rigorous laboratory course that will teach students how to produce and analyze data that are central to the fields of genomics and proteomics. The course is divided into three modules, the first of which focuses on genomics, the second on transcriptomics, and the third on proteomics. Each module begins with at least two wet labs where students generate data and end with computer labs where students analyze the data. In this way students will learn how to conduct an experiment from beginning to end. Techniques taught include:

DNA and RNA extraction.
Shotgun library construction.
PCR and DNA sequencing.
Expression profiling using microarrays, 2D-gel proteome analysis, and mass spectrometry.
Associated bioinformatics analyses such as sequence analysis and assembly, and statistical analysis of microarray and mass spectrometry data.

This is an advanced laboratory and computer-based course and assumes a strong background in molecular genetics and some prior laboratory experience. It is most appropriate for students wishing to pursue careers involving biological research. This course is open to both graduate and undergraduate students.

Online Education

BIOLOGY MEETS PROGRAMMING: BIOINFORMATICS 101 FOR NGS RESEARCHERS

It’s time to integrate next generation genomics techniques into your lab experiments, but where do you start? Watch this recorded webinar to learn from CAGEF director Dr. David Guttman and Dr. Alberto Riva from the University of Florida.

Learn about the bioinformatics approaches used for the variety of tasks involved in NGS
Gain a better understanding of the pipelines used to analyze NGS data
Acquire new tools for more robust design of NGS experiments

Presented by AAAS

Sponsorships & Outreach

CAGEF actively sponsors a range of student activities and external organizations. Outreach programs focus on increasing CAGEF brand awareness within the greater scientific community by maintaining a presence both within the University of Toronto and at relevant national and international conferences.

We have sponsored projects for international Genetically Engineered Machines (iGEM) from 2009-2025

CAGEF-sponsored “Encapsulator” iGEM 2009 project won a bronze medal
CAGEF-sponsored “MystiPhage” iGEM 2025 project won a gold medal, Best Therapeutics Project

We have been a sponsor for TorBUG (Toronto Bioinformatics User Group) from 2012-2020 (every year that it has run!)

CAGEF has also been a long time supporter and sponsor of the Collaborative Graduate Program in Genome Biology and Bioinformatics

We participated in Science Rendezvous 2018 with a fun and educational child-friendly booth introducing basic concepts in microbiology.

Conferences

Over the past 15 years CAGEF has attended a variety of conferences as an exhibitor, including:

SynBio

Plant and Animal Genome

Canadian Society of Microbiology

American Society of Plant Biology

International Conference on Arabidopsis Research

International Plant Growth Substances Association

Great Lakes Bioinformatics and Canadian Computational Biology Conference

Canadian Society of Plant Biology Eastern Regional Meeting

Canadian Plant Genomics Workshop

International Conference on Systems Biology

Molecular Biology of Plant Pathogens

Of note, CAGEF partnered with Illumina on 2 separate occasions to offer a Research Accelerator Award of up to $10,000 of Illumina sequencing at CAGEF