Course Descriptions

Analytical Methods in Biology (694:230)

This course may also be used to fulfill the elective requirements of the Biological Sciences major.


Offered:  Spring  T/H4 (Tues/Thurs 1:40 - 3:00 PM) Nelson Hall A237

Credits:  3

Prerequisites:  General Biology (01:119:116)  and  Calculus I (01:640:135 or 01:640:151)

If the twentieth century was the century of physics, the twenty-first is likely to be the century of biology. We have begun to tease apart the blueprint of life using sequencing technologies, allowing us to study how DNA, RNA and proteins regulate body tissues and functions across all three domains of life. Exciting discoveries are being made by applying data mining methods (popularly called “deep learning”) on large genetic and genomic datasets. Such studies have opened  up novel fields of research for biologists, clinicians, physicists, computer scientists, mathematicians, chemists and engineers, allowing them to collaborate and make new and exciting discoveries. This course is intended for students who are interested in learning the techniques necessary to work in these emerging areas of research. The goal is to introduce the students to the ideas in the field and provide them with the methods and tools that they need to analyze both experimental (small) and high throughput (big) data. 
   
We will use Matlab as a computational tool. Students should download and install Matlab from the following website: https://software.rutgers.edu/, using your NetID and password. Once you log in, type “Matlab” into the search tab at the top right to get to the software link. You may need to create a MathWorks account to get access. 

Summary of Course: The class will begin by teaching the students some of the mathematical methods used in biology for the analysis of experimental data The topics covered will include Probability Theory, the Theory of Distributions and Moments, the Central Limit Theorem, Linear and non-Linear Regression, Parametric and Non-Parametric Tests of Significance and Analysis of Variance (ANOVA).

After this, we will go over some basic biology background and then study the mathematical underpinnings of the modern theory of Genetics, which is necessary to make sense of high throughput sequencing data. This will include a study of Mutations and Drift, Hardy Weinberg Theory, and the mathematical theory of Recombination and Selection in the evolution of organisms. 

Following this, we will study methods useful in analysis of large datasets such as methods for Sequence Alignment, Phylogenetic Analysis and Clustering Methods: k-means clustering, Principle Component Analysis (PCA), t-SNE and non-negative matrix factorization. We will then apply these techniques to the analysis of high throughput RNA and DNA sequencing data from a variety of sources. Given the ongoing Covid-19 pandemic, we will then discuss zoonotic diseases such as FLU, HIV, SARS, MERS, develop the SIR Model of Pandemics and apply it to Covid-19 data on cases and deaths to understand the course and progression of this pandemic since its beginnings in the winter of 2019.  

Finally, if time permits, we will study more advanced techniques such as Neural Networks, Monte Carlo Simulations & Evolutionary Game Theory. 

All the methods and ideas presented will be developed using concrete examples of how they apply to actual biological phenomena and will make extensive use of the programming language Matlab as a computational tool. Students will be taught how to solve many problems by writing Matlab code.  

Text: Lecture notes which cover all course content will be provided a week in advance of each class. The textbook for the statistical analysis portion of the course will be “Mathematical Statistics and Data Analysis by John A Rice”. A pdf copy of this book will be made available to students before the first day of classes. Some topics will have assigned reading of relevant literature which will be provided as pdf files.

The Programming Language Matlab will be used as a computational tool. Students will be taught how to solve many problems by writing Matlab code.  

Text: Lecture notes which cover all course content will be provided a week in advance of each class. The textbook for the statistical analysis portion of the course will be “Mathematical Statistics and Data Analysis by John A Rice”. A pdf copy of this book will be made available to students before the first day of classes. Some topics will have assigned reading of relevant literature which will be provided as pdf files.

Plans for Remote Instruction:

    • Detailed notes for the two lectures in each week will be provided to all students on the Friday before that week. The students will be expected to have read the lecture notes before coming to class.   
    • Each 80 minute class will start with a 45 minute formal presentation by the instructor on the content of the lecture notes.  
    • For the remaining 35 minutes, the students will do a worksheet covering the material presented. This worksheet will be made available just before each class and should be easy to do if the students have read the lecture notes and attended the presentation earlier in the class. The instructor will be happy to answer questions during the time the students work on the worksheet during class. The purpose of the worksheets is to help the student absorb the key features of the lecture. Solutions to the worksheets must be e-mailed to the instructor within 1/2 hour of the end of the class. These worksheet solutions will count for 30% of the grade.
    • Class sessions will be recorded on video and made available to the students. 
    • Home-work on the material covered during the week will be posted on Friday and will be due one week later. The students may be paired into dynamic work-groups, which will change each week. Each group will submit one solution for each homework. They will also grade the other member of the group for the degree of participation in the working group. The homework will count as 30% of the grade.
    • In addition to these instruction sessions, there will be 2 x 2 hour periods when the instructor will have online office hours where students can call in and ask questions.
    • Additional individual contact times to address other student concerns will be by arrangement.
    • Students will be required to write a term paper on a topic they will choose from a list provided by the instructor after the class has been in session for 6 weeks. This will count as 10% of the grade.
    • There will be one mid-term and one final – both will be multiple-choice with a strict time limit. Each will be 15% of the grade.

Week 1-4

  • Introduction to Probability and Bayes Theorem, Random Variables; Expected Value and Variance
  • Distribution Theory - Binomial, Poisson, Bernoulli & Geometric Distributions
  • Matlab Tutorial - Demonstration of Central Limit Theorem.
  • Parametric Tests of Significance based on the Central Limit Theorem (t-test, F-test, ANOVA)
  • Non-parametric tests of significance

Week 5-9

  • Bio Intro – the Genetic Code, Mutation and Drift, Hardy Weinberg Theory
  • The role of recombination and selection in the evolution of organisms.
  • Introduction to Viruses - FLU, HIV, SARS, MERS and Zoonotic diseases
  • Analytical Modeling – The SIR Model of Pandemics - Modeling Covid-19 data
  • Monte Carlo Simulations
  • Mid-term and assignment of term paper topics

Week 10-14

  • Sequence Alignment and Phylogenetics
  • Clustering Methods: k-means clustering, PCA, t-SNE and non-negative matrix factorization methods.
  • Analysis of Genetic and Genomic cancer data using the techniques learned.
  • Neural Networks
  • Evolutionary Game Theory
  • Collection of Term Papers and Final

Course URL: Canvas

Course satisfies Learning Goals

MBB Departmental Learning Goals: 1, 2,and 3

Course Learning Goals:  The overall goal of this course is to give the students the mathematical tools and programming skills necessary to analyze and interpret biological and biomedical data correctly and with confidence. 

Exams, Assignments, and Grading Policy

    • Worksheets: 30% of grade
    • Homework: 30% of grade
    • Term paper: 10% of grade
    • Midterm: 15% of grade
    • Final: 15% of grade

Course Materials

1. Notes for each lecture will be handed out at the end of the class. If you attend all classes and pay attention, these should be sufficient.
2. Text book for statistical methods: Mathematical Statistics and Data Analysis, Second edition: John A. Rice. The book is available as a DjVu document and will be emailed to all students. You will need to install DjVu Viewer on your computer to see its contents.
3. Text book for Matlab (recommended but not required): Computational Statistics Handbook with MATLAB, Second Edition (Chapman & Hall/CRC Computer Science & Data Analysis) by Wendy L. Martinez and Angel R. Martinez (Hardcover - Dec 20, 2007).
4. Software: Please make sure you have MATLAB installed on your computer, including the MATLAB Statistics Toolbox. 

Course Closed?  Enrollment is limited to 50 students.

 Faculty:  Course Coordinator: This email address is being protected from spambots. You need JavaScript enabled to view it. 848-391-7508. Office hours: by arrangement 


 

** All information is subject to change at the discretion of the course coordinator.

 

Academic Integrity:

Students are expected to maintain the highest level of academic integrity.  You should be familiar with the university policy on academic integrity: http://academicintegrity.rutgers.edu/academic-integrity-policy/  Violations will be reported and enforced according to this policy.

Use of external sources to obtain solutions to homework assignments or exams is cheating and a violation of the University Academic Integrity policy. Cheating in the course may result in penalties ranging from a zero on an assignment to an F for the course, or expulsion from the University.  Posting of homework assignments, exams, recorded lectures, or other lecture materials to external sites without the permission of the instructor is a violation of copyright and constitutes a facilitation of dishonesty, which may result in the same penalties as explicit cheating.

Not only does the use of such sites violate the University’s policy on Academic Integrity, using such sites interferes with your achievement of the learning you are paying tuition for. Assignments, quizzes, and exams are given not simply to assign grades, but to promote the active learning that occurs through completing assignments on your own.  Getting the right answer is much less important than learning how to get the right answer.  This learning is critical to your success in subsequent courses and your careers.