• The main focus of this course is the application of R programming to the analysis of genetic data, particularly “big data” sets with multiple measurements. The primary data sets considered will contain RNA-seq and/or other expression data for multiple/all genes in a given set of individuals. This course is for junior or senior students who are thinking of careers at the intersection of life sciences, statistics, and/or computer science, particularly students who are majoring in Genetics.  The course fulfills the laboratory requirement for the Genetics major. Students will learn how to acquire such data, format it for R, plot the data, and perform statistical analyses. In addition, students will learn how to simulate data under different hypotheses, and how to perform power and sample size calculations for different statistical methods applied to real or simulated data. Each class consists of a mixture of lecture and computer-based demos and/or exercises, as well as time for students to work on assignments. Guest investigators will frequently make short presentations (in person or by skype) to provide illustrations of how programming and informatics is critical for their research. The course provides the introductory skills needed to conduct basic computational research in the life sciences, including many aspects of computer programming and data analysis.
  • Semester Offered: Spring, Fall
  • Credits: 3

Pre-requisites

Students must have previously completed Genetic Analysis I (01:447:384) or Genetics (01:447:380).

Course Restrictions

This course is limited to Genetics majors. Other students can be added by special permission number pending computer space availability.

Course Description

The main focus of this course is the application of R programming to the analysis of genetic data, particularly “big data” sets with multiple measurements. The primary data sets considered will contain RNA-seq and/or other expression data for multiple/all genes in a given set of individuals. This course is for junior or senior students who are thinking of careers at the intersection of life sciences, statistics, and/or computer science, particularly students who are majoring in Genetics.  The course fulfills the laboratory requirement for the Genetics major.

Students will learn how to acquire such data, format it for R, plot the data, and perform statistical analyses. In addition, students will learn how to simulate data under different hypotheses, and how to perform power and sample size calculations for different statistical methods applied to real or simulated data.

Each class consists of a mixture of lecture and computer-based demos and/or exercises, as well as time for students to work on assignments. Guest investigators will frequently make short presentations (in person or by skype) to provide illustrations of how programming and informatics is critical for their research. The course provides the introductory skills needed to conduct basic computational research in the life sciences, including many aspects of computer programming and data analysis.

Course Goals

The goals of Honors Computational Genetics reflect the learning goals of the Department of Genetics, and include (1) knowledge specific goals: know the terms, concepts and theories in genetics; and (2) integrate the material from multiple courses and research. Specific itemized goals include (1) to learn R programming, specifically methods for acquisition and analysis of big data from genomics repositories; (2) to discover online repositories for genomic data sets; (3) to learn the fundamentals of statistical analysis for such data sets; (4) to perform empirical type I and power evaluations for different statistics applied to expression data by writing R programs that can simulate data using mathematical models; and (5) to learn the fundamentals of experimental design for expression-data statistics.

Core Curriculum Learning Goals Met by this Course: Info Tech & Research [ITR]

  • Goal Y: Employ current technologies to access information, to conduct research, and to communicate findings.
  • Goal Z: Analyze and critically assess information from traditional and emergent technologies.     

Course Materials

The computer lab has Windows 8 computers. Class materials and files should be copied after each class to a portable USB flash drive (Windows formatted) to continue working at home. No textbook is required as most of the needed material is made available during class. A useful resource to have on hand if you prefer to have a printed book is:

The Art of R Programming: A Tour of Statistical Software Design 1st Edition

Amazon Link 

Exams, Assignments, and Grading Policy

Attendance is expected at all classes; in-class demos and exercises are an integral part of this class and it is difficult to make-up work when class is missed. If a student must miss a class, please use the University absence reporting website to indicate the date and reason for your absence.  An email is automatically sent to the instructors. Completion of all assignments is required, including any that may have been missed due to absence in class. 

Students will be assigned weekly projects based on current material. The final grade is based on the grades received on these projects, quizzes, and a final exam.

Course Closed?

If this course is closed, please use the following link to add your name to the wait list: Wait List Sign Up for Fall 2022 Courses . If you have any questions, please contact the Department of Genetics Undergraduate Education Office in Nelson Biological Laboratories Room B416 or call 848-445-1146.

Faculty:

Dr. Premal Shah

This email address is being protected from spambots. You need JavaScript enabled to view it.                                

848-445-9664                                        

LSB 326                                                       


** All information is subject to change at the discretion of the course coordinator.