Biology B02: Intro to biology via bioinformatics
- Fall 2013, Tuesday and Thursday 8:30-10am, with an overflow time on Friday at 9am
- Instructor: Sasha Mikheyev
- First class day 9am, 10 September
- for the first class, please bring your laptop and have a cluster account set up on either tombo or cstone beforehand (email email@example.com for assistance)
The goal of this course will be to introduce key biological concepts using computational biology as a guide. Given the short duration of the course, this will not be a comprehensive introduction, focusing instead on fundamental concepts that can be dealt with through computational approaches. The course will mix lecture, discussion and problem-solving to introduce and explore the material. At the end of the course, students will present a substantive research-based final project, to illustrate their mastery of the material. The ultimate goal of this course will be to provide the student with tools to conduct independent bioinformatic investigation. Note that this course will involve significant homework, which together with the final project, should be on the order of 20 hours a week, and perhaps substantially more for students without programming experience.
- Do I need a biology background to take this course? The goal of this course is to give an introduction to basic biology, focusing on genetics, genomics and evolution. All the concepts will be introduced during the class through lectures and readings.
- Why should a biologist take this course? Although this course is meant as an introduction to biology, its focus will be highly computational. In fact, the first two weeks are going to focus on computer science, and good programming practices, such as version control. If you have limited experience with programming and bioinformatics, this could be a good introduction to both, although you will have to put up a lot of up-front investment into learning basic computer science, and potentially working with computers. You can try doing the first week’s homework in advance of the course, which will give you a taste of what you will have to do, and whether you would feel up for it. For the final project, I will try to partner students without extensive computational background with those that already know how to program.
If you attend class, do the homework, participate in discussion, and hand in an adequate final project, you can expect to get B grade (80-90%) in the course. For an A grade (90%+), you will be required to hand in a project of high quality (see below on what that means). Missing assignments, or a lack of participation, will rapidly decrease the B grade to a C grade (70-80%), or even lower, so you are encouraged to keep up. Actually, below a C there will be only a failing grade. Really, a C is already a failing grade.
Grade components (each worth 1/3 of the total course grade):
Class participation. The Tuesday lecture will be dedicated to in-class discussion. You will be responsible for preparing assignments by then, and being ready to discuss them in class. The grade for this section will be determined by the quality and quantity of your contribution, and will be at the instructor’s discretion.
- Not saying anything gives you no points. The instructor understands that English is not the first language of many students. Although lack of English skills may prevent some of the more spontaneous aspects of discussion, they can be overcome by preparing thoughts, statements and formulations in advance. This means that non-fluent English speakers may have to do more work, which is an unfortunate fact of life in science. Your grade will not depend on how well you say things, but what you say, and whether you say it.
- Attendance and punctuality are integral to class participation. Your grade will be severly penalized for lateness and for missing classes, without a prior excuse from the instructor. Each missed class will discount your final participation grade, up to the total, by 4missed classes-1 percent. Arriving in class more than 15 minutes late counts as a missed class, not being there at the start counts as half a missed class. The system has a built-in mulligan, so no post hoc excuses will be accepted. Excepting medical and family emergencies, any authorized absences will have to be approved by the instructor at least two weeks beforehand.
Homework. The homework should be turned in on time, which means before the start of the class when it is due, as we will use it for discussion. You have 5 days to do each major assignment, all of which will be fairly labor-intensive, so do them over the weekend, and don’t leave them until the last minute. No excuses for late homework will be accepted. Collaboration between classmates is encouraged, but everyone will submit their own code with a distinct implementation of the solution. There are a few problems where coming up with fundamentally distinct solutions will be difficult, but you should try to introduce your own twist.
- The vast majority of the homework assignments will be from Rosalind.info. They are actually really fun, but require a fairly heavy investment into programming skills, if you don’t already have them. In principle, you can learn all of the required skills in the first couple of weeks of class, but it will require a lot of work. However, you will be well-rewarded for your effort. Please check out the web site and the homework assignments.
- To use Rosalind you have to register for an account. Once you have done this, please enroll in this course on Rosalind here. You should submit homework to the course site. Please note, that answers submitted to the course site don’t count in the rest of Rosalind, so you may want to post them in the general site as well, if you wish to try problems that are outside of the course problem tree.
Final project. This is the most important part of the course, other than following instructions. It is meant to test your creativity and will separate the As from the Bs. Project will require original research, and should be a demonstration of your computational skills and biological understanding. The choice of the topic is up to you, but it should have an obvious connection to biology. Your grade for this section will depend on how innovative, well-posed, and well-executed the project will be, as judged by the final presentation.
- Each project will be done by a pair of students, or at most three students, if there is not an even number. The student pairs will be assigned by the instructor in the first two weeks of the course, aiming to match students with complementary abilities (e.g., computational skills and knowledge of biology). Each team member should invest at least 40 hours into the project.
- Although there is no written component for the final project, you will be required submit your final presentation, and a public github repository with supporting material/data and documentation necessary to repeat your analysis.
Schedule (under development beyond the first couple of weeks)
|Week 1||10 Sep, 9am||Introduction; Python|
|Week 2||17 Sep||More python, version control||Python village|
|Week 3||24 Sep||Genome structure|
|Week 4||1 Oct||Heredity|
|Week 5||8 Oct||Population genetics|
|Week 6||15 Oct||Phylogenetics I|
|Week 7||22 Oct||Phylogenetics II|
|Week 8||29 Oct||Genome evolution I|
|Week 9||5 Nov||Genome Evolution II|
|Week 10||12 Nov||Alignment|
|Week 11||19 Nov||Genome assembly and annotation|
|Week 12||26 Nov||RNA-seq data analysis and other large data|
|Week 13||3 Dec||Final project presentations|
Frequently Asked Questions (FAQ)
- Do I have to use Python?
- For biology Python and Perl have the greatest number of available packages. However, I find Python easier to learn and to read, making the tool of choice for this course. I will use it to illustrate key concepts in the course using an environment called iPython notebooks, which you will also use to demonstrate solutions to homework problems. So, yes, you have to learn Python. For the final project you can use any language you like.
- How do I install Python on my computer/configure git/get a cluster account/etc.?
- In most cases you can Google this sort of thing, or if it has to do with our HPC infrastructure, check with our IT. Sadly, the I do not have resources to deal with IT issues.