Course Scheduling Algorithm
Nobody knows more about course scheduling algorithm than I do.
During the past summer, I worked at a friend's tech startup and established a course scheduling system. I served as a full stack engineer for this task, learned a lot and want to share some ideas and insights.
- As informed by Professor Madeleine Udell, Bill Gates wrote a class scheduling program for his school in exchange for computer time. As the rumor says, he tweaked the program’s code so that he was placed in classes with mostly female students
- In last September, I was a voluntary teacher teaching English. A local teacher complained to me about the hassle of spending so much time scheduling courses for the entire school. Even with cheap pirated software, he still had to manually adjust schedules in order to satisfy various requirements proposed by other teachers. I immediately realized that integer programming should be quite helpful for the algorithm. However, I forgot most of what I learned in Optimization II, let alone how to transform the data into code
- The course scheduling program can be classified into 2 types: high school (fixed timeslots) or college (flexible timeslots), but I will only talk about the 1st, as the system I developed served high schools
- It can be proved that the decision problem is NP-complete and the optimization problem is NP-hard. Let
mbe the number of slots and
nthe number of schedules. Since each schedule must be assigned to a slot, there are mn possible ways to schedule courses (if
m = 40and
n = 100, then the exponential is about 1.607×10159). Thus, we have to come up with some smart tricks to restrict this astronomical complexity and turn it close to polynomial runtime
# Data Models
Code should be designed around data instead of the other way around. Thus, building sensible data models is crucial. Given existing data models in
course (taught by at most 1 teacher and to at most 1 class), I created a few more models, two most essential of which are
Slot is a timeslot and has the following information:
- start time
- end time
- school day
- free (if false, no
schedulecan be assigned to this
Schedule is 1 weekly session of a course (can be many sessions) and contains following information:
slot(refers if assigned to the
# Algorithm - Integer Programming (IP)
Through careful review over ORIE 3310 course notes, and help of Cornell professors, I was able to realize the algorithm in code. In the following, I will restrict to the math formulations of IP rather than code.
What is integer programming? According to Wikipedia, it is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings, the term refers to integer linear programming (ILP), in which the objective function and the constraints (other than the integer constraints) are linear.
Here is an integer linear program in canonical form:
A linear program in canonical form is similar but omits the integrality constraint.
In the following math formulations, all decision variables are binary: 0 or 1, which is a special case of integer programming, and a usual practice for assignment problems.
# Basic Formulation
Whether assigning a
schedule to a
slot is a binary decision and I need a boolean variable to represent it. For each slot i and schedule j, a binary decision variable
x[i, j] is defined. There are
m * n decision variables and 2mn assignment possibilities, where
m is number of slots and
n number of schedules.
Further, the core constraint is to have no conflict in the timetable, i.e., no more than 1
schedule of each
room in each
slot. To represent these in math:
Boolean Condition: 0 means schedule j is not assigned to slot i, 1 means positive.
for all slots i∈[1..m] and schedules j∈[1..n]
Note that for now, each course is taught by one and only one teacher, so the Unique Course condition can be satisfied by the below Unique Teacher condition.
Unique Teacher: for each slot, at most 1 out of all schedules by a teacher is assigned.
for each i∈[1..m], where T is the set of schedules taught by each teacher
Unique Class: for each slot, at most 1 out of all schedules from a course is assigned.
for each i∈[1..m], where C is the set of schedules of each class
Unique Room: for each slot, at most 1 out of all schedules in a room is assigned.
for each i∈[1..m], where R is the set of schedules in each room
And don't forget! We must assign each schedule in some slot.
for each j∈[1..n]
# Parallel Electives
It should be obvious that for a set of classes (usually a grade), compulsory and elective courses taken by a class should not be scheduled in the same slot. To go one step further, the high school requires the feature "parallel electives", which requires a set of courses enrolled by a set of classes to be assigned at a same slot. A student from these classes can thus only choose at most one out of these courses for this semester.
It is enforced that each schedule can only belong to one parallel elective.
For the math formulation, I will start with an illustrative example. Say we have three courses, A, B, C, and each has k weekly schedules (same number of schedules is recommended but not required). Therefore, schedules A1, ..., Ak, B1, ..., Bk, C1, ..., Ck store the same id of a parallel elective. A, B, C are then zipped and produced k combinations: [A1,B1,C1], ..., [Ak,Bk,Ck]. For each combination in a parallel elective, the schedules are either all scheduled in a slot or not:
for each slot i∈[1..m], where P represents the p-th combination [Ap,Bp,Cp] and p∈[1..k]
# Assigned Schedules
Sometimes the administrator may have assigned some schedules to some slots already, so the algorithm only deals with free schedules. However, we need to satisfy no-conflict constraints with the assigned schedules. There are basically two ways to deal with assigned schedules (say schedule j is assigned to slot i):
- Assign decision variables involving assigned slot and schedule to 1, so that the free schedules automatically satisfy previous constraints. E.g., set x[i,j]=1
- All the other schedules with same teacher, class, room as schedule j should not be assigned at slot i. E.g., if schedule j is a physics course and taught by Albert to class Thoreau, then decision variable of any free schedule that is a physics course or taught by Albert or taught to class Thoreau is set to 0 at slot i.
The 1st way has the number of decision variables equal to that of all schedules but has less constraints. The 2nd way has the number of decision variables equal to that of only free schedules but has more constraints.
# Further Requirements
Clearly, the administrator was not happy enough with the basic requirements above and wanted to minimize manual adjustments. He proposed two additional needs:
- Some teachers cannot teach any course in certain slots
- Assign two consecutive schedules for certain times (max is
floor(# schedules / 2)) per week
The 1st requirement is easy whereas the 2nd is more tricky.
# Slot -> Teacher(s)
teacher t is not free at
for each i∈[1..m], where T is the set of schedules taught by teacher t
# Consecutive Schedules
Say there are 2 schedules j1 and j2 to be assigned to 2 slots i and i+s, where s is the number of school days (usually 5). Thus,
x[i, j_1] and
x[i + s, j_2] should both be set to 1 (product of 2 decision variables) to ensure the consecutive condition. In reality, i can be any
slot ∈[0..m−s], and consecutives is the mapping from a
Here are the constraints:
where C is the set of schedules of each course
And the objective is basically to maximize sum of LHS of constraints.
This formulation works theorectically, but it has nonlienar terms(product of 2 decision variables) in both constraints and objective. It is no longer standard IP, and so requires AddMultiplicationEquality from CP-SAT solver to create a new decision variable for each product of
# More Algorithms
There are other possible ways to solve the course scheduling problem but do not work as well as IP.
# Genetic Algorithm (GA)
Before integer programming, I initially used GA for the course scheduling algorithm, as there are more code using GA on GitHub than IP. After 3 weeks of trial and error, I was finally able to come up with a viable implementation of crossover, mutation, and fitness function. Though it is easy to tweak fitness function as requirements add on, it has so many caveats:
- GAs are not suited for all problems, especially problems which are simple and for which derivative information is available.
- Fitness value is calculated repeatedly which might be computationally expensive for some problems.
- Being stochastic, there are no guarantees on the optimality or the quality of the solution.
- If not implemented properly, the GA may not converge to the optimal solution.
- Stochastic algorithms in general can have difficulty obeying equality constraints.
- GA is sensitive to the initial population used. Wide diversity of feasible solutions is what you want.
Essentially, GA depends heavily on the randomized generated initial population and may be ineffective or inefficient if not operated properly. There is a consensus in academia that GA, though a generic problem solver, is at best the 2nd best way to solve any problem.
Simply put, in terms of course scheduling, GA gives an expedient solution but IP gives an optimal one.
# Manual Simulation
My colleagues remind me of a simple algorithm: simulate the manual way of scheduling courses and tackle by each course/teacher/class. This would gradually reduce the number of free schedules. Though it works for basic cases, it does not work well with more requirements (such as parallel electives and consecutive schedules).
In short, though intuitive, the algorithm does not go much further than manual scheduling. Unfortunately, like GA, no strict math proof can be given to ensure the existence of a feasible solution.
Following is the timeline of implementation of IP.
|early June||spent 3 weeks on GA, but does not work well with large number of schedules|
|late July - mid August||thought about IP formulations, and read the assignment example problem|
|8/14 11:41 a.m.||got 1st answer from IP to satisfy basic requirements|
|8/17 11:00 p.m.||finished parallel electives|
|9/17 3:35 a.m.||converted from MIP to CP-SAT solver|
|9/18 9:42 p.m.||finished consecutive schedules|
My great thanks go to many of my friends, who listened to my vague ideas, challenged them with insightful questions, and inspired me to aim for excellence.
My special thanks go to Professor David Williamson, who elegantly taught ORIE 3310 in spring 2017 at Cornell, patiently answered my ignorant questions in emails and suggested to me the vastly useful Google OR-Tools.
My very special thanks go to my mom, who helped find so many bugs in my initial system without grudges, and constantly inspired me to keep on refining it with support and encouragement.