Course Scheduling Algorithm
During the past summer, I worked at a friend's tech startup and established a course scheduling system. I served as a full stack engineer for backend, frontend, and algorithm. In the following I will share some ideas and insights on the algorithm.
# Baloney
- As informed by Professor Madeleine Udell (opens new window), Bill Gates (opens new window) wrote a class scheduling program for his school in exchange for computer time. As the rumor (opens new window) says, he tweaked the program’s code so that he was placed in classes with mostly female students
- While I was a voluntary teacher teaching English in last September, a local teacher complained to me about the hassle of spending so much time scheduling courses for the entire school. Even with cheap pirated software, he still had to manually adjust schedules in order to satisfy various requirements proposed by other teachers. I immediately realized that integer programming should be quite helpful for the algorithm. However, I forgot most of what I learned in Optimization II, let alone how to transform the data into code
- The course scheduling program can be classified into 2 types: high school (fixed timeslots) or college (flexible timeslots), but I will only talk about the 1st, as the system I developed served high schools
- It can be proved (opens new window) that the decision problem is NP-complete and the optimization problem is NP-hard. Let
m
be the number of slots andn
the number of schedules. Since each schedule must be assigned to a slot, there are possible ways to schedule courses (ifm = 40
andn = 100
, then is about ). Thus, we have to come up with some smart tricks to restrict this astronomical complexity and turn it close to polynomial runtime
# Data Models
Code should be designed around data instead of the other way around. Thus, building sensible data models is crucial. Given existing data models in teacher
, class
, course
(taught by at most 1 teacher and to at most 1 class), I created a few more models, two most essential of which are slot
and schedule
.
Slot
is a timeslot and has the following information:
- start time
- end time
- school day
- free (if false, no
schedule
can be assigned to thisslot
)
Schedule
is 1 weekly session of a course (can be many sessions) and contains following information:
- course
- teacher
- class
- room
slot
(refers if assigned to theslot
)
# Integer Programming (IP)
Through careful review over ORIE 3310 course notes, and help of Cornell professors, I was able to realize the algorithm in code. In the following, I will restrict to the math formulations of IP rather than code.
What is integer programming? According to Wikipedia (opens new window), it is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings, the term refers to integer linear programming (ILP), in which the objective function and the constraints (other than the integer constraints) are linear.
Here is an linear program in canonical form:
An integer linear program (ILP) has the additional integrality constraints: .
A mixed integer linear program (MILP) has some variables constrained to be integers, and others not.
In the following math formulations, all decision variables are binary: or , a special case of integer programming, and a usual practice for assignment problems (opens new window).
# Complexity
LPs can be solved efficiently (in polynomial time), but ILPs and MILPs are NP-complete due to additional constraints.
# Basic Formulation
Whether assigning a schedule
to a slot
is a binary decision and I need a boolean variable to represent it. For each slot and schedule , a binary decision variable x[i, j]
is defined. There are m * n
decision variables and assignment possibilities, where m
is number of slots and n
number of schedules.
Further, the core constraint is to have no conflict in the timetable, i.e., no more than schedule
of each course
, teacher
, class
, or room
in each slot
. To represent these in math:
Boolean Condition: means schedule is not assigned to slot , means positive.
for all slots and schedules
Note that for now, each course is taught by one and only one teacher, so the Unique Course condition can be satisfied by the below Unique Teacher condition.
Unique Teacher: for each slot, at most 1 out of all schedules by a teacher is assigned.
for each , where is the set of schedules taught by each teacher
Unique Class: for each slot, at most 1 out of all schedules from a course is assigned.
for each , where is the set of schedules of each class
Unique Room: for each slot, at most 1 out of all schedules in a room is assigned.
for each , where is the set of schedules in each room
And don't forget! We must assign each schedule in some slot.
for each
# Parallel Electives
It should be obvious that for a set of classes (usually a grade), compulsory and elective courses taken by a class should not be scheduled in the same slot. To go one step further, the high school requires the feature "parallel electives", which requires a set of courses enrolled by a set of classes to be assigned at a same slot. A student from these classes can thus only choose at most one out of these courses for this semester.
It is enforced that each schedule can only belong to one parallel elective.
For the math formulation, I will start with an illustrative example. Say we have three courses, , , , and each has weekly schedules (same number of schedules is recommended but not required). Therefore, schedules , ..., , , ..., , , ..., store the same id of a parallel elective. , , are then zipped and produced combinations: , ..., . For each combination in a parallel elective, the schedules are either all scheduled in a slot or not:
for each slot , where represents the -th combination and
# Assigned Schedules
Sometimes the administrator may have assigned some schedules to some slots already, so the algorithm only deals with free schedules. However, we need to satisfy no-conflict constraints with the assigned schedules. There are basically two ways to deal with assigned schedules (say schedule is assigned to slot ):
- Assign decision variables involving assigned slot and schedule to , so that the free schedules automatically satisfy previous constraints. E.g., set
- All the other schedules with same teacher, class, room as schedule should not be assigned at slot . E.g., if schedule is a physics course and taught by Albert to class Thoreau, then decision variable of any free schedule that is a physics course or taught by Albert or taught to class Thoreau is set to at slot .
The 1st way has the number of decision variables equal to that of all schedules but has less constraints. The 2nd way has the number of decision variables equal to that of only free schedules but has more constraints.
# Further Requirements
Clearly, the administrator was not happy enough with the basic requirements above and wanted to minimize manual adjustments. He proposed two additional needs:
- Some teachers cannot teach any course in certain slots
- Assign two consecutive schedules for certain times (max is
floor(# schedules / 2)
) per week
The 1st requirement is easy whereas the 2nd is more tricky.
# Slot -> Teacher(s)
Say teacher
is not free at slot
:
for each , where is the set of schedules taught by teacher
# Consecutive Schedules
Say there are 2 schedules and to be assigned to 2 slots and , where is the number of school days (usually ). Thus, x[i, j_1]
and x[i + s, j_2]
should both be set to (product of 2 decision variables) to ensure the consecutive condition. In reality, can be any slot
, and is the mapping from a course
to # consecutives
.
Here are the constraints:
where is the set of schedules of each course
And the objective is basically to maximize sum of LHS of constraints.
This formulation works theorectically, but it has nonlienar terms(product of 2 decision variables) in both constraints and objective. It is no longer standard IP, and so requires AddMultiplicationEquality (opens new window) from CP-SAT solver (opens new window) to create a new decision variable for each product of x
.
# Heuristic Algorithms
Following are some heuristic algorithms that I know of and may also solve the course scheduling problem but undoubtedly inferior to IP.
# Genetic Algorithm (GA)
Before integer programming, I initially used GA for the course scheduling algorithm, as there are more code using GA on GitHub than IP. After 3 weeks of trial and error, I was finally able to come up with a viable implementation of crossover, mutation, and fitness function. Though it is easy to tweak fitness function as requirements add on, it has so many caveats (opens new window):
- GAs are not suited for all problems, especially problems which are simple and for which derivative information is available.
- Fitness value is calculated repeatedly which might be computationally expensive for some problems.
- Being stochastic, there are no guarantees on the optimality or the quality of the solution.
- If not implemented properly, the GA may not converge to the optimal solution.
- Stochastic algorithms in general can have difficulty obeying equality constraints.
- GA is sensitive to the initial population used. Wide diversity of feasible solutions is what you want.
Essentially, GA depends heavily on the randomized generated initial population and may be ineffective or inefficient if not operated properly. There is a consensus in academia that GA, though a generic problem solver, is at best the 2nd best way to solve any problem.
Simply put, in terms of course scheduling, GA gives an expedient solution but IP gives an optimal one.
# Manual Simulation
My colleagues remind me of a simple algorithm: simulate the manual way of scheduling courses and tackle by each course/teacher/class. This would gradually reduce the number of free schedules. Though it works for basic cases, it does not work well with more requirements (such as parallel electives and consecutive schedules).
In short, though intuitive, the algorithm does not go much further than manual scheduling. Unfortunately, like GA, no strict math proof can be given to ensure the existence of a feasible solution.
# Timeline
Following is the timeline for implementation of algorithms: GA and IP.
Time | Description |
---|---|
early June | spent 3 weeks on GA, but does not work well with large number of schedules |
late July - mid August | thought about IP formulations, and read the assignment example problem (opens new window) |
8/14 11:41 a.m. | got 1st answer from IP to satisfy basic requirements |
8/17 11:00 p.m. | finished parallel electives |
9/17 3:35 a.m. | converted from MIP to CP-SAT solver |
9/18 9:42 p.m. | finished consecutive schedules |
Other times I was doing backend and frontend development.
# Thanks
My great thanks go to many of my friends, who listened to my vague ideas, challenged me with insightful questions, and inspired me to pursue excellence.
My special thanks go to Professor David Williamson (opens new window), who elegantly taught ORIE 3310 in spring 2017 at Cornell, patiently answered my ignorant questions in emails and suggested to me the extremely useful Google OR-Tools (opens new window).
My very special thanks go to my mom (opens new window), who persistently helped find so many bugs in my initial system without grudges, and constantly encouraged me to keep on refining it.