June 17, 2024

Linear Programming 101 for Data Scientists

Linear Programming 101 for Data ScientistsLinear Programming 101 for Data Scientists
Images from Wikipedia



For those more familiar with the subject, you probably know that the origins of linear programming began roughly around the mid-1950s, and a mathematician by the name of George Dantzig was involved. If that was your guess, you’d be right for the most part, but we all know that attributing credit for many (if not all) scientific and mathematical discoveries is not that simple — there is often more than one individual that contributed to the development of an area of research, and that is certainly the case for linear programming. Initial progress was made in parallel by two mathematicians working independently in the mid-1900s, and credit was therefore certainly due to more than one person.

Without getting into the history too much, let’s try to get a rough sense of the timeline of key advances in linear programming. The first idea for linear programming branched out of Leonid Kantorovich’s intent to reduce costs for his own army while increasing those of his enemy army. His efforts took place during World War 2 in 1939 but was neglected by the USSR at the time. Meanwhile, T.C. Koopmans had a similar idea as Kantorovich but was working on it independently and it was tailored to his own economic applications. A few years later, in 1941, Frank Lauren Hitchcock began working on similar ideas which, again, were tailored to his own transportation problems, but he developed a solution similar to what is now famously known as the simplex method. To cut it short, all three men were on the right track, but by the time the discovery merited a Nobel Prize in Economics, Hitchcock had died, and so Kantorovich and Koopmans took the credit.

Between 1946 and 1947, George B. Dantzig developed an algorithm for the Simplex Method that efficiently tackled linear programming problems in most cases — this was an incredible achievement. Shortly thereafter, Dantzig introduced the theory of duality in linear programming to John Von Neumann, who had been developing a theory of games, and was astonished to find that Dantzig had made progress in an unsolved problem in linear programming. This was very exciting. (Good Will Hunting, anyone? Dantzig’s achievement was actually the inspiration for the storyline in that movie!).


Linear Programming 101 for Data Scientists
Good Will Hunting (Source)


Those areas are now well-studied and used heavily in important real-life applications. Post-WW2, Dantzig’s work has been applied to daily planning applications in many industries, and mathematicians soon made progress in making linear programming solvable in polynomial time. That was some background on its origins (and you can read all about it here, for example), but for now, let us briefly get into more recent advances.



More recently, research in linear programming has focused on developing algorithms that improve computational complexity. This paper, for instance, discusses faster dynamic matrix inverses for faster LPs. (However, it is computer science-heavy, and we do not need to get into it). Overall, there is a lot of research going into mathematical optimization today as a whole, whether it’s to speed up computations, reduce inefficiencies, or introduce new applications in machine learning.



There are countless software packages to support applied linear programming in the industry. You have IBM’s CPLEX, GUROBI proprietary optimization software, open-source Python packages (such SciPy, Pyomo, PuLP, GEKKO), and probably much more. An interesting fact is that all these packages use what we call an Algebraic Modeling Language (AML), a critical paradigm that was developed in the late 1970s. All these packages are great at what they do for different reasons and there are many blog posts you can read for a good comparison between each of them — check out this post, for example.



We won’t get into the details, but let’s talk about what you should know as a user of linear programming theory and methodology. The theory of linear programming is beautiful on its own but is even more so when you can draw the connections between linear programming and linear algebra. Whether you’ve picked up a textbook, took an online introductory course, or learned this formally in school, there’s no doubt that you’ll arrive at the same conclusion that linear programming is a special case of linear algebra, and probably one of the most important and relevant extensions of elementary linear algebra.

No matter what route you’ve chosen to take to learn linear programming, you’ve likely encountered the following (or similar) sequence of topics:

  • Components of a linear program
  • Forms of linear programs
  • Common linear programming problems
  • Duality theory and sensitivity analysis
  • Specialized types of linear programs
  • Applied linear programming

(Does that sound right? If I’ve missed anything, please leave a comment below.) If that’s the case and you’d say you’re familiar with all these topics, then I think you should stop reading here and call it a day (and congratulate yourself because you’ve just been treated to a short history of linear programming!). However, if at least one of the above topics sounds new to you, then I would say you should continue reading this article as I think it’ll be worth your time.



For the sake of keeping things short and consumable, we’re going to skip the first two topics in the sequence, but here is a good resource where you can learn more about the components of a linear program and forms of linear programs. Similarly, we’ll omit discussion of duality theory and sensitivity analysis but some great resources are here and here.

Now, if you’re attempting to apply linear programming principles to solve a problem, there are likely various other classes of models that you have considered for your problem, and you’ve (hopefully reasonably) arrived at the conclusion that a linear programming model would solve your problem. If so, it may be the case that you’ve also thought about some classical problems that use linear programming: The blending problem, The assignment problem, The transportation problem, The traveling salesman problem, and so many more.

If your problem sounds roughly like any one of these problems, then we already know how to solve your problem and you’re good to go. If not, you’ll want to think deeper about your problem and figure out a way to cleverly translate it into a linear programming problem. Consider some questions like:

  • Do you need a LP to solve this problem, or is there a chance that this could be a trivial problem?
  • What is your main objective?
  • Do you only have one objective or are there many?
  • Is it a maximization problem or a minimization problem?
  • What are all the possible constraints you can think of?
  • Are your decision variables all non-negative, or do you need some special kind of specification?



I’ve done tons of explaining for now, so instead of talking some more, I’ll just show you what a “non-standard” applied linear programming problem might look like, in code. If you remember, we said there are many open-source Python packages available which use the AML paradigm. PuLP is one of them, so we’ll use PuLP for now.

This is a semi-hypothetical example with 12 real-life homeless shelters based in the City of Toronto — real-life addresses, real-life buildings, fake number of rooms/beds available (i.e., supply/capacity), fake number of new requests for beds (i.e., demand), fake ‘cluster’ a shelter belongs to. Grim topic, I know. The data was acquired from the city’s Open Data Catalogue.

Code by Author


Suppose we’re looking at any random day’s snapshot of supply and demand for beds at homeless shelters in Toronto. The selection of homeless shelters we’re looking at is strategically made so that we’re looking at clusters dispersed across the city — some shelters are close to others, while some are far from the reach of others. Keep this in mind as this is relevant to the optimization problem we’ll be specifying. Note that demand is at the cluster level and not the shelter level. 

Furthermore, there are costs (both monetary and temporal) associated with each additional bed made available in a shelter, and costs vary — sometimes by shelter, and sometimes by cluster. We’ll define a cluster as a group of shelters that are relatively close to each other — a rough “map” visual of locations relative to each other below.




In addition to monetary cost, let’s say there’s also a time cost associated with opening each additional bed — also sometimes by the shelter, and sometimes by cluster. The reasoning for this time cost is that there might be demand for a bed within the neighborhood/cluster, but the request has been made at a full-capacity location and the service user needs to be relocated.

We’ll set this up as a multi-objective optimization problem so our specification (in code) will need to be able to accommodate that. First, we minimize monetary cost, and then we minimize time costs. The coefficients in the second objective function correspond to the relative time increase within a specific shelter group (i.e., time cost for moving new users from Shelter X to shelter Y). Note that optimization problem specification is not unique and there may be more than one specification that leads to the same results.

Here’s all the code as one long script. The model specification code could be condensed/simplified but we’ll keep it like this so you can see all the details more clearly. We comment out the initial objective function since we’ve added it as a monetary cost constraint in step #5.

Code by Author


If you’ve run this code locally, you’d see the output as follows:

A:2671Islington:      1 additional bed(s).

B1:26Vaughan:         0 additional bed(s).

B2:14Vaughan:         5 additional bed(s).

C:850Bloor:           1 additional bed(s).

D1:38Bathurst:        2 additional bed(s).

D2:38Bathurst:        1 additional bed(s).

E1:135Sherbourne:     6 additional bed(s).

E2:339George:         5 additional bed(s).

E3:339George:         4 additional bed(s).

E4:339George:         0 additional bed(s).

E5:339George:         0 additional bed(s).

E6:339George:         4 additional bed(s).

Optimal Value of Objective Function:  36.9


Your first question might be: Isn’t this a trivial problem? And if not, how come? This is an important question, and it’s one that might justify not specifying a linear programming problem at all! However, in this case, it is a non-trivial problem and we do need a LP specification, think about why.

Hint: First, we minimize monetary costs, and then we minimize time costs. Can you think of a way to solve this problem by hand?



[1] B. Kolman and R.E. Beck, Elementary Linear Programming with Applications (1995), ScienceDirect

[2] G.B. Dantzig, Reminiscences About The Origins of Linear Programming (1982), Department of Operations Research — Stanford University

[3] R. Fourer, D.M. Gay, B.W. Kenighan, A Modeling Language for Mathematical Programming (1990)
Mariam Walaa is a math major with over 3 years of experience as a data scientist in engineering, retail and academia working on a variety of problems ranging from natural language processing and recommendation systems to linear programming and optimization.