Convex Optimization – Chapter 1: Background Knowledge

This chapter provides the foundational concepts needed for the rest of the course. It covers four core topics: convex sets and convex functions, polytopes and polyhedra, cones, and linear programming. These are the building blocks on which the entire theory of convex optimization rests. The chapter is designed as a review — if any of these concepts are entirely new, supplementary reading is recommended.

1.1 Convex Sets and Convex Functions

The notion of convexity — for both sets and functions — is the single most important concept in this course. Everything that follows depends on it. The core idea is simple: a shape is convex if you can "walk" between any two of its points without leaving the shape.

Line Segments and Convex Sets

Definition 1.1.1 — Line Segment

Let $p \neq q$ be two points in $\mathbb{R}^n$. The line segment between them is the set:

[p, q] = \{ \lambda p + (1-\lambda) q \mid 0 \leq \lambda \leq 1 \}

As $\lambda$ varies from 0 to 1, the point $\lambda p + (1-\lambda) q$ slides smoothly from $q$ (when $\lambda=0$) to $p$ (when $\lambda=1$).

Definition 1.1.2 — Convex Set

A set $S \subseteq \mathbb{R}^n$ is convex if for all $p, q \in S$, the entire line segment $[p, q] \subseteq S$.

Intuition & Examples

Think of a convex set as a "blob with no dents." If you pick any two points inside it and draw a straight line between them, the line stays entirely inside the set. A filled circle, a filled square, and $\mathbb{R}^n$ itself are all convex. A crescent moon shape is not convex — you can find two points inside the crescent such that the line between them passes through the "hollow" part.

Properties of Convex Sets

Proposition 1.1.4 — Preservation of Convexity

Let $S, T \subseteq \mathbb{R}^n$ be convex sets. Then:

Intersection: $S \cap T$ is convex.
Affine image: For $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$, the set $\{Ax + b \mid x \in S\}$ is convex.
Minkowski sum: $S \oplus T := \{x + y \mid x \in S, y \in T\}$ is convex.

Why intersections matter

In optimization, the feasible region is usually described as an intersection of constraint sets (e.g., half-spaces). The fact that intersecting convex sets produces a convex set is what guarantees that these feasible regions remain convex — a crucial requirement for efficient optimization.

Combinations and Hulls

Definition 1.1.5 — Affine, Conic, and Convex Combinations

Given points $a_1, \ldots, a_k \in \mathbb{R}^n$ and scalars $\lambda_1, \ldots, \lambda_k$, the linear combination $\sum_{i=1}^{k} \lambda_i a_i$ is called:

Affine combination if $\sum_i \lambda_i = 1$ (the weights sum to one, but can be negative).
Conic combination if $\lambda_i \geq 0$ for all $i$ (non-negative weights, no sum constraint).
Convex combination if both: $\lambda_i \geq 0$ and $\sum_i \lambda_i = 1$ (non-negative weights that sum to one).

Intuition

A convex combination is a "weighted average." For two points, $\lambda a_1 + (1-\lambda) a_2$ with $\lambda \in [0,1]$ traces the line segment between them — exactly what we used to define convex sets. A conic combination allows scaling up (think: rays from the origin), while an affine combination allows negative weights (think: extending beyond the original points along a line).

Definition 1.1.6 — Hulls

For a set $A \subseteq \mathbb{R}^n$:

The affine hull $\text{aff}(A)$ is the set of all finite affine combinations of points in $A$.
The conic hull $\text{cone}(A)$ is the set of all finite conic combinations of points in $A$.
The convex hull $\text{conv}(A)$ is the set of all finite convex combinations of points in $A$.

Informally, $\text{conv}(A)$ is the smallest convex set containing $A$ — like stretching a rubber band around the points.

Convex Functions

Definition 1.1.7 — Convex Function

Let $X \subseteq \mathbb{R}^n$ be a convex set. A function $f: X \to \mathbb{R}$ is convex if for all $x, y \in X$ and $\lambda \in [0,1]$:

f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y)

If the inequality is strict for all $\lambda \in (0,1)$ and $x \neq y$, the function is strictly convex. A function $f$ is concave if $-f$ is convex.

Intuition — "The Chord Lies Above"

The definition says: if you pick any two points on the graph of $f$ and draw a straight line (chord) between them, the function's graph lies on or below that chord. Think of a bowl shape like $f(x) = x^2$ — it's convex. A hill shape like $f(x) = -x^2$ is concave. A straight line $f(x) = ax + b$ is both convex and concave.

Conditions for Convexity (Differentiable Functions)

First- and Second-Order Conditions

If $f$ is differentiable, it is convex if and only if for all $x, y \in X$:

f(y) \geq f(x) + \nabla f(x)^\top (y - x)

This says the tangent line (or hyperplane) always lies below the function — the function always curves upward from any tangent.

If $f$ is twice differentiable, it is convex if and only if its Hessian is positive semidefinite everywhere: $\nabla^2 f(x) \succeq 0$ for all $x \in X$. In one dimension, this reduces to the familiar condition $f''(x) \geq 0$ — the curve bends upward.

The Epigraph — Connecting Sets and Functions

Definition 1.1.9 — Epigraph

The epigraph of a function $f: X \to \mathbb{R}$ is the set of all points "on or above" its graph:

\text{epi}\, f = \{(x, t) \in \mathbb{R}^{n+1} \mid x \in X,\; t \geq f(x)\}

Proposition 1.1.10 — The Epigraph Characterization

A function $f$ is convex if and only if its epigraph is a convex set.

Why this matters

This result is a powerful bridge: it lets you convert questions about convex functions into questions about convex sets, and vice versa. For example, to prove a function is convex, you can instead show its epigraph (the "ice cream bowl" above the curve) is a convex set. This technique is used extensively later in the course, especially in conic programming.

Operations Preserving Convexity

Proposition 1.1.11 — Preservation Rules for Convex Functions

Let $f_1, \ldots, f_s: X \to \mathbb{R}$ be convex functions on a convex set $X \subseteq \mathbb{R}^n$. Then:

Non-negative weighted sum: $\sum_{i} \lambda_i f_i$ is convex for all $\lambda_i \geq 0$.
Pointwise maximum: $\max\{f_1, \ldots, f_s\}$ is convex.
Composition with non-decreasing function: $f_1(f_2(x))$ is convex if $f_1$ is non-decreasing.

Practical use

These rules are your toolkit for proving convexity without going back to the definition each time. For instance, the function $\max(x^2, 3x+1)$ is convex because both $x^2$ and $3x+1$ are convex and the max of convex functions is convex. Similarly, $e^{x^2}$ is convex because $e^t$ is convex and non-decreasing and $x^2$ is convex.

1.2 Polytopes and Polyhedra

Polyhedra are the geometric shapes that arise naturally in linear programming. They are the regions you get when you intersect a finite number of half-spaces — essentially, the feasible regions of linear programs.

Hyperplanes and Half-Spaces

Definition 1.2.1 — Hyperplane and Half-Space

For $a \in \mathbb{R}^n \setminus \{0\}$ and $\beta \in \mathbb{R}$:

The hyperplane is $H = \{x \in \mathbb{R}^n : a^\top x = \beta\}$.
The positive half-space is $H^+ = \{x : a^\top x \geq \beta\}$.
The negative half-space is $H^- = \{x : a^\top x \leq \beta\}$.

The vector $a$ is the normal to the hyperplane — it points "outward" from $H^-$ into $H^+$.

Intuition

In $\mathbb{R}^2$, a hyperplane is just a line. In $\mathbb{R}^3$, it's a plane. A half-space is everything on one side of that dividing surface. The constraint $2x_1 + 3x_2 \leq 7$ defines a half-space in $\mathbb{R}^2$: everything on one side of the line $2x_1 + 3x_2 = 7$.

Separating and Supporting Hyperplanes

Theorem 1.2.2 — Separating Hyperplane Theorem

If $X, Y \subseteq \mathbb{R}^n$ are two nonempty, disjoint convex sets, then there exists a hyperplane that separates them: a vector $a \neq 0$ and scalar $b$ such that $a^\top x \leq b$ for all $x \in X$ and $a^\top y \geq b$ for all $y \in Y$.

Intuition

If you have two non-overlapping convex blobs, you can always slide a flat "wall" (hyperplane) between them. This seemingly simple geometric fact is the theoretical engine behind LP duality and many optimality conditions.

Theorem 1.2.8 — Supporting Hyperplane Theorem

For any nonempty, closed, convex set $S$ and any boundary point $x$, there exists a supporting hyperplane — a hyperplane that passes through $x$ with the entire set $S$ on one side.

Polyhedra and Polytopes

Definition 1.2.3 — Polyhedron and Polytope

Given a matrix $A \in \mathbb{R}^{m \times n}$ and vector $b \in \mathbb{R}^m$, the set $P = \{x \in \mathbb{R}^n : Ax \leq b\}$ is a polyhedron. A bounded polyhedron is called a polytope.

Intuition

A polyhedron is an intersection of finitely many half-spaces — think of it as a "faceted" region. A polygon in 2D (like a triangle or pentagon) is a polytope. An unbounded wedge-shaped region (open on one side) is a polyhedron but not a polytope. Since each half-space is convex and intersections preserve convexity, every polyhedron is automatically a convex set.

Interior and Relative Interior

The interior $\text{int}(P)$ consists of all points with a small ball around them that fits entirely inside $P$. The relative interior $\text{relint}(P)$ relaxes this: the ball only needs to fit inside $P$ when restricted to the affine hull of $P$. This distinction matters for sets that are "flat" — for example, a line segment in $\mathbb{R}^2$ has empty interior, but its relative interior is the segment without its endpoints.

Faces, Vertices, Edges, and Facets

Definition 1.2.10 — Faces of a Polyhedron

Let $P$ be a nonempty polyhedron with dimension $\text{dim}(P)$:

A face is $P$ itself or $P$ intersected with a supporting hyperplane.
A vertex is a 0-dimensional face (a corner point).
An edge is a 1-dimensional face.
A facet is a $(\text{dim}(P)-1)$-dimensional face.

Example: a cube in $\mathbb{R}^3$

A cube has dimension 3. Its 8 corners are vertices (dim 0), its 12 edges are edges (dim 1), and its 6 square sides are facets (dim 2). The Simplex method for linear programming works by hopping between vertices along edges.

Two Representations of Polytopes

Proposition 1.2.11 — Vertex (V-) Representation

Every polytope is the convex hull of its vertices. Conversely, the convex hull of any finite set of points is a polytope.

So a polytope can be described in two equivalent ways: as an intersection of half-spaces (H-representation: $Ax \leq b$), or as the convex hull of its vertices (V-representation: $\text{conv}\{v_1, \ldots, v_k\}$). Converting between these representations can be computationally expensive — a polytope with a few half-spaces can have exponentially many vertices, and vice versa.

Rays and the Minkowski Resolution Theorem

Unbounded polyhedra (those that extend to infinity) need an additional concept: extreme rays — directions in which the polyhedron extends infinitely. An extreme ray cannot be decomposed as a combination of two other distinct rays.

Proposition 1.2.13 — Minkowski Resolution Theorem

Every polyhedron is the Minkowski sum of the convex hull of its vertices and the conic hull of its extreme rays.

Intuition

A polyhedron = a bounded "core" (polytope from vertices) + unbounded "directions" (rays). A polytope has no extreme rays, so it is just the convex hull of its vertices. An unbounded polyhedron like $\{(x,y) : x \geq 0,\; y \geq 0,\; x+y \geq 1\}$ has one vertex at $(1,0)$ and $(0,1)$ plus two rays pointing along the positive $x$- and $y$-axes.

1.3 Cones

Cones are sets that are closed under non-negative scaling. They play a central role in generalizing linear programming to broader classes of optimization problems (conic programs, SOCPs, SDPs).

Definition 1.3.1 — Cone and Its Variants

A set $K \subseteq \mathbb{R}^n$ is a cone if for any $c \in K$ and $\lambda \geq 0$, we have $\lambda c \in K$. A cone $K$ is called:

Convex cone — if $K$ is convex (closed under non-negative linear combinations).
Solid cone — if it has nonempty interior.
Pointed cone — if it contains no entire line ($x, -x \in K \Rightarrow x = 0$).
Proper cone — if it is convex, closed, solid, and pointed.
Polyhedral cone — if it is a polyhedron (i.e., $K = \{x : Ax \leq 0\}$).

Key example: the non-negative orthant

The set $\mathbb{R}^n_{\geq 0} = \{x \in \mathbb{R}^n : x_i \geq 0 \text{ for all } i\}$ is the prototypical proper cone. It's convex (non-negative combination of non-negative vectors is non-negative), closed, solid (has interior, e.g., $(1,1,\ldots,1)$), and pointed (if both $x$ and $-x$ have all entries non-negative, then $x=0$). It is also polyhedral (defined by $-I x \leq 0$). Moreover, the non-negative orthant is self-dual.

Proposition 1.3.2 — Decomposition of Polyhedra

Every nonempty polyhedron $P$ can be written as $P = Q \oplus K$, where $Q$ is a polytope and $K$ is a polyhedral cone.

Dual Cones

Definition 1.3.3 — Dual Cone

The dual cone of $K \subseteq \mathbb{R}^n$ is:

K^* = \{ \ell \in \mathbb{R}^n : \langle \ell, x \rangle \geq 0 \text{ for all } x \in K \}

If $K = K^*$, the cone is self-dual.

Intuition

A vector $\ell$ belongs to the dual cone $K^*$ if it makes a non-negative inner product with every vector in $K$. Geometrically, $\ell$ defines a half-space that contains the entire cone $K$. The dual cone captures the "directions that are friendly to $K$." The non-negative orthant $\mathbb{R}^n_{\geq 0}$ is self-dual: every vector with all non-negative entries has non-negative dot product with every other such vector.

Proposition 1.3.4 — Properties of Dual Cones

Key properties include:

$K^*$ is always closed and convex (even if $K$ is not).
$K_1 \subseteq K_2 \Rightarrow K_2^* \subseteq K_1^*$ (bigger cone, smaller dual).
$K$ solid $\Rightarrow K^*$ pointed; $K$ convex and pointed $\Rightarrow K^*$ solid.
$K^{**} = \overline{\text{conv}}(K)$, and if $K$ is closed and convex, $K^{**} = K$.
If $K$ is proper, then $K^*$ is proper.

Why cones matter for optimization

Cones generalize the non-negativity constraint $x \geq 0$ from linear programming. Instead of constraining variables to the non-negative orthant, conic programs constrain variables to lie in a general cone $K$ (such as the second-order cone or the positive semidefinite cone). The dual cone plays the same role that non-negative dual variables play in LP duality — this connection will become clear in later chapters.

1.4 Linear Programming

Linear programming (LP) is the most fundamental class of convex optimization. The task is to optimize a linear objective over a polyhedral feasible region. LPs are both theoretically elegant and practically ubiquitous — they appear in logistics, economics, network flows, and as subroutines inside more complex optimization algorithms.

LP Formulations

Definition 1.4.1 — Linear Program

A linear program can be written in three equivalent forms:

General	Standard	Canonical
$\min\; c^\top x$ $Ax \geq b$ $Cx \leq d$ $Ex = f$	$\min\; c^\top x$ $Ax = b$ $x \geq 0$	$\max\; c^\top x$ $Ax \leq b$ $x \geq 0$

All three forms are equivalent — any LP in one form can be converted to another using slack variables and sign-splitting.

Definition 1.4.2 — Feasible, Infeasible, Unbounded

An LP is:

Feasible if it attains a finite optimum.
Unbounded if the objective can be made arbitrarily good over the feasible region.
Infeasible if no point satisfies all constraints simultaneously.

LP Duality

Duality is one of the deepest ideas in optimization. The key insight: we can use the constraints of an LP to derive bounds on its optimal value. By systematically combining constraints, we construct a new LP — the dual — whose optimal value bounds the original (primal) LP.

Motivating Example (Example 1.4.4)

Consider $\max\; 2x_1 + x_2$ subject to $2x_1 + x_2 \leq 4$, $x_1 + x_2 \leq 1$, and $x_1, x_2 \geq 0$. The first constraint directly tells us the optimum is at most 4. But we can do better: notice that $2x_1 + x_2 = 2(x_1 + x_2) + (-x_2) \leq 2 \cdot 1 + 0 = 2$. The point $(1, 0)$ achieves this value, so the optimum is exactly 2. This "combining constraints to bound the objective" is exactly what the dual LP formalizes.

Definition 1.4.5 — Dual LP

The primal-dual pairs for standard and canonical forms:

Standard Form		Canonical Form
Primal	Dual	Primal	Dual
$\min\; c^\top x$ $Ax = b$ $x \geq 0$	$\max\; b^\top y$ $c - A^\top y \geq 0$	$\max\; c^\top x$ $Ax \leq b$ $x \geq 0$	$\min\; b^\top y$ $A^\top y \geq c$ $y \geq 0$

Weak and Strong Duality

Theorem 1.4.6 — Weak Duality

For any feasible primal solution $x$ and feasible dual solution $y$ (in standard form): $c^\top x \geq b^\top y$.

The dual objective always provides a lower bound on the primal objective. This holds for any feasible pair — not just the optimal ones.

Theorem 1.4.7 — Strong Duality for LPs

If both the primal and the dual are feasible, then their optimal values are equal: $c^\top x^* = b^\top y^*$.

Important: Strong duality is special to LPs

Weak duality holds for all mathematical programs (convex or not). Strong duality — the primal and dual optima being equal — holds unconditionally only for linear programs. For more general convex programs, strong duality requires additional conditions (like Slater's condition, covered in Chapter 3).

Proposition 1.4.8 — Primal-Dual Relationships

The possible outcomes for a primal-dual LP pair:

Primal finite optimum $\Rightarrow$ Dual finite optimum (and they agree).
Primal unbounded $\Rightarrow$ Dual infeasible.
Primal infeasible $\Rightarrow$ Dual unbounded or infeasible.

The same holds symmetrically (the dual of the dual is the primal).

Complementary Slackness

Theorem 1.4.9 — Complementary Slackness

For feasible primal $x$ and dual $y$ in canonical form, they are both optimal if and only if:

(b - Ax)^\top y = 0 \quad \text{and} \quad (A^\top y - c)^\top x = 0

Intuition

Complementary slackness says: at optimality, for each constraint, either the constraint is tight (holds with equality) or the corresponding dual variable is zero. You can't have both slack in a constraint and a positive dual variable — one of them must be zero. This gives a powerful tool for verifying optimality and for constructing optimal solutions: if you know which constraints are tight, you can often solve for the dual variables directly.

Algorithms for Solving LPs

The Simplex Method

Developed by Dantzig in 1947, the Simplex method remains the most widely used LP solver. It starts at a vertex of the feasible polyhedron and moves along edges to adjacent vertices, always improving the objective value. The process terminates at an optimal vertex in finitely many steps. Although there exist worst-case inputs (like the Klee-Minty cube) where Simplex takes exponential time, it is extremely fast in practice — and has been proven to run in polynomial time under smoothed analysis.

The Ellipsoid Method

Introduced by Khachiyan in 1979, the Ellipsoid method was the first algorithm proven to solve LPs in polynomial worst-case time, settling a long-standing open question. It works by maintaining an ellipsoid that contains the optimal solution and iteratively shrinking it using cutting planes from violated constraints. While theoretically important, it is slower in practice than the Simplex method. Its real significance is theoretical: it proves that LP (and more general convex optimization problems) can be solved efficiently.

Practical takeaway

Simplex is fast in practice but exponential in worst-case. The Ellipsoid method is polynomial in worst-case but slow in practice. Modern solvers often use interior-point methods (covered in Chapter 5), which combine the best of both worlds: polynomial worst-case time and strong practical performance.

Key Takeaways

Convex sets are closed under line segments between their points; convex functions satisfy the "chord above graph" property. The epigraph bridges these two concepts: $f$ is convex iff $\text{epi}\,f$ is a convex set.
Polyhedra (intersections of half-spaces) are the feasible regions of linear programs. Polytopes are bounded polyhedra and admit both an inequality (H-) representation and a vertex (V-) representation.
Cones generalize the non-negative orthant. Proper cones (convex, closed, solid, pointed) are the key to defining conic optimization problems. The dual cone $K^*$ generalizes the role of non-negative dual variables in LP duality.
LP duality — expressed through weak duality, strong duality, and complementary slackness — provides a complete framework for certifying optimality and bounding objective values. Strong duality holds unconditionally for LPs but requires additional conditions (e.g., Slater's) for general convex programs.
Key operations that preserve convexity (intersection, affine image, non-negative sum, pointwise max, composition with non-decreasing functions) are the practical tools for recognizing and constructing convex problems.