Cloud balancing

Cloud balancing tutorial

Problem description

Suppose your company owns a number of cloud computers and needs to run a number of processes on those computers. Assign each process to a computer.

The following hard constraints must be fulfilled:

  • Every computer must be able to handle the minimum hardware requirements of the sum of its processes:

    • CPU capacity: The CPU power of a computer must be at least the sum of the CPU power required by the processes assigned to that computer.

    • Memory capacity: The RAM memory of a computer must be at least the sum of the RAM memory required by the processes assigned to that computer.

    • Network capacity: The network bandwidth of a computer must be at least the sum of the network bandwidth required by the processes assigned to that computer.

The following soft constraints should be optimized:

  • Each computer that has one or more processes assigned, incurs a maintenance cost (which is fixed per computer).

    • Cost: Minimize the total maintenance cost.

This problem is a form of bin packing. The following is a simplified example, in which we assign four processes to two computers with two constraints (CPU and RAM) with a simple algorithm:

cloudBalanceUseCase

The simple algorithm used here is the First Fit Decreasing algorithm, which assigns the bigger processes first and assigns the smaller processes to the remaining space. As you can see, it is not optimal, as it does not leave enough room to assign the yellow process D.

Timefold does find the more optimal solution by using additional, smarter algorithms. It also scales: both in data (more processes, more computers) and constraints (more hardware requirements, other constraints). So let’s see how Timefold can be used in this scenario.

Here’s an executive summary of this example and an advanced implementation with more constraints:

cloudOptimizationValueProposition

Problem size

Table 1. Cloud Balancing Problem Size
Problem Size Computers Processes Search Space

2computers-6processes

2

6

64

3computers-9processes

3

9

10^4

4computers-12processes

4

12

10^7

100computers-300processes

100

300

10^600

200computers-600processes

200

600

10^1380

400computers-1200processes

400

1200

10^3122

800computers-2400processes

800

2400

10^6967

Using the domain model

Domain model design

Using a domain model helps determine which classes are planning entities and which of their properties are planning variables. It also helps to simplify constraints, improve performance, and increase flexibility for future needs.

To create a domain model, define all the objects that represent the input data for the problem. In this simple example, the objects are processes and computers.

A separate object in the domain model must represent a full data set of problem, which contains the input data as well as a solution. In this example, this object holds a list of computers and a list of processes. Each process is assigned to a computer; the distribution of processes between computers is the solution.

  1. Draw a class diagram of your domain model.

  2. Normalize it to remove duplicate data.

  3. Write down some sample instances for each class.

    • Computer: represents a computer with certain hardware and maintenance costs.

      In this example, the sample instances for the Computer class are: cpuPower, memory, networkBandwidth, cost.

    • Process: represents a process with a demand. Needs to be assigned to a Computer by Timefold.

      Sample instances for Process are: requiredCpuPower, requiredMemory, and requiredNetworkBandwidth.

    • CloudBalance: represents a problem. Contains every Computer and Process for a certain data set.

      For an object representing the full data set and solution, a sample instance holding the score must be present. Timefold can calculate and compare the scores for different solutions; the solution with the highest score is the optimal solution. Therefore, the sample instance for CloudBalance is score.

  4. Determine which relationships (or fields) change during planning.

    • Planning entity: The class (or classes) that Timefold can change during solving. In this example, it is the class Process, because Timefold can assign processes to computers.

    • Problem fact: A class representing input data that Timefold cannot change.

    • Planning variable: The property (or properties) of a planning entity class that changes during solving. In this example, it is the property computer on the class Process.

    • Planning solution: The class that represents a solution to the problem. This class must represent the full data set and contain all planning entities. In this example that is the class CloudBalance.

In the UML class diagram below, the Timefold concepts are already annotated:

cloudBalanceClassDiagram

Domain model implementation

The Computer class

The Computer class is a POJO (Plain Old Java Object). Usually, you will have more of this kind of classes with input data.

Example 1. CloudComputer.java
public class CloudComputer ... {

    private int cpuPower;
    private int memory;
    private int networkBandwidth;
    private int cost;

    ... // getters
}

The Process class

The Process class is particularly important. It is the class that is modified during solving.

We need to tell Timefold that it can change the property computer. To do this: . Annotate the class with @PlanningEntity. . Annotate the getter getComputer() with @PlanningVariable.

Of course, the property computer needs a setter too, so Timefold can change it during solving.

Example 2. CloudProcess.java
@PlanningEntity(...)
public class CloudProcess ... {

    private int requiredCpuPower;
    private int requiredMemory;
    private int requiredNetworkBandwidth;

    private CloudComputer computer;

    ... // getters

    @PlanningVariable
    public CloudComputer getComputer() {
        return computer;
    }

    public void setComputer(CloudComputer computer) {
        computer = computer;
    }

    // ************************************************************************
    // Complex methods
    // ************************************************************************

    ...

}
  • Timefold needs to know which values it can choose from to assign to the property computer. Those values are retrieved from the method CloudBalance.getComputerList() on the planning solution, which returns a list of all computers in the current data set.

  • The @PlanningVariable automatically matches with the @ValueRangeProvider on CloudBalance.getComputerList().

Instead of getter annotations, it is also possible to use field annotations.

The CloudBalance class

The CloudBalance class has a @PlanningSolution annotation.

  • It holds a list of all computers and a list of all processes.

  • It represents both the planning problem and (if it is initialized) the planning solution.

  • To save a solution, Timefold initializes a new instance of the class.

    1. The processList property holds a list of processes. Timefold can change the processes, allocating them to different computers. Therefore, a process is a planning entity and the list of processes is a collection of planning entities. We annotate the getter getProcessList() with @PlanningEntityCollectionProperty.

    2. The computerList property holds a list of computers. Timefold cannot change the computers. Therefore, a computer is a problem fact. Especially for Constraint Streams, the property computerList needs to be annotated with a @ProblemFactCollectionProperty so that Timefold can retrieve the list of computers (problem facts) and make it available to the rule engine.

    3. The CloudBalance class also has a @PlanningScore annotated property score, which is the Score of that solution in its current state. Timefold automatically updates it when it calculates a Score for a solution instance. Therefore, this property needs a setter.

Example 3. CloudBalance.java
@PlanningSolution
public class CloudBalance ... {

    private List<CloudComputer> computerList;

    private List<CloudProcess> processList;

    private HardSoftScore score;

    @ValueRangeProvider
    @ProblemFactCollectionProperty
    public List<CloudComputer> getComputerList() {
        return computerList;
    }

    @PlanningEntityCollectionProperty
    public List<CloudProcess> getProcessList() {
        return processList;
    }

    @PlanningScore
    public HardSoftScore getScore() {
        return score;
    }

    public void setScore(HardSoftScore score) {
        this.score = score;
    }

    ...
}

Run the cloud balancing Hello World

  1. Download and configure the examples in your preferred IDE.

  2. Create a run configuration with the following main class: ai.timefold.solver.examples.cloudbalancing.app.CloudBalancingHelloWorld

    By default, the Cloud Balancing Hello World is configured to run for 120 seconds.

It executes the following code:

Example 4. CloudBalancingHelloWorld.java
public class CloudBalancingHelloWorld {

    public static void main(String[] args) {
        // Build the Solver
        SolverFactory<CloudBalance> solverFactory = SolverFactory.createFromXmlResource(
                "ai/timefold/solver/examples/cloudbalancing/solver/cloudBalancingSolverConfig.xml");
        Solver<CloudBalance> solver = solverFactory.buildSolver();

        // Load a problem with 400 computers and 1200 processes
        CloudBalance unsolvedCloudBalance = new CloudBalancingGenerator().createCloudBalance(400, 1200);

        // Solve the problem
        CloudBalance solvedCloudBalance = solver.solve(unsolvedCloudBalance);

        // Display the result
        System.out.println("\nSolved cloudBalance with 400 computers and 1200 processes:\n"
                + toDisplayString(solvedCloudBalance));
    }

    ...
}

The code example does the following:

  1. Build the Solver based on a solver configuration which can come from an XML file as classpath resource:

            SolverFactory<CloudBalance> solverFactory = SolverFactory.createFromXmlResource(
                    "ai/timefold/solver/examples/cloudbalancing/solver/cloudBalancingSolverConfig.xml");
            Solver<CloudBalance> solver = solverFactory.buildSolver();

    Or to avoid XML, build it through the programmatic API instead:

            SolverFactory<CloudBalance> solverFactory = SolverFactory.create(new SolverConfig()
                    .withSolutionClass(CloudBalance.class)
                    .withEntityClasses(CloudProcess.class)
                    .withEasyScoreCalculatorClass(CloudBalancingEasyScoreCalculator.class)
                    .withTerminationSpentLimit(Duration.ofMinutes(2)));
            Solver<CloudBalance> solver = solverFactory.buildSolver();

    The solver configuration is explained in the next section.

  2. Load the problem.

    CloudBalancingGenerator generates a random problem: replace this with a class that loads a real problem, for example from a database.

            CloudBalance unsolvedCloudBalance = new CloudBalancingGenerator().createCloudBalance(400, 1200);
  3. Solve the problem.

            CloudBalance solvedCloudBalance = solver.solve(unsolvedCloudBalance);
  4. Display the result.

            System.out.println("\nSolved cloudBalance with 400 computers and 1200 processes:\n"
                    + toDisplayString(solvedCloudBalance));

Solver configuration

The solver configuration file determines how the solving process works; it is considered a part of the code. The file is named cloudBalancingSolverConfig.xml.

Example 5. cloudBalancingSolverConfig.xml
<?xml version="1.0" encoding="UTF-8"?>
<solver xmlns="https://timefold.ai/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://timefold.ai/xsd/solver https://timefold.ai/xsd/solver/solver.xsd">
  <!-- Domain model configuration -->
  <solutionClass>ai.timefold.solver.examples.cloudbalancing.domain.CloudBalance</solutionClass>
  <entityClass>ai.timefold.solver.examples.cloudbalancing.domain.CloudProcess</entityClass>

  <!-- Score configuration -->
  <scoreDirectorFactory>
    <easyScoreCalculatorClass>ai.timefold.solver.examples.cloudbalancing.optional.score.CloudBalancingEasyScoreCalculator</easyScoreCalculatorClass>
    <!--<constraintProviderClass>ai.timefold.solver.examples.cloudbalancing.score.CloudBalancingConstraintProvider</constraintProviderClass>-->
  </scoreDirectorFactory>

  <!-- Optimization algorithms configuration -->
  <termination>
    <secondsSpentLimit>30</secondsSpentLimit>
  </termination>
</solver>

This solver configuration consists of three parts:

  1. Domain model configuration: What can Timefold change?

    We need to make Timefold aware of our domain classes, annotated with @PlanningEntity and @PlanningSolution annotations:

      <solutionClass>ai.timefold.solver.examples.cloudbalancing.domain.CloudBalance</solutionClass>
      <entityClass>ai.timefold.solver.examples.cloudbalancing.domain.CloudProcess</entityClass>
  2. Score configuration: How should Timefold optimize the planning variables? What is our goal?

    Since we have hard and soft constraints, we use a HardSoftScore. But we need to tell Timefold how to calculate the score, depending on our business requirements. Further down, we will look into two alternatives to calculate the score, such as using an easy Java implementation, or Constraint Streams.

      <scoreDirectorFactory>
        <easyScoreCalculatorClass>ai.timefold.solver.examples.cloudbalancing.optional.score.CloudBalancingEasyScoreCalculator</easyScoreCalculatorClass>
        <!--<constraintProviderClass>ai.timefold.solver.examples.cloudbalancing.score.CloudBalancingConstraintProvider</constraintProviderClass>-->
      </scoreDirectorFactory>
  3. Optimization algorithms configuration: How should Timefold optimize it?

    In this case, we use the default optimization algorithms (because no explicit optimization algorithms are configured) for 30 seconds:

      <termination>
        <secondsSpentLimit>30</secondsSpentLimit>
      </termination>

    Timefold should get a good result in seconds (and even in less than 15 milliseconds with real-time planning), but the more time it has, the better the results. Advanced use cases might use different termination criteria than a hard time limit.

    The default algorithms already easily surpass human planners and most in-house implementations. Use the Benchmarker to power tweak to get even better results.

Score configuration

Timefold searches for the solution with the highest Score. This example uses a HardSoftScore, which means Timefold looks for the solution with no hard constraints broken (fulfill hardware requirements) and as little as possible soft constraints broken (minimize maintenance cost).

scoreComparisonCloudBalancing

Of course, Timefold needs to be told about these domain-specific score constraints. There are several ways to implement such a score function:

Easy Java score configuration

One way to define a score function is to implement the interface EasyScoreCalculator in plain Java.

  <scoreDirectorFactory>
    <easyScoreCalculatorClass>ai.timefold.solver.examples.cloudbalancing.optional.score.CloudBalancingEasyScoreCalculator</easyScoreCalculatorClass>
  </scoreDirectorFactory>

Just implement the calculateScore(Solution) method to return a HardSoftScore instance.

Example 6. CloudBalancingEasyScoreCalculator.java
public class CloudBalancingEasyScoreCalculator
    implements EasyScoreCalculator<CloudBalance, HardSoftScore> {

    /**
     * A very simple implementation. The double loop can easily be removed by using Maps as shown in
     * {@link CloudBalancingMapBasedEasyScoreCalculator#calculateScore(CloudBalance)}.
     */
    @Override
    public HardSoftScore calculateScore(CloudBalance cloudBalance) {
        int hardScore = 0;
        int softScore = 0;
        for (CloudComputer computer : cloudBalance.getComputerList()) {
            int cpuPowerUsage = 0;
            int memoryUsage = 0;
            int networkBandwidthUsage = 0;
            boolean used = false;

            // Calculate usage
            for (CloudProcess process : cloudBalance.getProcessList()) {
                if (computer.equals(process.getComputer())) {
                    cpuPowerUsage += process.getRequiredCpuPower();
                    memoryUsage += process.getRequiredMemory();
                    networkBandwidthUsage += process.getRequiredNetworkBandwidth();
                    used = true;
                }
            }

            // Hard constraints
            int cpuPowerAvailable = computer.getCpuPower() - cpuPowerUsage;
            if (cpuPowerAvailable < 0) {
                hardScore += cpuPowerAvailable;
            }
            int memoryAvailable = computer.getMemory() - memoryUsage;
            if (memoryAvailable < 0) {
                hardScore += memoryAvailable;
            }
            int networkBandwidthAvailable = computer.getNetworkBandwidth() - networkBandwidthUsage;
            if (networkBandwidthAvailable < 0) {
                hardScore += networkBandwidthAvailable;
            }

            // Soft constraints
            if (used) {
                softScore -= computer.getCost();
            }
        }
        return HardSoftScore.of(hardScore, softScore);
    }

}

Even if we optimize the code above to use Maps to iterate through the processList only once, it is still slow because it does not do incremental score calculation. To fix that, either use constraint streams or incremental Java score calculation.

Constraint streams score configuration

Constraint Streams use incremental calculation. To use it, implement the interface ConstraintProvider in Java.

  <scoreDirectorFactory>
    <constraintProviderClass>ai.timefold.solver.examples.cloudbalancing.score.CloudBalancingConstraintProvider</constraintProviderClass>
  </scoreDirectorFactory>

We want to make sure that all computers have enough CPU, RAM and network bandwidth to support all their processes, so we make these hard constraints. If those constraints are met, we want to minimize the maintenance cost, so we add that as a soft constraint.

Example 7. CloudBalancingConstraintProvider.java
public class CloudBalancingConstraintProvider implements ConstraintProvider {

    @Override
    public Constraint[] defineConstraints(ConstraintFactory constraintFactory) {
        return new Constraint[] {
                requiredCpuPowerTotal(constraintFactory),
                requiredMemoryTotal(constraintFactory),
                requiredNetworkBandwidthTotal(constraintFactory),
                computerCost(constraintFactory)
        };
    }

    Constraint requiredCpuPowerTotal(ConstraintFactory constraintFactory) {
        return constraintFactory.forEach(CloudProcess.class)
                .groupBy(CloudProcess::getComputer, sum(CloudProcess::getRequiredCpuPower))
                .filter((computer, requiredCpuPower) -> requiredCpuPower > computer.getCpuPower())
                .penalize(HardSoftScore.ONE_HARD,
                        (computer, requiredCpuPower) -> requiredCpuPower - computer.getCpuPower())
                .asConstraint("requiredCpuPowerTotal");
    }

    Constraint requiredMemoryTotal(ConstraintFactory constraintFactory) {
        return constraintFactory.forEach(CloudProcess.class)
                .groupBy(CloudProcess::getComputer, sum(CloudProcess::getRequiredMemory))
                .filter((computer, requiredMemory) -> requiredMemory > computer.getMemory())
                .penalize(HardSoftScore.ONE_HARD,
                        (computer, requiredMemory) -> requiredMemory - computer.getMemory())
                .asConstraint("requiredMemoryTotal");
    }

    Constraint requiredNetworkBandwidthTotal(ConstraintFactory constraintFactory) {
        return constraintFactory.forEach(CloudProcess.class)
                .groupBy(CloudProcess::getComputer, sum(CloudProcess::getRequiredNetworkBandwidth))
                .filter((computer, requiredNetworkBandwidth) -> requiredNetworkBandwidth > computer.getNetworkBandwidth())
                .penalize(HardSoftScore.ONE_HARD,
                        (computer, requiredNetworkBandwidth) -> requiredNetworkBandwidth - computer.getNetworkBandwidth())
                .asConstraint("requiredNetworkBandwidthTotal");
    }

    Constraint computerCost(ConstraintFactory constraintFactory) {
        return constraintFactory.forEach(CloudComputer.class)
                .ifExists(CloudProcess.class, equal(Function.identity(), CloudProcess::getComputer))
                .penalize(HardSoftScore.ONE_SOFT,
                        CloudComputer::getCost)
                .asConstraint("computerCost");
    }

}

Incremental Java score configuration

Another way to define a score function is to implement the interface IncrementalScoreCalculator in plain Java.

  <scoreDirectorFactory>
    <easyScoreCalculatorClass>ai.timefold.solver.examples.cloudbalancing.optional.score.CloudBalancingIncrementalScoreCalculator</easyScoreCalculatorClass>
  </scoreDirectorFactory>
Example 8. CloudBalancingIncrementalScoreCalculator.java
public class CloudBalancingIncrementalScoreCalculator
        implements IncrementalScoreCalculator<CloudBalance, HardSoftScore> {

    private Map<CloudComputer, Integer> cpuPowerUsageMap;
    private Map<CloudComputer, Integer> memoryUsageMap;
    private Map<CloudComputer, Integer> networkBandwidthUsageMap;
    private Map<CloudComputer, Integer> processCountMap;

    private int hardScore;
    private int softScore;

    @Override
    public void resetWorkingSolution(CloudBalance cloudBalance) {
        int computerListSize = cloudBalance.getComputerList().size();
        cpuPowerUsageMap = new HashMap<>(computerListSize);
        memoryUsageMap = new HashMap<>(computerListSize);
        networkBandwidthUsageMap = new HashMap<>(computerListSize);
        processCountMap = new HashMap<>(computerListSize);
        for (CloudComputer computer : cloudBalance.getComputerList()) {
            cpuPowerUsageMap.put(computer, 0);
            memoryUsageMap.put(computer, 0);
            networkBandwidthUsageMap.put(computer, 0);
            processCountMap.put(computer, 0);
        }
        hardScore = 0;
        softScore = 0;
        for (CloudProcess process : cloudBalance.getProcessList()) {
            insert(process);
        }
    }

    @Override
    public void beforeVariableChanged(Object entity, String variableName) {
        retract((CloudProcess) entity);
    }

    @Override
    public void afterVariableChanged(Object entity, String variableName) {
        insert((CloudProcess) entity);
    }

    @Override
    public void beforeEntityRemoved(Object entity) {
        retract((CloudProcess) entity);
    }

    ...

    private void insert(CloudProcess process) {
        CloudComputer computer = process.getComputer();
        if (computer != null) {
            int cpuPower = computer.getCpuPower();
            int oldCpuPowerUsage = cpuPowerUsageMap.get(computer);
            int oldCpuPowerAvailable = cpuPower - oldCpuPowerUsage;
            int newCpuPowerUsage = oldCpuPowerUsage + process.getRequiredCpuPower();
            int newCpuPowerAvailable = cpuPower - newCpuPowerUsage;
            hardScore += Math.min(newCpuPowerAvailable, 0) - Math.min(oldCpuPowerAvailable, 0);
            cpuPowerUsageMap.put(computer, newCpuPowerUsage);

            int memory = computer.getMemory();
            int oldMemoryUsage = memoryUsageMap.get(computer);
            int oldMemoryAvailable = memory - oldMemoryUsage;
            int newMemoryUsage = oldMemoryUsage + process.getRequiredMemory();
            int newMemoryAvailable = memory - newMemoryUsage;
            hardScore += Math.min(newMemoryAvailable, 0) - Math.min(oldMemoryAvailable, 0);
            memoryUsageMap.put(computer, newMemoryUsage);

            int networkBandwidth = computer.getNetworkBandwidth();
            int oldNetworkBandwidthUsage = networkBandwidthUsageMap.get(computer);
            int oldNetworkBandwidthAvailable = networkBandwidth - oldNetworkBandwidthUsage;
            int newNetworkBandwidthUsage = oldNetworkBandwidthUsage + process.getRequiredNetworkBandwidth();
            int newNetworkBandwidthAvailable = networkBandwidth - newNetworkBandwidthUsage;
            hardScore += Math.min(newNetworkBandwidthAvailable, 0) - Math.min(oldNetworkBandwidthAvailable, 0);
            networkBandwidthUsageMap.put(computer, newNetworkBandwidthUsage);

            int oldProcessCount = processCountMap.get(computer);
            if (oldProcessCount == 0) {
                softScore -= computer.getCost();
            }
            int newProcessCount = oldProcessCount + 1;
            processCountMap.put(computer, newProcessCount);
        }
    }

    private void retract(CloudProcess process) {
        CloudComputer computer = process.getComputer();
        if (computer != null) {
            int cpuPower = computer.getCpuPower();
            int oldCpuPowerUsage = cpuPowerUsageMap.get(computer);
            int oldCpuPowerAvailable = cpuPower - oldCpuPowerUsage;
            int newCpuPowerUsage = oldCpuPowerUsage - process.getRequiredCpuPower();
            int newCpuPowerAvailable = cpuPower - newCpuPowerUsage;
            hardScore += Math.min(newCpuPowerAvailable, 0) - Math.min(oldCpuPowerAvailable, 0);
            cpuPowerUsageMap.put(computer, newCpuPowerUsage);

            int memory = computer.getMemory();
            int oldMemoryUsage = memoryUsageMap.get(computer);
            int oldMemoryAvailable = memory - oldMemoryUsage;
            int newMemoryUsage = oldMemoryUsage - process.getRequiredMemory();
            int newMemoryAvailable = memory - newMemoryUsage;
            hardScore += Math.min(newMemoryAvailable, 0) - Math.min(oldMemoryAvailable, 0);
            memoryUsageMap.put(computer, newMemoryUsage);

            int networkBandwidth = computer.getNetworkBandwidth();
            int oldNetworkBandwidthUsage = networkBandwidthUsageMap.get(computer);
            int oldNetworkBandwidthAvailable = networkBandwidth - oldNetworkBandwidthUsage;
            int newNetworkBandwidthUsage = oldNetworkBandwidthUsage - process.getRequiredNetworkBandwidth();
            int newNetworkBandwidthAvailable = networkBandwidth - newNetworkBandwidthUsage;
            hardScore += Math.min(newNetworkBandwidthAvailable, 0) - Math.min(oldNetworkBandwidthAvailable, 0);
            networkBandwidthUsageMap.put(computer, newNetworkBandwidthUsage);

            int oldProcessCount = processCountMap.get(computer);
            int newProcessCount = oldProcessCount - 1;
            if (newProcessCount == 0) {
                softScore += computer.getCost();
            }
            processCountMap.put(computer, newProcessCount);
        }
    }

    @Override
    public HardSoftScore calculateScore() {
        return HardSoftScore.of(hardScore, softScore);
    }
}

This score calculation is the fastest we can possibly make it. It reacts to every planning variable change, making the smallest possible adjustment to the score.

Beyond this tutorial

Now that this simple example works, you can try going further. For example, you can enrich the domain model and add extra constraints such as these:

  • Each Process belongs to a Service. A computer might crash, so processes running the same service must be assigned to different computers.

  • Each Computer is located in a Building. A building might burn down, so processes of the same services should (or must) be assigned to computers in different buildings.