Start & Scale UI

The Start & Scale UI is a lightweight web dashboard that lets coaches and participants start, stop, and scale a JupyterHub AKS cluster — without requiring access to the Azure portal or any cloud credentials.


Two surfaces

Start surface (/)

👩‍🎓 Participants

Shown automatically whenever JupyterHub is down (the App Gateway routes traffic here when the cluster is stopped).

Role Available actions
Participant Start button
Coach Start, Stop, and a link to the Scale UI
  • The current cluster status (Running / Stopped / Starting / Stopping) is shown at the top.
  • While the cluster is transitioning, the page auto-refreshes every 15 seconds.

To start the cluster:

  1. Click the LTI link from OLAT. The Start surface appears if the cluster is stopped.
  2. Press Start. Starting typically takes 3–5 minutes.
  3. Wait, then reload the page. Intermittent 502 responses during startup are normal.

Scale surface (/scale)

👩‍🏫 Course owners

Accessible to TAs at any time — the App Gateway always routes /scale to the dashboard function, even when JupyterHub is running. Access it via the link on the Start surface, or directly at https://<course-url>/scale.

Node pool table

Shows the primary node pools (default, gpuded0, gpushr0). For each pool the table displays:

Column Meaning
Name Pool identifier
Current nodes Live node count right now
Min Autoscaler lower bound (0 = pool can scale to zero)
Max Autoscaler upper bound

TAs can edit Min / Max for any pool and click Save to apply the new autoscaler bounds via the Azure Management API.

GPU fallback pools (gpuded1, gpuded2, …, gpushr1, …) are not listed here. They are managed automatically by the cluster autoscaler and are only used when the primary pool’s VM SKU is unavailable. See GPU size failover.

Scheduled Pre-Scaling

A schedule defines a time at which the dashboard function will automatically warm up a node pool so that nodes are ready wparticipantdents arrive.

Field Description
Name Human-readable label
Node pool Which pool to pre-scale (default, gpuded0, gpushr0)
Replica count How many placeholder pods to deploy (≈ expected concurrent users)
Cron expression When to trigger (Europe/Zurich timezone)
Type scale_out — warm up nodes; scale_in — remove placeholders
Paired schedule For scale_in entries: the ID of the matching scale_out schedule
Enabled Toggle without deleting

How pre-scaling works:

  1. At the scheduled time the function calculates how many VMSS nodes are needed for the requested replica count.
  2. It scales the backing VMSS directly (bypassing the autoscaler) so nodes are provisioned immediately.
  3. It deploys a Kubernetes placeholder StatefulSet that occupies the provisioned nodes. This prevents the autoscaler from scaling them back down before participants arrive.
  4. When the paired scale_in schedule fires, the placeholder StatefulSet is deleted and the autoscaler resumes normal management.

Schedules are stored as JSON in Azure Blob Storage (same storage account as the function). They survive function restarts and redeployments.


UZH ZI Cloud —

This site uses Just the Docs, a documentation theme for Jekyll.