쿠버네티스 마스터 — Scaling·Scheduling·Probes

2026-05-03•확률과 통계 마스터 노트

쿠버네티스 마스터 노트 시리즈 7편. HPA가 CPU·메모리 기반 자동 확장을 어떻게 구현하는지, Liveness/Readiness/Startup Probe 3종의 결정적 차이와 실패 시 동작, NodeSelector·Affinity·Taint/Toleration의 Pod 배치 제어, ResourceQuota·LimitRange로 namespace 자원 제어까지 — 운영 자동화의 핵심.

이 글은 쿠버네티스 마스터 노트 시리즈의 일곱 번째 편입니다. 1~6편이 정적 정의였다면, 이번엔 운영 환경의 자동화 — 자동 확장·정밀 스케줄링·헬스체크.

HPA로 트래픽 따라 자동 확장. Probe로 자동 헬스체크. Affinity로 정밀 배치. 이 3가지가 운영 환경의 토대.

처음 Scaling·Scheduling이 어렵게 느껴지는 이유

처음 이 단원이 어렵게 느껴지는 이유는 두 가지예요. 첫째, Probe 3종이 헷갈립니다 — Liveness·Readiness·Startup. 어느 게 어디에 쓰이나? 둘째, Affinity·Taint/Toleration 두 메커니즘이 거의 같아 보입니다.

해결법은 한 가지예요. 각 메커니즘을 "한 줄 사용처". Liveness=재시작, Readiness=트래픽 라우팅, Startup=초기 보호. Affinity=Pod이 어디 갈까, Taint/Toleration=노드가 누구 받을까. 관점 차이.

자동 확장 3 계층

1. HPA (Horizontal Pod Autoscaler) — Pod 수 ↑↓
2. VPA (Vertical Pod Autoscaler)   — Pod 자원 ↑↓ (CPU·Memory)
3. Cluster Autoscaler              — 노드 수 ↑↓

대부분 = HPA + Cluster Autoscaler.

HPA — 수평 자동 확장

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

흐름:

1. Metrics Server가 Pod CPU·Memory 메트릭 수집
2. HPA가 평균 사용률 계산
3. 70% 초과 → replicas ↑
4. 70% 미만 → replicas ↓
5. minReplicas·maxReplicas 한도 안에서

여기서 정말 중요한 시험 함정 — HPA는 Metrics Server 필수. 안 깔려 있으면 동작 X. EKS·GKE 자동 / 로컬 = kubectl apply -f metrics-server.yaml.

Custom Metrics

metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 1000

Prometheus Adapter로 Custom Metrics → HPA. 비즈니스 메트릭 기반 확장.

Behavior — 안정화

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300   # 5분 안정화
      policies:
        - type: Percent
          value: 10                      # 한 번에 10% 만 줄임
          periodSeconds: 60
    scaleUp:
      policies:
        - type: Percent
          value: 100                     # 한 번에 2배까지
          periodSeconds: 30

급격한 확장·축소 방지.

VPA — 수직 자동 확장

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: Auto    # Off / Initial / Auto

자동으로 requests·limits 조정. Pod 재시작 발생 (Auto 모드).

여기서 시험 함정이 하나 있어요. VPA + HPA 동시 사용 X (CPU·Memory). 둘 다 같은 메트릭이면 충돌. VPA는 단독 또는 HPA(Custom)와.

Cluster Autoscaler — 노드 자동 확장

HPA가 Pod 늘림 → 자원 부족
Cluster Autoscaler가 새 노드 추가
새 Pod이 새 노드에 배정

부하 감소 → Pod 줄음 → 노드 idle → 노드 제거

클라우드 환경 자동 (EKS·GKE Auto-Scaling Group 연동).

Probes — 헬스체크 3종

1. Liveness Probe — "살아 있나"

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

실패 시 컨테이너 재시작.

용도 — 데드락·무한 루프·메모리 누수로 응답 안 함.

2. Readiness Probe — "트래픽 받을 준비됐나"

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

실패 시 Service Endpoints에서 제외 (트래픽 X). 컨테이너는 안 재시작.

용도 — 시작 직후 워밍업·외부 의존 일시 다운.

3. Startup Probe — "시작 됐나" (1.16+)

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

시작이 오래 걸리는 앱 (Java 서버 등)을 위해. 성공 전엔 Liveness·Readiness 비활성.

여기서 정말 중요한 시험 함정 — 3 Probe 차이:

Liveness 실패 = 재시작 (계속 죽음 = 영원 재시작)
Readiness 실패 = 트래픽 X (재시작 X)
Startup 실패 = 재시작 (시작 못 함)

Java 같이 시작 느린 앱 = Startup 사용. 안 그러면 Liveness가 시작 중인 컨테이너 죽임.

Probe 종류

# HTTP
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080

# TCP (포트 열림 확인)
livenessProbe:
  tcpSocket:
    port: 8080

# Exec (명령 실행, exit 0이면 성공)
livenessProbe:
  exec:
    command: ['cat', '/tmp/healthy']

# gRPC (1.24+)
livenessProbe:
  grpc:
    port: 9000

Pod 스케줄링 — 4 메커니즘

1. nodeSelector — 단순 매칭

spec:
  nodeSelector:
    disktype: ssd

해당 라벨 노드에만 배정. 단순.

2. NodeAffinity — 더 정밀

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:    # 필수
        nodeSelectorTerms:
          - matchExpressions:
              - key: zone
                operator: In
                values: [us-east-1a, us-east-1b]
      preferredDuringSchedulingIgnoredDuringExecution:    # 선호
        - weight: 100
          preference:
            matchExpressions:
              - key: gpu
                operator: Exists

required = 필수, preferred = 가능하면.

3. PodAffinity / AntiAffinity

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: web
          topologyKey: kubernetes.io/hostname

같은 라벨 Pod끼리 같은 노드 X (분산). HA의 핵심 패턴.

여기서 정말 중요한 시험 함정 — AntiAffinity = HA 표준. Deployment 3 replicas가 한 노드에 모이면 노드 다운 시 모두 죽음. 다른 노드에 분산 = AntiAffinity.

4. Taint / Toleration

노드가 "이런 Pod만 받아"라고 명시:

# 노드에 Taint
kubectl taint nodes node-1 dedicated=gpu:NoSchedule

# 그 Pod에 Toleration
spec:
  tolerations:
    - key: dedicated
      operator: Equal
      value: gpu
      effect: NoSchedule

Effect	의미
NoSchedule	새 Pod 배정 X (기존은 OK)
PreferNoSchedule	가능하면 X
NoExecute	기존 Pod도 추방

용도:

GPU 노드 전용
특정 워크로드 격리
Master 노드 보호 (자동 Taint)

여기서 시험 함정이 하나 있어요. Affinity는 Pod 관점·Taint는 노드 관점. 둘 다 사용 가능. 일반 = nodeSelector·Affinity, 특수 분리 = Taint/Toleration.

ResourceQuota — Namespace 자원 한도

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: dev
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
    persistentvolumeclaims: "10"

namespace 단위 자원 제한. 다중 팀 환경 필수.

LimitRange — Pod·Container 기본값·한도

apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-limit
spec:
  limits:
    - type: Container
      default:
        cpu: 500m
        memory: 512Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      max:
        cpu: 2
        memory: 4Gi
      min:
        cpu: 50m
        memory: 64Mi

자원 명시 안 한 Pod에 자동 적용 + 최대·최소 강제.

PodDisruptionBudget — 가용성 보호

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
spec:
  minAvailable: 2          # 또는 maxUnavailable: 1
  selector:
    matchLabels:
      app: nginx

노드 유지보수·자발적 종료 시 최소 N개는 유지. HA 보장.

Priority·Preemption

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false

spec:
  priorityClassName: high-priority

자원 부족 시 우선순위 낮은 Pod이 추방돼서 우선순위 높은 Pod이 배정.

시험 직전 한 번 더 — 자주 헷갈리는 함정 모음

여기까지가 7편의 핵심입니다. 시험 직전 또는 실무에서 헷갈릴 때 다시 펼쳐 볼 수 있게 압축 노트로 마무리할게요.

자동 확장 3 — HPA (Pod 수) / VPA (자원) / Cluster Autoscaler (노드)
일반 = HPA + Cluster Autoscaler
HPA = Metrics Server 필수
CPU·Memory·Custom Metrics
behavior로 stabilization·rate 조절
VPA + HPA(같은 메트릭) X — 충돌
Probes 3 — Liveness / Readiness / Startup
Liveness 실패 = 재시작
Readiness 실패 = 트래픽 X (재시작 X)
Startup 실패 = 재시작 (시작 보호)
시작 느린 앱 = Startup 필수 (안 그러면 Liveness가 죽임)
Probe 종류 — HTTP / TCP / Exec / gRPC
스케줄링 4 — nodeSelector / NodeAffinity / PodAffinity·AntiAffinity / Taint·Toleration
AntiAffinity = HA 표준 (같은 노드 분산 방지)
Affinity = required (필수) / preferred (선호)
Taint = 노드 관점 / Affinity = Pod 관점
Effect — NoSchedule / PreferNoSchedule / NoExecute (추방)
ResourceQuota = namespace 단위 자원 한도
LimitRange = Pod·Container 기본값·한도
PodDisruptionBudget = 자발적 종료 시 최소 보장
minAvailable 또는 maxUnavailable
PriorityClass = 자원 부족 시 우선순위·추방

시리즈 다른 편

공식 문서: HPA / Probes / Taints and Tolerations 에서 더 깊이.

다음 글(8편)에서는 Security — RBAC·NetworkPolicy·SecurityContext·PodSecurityStandards까지 풀어 갑니다.

※ 이 포스팅은 쿠팡 파트너스 활동의 일환으로, 이에 따른 일정액의 수수료를 제공받습니다.