Spring Batch 입문 37편 — Scaling · Parallel 6가지 전략 종합

2026-05-17•Spring Batch 입문에서 운영까지

Spring Batch 입문 37편. Batch 의 6가지 확장 전략 — Multi-threaded Step · Parallel Steps · Local Chunking (v6 신규) · Remote Chunking · Partitioning · Remote Step. 각 전략의 single-process vs multi-process 분류, throughput 특성, thread-safety 요구, gridSize · TaskExecutor · PartitionHandler · ChunkTaskExecutorItemWriter 같은 핵심 컴포넌트와 선택 가이드까지 정리한 학습 노트. Part 8 1편 종합.

이 글은 Spring Batch 입문에서 운영까지 시리즈 48편 중 37편이에요. 36편까지 Step 안의 모든 컴포넌트를 봤다면, 이번 37편은 Step 을 어떻게 확장(scale, 처리량을 키우는 것)할 것인가, 그 6가지 전략을 한 글에 종합해 봅니다. Part 8.

첫 번째 질문 — 정말 scaling 이 필요한가

Batch 문제의 상당수 가 single-threaded · single-process 로 해결됨. 복잡한 구현 전에 측정 먼저. — 공식 reference

single-threaded(쓰레드 1개)·single-process(JVM 1대)만으로도 표준 하드웨어에서 수백 MB 파일을 분 단위로 처리할 수 있어요. 그러니 측정부터 하고 진짜 병목을 확인한 다음에 적절한 전략을 고르는 순서로 갑니다.

Scaling 의 대가는 복잡도와 디버깅, 운영 비용이에요. throughput(단위 시간당 처리량)이 부족할 때만 손대는 게 맞습니다.

6가지 전략 — 분류 매트릭스

전략	분류	Spring Batch 표준 지원
Multi-threaded Step	single-process	✓
Parallel Steps	single-process	✓
Local Chunking (v6 신규)	single-process	✓
Remote Chunking	multi-process	Spring Batch Integration
Partitioning	single 또는 multi	✓
Remote Step (v6 신규)	multi-process	Spring Integration

여기서 single-process 는 JVM 1대 안에서 끝나는 방식이고, multi-process 는 JVM 여러 대가 네트워크로 통신하며 일을 나누는 방식이에요.

1. Multi-threaded Step — 가장 단순

설정

@Bean
public TaskExecutor taskExecutor() {
    return new SimpleAsyncTaskExecutor("spring_batch");
}

@Bean
public Step sampleStep(JobRepository repo, PlatformTransactionManager tx,
                       TaskExecutor executor) {
    return new StepBuilder("sampleStep", repo)
        .<String, String>chunk(10, tx)
        .reader(reader())
        .processor(processor())
        .writer(writer())
        .taskExecutor(executor)              // ★ 한 줄 추가
        .build();
}

taskExecutor 한 줄을 더하면 끝이에요. 가장 단순한 병렬화 방식이라고 보면 됩니다.

작동 방식

Step Thread Pool (예: 8 thread)
  ├─ Thread 1 ─ chunk 처리 → read·process·write
  ├─ Thread 2 ─ chunk 처리
  ├─ Thread 3 ─ chunk 처리
  ...

각 thread 가 chunk(여러 item 을 묶은 처리 단위) 1개씩 독립적으로 처리해요. 병렬의 단위가 chunk 라는 점이 핵심.

함정 — Thread-safety 요구

The ItemProcessor must be thread-safe. ... reading and writing of items is still done in serial by the main thread executing the step, so the ItemReader and ItemWriter do not have to be thread-safe. — 공식 reference

잠깐, 이건 정확히 알아두고 가야 해요.

ItemProcessor 는 여러 thread 가 동시에 호출하니까 thread-safe(여러 쓰레드에서 동시 호출해도 안전)가 필수입니다. 반면 ItemReader 와 ItemWriter 는 main thread 가 serial 로 호출하니 thread-safety 가 필요 없다, 라는 게 공식 문서의 옛 설명이었어요.

그런데 공식 문서가 그 사이 변경됐을 가능성이 있고 실제 구현은 더 복잡해요.

FlatFileItemReader · JdbcCursorItemReader · StaxEventItemReader = NOT thread-safe
multi-threaded step 에서 사용 시 → SynchronizedItemStreamReader (25편) wrap 권장
JdbcPagingItemReader · KafkaItemReader = thread-safe → wrap 불필요

안전하게 가려면 Reader 도 thread-safe 한지 따로 검증하거나 Synchronized*Reader 로 감싸 두는 편이 낫습니다.

함정 — Connection Pool 크기

DataSource pool 크기는 thread 수 이상으로 잡아야 해요. 부족하면 thread 가 connection 을 기다리느라 성능이 폭락합니다.

함정 — Throughput 한계

read 와 write 가 main thread 에서 serial 로 도는 옛 구조거나 I/O bound(디스크·네트워크 I/O 가 병목) 라면 병렬도가 N 이어도 throughput 은 N 배까지 못 갑니다.

I/O bound batch 에는 local chunking 이나 partitioning 쪽이 더 낫습니다.

2. Parallel Steps — Step 단위 병렬

@Bean
public Flow flow1() {
    return new FlowBuilder<SimpleFlow>("flow1")
        .start(step1())
        .next(step2())
        .build();
}

@Bean
public Flow flow2() {
    return new FlowBuilder<SimpleFlow>("flow2")
        .start(step3())
        .build();
}

@Bean
public Job job(JobRepository repo) {
    return new JobBuilder("job", repo)
        .start(splitFlow())
        .next(step4())
        .build().build();
}

@Bean
public Flow splitFlow() {
    return new FlowBuilder<SimpleFlow>("splitFlow")
        .split(taskExecutor())
        .add(flow1(), flow2())
        .build();
}

20편에서 본 Split Flow 의 정확한 적용 사례예요. 서로 다른 책임을 가진 Step 들을 동시에 실행하는 그림입니다.

사용 case

독립 데이터 처리 (예: 주문 batch + 고객 batch 동시)
결과 합치기는 split 후 Step

한계

같은 데이터의 chunk 병렬 은 X
flow 간 데이터 공유 X (각 transaction 독립)

즉 진정한 throughput 확장이라기보다는 서로 다른 작업을 동시에 진행하는 용도라고 보는 게 정확합니다.

3. Local Chunking (Spring Batch 6 신규)

동기

Multi-threaded Step 의 single-process 한계와 Remote Chunking 의 multi-process 복잡도, 그 사이를 메우려는 전략이에요.

같은 JVM 안에서 chunk 를 worker thread 들로 분배합니다. local + multi-thread + chunk parallel 의 결합이라고 보면 됩니다.

설정

@Bean
public ChunkTaskExecutorItemWriter<Vet> itemWriter(ChunkProcessor<Vet> chunkProcessor) {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(4);
    executor.setThreadNamePrefix("worker-thread-");
    executor.setWaitForTasksToCompleteOnShutdown(true);
    executor.afterPropertiesSet();
    return new ChunkTaskExecutorItemWriter<>(chunkProcessor, executor);
}

@Bean
public ChunkProcessor<Vet> chunkProcessor(DataSource ds, TransactionTemplate tx) {
    JdbcBatchItemWriter<Vet> writer = new JdbcBatchItemWriterBuilder<Vet>()
        .dataSource(ds)
        .sql("INSERT INTO vets (firstname, lastname) VALUES (?, ?)")
        .itemPreparedStatementSetter((item, ps) -> {
            ps.setString(1, item.firstname());
            ps.setString(2, item.lastname());
        })
        .build();

    return (chunk, contribution) -> tx.executeWithoutResult(status -> {
        try {
            writer.write(chunk);
            contribution.incrementWriteCount(chunk.size());
            contribution.setExitStatus(ExitStatus.COMPLETED);
        } catch (Exception e) {
            status.setRollbackOnly();
            contribution.incrementWriteSkipCount(chunk.size());
            contribution.setExitStatus(ExitStatus.FAILED.addExitDescription(e));
        }
    });
}

ChunkTaskExecutorItemWriter 는 chunk 요청을 local TaskExecutor 의 worker thread 들에게 submit 합니다.

작동 방식

Main thread (read)
  ↓
Chunk
  ↓
ChunkTaskExecutorItemWriter.write(chunk)
  ↓
  ├─ Worker thread 1 ← Chunk A 처리
  ├─ Worker thread 2 ← Chunk B 처리
  ├─ Worker thread 3 ← Chunk C 처리
  ├─ Worker thread 4 ← Chunk D 처리
  ↓
모든 worker 완료 후 next chunk

Multi-threaded Step 과 다른 점은 write 단계를 명시적으로 worker 들에게 분배한다는 거예요. CPU bound(연산이 병목인) write 에 유리한 구조입니다.

4. Remote Chunking — Multi-process

구조

[Manager Process]
  read → chunk → message queue → 분배
                        │
                        ↓
[Worker Process 1]  [Worker Process 2]  [Worker Process 3]
   ↓                     ↓                     ↓
process · write       process · write       process · write
   ↓                     ↓                     ↓
reply queue ←───────────────────────────────
   ↓
[Manager] 결과 aggregation

Manager 가 read 와 chunk 생성, 송신을 맡고, Workers 가 process 와 write, ack 를 담당하는 구조예요.

사용 조건

This pattern works best if the manager is not a bottleneck, so the processing must be more expensive than the reading of items. — 공식 reference

Read 는 빠른데 process 와 write 가 무거운 경우, 이때 remote chunking 이 가장 잘 맞습니다.

CPU 집약 process 가 있거나 무거운 외부 호출, 복잡한 변환이 끼어 있을 때, manager 가 chunk 분배만 하면 worker 들이 동시에 처리하는 그림이 됩니다.

한계

메시지 큐 인프라 필수 (JMS · ActiveMQ · RabbitMQ 등)
메시지 guaranteed delivery + single consumer
운영 복잡도 ↑

자세한 내용

Spring Batch Integration 의 Remote Chunking 은 43편(launching jobs through messages)과 44편(async externalization)에서 더 다룰게요.

5. Partitioning — 가장 강력한 전략

구조

[Manager Step]
  ↓
  Partitioner 가 입력을 N 등분 → ExecutionContext 들 생성
  ↓
[PartitionStep]
  ├─ Worker Step 1 (ExecutionContext 1)
  ├─ Worker Step 2 (ExecutionContext 2)
  ├─ Worker Step 3 (ExecutionContext 3)
  ...
  Worker Step N (ExecutionContext N)
  ↓
모든 worker 완료 후 다음 Step

각 worker 가 독립 Step instance 라서 자체 read · process · write · transaction 을 다 가집니다. 진정한 독립이라고 부를 만한 구조예요.

핵심 SPI 2가지

SPI(Service Provider Interface, 구현을 갈아 끼우는 확장 지점) 가 두 개 있어요.

Partitioner — 입력을 N 등분 → Map<String, ExecutionContext> 반환
PartitionHandler — worker 들에 execution 요청 송신 + 결과 수집

Partitioner 인터페이스

public interface Partitioner {
    Map<String, ExecutionContext> partition(int gridSize);
}

Return 의 의미는 다음과 같아요. key 는 각 partition StepExecution 이름(예: step1:partition0)이고, value 는 그 partition 의 입력 정보(ExecutionContext)입니다.

예제 — 파일 N등분

@Bean
public Partitioner filePartitioner() {
    return gridSize -> {
        Map<String, ExecutionContext> result = new HashMap<>();
        for (int i = 0; i < gridSize; i++) {
            ExecutionContext ctx = new ExecutionContext();
            ctx.putString("fileName", "/data/files-" + i + ".csv");
            result.put("partition-" + i, ctx);
        }
        return result;
    };
}

gridSize=10 이면 10 partition, 10 worker step 이 동시에 실행됩니다.

Manager Step 구성

@Bean
public Step partitionedStep(JobRepository repo, Partitioner partitioner,
                            Step workerStep, TaskExecutor executor) {
    return new StepBuilder("partitionedStep.manager", repo)
        .<String, String>partitioner("workerStep", partitioner)
        .step(workerStep)
        .gridSize(10)
        .taskExecutor(executor)
        .build();
}

Worker Step + Late Binding

@Bean
@StepScope
public FlatFileItemReader<Customer> partitionReader(
        @Value("#{stepExecutionContext['fileName']}") Resource resource) {
    return new FlatFileItemReaderBuilder<Customer>()
        .name("partitionReader")
        .resource(resource)
        // ...
        .build();
}

@Bean
public Step workerStep(JobRepository repo, PlatformTransactionManager tx,
                       FlatFileItemReader<Customer> reader,
                       JdbcBatchItemWriter<Customer> writer) {
    return new StepBuilder("workerStep", repo)
        .<Customer, Customer>chunk(100, tx)
        .reader(reader)
        .writer(writer)
        .build();
}

@StepScope 와 #{stepExecutionContext['fileName']} 의 조합으로 각 partition 이 자기 파일을 받게 돼요. 21편 Late Binding 의 정확한 예시입니다.

PartitionHandler 종류

가장 흔하게 쓰는 게 TaskExecutorPartitionHandler 예요. local thread 들로 partition 을 실행합니다.

@Bean
public PartitionHandler partitionHandler(TaskExecutor executor, Step workerStep) {
    TaskExecutorPartitionHandler handler = new TaskExecutorPartitionHandler();
    handler.setTaskExecutor(executor);
    handler.setStep(workerStep);
    handler.setGridSize(10);
    return handler;
}

이 외에 MessageChannelPartitionHandler 는 Spring Integration 메시지 채널로 remote worker 에 보내는 방식이고, 사내 grid framework 와 통합하는 custom handler 를 직접 만들 수도 있습니다.

`gridSize` 의 의미

Similar to the multi-threaded step's throttleLimit method, the gridSize method prevents the task executor from being saturated. — 공식 reference

gridSize 는 동시 실행 partition 수의 한도예요. partitioner 가 반환한 partition 수가 gridSize 이상이면 gridSize 만큼만 동시 실행됩니다.

표준 Partitioner 들

SimplePartitioner — gridSize 만큼 빈 ExecutionContext 생성 (기본 분배 X)
MultiResourcePartitioner — 여러 resource 를 각 partition 에 1개씩 (33편 multi-file)
Custom — DB range·timestamp·도메인 키 분할

Partitioning 의 장점

각 worker 가 독립 Step 이라 완전히 격리됩니다. Multi-threaded Step 에서 만났던 thread-safety 함정도 없어요. partition 마다 자기 Reader/Writer instance 를 따로 가지기 때문이죠. 재시작 시에도 실패한 partition 만 다시 돌리면 됩니다. restart 가 partition 단위로 잡힙니다.

Partitioning 단점

대신 분할 가능한 입력만 가능해요(파일별, DB range, key 기반 등). 균등 분할이 어려운 경우엔 skew(일부 파티션에 데이터 쏠림) 가 나서 일부 partition 만 오래 걸립니다. 그리고 입력 분할 로직은 직접 작성해야 합니다.

6. Remote Step (Spring Batch 6 신규)

@Bean
public Step step(MessagingTemplate messagingTemplate, JobRepository repo) {
    return new RemoteStep("step", "workerStep", repo, messagingTemplate);
}

RemoteStep 은 Spring Integration 메시지 채널을 통해 worker 에게 Step 실행 요청을 보내요.

Worker 측

@Bean
public Step workerStep(JobRepository repo, JdbcTransactionManager tx) {
    return new StepBuilder("workerStep", repo)
        // 처리 로직
        .build();
}

@Bean
public StepExecutionRequestHandler requestHandler(JobRepository repo, StepLocator locator) {
    StepExecutionRequestHandler handler = new StepExecutionRequestHandler();
    handler.setJobRepository(repo);
    handler.setStepLocator(locator);
    return handler;
}

@Bean
public IntegrationFlow inboundFlow(ConnectionFactory cf,
                                    StepExecutionRequestHandler handler) {
    return IntegrationFlow
        .from(Jms.messageDrivenChannelAdapter(cf).destination("requests"))
        .handle(handler, "handle")
        .get();
}

전체 Step 을 다른 노드에서 실행하는 그림입니다. 분산 batch 클러스터를 구성할 때 쓰는 방식이에요.

선택 가이드 — 결정 트리

throughput 부족?
├─ NO  → single-thread (default)
└─ YES
   ↓
같은 데이터의 chunk 병렬?
├─ NO (서로 다른 작업)  → Parallel Steps
└─ YES
   ↓
process · write 가 CPU bound?
├─ YES (단일 JVM 충분)  → Multi-threaded Step 또는 Local Chunking (v6)
└─ NO 또는 단일 JVM 부족
   ↓
입력을 명확히 N 등분 가능?
├─ YES  → Partitioning (가장 권장)
└─ NO
   ↓
read 가 cheap, process 가 expensive?
├─ YES  → Remote Chunking
└─ NO  → Remote Step (v6)

대부분의 운영에서는 Partitioning 을 권장합니다. 격리성과 재시작 안전성, 확장성 모두 균형이 좋아요.

4 전략 비교 — 핵심 한 표

항목	Multi-threaded	Parallel Steps	Partitioning	Remote Chunking
분류	single-process	single-process	single 또는 multi	multi-process
병렬 단위	chunk	Step	Step (worker instance)	chunk
Thread-safety 요구	Processor + Reader/Writer	X (Step 독립)	X (Step 독립)	Worker 만
입력 분할	X (Reader serial)	X (Step 독립)	✓ Partitioner	✓ message
재시작 단위	chunk	Step	Step (partition)	chunk
인프라 요구	TaskExecutor	TaskExecutor	TaskExecutor (또는 메시지)	메시지 큐
운영 복잡도	낮음	낮음	중	높음
대표 사용	CPU bound process	독립 작업 동시	입력 분할 가능 + 대량	Read cheap + Process heavy

자주 만나는 사고

사고 1: Multi-threaded Step 의 Reader thread-safety

원인 — FlatFileItemReader 등 NOT thread-safe Reader 사용.

해결 — SynchronizedItemStreamReader wrap 또는 partitioning 으로 전환.

사고 2: DB Connection Pool 부족

원인 — thread 16 + pool 8 → wait/timeout.

해결 — Pool ≥ thread 수.

사고 3: ChunkListener 가 호출 안 됨

원인 — Multi-threaded Step 에서 ChunkListener 호환성 미보장 (18편).

해결 — StepExecutionListener 만 사용 또는 partitioning.

사고 4: Partitioning 의 skew

원인 — 일부 partition 이 데이터 90% 보유 → 다른 partition 빨리 끝남.

해결 — 균등 분할 알고리즘 (hash · range · 동적).

사고 5: gridSize 와 partitioner 의 partition 수 불일치

원인 — gridSize=10 인데 partitioner 가 100 partition 반환.

해결 — partitioner 가 gridSize 참고 또는 gridSize ≥ partition 수.

사고 6: Remote Chunking 의 메시지 손실

원인 — 메시지 큐가 guaranteed delivery 안 됨.

해결 — durable queue (JMS persistent message · Kafka 등) + idempotent worker.

사고 7: Parallel Steps 의 데이터 race

원인 — 두 Step 이 같은 테이블 동시 write → deadlock 또는 inconsistency.

해결 — Step 간 데이터 분리 또는 순차 실행.

사고 8: Partitioning 의 worker Step ExecutionContext 불러올 때 key 없음

원인 — @Value("#{stepExecutionContext['fileName']}") 인데 partitioner 가 "file" 으로 put.

해결 — key 이름 정확히 일치.

운영 권장 패턴

Pattern 1: Multi-threaded Step (가벼운 CPU bound)

@Bean
public Step parallelStep(JobRepository repo, PlatformTransactionManager tx) {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(8);
    executor.setMaxPoolSize(8);
    executor.initialize();

    return new StepBuilder("parallelStep", repo)
        .<Customer, Customer>chunk(100, tx)
        .reader(syncReader())              // SynchronizedItemStreamReader
        .processor(threadSafeProcessor())  // immutable transform
        .writer(jdbcBatchWriter())          // thread-safe
        .taskExecutor(executor)
        .build();
}

Pattern 2: Partitioning + Late Binding (대규모 표준)

@Bean
public Step managerStep(JobRepository repo, Partitioner partitioner,
                        Step workerStep, TaskExecutor executor) {
    return new StepBuilder("managerStep", repo)
        .partitioner("workerStep", partitioner)
        .step(workerStep)
        .gridSize(8)
        .taskExecutor(executor)
        .build();
}

@Bean
public Partitioner rangePartitioner(JdbcTemplate jdbc) {
    return gridSize -> {
        long minId = jdbc.queryForObject("SELECT MIN(id) FROM customer", Long.class);
        long maxId = jdbc.queryForObject("SELECT MAX(id) FROM customer", Long.class);
        long range = (maxId - minId + 1) / gridSize;

        Map<String, ExecutionContext> result = new HashMap<>();
        for (int i = 0; i < gridSize; i++) {
            ExecutionContext ctx = new ExecutionContext();
            ctx.putLong("startId", minId + i * range);
            ctx.putLong("endId", i == gridSize - 1 ? maxId : minId + (i + 1) * range - 1);
            result.put("partition-" + i, ctx);
        }
        return result;
    };
}

@Bean
@StepScope
public JdbcCursorItemReader<Customer> workerReader(
        DataSource ds,
        @Value("#{stepExecutionContext['startId']}") Long startId,
        @Value("#{stepExecutionContext['endId']}") Long endId) {
    return new JdbcCursorItemReaderBuilder<Customer>()
        .name("workerReader")
        .dataSource(ds)
        .sql("SELECT * FROM customer WHERE id BETWEEN ? AND ?")
        .queryArguments(startId, endId)
        .rowMapper(BeanPropertyRowMapper.newInstance(Customer.class))
        .build();
}

ID range 기반 partition 방식이라 각 worker 가 독립된 ID 범위를 받습니다.

Pattern 3: MultiResource Partitioning (파일별)

@Bean
public Partitioner fileePartitioner(@Value("file:/data/*.csv") Resource[] resources) {
    MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
    partitioner.setResources(resources);
    return partitioner;
}

33편 Multi-file Input 의 진정한 parallel 버전이라고 보면 됩니다.

Pattern 4: Parallel + Partitioning 결합

@Bean
public Job complexJob(JobRepository repo, Step prepareStep,
                      Step partitionedProcessStep, Step reportStep,
                      Flow parallelStatsFlow) {
    return new JobBuilder("complexJob", repo)
        .start(prepareStep)
        .split(executor()).add(parallelStatsFlow, partitionedFlow(partitionedProcessStep))
        .next(reportStep)
        .build().build();
}

준비 단계를 거친 뒤 병렬 분석과 병렬 처리를 동시에 돌리고 리포트로 마무리하는 흐름. 복합 워크플로의 전형입니다.

시험 직전 한 번 더 — Scaling · Parallel 함정 압축 노트

scaling 첫 질문 = 정말 필요한가 (측정 우선)
6 전략 = Multi-threaded · Parallel Steps · Local Chunking (v6) · Remote Chunking · Partitioning · Remote Step (v6)
분류 = single-process / multi-process
Multi-threaded Step = .taskExecutor(executor) 한 줄
chunk 단위 병렬, main thread read · write (옛 구조), worker thread process
ItemProcessor thread-safe 필수
Reader 대부분 NOT thread-safe → SynchronizedItemStreamReader wrap (25편)
DB Connection Pool ≥ thread 수
함정 — ChunkListener 호환성 미보장
Parallel Steps = .split(executor).add(flow1, flow2) (20편 Split Flow)
Step 단위 병렬, 서로 다른 작업 동시
같은 데이터 chunk 병렬 X
Local Chunking (Spring Batch 6 신규) = ChunkTaskExecutorItemWriter + worker thread 분배
Multi-threaded + chunk 단위 명시적 worker 분배
CPU bound write 에 유리
Remote Chunking = manager(read) + workers(process·write), 메시지 큐 통신
Read 가 cheap + Process 가 heavy 인 환경
메시지 큐 guaranteed delivery + single consumer 필수
Partitioning = manager step + N workers (각 worker = 독립 Step)
SPI 2개 = Partitioner (입력 분할) + PartitionHandler (실행)
Partitioner.partition(gridSize) → Map<String, ExecutionContext>
key = partition name, value = 입력 정보
PartitionHandler 종류 = TaskExecutorPartitionHandler (local) · MessageChannelPartitionHandler (remote) · custom
gridSize = 동시 실행 partition 수 한도
표준 Partitioner = SimplePartitioner · MultiResourcePartitioner · custom (range·hash·timestamp)
Worker Step = @StepScope + #{stepExecutionContext['key']} (21편 Late Binding)
각 partition = 독립 Step instance → thread-safety 함정 없음
재시작 = partition 단위
단점 — skew (불균등 분할) · 분할 가능 입력 한정
Remote Step (Spring Batch 6 신규) = RemoteStep + Spring Integration messaging
전체 Step 을 다른 노드에서 실행
분산 batch 클러스터
결정 트리 — single → Parallel Steps → Multi-threaded/Local Chunking → Partitioning → Remote Chunking → Remote Step
대부분 운영 = Partitioning 권장 (격리·재시작·확장)
4전략 비교표 — 분류·병렬단위·thread-safety·재시작·인프라·복잡도·사용 case
함정 — Reader thread-safety · Pool 부족 · ChunkListener · skew · gridSize 불일치 · 메시지 손실 · 데이터 race · ExecutionContext key 오타
패턴 — Multi-threaded (sync reader) · Partitioning + range · MultiResource Partitioning · Parallel + Partitioning 결합

공식 문서: Scaling and Parallel Processing 에서 원문을 확인할 수 있어요.

시리즈 다른 편 (앞뒤 글 모음)

이전 글:

다음 글:

※ 이 포스팅은 쿠팡 파트너스 활동의 일환으로, 이에 따른 일정액의 수수료를 제공받습니다.

첫 번째 질문 — 정말 scaling 이 필요한가

6가지 전략 — 분류 매트릭스

1. Multi-threaded Step — 가장 단순

설정

작동 방식

함정 — Thread-safety 요구

함정 — Connection Pool 크기

함정 — Throughput 한계

2. Parallel Steps — Step 단위 병렬

사용 case

한계

3. Local Chunking (Spring Batch 6 신규)

동기

설정

작동 방식

4. Remote Chunking — Multi-process

구조

사용 조건

한계

자세한 내용

5. Partitioning — 가장 강력한 전략

구조

핵심 SPI 2가지

Partitioner 인터페이스

예제 — 파일 N등분

Manager Step 구성

Worker Step + Late Binding

PartitionHandler 종류

gridSize 의 의미

표준 Partitioner 들

Partitioning 의 장점

Partitioning 단점

6. Remote Step (Spring Batch 6 신규)

Worker 측

선택 가이드 — 결정 트리

4 전략 비교 — 핵심 한 표

자주 만나는 사고

사고 1: Multi-threaded Step 의 Reader thread-safety

사고 2: DB Connection Pool 부족

사고 3: ChunkListener 가 호출 안 됨

사고 4: Partitioning 의 skew

사고 5: gridSize 와 partitioner 의 partition 수 불일치

사고 6: Remote Chunking 의 메시지 손실

사고 7: Parallel Steps 의 데이터 race

사고 8: Partitioning 의 worker Step ExecutionContext 불러올 때 key 없음

운영 권장 패턴

Pattern 1: Multi-threaded Step (가벼운 CPU bound)

Pattern 2: Partitioning + Late Binding (대규모 표준)

Pattern 3: MultiResource Partitioning (파일별)

Pattern 4: Parallel + Partitioning 결합

시험 직전 한 번 더 — Scaling · Parallel 함정 압축 노트

시리즈 다른 편 (앞뒤 글 모음)

답글 남기기 응답 취소

`gridSize` 의 의미