Batch Processing with Spring Boot, Spring Batch & JPA

Processing large datasets efficiently is a cornerstone of enterprise applications. Whether it’s payroll calculations, data migrations, or report generation, batch processing ensures scalable, transactional, and fault-tolerant operations without overwhelming system resources.

In service-oriented architectures, batch jobs often run independently, processing accumulated data in scheduled intervals rather than real-time processing every individual transaction.

In this article, we’ll explore:

  • A production-ready batch processing solution using Spring Batch and JPA
  • Chunk-oriented processing for memory efficiency and transactional safety
  • Scheduled job execution with comprehensive error handling and monitoring
  • Testing strategies for batch operations using in-memory databases

You can find the full reference implementation in the batch-processing module of my RealWorld Java Solutions repository.

I’ll walk through the architecture, essential patterns, and implementation details without recreating the entire project here.

The Use Case

Imagine you have an HR system that needs to process employee timesheets monthly to calculate payouts. The system must:

  • Read approved but unprocessed timesheets from the database
  • Calculate total payout (salary + overtime – deductions)
  • Store payment records in a separate payments table
  • Mark processed timesheets to prevent duplicate processing

This represents a common enterprise pattern where data accumulates over time and needs periodic bulk processing.

Batch Processing Goals

  • Scalability: Handle thousands of records efficiently using chunk processing
  • Transactional Integrity: Ensure all-or-nothing processing within chunks
  • Fault Tolerance: Resume from failure points without data corruption
  • Monitoring: Provide visibility into job execution and performance metrics
  • Testability: Enable automated testing without external dependencies

Processing Flow

sequenceDiagram
    participant Scheduler as JobScheduler (@Scheduled)
    participant Job as Spring Batch Job
    participant Step as Step (Chunk-Oriented)
    participant Reader as RepositoryItemReader
    participant Processor as TimesheetToPaymentProcessor
    participant Writer as ItemWriter
    participant DB as PostgreSQL

    Scheduler->>Job: launch(salaryPaymentJob)
    Job->>Step: execute()

    loop chunk (until size or end)
        Step->>Reader: read()
        Reader->>DB: SELECT timesheet
        DB-->>Reader: Timesheet
        Reader-->>Step: Timesheet
    end

    loop each item
        Step->>Processor: process(Timesheet)
    end

    Step->>Writer: write(List<Payment>)
    Writer->>DB: INSERT into payments + UPDATE timesheets
    Step-->>Job: status COMPLETED

Why Spring Batch?

Spring Batch is the de facto standard for enterprise batch processing in the Java ecosystem. Key features include:

  • Chunk-oriented processing: Processes data in configurable chunks for optimal memory usage
  • Transaction management: Built-in support for database transactions across chunks
  • Restart capabilities: Resume failed jobs from the last successful commit point
  • Monitoring and metrics: Comprehensive job execution tracking and reporting
  • Scalability: Support for parallel processing and partitioning

Alternatively, you could use Apache Camel or custom scheduling solutions, but Spring Batch provides enterprise-grade features out of the box with excellent Spring ecosystem integration.

System Architecture

Below is the architectural flow of the batch processing system:

flowchart TD
    Scheduler[JobScheduler] -->|launches| BatchJob
    BatchJob --> Reader[RepositoryItemReader]
    DB[(PostgreSQL DB)] --> Reader
    Reader --> Processor[ItemProcessor]
    Processor --> Writer[ItemWriter]
    Writer --> DB

Core Implementation

The batch processing implementation centers around three key components working together in a chunk-oriented pattern.

Job Configuration

@Configuration
@EnableBatchProcessing
public class BatchJobConfig {

    @Bean
    public Job salaryPaymentJob(JobRepository jobRepository, Step salaryPaymentStep) {
        return new JobBuilder("salaryPaymentJob", jobRepository)
                .start(salaryPaymentStep)
                .build();
    }

    @Bean
    public Step salaryPaymentStep(JobRepository jobRepository, 
                                  PlatformTransactionManager transactionManager,
                                  ItemReader<Timesheet> reader,
                                  ItemProcessor<Timesheet, Payment> processor,
                                  ItemWriter<Payment> writer) {
        return new StepBuilder("salaryPaymentStep", jobRepository)
                .<Timesheet, Payment>chunk(10, transactionManager)
                .reader(reader)
                .processor(processor)
                .writer(writer)
                .build();
    }
}

Item Reader, Processor, and Writer

The Reader queries unprocessed timesheets using Spring Data JPA:

@Bean
@StepScope
public RepositoryItemReader<Timesheet> timesheetReader(TimesheetRepository repository) {
    RepositoryItemReader<Timesheet> reader = new RepositoryItemReader<>();
    reader.setRepository(repository);
    reader.setMethodName("findByStatusAndApproved");
    reader.setArguments(List.of(TimesheetStatus.PENDING, true));
    reader.setSort(Map.of("id", Sort.Direction.ASC));
    reader.setPageSize(10);
    return reader;
}

The Processor handles business logic for payment calculation:

@Component
public class TimesheetToPaymentProcessor implements ItemProcessor<Timesheet, Payment> {

    @Override
    public Payment process(Timesheet timesheet) {
        BigDecimal baseSalary = timesheet.getEmployee().getSalary();
        BigDecimal overtimePay = calculateOvertimePay(timesheet);
        BigDecimal deductions = calculateDeductions(timesheet);
        
        BigDecimal totalPayout = baseSalary.add(overtimePay).subtract(deductions);
        
        return Payment.builder()
                .employee(timesheet.getEmployee())
                .timesheet(timesheet)
                .amount(totalPayout)
                .paymentDate(LocalDateTime.now())
                .build();
    }
}

The Writer performs bulk database operations:

@Component
public class PaymentItemWriter implements ItemWriter<Payment> {

    @Autowired
    private PaymentRepository paymentRepository;
    
    @Autowired
    private TimesheetRepository timesheetRepository;

    @Override
    @Transactional
    public void write(Chunk<? extends Payment> payments) {
        // Save all payments
        paymentRepository.saveAll(payments.getItems());
        
        // Mark timesheets as processed
        List<Timesheet> processedTimesheets = payments.getItems().stream()
                .map(Payment::getTimesheet)
                .peek(timesheet -> timesheet.setStatus(TimesheetStatus.PROCESSED))
                .collect(Collectors.toList());
                
        timesheetRepository.saveAll(processedTimesheets);
    }
}

Let’s break down what’s happening:

  • Chunk Processing: Data is processed in chunks of 10 records, providing optimal balance between memory usage and transaction overhead.
  • Transactional Safety: Each chunk is wrapped in a database transaction, ensuring consistency.
  • Repository Integration: Spring Data JPA repositories handle database operations seamlessly.
  • Business Logic Separation: The processor contains pure business logic, making it easily testable.

Scheduled Job Execution

The batch job runs automatically using Spring’s scheduling capabilities:

@Component
@Slf4j
public class JobScheduler {

    @Autowired
    private JobLauncher jobLauncher;

    @Autowired
    private Job salaryPaymentJob;

    @Scheduled(fixedRate = 300000) // Every 5 minutes
    public void runSalaryPaymentJob() {
        try {
            JobParameters jobParameters = new JobParametersBuilder()
                    .addLong("startTime", System.currentTimeMillis())
                    .toJobParameters();

            JobExecution jobExecution = jobLauncher.run(salaryPaymentJob, jobParameters);
            log.info("Job Status: {}", jobExecution.getStatus());
            
        } catch (Exception e) {
            log.error("Failed to execute salary payment job", e);
        }
    }
}

This approach mirrors production scenarios where batch jobs run on predetermined schedules—daily payroll processing, weekly reports, or monthly data cleanup operations.

Testing Strategy

Testing batch operations requires careful consideration of data setup and transaction boundaries:

@SpringBatchTest
@SpringBootTest
@Transactional
class SalaryPaymentJobTest {

    @Autowired
    private JobLauncherTestUtils jobLauncherTestUtils;

    @Autowired
    private TestEntityManager entityManager;

    @Test
    void shouldProcessTimesheetsAndCreatePayments() {
        // Given: Setup test data
        Employee employee = createTestEmployee();
        Timesheet timesheet = createApprovedTimesheet(employee);
        entityManager.persistAndFlush(employee);
        entityManager.persistAndFlush(timesheet);

        // When: Execute the job
        JobExecution jobExecution = jobLauncherTestUtils.launchJob();

        // Then: Verify results
        assertThat(jobExecution.getStatus()).isEqualTo(BatchStatus.COMPLETED);
        
        List<Payment> payments = paymentRepository.findAll();
        assertThat(payments).hasSize(1);
        assertThat(payments.get(0).getAmount()).isPositive();
        
        Timesheet updatedTimesheet = timesheetRepository.findById(timesheet.getId()).get();
        assertThat(updatedTimesheet.getStatus()).isEqualTo(TimesheetStatus.PROCESSED);
    }
}

The testing approach uses:

  • H2 in-memory database with PostgreSQL compatibility mode
  • @SpringBatchTest for batch-specific test utilities
  • Transactional rollback to ensure test isolation

Conclusion

Enterprise batch processing with Spring Batch provides a robust foundation for handling large-scale data operations. This solution demonstrates production-ready patterns: chunk-oriented processing, transactional safety, comprehensive testing, and operational monitoring.

The modular design separates concerns clearly—data access, business logic, and persistence—making the system maintainable and extensible. Whether processing payroll, generating reports, or migrating data, these patterns scale from thousands to millions of records.

Explore the complete implementation on GitHub, experiment with the Docker setup, and adapt these patterns to your own batch processing requirements.

Fully working solution on GitHub: RealWorld Java Solutions – batch-processing module

Leave a Comment

Your email address will not be published. Required fields are marked *


Scroll to Top