Processing large datasets efficiently is a cornerstone of enterprise applications. Whether it’s payroll calculations, data migrations, or report generation, batch processing ensures scalable, transactional, and fault-tolerant operations without overwhelming system resources.
In service-oriented architectures, batch jobs often run independently, processing accumulated data in scheduled intervals rather than real-time processing every individual transaction.
In this article, we’ll explore:
- A production-ready batch processing solution using Spring Batch and JPA
- Chunk-oriented processing for memory efficiency and transactional safety
- Scheduled job execution with comprehensive error handling and monitoring
- Testing strategies for batch operations using in-memory databases
You can find the full reference implementation in the batch-processing module of my RealWorld Java Solutions repository.
I’ll walk through the architecture, essential patterns, and implementation details without recreating the entire project here.
The Use Case
Imagine you have an HR system that needs to process employee timesheets monthly to calculate payouts. The system must:
- Read approved but unprocessed timesheets from the database
- Calculate total payout (salary + overtime – deductions)
- Store payment records in a separate payments table
- Mark processed timesheets to prevent duplicate processing
This represents a common enterprise pattern where data accumulates over time and needs periodic bulk processing.
Batch Processing Goals
- Scalability: Handle thousands of records efficiently using chunk processing
- Transactional Integrity: Ensure all-or-nothing processing within chunks
- Fault Tolerance: Resume from failure points without data corruption
- Monitoring: Provide visibility into job execution and performance metrics
- Testability: Enable automated testing without external dependencies
Processing Flow
sequenceDiagram
participant Scheduler as JobScheduler (@Scheduled)
participant Job as Spring Batch Job
participant Step as Step (Chunk-Oriented)
participant Reader as RepositoryItemReader
participant Processor as TimesheetToPaymentProcessor
participant Writer as ItemWriter
participant DB as PostgreSQL
Scheduler->>Job: launch(salaryPaymentJob)
Job->>Step: execute()
loop chunk (until size or end)
Step->>Reader: read()
Reader->>DB: SELECT timesheet
DB-->>Reader: Timesheet
Reader-->>Step: Timesheet
end
loop each item
Step->>Processor: process(Timesheet)
end
Step->>Writer: write(List<Payment>)
Writer->>DB: INSERT into payments + UPDATE timesheets
Step-->>Job: status COMPLETEDWhy Spring Batch?
Spring Batch is the de facto standard for enterprise batch processing in the Java ecosystem. Key features include:
- Chunk-oriented processing: Processes data in configurable chunks for optimal memory usage
- Transaction management: Built-in support for database transactions across chunks
- Restart capabilities: Resume failed jobs from the last successful commit point
- Monitoring and metrics: Comprehensive job execution tracking and reporting
- Scalability: Support for parallel processing and partitioning
Alternatively, you could use Apache Camel or custom scheduling solutions, but Spring Batch provides enterprise-grade features out of the box with excellent Spring ecosystem integration.
System Architecture
Below is the architectural flow of the batch processing system:
flowchart TD
Scheduler[JobScheduler] -->|launches| BatchJob
BatchJob --> Reader[RepositoryItemReader]
DB[(PostgreSQL DB)] --> Reader
Reader --> Processor[ItemProcessor]
Processor --> Writer[ItemWriter]
Writer --> DBCore Implementation
The batch processing implementation centers around three key components working together in a chunk-oriented pattern.
Job Configuration
@Configuration
@EnableBatchProcessing
public class BatchJobConfig {
@Bean
public Job salaryPaymentJob(JobRepository jobRepository, Step salaryPaymentStep) {
return new JobBuilder("salaryPaymentJob", jobRepository)
.start(salaryPaymentStep)
.build();
}
@Bean
public Step salaryPaymentStep(JobRepository jobRepository,
PlatformTransactionManager transactionManager,
ItemReader<Timesheet> reader,
ItemProcessor<Timesheet, Payment> processor,
ItemWriter<Payment> writer) {
return new StepBuilder("salaryPaymentStep", jobRepository)
.<Timesheet, Payment>chunk(10, transactionManager)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
}Item Reader, Processor, and Writer
The Reader queries unprocessed timesheets using Spring Data JPA:
@Bean
@StepScope
public RepositoryItemReader<Timesheet> timesheetReader(TimesheetRepository repository) {
RepositoryItemReader<Timesheet> reader = new RepositoryItemReader<>();
reader.setRepository(repository);
reader.setMethodName("findByStatusAndApproved");
reader.setArguments(List.of(TimesheetStatus.PENDING, true));
reader.setSort(Map.of("id", Sort.Direction.ASC));
reader.setPageSize(10);
return reader;
}The Processor handles business logic for payment calculation:
@Component
public class TimesheetToPaymentProcessor implements ItemProcessor<Timesheet, Payment> {
@Override
public Payment process(Timesheet timesheet) {
BigDecimal baseSalary = timesheet.getEmployee().getSalary();
BigDecimal overtimePay = calculateOvertimePay(timesheet);
BigDecimal deductions = calculateDeductions(timesheet);
BigDecimal totalPayout = baseSalary.add(overtimePay).subtract(deductions);
return Payment.builder()
.employee(timesheet.getEmployee())
.timesheet(timesheet)
.amount(totalPayout)
.paymentDate(LocalDateTime.now())
.build();
}
}The Writer performs bulk database operations:
@Component
public class PaymentItemWriter implements ItemWriter<Payment> {
@Autowired
private PaymentRepository paymentRepository;
@Autowired
private TimesheetRepository timesheetRepository;
@Override
@Transactional
public void write(Chunk<? extends Payment> payments) {
// Save all payments
paymentRepository.saveAll(payments.getItems());
// Mark timesheets as processed
List<Timesheet> processedTimesheets = payments.getItems().stream()
.map(Payment::getTimesheet)
.peek(timesheet -> timesheet.setStatus(TimesheetStatus.PROCESSED))
.collect(Collectors.toList());
timesheetRepository.saveAll(processedTimesheets);
}
}Let’s break down what’s happening:
- Chunk Processing: Data is processed in chunks of 10 records, providing optimal balance between memory usage and transaction overhead.
- Transactional Safety: Each chunk is wrapped in a database transaction, ensuring consistency.
- Repository Integration: Spring Data JPA repositories handle database operations seamlessly.
- Business Logic Separation: The processor contains pure business logic, making it easily testable.
Scheduled Job Execution
The batch job runs automatically using Spring’s scheduling capabilities:
@Component
@Slf4j
public class JobScheduler {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job salaryPaymentJob;
@Scheduled(fixedRate = 300000) // Every 5 minutes
public void runSalaryPaymentJob() {
try {
JobParameters jobParameters = new JobParametersBuilder()
.addLong("startTime", System.currentTimeMillis())
.toJobParameters();
JobExecution jobExecution = jobLauncher.run(salaryPaymentJob, jobParameters);
log.info("Job Status: {}", jobExecution.getStatus());
} catch (Exception e) {
log.error("Failed to execute salary payment job", e);
}
}
}This approach mirrors production scenarios where batch jobs run on predetermined schedules—daily payroll processing, weekly reports, or monthly data cleanup operations.
Testing Strategy
Testing batch operations requires careful consideration of data setup and transaction boundaries:
@SpringBatchTest
@SpringBootTest
@Transactional
class SalaryPaymentJobTest {
@Autowired
private JobLauncherTestUtils jobLauncherTestUtils;
@Autowired
private TestEntityManager entityManager;
@Test
void shouldProcessTimesheetsAndCreatePayments() {
// Given: Setup test data
Employee employee = createTestEmployee();
Timesheet timesheet = createApprovedTimesheet(employee);
entityManager.persistAndFlush(employee);
entityManager.persistAndFlush(timesheet);
// When: Execute the job
JobExecution jobExecution = jobLauncherTestUtils.launchJob();
// Then: Verify results
assertThat(jobExecution.getStatus()).isEqualTo(BatchStatus.COMPLETED);
List<Payment> payments = paymentRepository.findAll();
assertThat(payments).hasSize(1);
assertThat(payments.get(0).getAmount()).isPositive();
Timesheet updatedTimesheet = timesheetRepository.findById(timesheet.getId()).get();
assertThat(updatedTimesheet.getStatus()).isEqualTo(TimesheetStatus.PROCESSED);
}
}The testing approach uses:
- H2 in-memory database with PostgreSQL compatibility mode
@SpringBatchTestfor batch-specific test utilities- Transactional rollback to ensure test isolation
Conclusion
Enterprise batch processing with Spring Batch provides a robust foundation for handling large-scale data operations. This solution demonstrates production-ready patterns: chunk-oriented processing, transactional safety, comprehensive testing, and operational monitoring.
The modular design separates concerns clearly—data access, business logic, and persistence—making the system maintainable and extensible. Whether processing payroll, generating reports, or migrating data, these patterns scale from thousands to millions of records.
Explore the complete implementation on GitHub, experiment with the Docker setup, and adapt these patterns to your own batch processing requirements.
Fully working solution on GitHub: RealWorld Java Solutions – batch-processing module
