Spring Batch – Data Transformation with ItemProcessors
In Spring Batch, processors play an important role in the processing phase of a batch job. Simply put, the processor in Spring Batch is like an intermediary that receives an item (data) from a reader, does some processing on it, and then sends it to the writer. The Processor in Spring Batch is represented by the ItemProcessor interface. This interface has a form:
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
where,
- “I”: It is the type of input item read by the reader.
- “O”: It is the type of output item that will be passed to the writer.
- When configuring a Spring Batch job, you can define a processor by implementing the ItemProcessor interface. For example:
public class MyItemProcessor implements ItemProcessor<InputType, OutputType> {
@Override
public OutputType process(InputType item) throws Exception {
// Process the input item and return the processed item
// This method is where the business logic for processing takes place
// You can transform, filter, or modify the input item here
return processedItem;
}
}
In the Spring Batch job configuration, you can then wire this processor into your step:
Java
@Bean public Step myStep(ItemReader<InputType> reader, ItemProcessor<InputType, OutputType> processor, ItemWriter<OutputType> writer) { return stepBuilderFactory.get( "myStep" ) .<InputType, OutputType>chunk( 10 ) .reader(reader) .processor(processor) .writer(writer) .build(); } |
Where, processor is the instance of your custom ItemProcessor implementation.
Example:
Let’s simplify the concept of processors in Spring Batch using a real-life analogy with the example of a content processing system for a website like w3wiki.Imagine w3wiki needs to update and process the content on its website regularly. They have a massive database of programming tutorials written in different formats, and they want to standardize the content before publishing it on the website. This is where the content processing system comes into play.
- Reader (Content Retriever): The system retrieves content from the w3wiki database. Each piece of content represents a programming tutorial in various languages (Python, Java, Ruby, C, C++, etc.).
- Processor (Content Processor): The processor is like the team of editors and reviewers who ensure that the content follows a standardized format and meets certain quality criteria before it goes live on the website.
In the context of Spring Batch, the ItemProcessor is similar to the content processing logic. For example, it could check for consistency in code formatting, add standardized headers, or perform language-specific adjustments. - Real-life analogy: If the tutorial content has code snippets, the processor might ensure that all code follows a consistent style guide and includes necessary comments.
public class CodeFormattingProcessor implements ItemProcessor<Tutorial, Tutorial> {
@Override
public Tutorial process(Tutorial tutorial) throws Exception {
// Check and standardize code formatting for the tutorial
tutorial.setCode(CodeFormatter.format(tutorial.getCode()));
return tutorial;
}
}
- Writer (Content Publisher): The writer is responsible for publishing the processed content to the website. In our analogy, this corresponds to updating the w3wiki database with the standardized content.
Real-life analogy: After the content processor has ensured consistent code formatting, the writer updates the database with the processed tutorial.
public class DatabaseWriter implements ItemWriter<Tutorial> {
@Override
public void write(List<? extends Tutorial> tutorials) throws Exception {
// Update the w3wiki database with the processed tutorials
tutorialDatabaseService.updateTutorials(tutorials);
}
}
By using Spring Batch, w3wiki can efficiently automate this content processing system, ensuring that all programming tutorials meet certain quality standards before being published on their website. The processor component, represented by the ItemProcessor, allows for the customization and standardization of content processing logic.
Advantages of Data Transformation with ItemProcessors in Spring Batch
- Modularity: Breaks the task into clear steps for better organization.
- Reusability: Creates tools that can be used again for different tasks, saving time.
- Scalability: Speeds up tasks by dividing the work among many helpers.
- Error Handling: Acts like a safety net, catching and dealing with unexpected issues.
- Complex Transformations: Centralizes intricate changes, simplifying the process.
- Integration: Easily connects with other tools or services for versatility.
- Testing and Debugging: Makes it simple to check and fix each part independently.
Data Transformation with ItemProcessors in Spring Batch
In Spring Batch, the ItemProcessor plays a crucial role in transforming data during batch processing. It allows you to apply custom logic to modify or enrich the data read by the ItemReader before it is written by the ItemWriter. Let’s extend the example of a content processing system for w3wiki with additional attributes and provide a guide on how to perform data transformation using ItemProcessors.Below steps to be followed. Let’s start from the beginning by creating a Spring Boot project and adding the necessary dependencies. For this example, I’ll use Maven as the build tool.
Step 1: Create a Spring Boot Project
- Go to website Spring Initializr
- Set the following configurations:
- Project: Maven Project
- Language: Java
- Spring Boot: Latest stable version
- Group: Your desired group name, e.g. com.w3wiki
- Artifact: Your desired artifact name, e.g. content-processor
- Dependencies:
- Spring Batch
- Spring Web
- Lombok
- Click on the “Generate” button to download the project zip file.
Step 2: Extract and Import into IDE
Extract the downloaded zip file and import the project into your preferred IDE (Eclipse, IntelliJ, etc.).
Step 3: Add Additional Dependencies
Open the pom.xml file in your project and add the necessary dependencies. For this example, we’ll use H2 database for simplicity. If you are using a different database, adjust the dependencies accordingly. Below is the full pom.xml file configuration.
XML
<? xml version = "1.0" encoding = "UTF-8" ?> <!-- Maven Project Object Model (POM) file for the w3wikiContentProcessor Spring Boot App --> < project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd" > <!-- Specify the Maven version and POM format --> < modelVersion >4.0.0</ modelVersion > <!-- Parent POM for Spring Boot projects --> < parent > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-starter-parent</ artifactId > < version >2.6.1</ version > < relativePath /> <!-- lookup parent from repository --> </ parent > <!-- Project information --> < groupId >com.w3wiki</ groupId > < artifactId >w3wikiContentProcessor</ artifactId > < version >0.0.1-SNAPSHOT</ version > < name >w3wikiContentProcessor</ name > < description >RESTful API for w3wiki Content Processor Spring Boot App</ description > <!-- Project properties --> < properties > < java.version >8</ java.version > <!-- Java version for the project --> </ properties > <!-- Project dependencies --> < dependencies > <!-- Spring Boot starter for Spring Data JPA --> < dependency > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-starter-data-jpa</ artifactId > </ dependency > <!-- Spring Boot starter for Spring Batch --> < dependency > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-starter-batch</ artifactId > </ dependency > <!-- Spring Boot starter for building web applications --> < dependency > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-starter-web</ artifactId > </ dependency > <!-- H2 Database as a runtime dependency --> < dependency > < groupId >com.h2database</ groupId > < artifactId >h2</ artifactId > < scope >runtime</ scope > </ dependency > <!-- Spring Boot devtools for development --> < dependency > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-devtools</ artifactId > < scope >runtime</ scope > < optional >true</ optional > </ dependency > <!-- Lombok for simplified Java code --> < dependency > < groupId >org.projectlombok</ groupId > < artifactId >lombok</ artifactId > < optional >true</ optional > </ dependency > <!-- Spring Boot starter for testing --> < dependency > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-starter-test</ artifactId > < scope >test</ scope > </ dependency > <!-- Spring Batch testing support --> < dependency > < groupId >org.springframework.batch</ groupId > < artifactId >spring-batch-test</ artifactId > < scope >test</ scope > </ dependency > </ dependencies > <!-- Maven Build Configuration --> < build > < plugins > <!-- Spring Boot Maven Plugin --> < plugin > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-maven-plugin</ artifactId > < configuration > <!-- Exclude Lombok from Spring Boot plugin --> < excludes > < exclude > < groupId >org.projectlombok</ groupId > < artifactId >lombok</ artifactId > </ exclude > </ excludes > </ configuration > </ plugin > </ plugins > </ build > </ project > |
Step 4: Create a ProgrammingTutorial Class
Create the ProgrammingTutorial class with the additional attributes as per the requirement.
Java
package com.w3wiki.model; import java.util.Date; import jakarta.persistence.Column; import jakarta.persistence.Entity; import jakarta.persistence.GeneratedValue; import jakarta.persistence.GenerationType; import jakarta.persistence.Id; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; import org.hibernate.annotations.CreationTimestamp; import org.hibernate.annotations.UpdateTimestamp; /** * Represents a programming tutorial entity. * * This class is annotated with JPA annotations for entity mapping. Lombok * annotations are used to generate getters, setters, and constructors. * Hibernate annotations are used to handle timestamp creation and updates. * * @author rahul.chauhan */ @Entity @Data @NoArgsConstructor @AllArgsConstructor public class ProgrammingTutorial { /** * Unique identifier for the tutorial. */ @Id @GeneratedValue (strategy = GenerationType.IDENTITY) private Long id; /** * Title of the programming tutorial. */ @Column private String title; /** * Programming language covered in the tutorial. */ @Column private String language; /** * Content of the programming tutorial. */ @Column private String content; /** * Author of the programming tutorial. */ @Column private String author; /** * Timestamp representing the creation time of the tutorial. Automatically * populated by Hibernate. */ @CreationTimestamp private Date createTime; /** * Timestamp representing the last update time of the tutorial. Automatically * updated by Hibernate. */ @UpdateTimestamp private Date lastUpdateTime; } |
Step 5: Create a TutorialRepository
Create a simple repository interface for accessing the database.
Java
package com.w3wiki.repository; import org.springframework.data.jpa.repository.JpaRepository; import com.w3wiki.model.ProgrammingTutorial; public interface TutorialRepository extends JpaRepository<ProgrammingTutorial, Long> { } |
Step 6: Configure Application Properties
In your application.properties file, configure the H2 database and other Spring Batch properties.
# DataSource settings
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=testUser
spring.datasource.password=password
spring.jpa.database-platform=org.hibernate.dialect.H2Dialect
# H2 Console settings
spring.h2.console.enabled=true
spring.h2.console.path=/h2-console
# Hibernate settings
spring.jpa.hibernate.ddl-auto=update
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.format_sql=true
# Server port
server.port=8080
Step 7: Implement ItemReader , ItemWriter , ItemProcesor and Batch config class
Java
/* BatchConfiguration.java */ package com.w3wiki.batch; import org.springframework.batch.core.Job; import org.springframework.batch.core.Step; import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; import org.springframework.batch.core.launch.JobLauncher; import org.springframework.batch.core.launch.support.RunIdIncrementer; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.scheduling.annotation.EnableScheduling; import com.w3wiki.model.ProgrammingTutorial; @Configuration @EnableBatchProcessing @EnableScheduling public class BatchConfiguration { private final JobBuilderFactory jobBuilderFactory; private final StepBuilderFactory stepBuilderFactory; private final JobLauncher jobLauncher; private final JobCompletionNotificationListener notificationListener; private final TutorialItemReader tutorialItemReader; private final TutorialItemProcessor tutorialItemProcessor; private final TutorialItemWriter tutorialItemWriter; @Autowired public BatchConfiguration( JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory, JobLauncher jobLauncher, JobCompletionNotificationListener notificationListener, TutorialItemReader tutorialItemReader, TutorialItemProcessor tutorialItemProcessor, TutorialItemWriter tutorialItemWriter ) { this .jobBuilderFactory = jobBuilderFactory; this .stepBuilderFactory = stepBuilderFactory; this .jobLauncher = jobLauncher; this .notificationListener = notificationListener; this .tutorialItemReader = tutorialItemReader; this .tutorialItemProcessor = tutorialItemProcessor; this .tutorialItemWriter = tutorialItemWriter; } @Bean public Job processContentJob() { return jobBuilderFactory.get( "processContentJob" ) .incrementer( new RunIdIncrementer()) .listener(notificationListener) .flow(processContentStep()) .end() .build(); } @Bean public Step processContentStep() { return stepBuilderFactory.get( "processContentStep" ) .<ProgrammingTutorial, ProgrammingTutorial>chunk( 10 ) .reader(tutorialItemReader) .processor(tutorialItemProcessor) .writer(tutorialItemWriter) .build(); } } |
Java
/* JobCompletionNotificationListener.java */ package com.w3wiki.batch; import org.springframework.batch.core.BatchStatus; import org.springframework.batch.core.JobExecution; import org.springframework.batch.core.listener.JobExecutionListenerSupport; import org.springframework.stereotype.Component; @Component public class JobCompletionNotificationListener extends JobExecutionListenerSupport { @Override public void afterJob(JobExecution jobExecution) { if (jobExecution.getStatus() == BatchStatus.COMPLETED) { System.out.println( "Batch Job Completed Successfully! Time to verify the results." ); } } } |
Java
/* TutorialItemProcessor.java */ package com.w3wiki.batch; import org.springframework.batch.item.ItemProcessor; import org.springframework.stereotype.Component; import com.w3wiki.model.ProgrammingTutorial; @Component public class TutorialItemProcessor implements ItemProcessor<ProgrammingTutorial, ProgrammingTutorial> { @Override public ProgrammingTutorial process(ProgrammingTutorial tutorial) throws Exception { // Your transformation logic here tutorial.setTitle( "Transformed: " + tutorial.getTitle()); tutorial.setContent(transformContent(tutorial.getContent())); return tutorial; } private String transformContent(String content) { // Your content transformation logic here // For example, perform language-specific adjustments return content.toUpperCase(); } } |
Java
/* TutorialItemReader.java */ package com.w3wiki.batch; import java.util.Iterator; import java.util.List; import org.springframework.batch.item.ItemReader; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import com.w3wiki.model.ProgrammingTutorial; import com.w3wiki.repository.TutorialRepository; @Component public class TutorialItemReader implements ItemReader<ProgrammingTutorial> { @Autowired private TutorialRepository tutorialRepository; private Iterator<ProgrammingTutorial> tutorialIterator; @Override public ProgrammingTutorial read() throws Exception { if (tutorialIterator == null || !tutorialIterator.hasNext()) { initializeIterator(); } return tutorialIterator.hasNext() ? tutorialIterator.next() : null ; } private void initializeIterator() { List<ProgrammingTutorial> tutorials = tutorialRepository.findAll(); tutorialIterator = tutorials.iterator(); } } |
Java
/* TutorialItemWriter.java */ package com.w3wiki.batch; import java.util.List; import org.springframework.batch.item.ItemWriter; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import com.w3wiki.model.ProgrammingTutorial; import com.w3wiki.repository.TutorialRepository; @Component public class TutorialItemWriter implements ItemWriter<ProgrammingTutorial> { @Autowired private TutorialRepository tutorialRepository; @Override public void write(List<? extends ProgrammingTutorial> tutorials) throws Exception { tutorialRepository.saveAll(tutorials); } } |
Step 8: Create ContentProcessingController
Java
package com.w3wiki.controller; import org.springframework.batch.core.Job; import org.springframework.batch.core.JobExecution; import org.springframework.batch.core.JobParameters; import org.springframework.batch.core.JobParametersBuilder; import org.springframework.batch.core.launch.JobLauncher; import org.springframework.http.HttpStatus; import org.springframework.http.ResponseEntity; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RestController; @RestController @RequestMapping ( "/api/content" ) public class ContentProcessingController { private final JobLauncher jobLauncher; private final Job processContentJob; public ContentProcessingController(JobLauncher jobLauncher, Job processContentJob) { this .processContentJob = processContentJob; this .jobLauncher = jobLauncher; } @PostMapping ( "/process" ) public ResponseEntity<String> processContent() { try { JobParameters jobParameters = new JobParametersBuilder() .addString( "jobParam1" , String.valueOf(System.currentTimeMillis())).toJobParameters(); JobExecution jobExecution = jobLauncher.run(processContentJob, jobParameters); return ResponseEntity.ok( "Content processing job initiated successfully. Job ID: " + jobExecution.getId()); } catch (Exception e) { return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR) .body( "Error initiating content processing job: " + e.getMessage()); } } } |
Below is the project structure of created spring Boot application:
Run the Spring Boot Application
Testing Spring Boot Application
Use Postman to Trigger the Batch Job: Open Postman and create a new request:
- Method: POST
- URL: http://localhost:8080/api/content/process
OR just paste the below CURL into the postman request area:
curl -X POST -H "Content-Type: application/json" -d '{
"title": "Sample Tutorial",
"language": "Java",
"content": "This is a sample tutorial content.",
"author": "Rahul Dravid"
}' http://localhost:8080/api/tutorials
Send the request, and you should receive a response indicating that the content processing job has been initiated. See the below image for reference:
It is visible from the attached image the message showing as “Content processing job initiated successfully. Job ID: 3″.
Conclusion
In conclusion, creating a Spring Batch application for processing programming tutorials involves configuring entity classes, implementing ItemProcessor, ItemReader, and ItemWriter components, setting up batch jobs, and creating endpoints for job initiation. The application can be tested using Postman, and the entire process is designed to streamline the batch processing of data, ensuring consistency and efficiency in handling large datasets. Monitoring and debugging tools, along with additional enhancements, can be employed to refine and optimize the application for specific use cases. Overall, Spring Batch simplifies the development of robust and scalable batch processing systems within a Spring Boot application.
Contact Us