java.nio.charset.CharsetEncoder Class in Java
For the purpose of character encoding and decoding, java offers a number of classes in the ‘java.nio.charset’ package. The ‘CharsetEncoder’ class of this package performs the important task of encoding. In this article, let us understand this class, its syntax, different methods, and some examples of error handling and optimization techniques.
What is a CharsetEncoder?
The ‘CharsetEncoder’ class is imported from ‘java.nio.charset’ package.
The basic function of the class is to use a certain character set or an encoding known as a Charset. It converts the character sequences into byte format. This class is commonly used for activities such as writing textual data to files, transmitting data over the network, and encoding/decoding data between different character encodings.
CharsetEncoder translates a character input to a byte output. The internal character representation of Java which is usually UTF-16, is encoded and converted into the byte representation of the chosen character encoding (eg. UTF-8, etc).
Syntax of CharsetEncoder
public abstract class CharsetEncoder extends Object
Constructors of CharsetEncoder
Constructor associated with CharsetEncoder and its description.
Constructor |
Modifier |
Description |
---|---|---|
CharsetEncoder(Charset cs, float averageBytesPerChar, float maxBytesPerChar) |
protected |
A new encoder for a given Charset is initialized with the maximum and average bytes per character specified by the CharsetEncoder constructor. |
CharsetEncoder(Charset cs, float averageBytesPerChar, float maxBytesPerChar, byte[] replacement) |
protected |
A new encoder for a given Charset is initialized by the CharsetEncoder constructor with an estimated average and maximum number of bytes per character as well as a unique alternative byte sequence for characters that cannot be mapped. |
Methods of CharsetEncoder
Table of the methods associated with CharsetEncoder and its description.
Modifier and Type |
Method |
Description |
---|---|---|
float |
averageBytesPerChar() |
Returns the average number of bytes that will be generated for every input character. |
boolean |
canEncode(char c) |
Indicates if the specified character can be encoded by this encoder. |
boolean |
canEncode(CharSequence cs) |
Indicates if the provided character sequence can be encoded by this encoder. |
Charset |
charset() |
Returns the charset that created this encoder. |
ByteBuffer |
encode(CharBuffer in) |
Encodes the remaining data from a single input character buffer into a newly-allocated byte buffer |
CoderResult |
encode(CharBuffer in, ByteBuffer out, boolean endOfInput) |
Writes the results to the specified output buffer after encoding as many characters as possible from the provided input buffer. |
protected abstract CoderResult |
encodeLoop(CharBuffer in, ByteBuffer out) |
Encodes one or more characters into one or more bytes. |
CoderResult |
flush(ByteBuffer out) |
Flushes the encoder. |
protected CoderResult |
implFlush(ByteBuffer out) |
Flushes the encoder. |
protected void |
implReset() |
Clears any internal state specific to a given charset by resetting this encoder. |
boolean |
isLegalReplacement(byte[] repl) |
Indicates if the provided byte array is a valid replacement value for this encoder. |
float |
maxBytesPerChar() |
Returns the maximum number of bytes that can be generated for each input character. |
CharsetEncoder |
reset() |
Resets the encoder, clearing any internal state. |
byte[] |
replacement() |
Returns the replacement value for this encoder. |
CharsetEncoder |
replaceWith(byte[] newReplacement) |
Modifies the replacement value of this encoder. |
Inherited Methods
The Methods included with Charset class are inherited by java.lang.Object .
Examples of CharEncoder Class
Example 1: Basic use of CharsetEncoder
In this example, the input string is encoded into bytes using the CharsetEncoder with UTF-8 character encoding.
It covers on how to construct a CharsetEncoder, encode the characters, place the input text within a CharBuffer, then output the data that has been encoded. It has basic error handling to address any issues that may come up during the encoding process.
Java
// Java Program to construct a // CharsetEncoder using CharBuffer import java.nio.*; import java.nio.charset.*; //Driver class public class Main { // Main method public static void main(String[] args){ // Create a Charset Charset ch = Charset.forName( "UTF-8" ); // Initialize a CharsetEncoder CharsetEncoder ec = ch.newEncoder(); // Input string String str = "CharsetEncoder Example" ; // Wrap the input text in a CharBuffer CharBuffer charBuffer = CharBuffer.wrap(str); try { // Encode the characters ByteBuffer bf = ec.encode(charBuffer); // Print the encoded data String ans = new String(bf.array()); System.out.println(ans); } catch (Exception e) { // Handle the exception e.printStackTrace(); } } } |
Output:
CharsetEncoder Example
Example 2: Error Handling
The UTF-8 character encoding can encode only the characters that lie within the Unicode standard. There are some special characters or symbols that cannot be recognized by this encoding technique. In order to prevent problems, the errors need to be handled using some methods. In the below given example, we have given an input string which contains a special symbol ‘Ω’, that is not mappable using UTF-8. We use the ‘onUnmappableCharacter‘ and ‘CodingErrorAction.REPLACE‘ methods to replace these unmappable characters with any different character.
In the code below, whenever we encounter ‘Ω’, it is replaced by ‘?‘ which indicates that the special symbol is replaced with a fallback character for error handling.
Java
// Java Program for Error handling // Using onUnmappableCharacter import java.nio.*; import java.nio.charset.*; //Driver Class public class Main { //Main method public static void main(String[] args){ // Create a Charset Charset ch = Charset.forName( "UTF-8" ); // Initialize a CharsetEncoder CharsetEncoder ec = ch.newEncoder(); // Input string (with Ω as an unmappable character) String str = "Charset Ω Encoder" ; // Handle the error by replacing the unmappable // character with a question mark ec.onUnmappableCharacter(CodingErrorAction.REPLACE); ec.replaceWith( "?" .getBytes()); // Wrap the string into a CharBuffer CharBuffer cb = CharBuffer.wrap(str); try { // Encode the characters ByteBuffer bf = ec.encode(cb); // Convert the ByteBuffer to a String String ans = new String(bf.array()); System.out.println( "Encoded String: " + ans); } catch (Exception e) { // Handle the exception System.err.println( "Error: " + e.getMessage()); } } } |
Output:
Encoded String: Charset ? Encoder
How to Optimize the Encoding?
Now that we have understood about the encoding operations with the help of CharsetEncoder class, it is important to know about how to improve the efficiency and performance when dealing with larger volumes of data.
- Buffer Management: Using CharBuffer and ByteBuffer, we can manage the size of data as it avoid frequent reallocations. Set aside buffers that are just sufficient to contain expected data. We have discussed this method in the examples given above
- Reuse Buffers: Instead of creating new instances of CharBuffer and ByteBuffer everytime, consider reusing them for each encoding and decoding operations. This will significantly reduce the memory allocation.
- Bulk Encoding: Always use the encode() method with CharSequence or a CharBuffer that contains all the characters to be encoded or decoded. Using this, the number of encoding calls will be minimized making your program efficient.
- Precompute Buffer Size: To prevent unnecessary resizing, allocate the ByteBuffer with the right size or a little bit more capacity if you know the approximate amount of the encoded data in bytes.
In this article, we covered all the methods and best practices related to the CharsetEncoder class. From syntax, constructors to error handling and optimization techniques, we explored how to utilize this class for character encoding tasks in Java applications.
Contact Us