maven and UTF-8

When you want to compile your source code – Java in this case – on multiple machines, you might get into troubles when special characters are coming in. I mean, when you just type plain English, nothing’s wrong. However, when you start typing characters other than in the ASCII128 specification, you get into troubles. On the Internet, it is a known problem and thus, most websites are specifying charset=”UTF-8″ in their metadata, so that every character you see or insert, is the same on all platforms.

Now, when you create a text file on Windows, by default it’ll be in the Cp1252 encoding. When that is a Java source file that you want to build on a Ubuntu server, you come into troubles when these source files contain characters like á or è and especially Æ. Ubuntu uses UTF-8 as default encoding and you’ll end up in compile errors because the characters are converted wrongly.

This writing describes a solution to be able to build your project using maven on different platforms (Windows and Ubuntu).

To make your project platform-independent, you can choose to create UTF-8 encoded sourcefiles all the time. In Eclipse, you can find the option in the Preferences panel (Window > Preferences) under General > Workspace. Here, you can modify the Text file encoding to UTF-8.

Now, maven will build the project correctly on Ubuntu. However, it won’t build on your Windows machine anymore – since it doesn’t interpret the UTF-8 characters correctly. So we have to make sure maven starts building in UTF-8. It’s giving you a hint by displaying a warning message while building:

[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!

To get rid of that, we have to add some properties in the pom files, preferably the master-pom file so that you only have to specify it once. It’s described on the maven website and even more on codehouse.org. However, both solutions were not enough to solve the problem on my machine. You have to specify the encoding that will be used for the project and configure it for all plugins that matter. In fact, that would cut down to the “build” and the “resources” plugin. If you could speak of “plugin” in these cases anyway…

When only specifying project.build.sourceEncoding, maven would still display the warning, so it wasn’t sufficient:

[INFO] Building project
[INFO]    task-segment: [deploy]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!
...

As you can see, we’re in the resources:resources section, not in a “build” section, so that made me add a project.resources.sourceEncoding property. My adjusted master pom file thus contains:

<?xml version="1.0" encoding="us-ascii"?>
<project xmlns="...">
... 
<properties>
  <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  <project.resources.sourceEncoding>UTF-8</project.resources.sourceEncoding>
</properties>
... 
<build>
<plugins>
<plugin>
  <artifactId>maven-compiler-plugin</artifactId>
  ...
  <configuration>
    ...
    <encoding>${project.build.sourceEncoding}</encoding>
    <sourceEncoding>${project.build.sourceEncoding}</sourceEncoding>
  </configuration>
</plugin>
...
</plugins>
...
</build>
</project>

And that works. The project is now building on a windows as well as on an ubuntu server and interprets the source files as UTF-8 encoded text files.

One Comment

  1. November 5, 2010

    Vreemd eigelijk dat Windows in deze tijden (goh, wat haat ik die uitdrukking 😛 ) niet standaard in UTF-8 werkt. Ze gebruiken een BOM om de UTF-8-ness van tekstfiles te asserten, dus ik zou denken dat on the fly conversie indien nodig redelijk transparant zou kunnen.
    Kvind het nog vreemder dat Eclipse niet out of the box UTF-8 gebruikt, maar dat je dat specifiek moet enablen.

    Zelfs al zou dat voor een totale nachtmerrie aan backward compatibility issues zorgen, dan nog wordt het tijd om de 21e eeuw binnen te stappen 😛

Comments are closed.