移除UTF8文件的BOM头

1月 052017

开发时遇到过的UTF8文件有BOM头、导致文件不能正常解析这问题，BOM是什么这个问题请参考如下地址：
https://en.wikipedia.org/wiki/Byte_order_mark
UTF8 与 UTF8 +BOM 区别
其实就是在文件头部的3个字节：EF BB BF，而且是不可见的，可以用于标示字节编码顺序（Big-Endian/Little- Endian），UTF-8不需要BOM来表明字节顺序，但可以用BOM来表明编码方式。Windows就是使用BOM来标记文本文件的编码方式的。此处整理一下移除BOM头的方法：
【1】文本编辑器
UltarEdit/Sublime/notepad++都可以方便的转化（比如notepad++“另存为”）
【2】编码实现
平时开发一直用java，此处给出java实现（亲测，参考自：移除UTF-8文件头的bom）：

public static byte[] removeUTF8BOM(byte[] bt) {
&nbsp; &nbsp; if (bt != null &amp;&amp; bt.length &gt; 3) {
 &nbsp; &nbsp; &nbsp; &nbsp; // 前三个字节依次是 EF BB BF
 &nbsp; &nbsp; &nbsp; &nbsp; if (bt[0] == -17 &amp;&amp; bt[1] == -69 &amp;&amp; bt[2] == -65) {
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; byte[] nbs = new byte[bt.length - 3];
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.arraycopy(bt, 3, nbs, 0, nbs.length);
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return nbs;
 &nbsp; &nbsp; &nbsp; &nbsp; }
 &nbsp; &nbsp; }
 &nbsp; &nbsp; return bt;
 }

其他处理方式：Java处理文件BOM头的方式推荐

« 【转发】[VirtualBox]如何复制一个虚拟机

【源码分析】java sdk篇–Integer »

Prayer's Laputa

移除UTF8文件的BOM头

Leave a Reply Cancel reply