Removing character accents

Avalon Internet web development


Removing Character Accents

A Workaround for Google Maps' Inability to Handle Them

by Beryl Magilavy

Google Maps appears to be unable to accept accented characters, which really cuts into its mapping usefulness in the very many countries whose languages use them.

Until this has been resolved in a future version of Google Maps, I have found that stripping off the accents gets the mapping software working fine. The following PHP function does this. You can see it working by clicking the map link on one of the Private List pages with accents in the address, for instance the page for the hôtel Heidelbach, in Paris, an address on Avenue d'Iéna.

For UTF-8, the function should be modified to use the UTF-8-specific multibyte-string-splitting function at the bottom of the page.

The odd-looking characters are what the preceding multibyte letters look like entered by an ASCII editor. It can be useful to have this collection if your copy of phpMyAdmin refuses to save extended characters properly UTF-8 encoded, even though it seems to be configured to do so. Sometimes it is easiest to use it for changing just one or two words, and having the code reference allows entry of accented characters that will print properly on a UTF-8 page.

Note this is a function instead of a class because of the incompatibility of the object structure between PHP4 and PHP5. It was written to be usable by both.

PHP code: Function removing character accents

function removeAccents($text){//1
	
$a = array("à","á","â","ä","Ã","á","â","ä");
$e = array("è","é","ê","ë","è","é","ê","ë");
$i = array("ì","í","î","ï","ì","Ã-","î","ï");
$o = array("ò","ó","ô","ö","ò","ó","ô","ö");
$u = array("ù","ú","û","ü","ù","ú","û","ü");
$y = array("ý","ý");
$c = array("ç","ç");
$n = array("ñ","ñ");
$changeset = array($a,$e,$i,$o,$u,$y,$c,$n);
$changeto = array("a","e","i","o","u","y","c","n");     

$textarray=mb_str_split($text);//MODIFY TO mb_str_split_UTF8() IF NECESSARY.
$i=0;$j=0;
foreach($textarray as $letter){//2
	foreach($changeset as $accentgroup){//3
		if(in_array($letter,$accentgroup)){//4
			$textarray[$i] = $changeto[$j];continue;
			}//4
		$j++;
		}//3
	$i++;$j=0;
	}//2

return implode("",$textarray);
}//1 END removeAccents()

PHP code: Functions converting multibyte strings to arrays

function mb_str_split($str, $length = 1) {
//FUNCTION COPIED VERBATUM FROM ONE CONTRIBUTED TO THE PHP MANUAL
// BY "ference at super_delete_brose dot co dot uk"
  if ($length < 1) return FALSE;
  $result = array();

  for ($i = 0; $i < mb_strlen($str); $i += $length) {
   $result[] = mb_substr($str, $i, $length);
  }

  return $result;

}

//ABOVE DOES NOT WORK FOR UTF-8 ENCODED STRINGS.
function mb_str_split_UTF8($str, $length = 1) {
  if ($length < 1) return FALSE;
  $result = array();

  for ($i = 0; $i < mb_strlen($str,"UTF-8"); $i += $length) {
   $result[] = mb_substr($str, $i, $length,"UTF-8");
  }

  return $result;
}
published 3 October, 2006