Jan 182015
 

Sometimes, while editing a text file with the vim editor, we might need to enter, search and/or replace characters not available in our keyboard, such as æ, å, ě, … or non-printable characters such as controls characters ^A, ^B, … or characters with hexadecimal codes between 0x7F and 0xFF.

This post goes through some of the possibilities in vim to work with those characters.

Obtaining the hexadecimal or unicode code of a character

If we want to obtain the hexadecimal or unicode code of a given character found in the file being edited, just put the cursor over it, and press “ga”. A line is written at the bottom of the screen, showing the hexadecimal/unicode and octal code for that character. For instance:

in this example, we see that character Ѳ is coded as  Hex 0472 (unicode), octal 2162.

Inserting a character with a given hex/unicode code:

To insert the character for a given single-byte hex code, press Ctrl-V, followed by character x, and then the hexadecimal code. For instance:

To insert the character for a given two byte unicode hexadecimal representation, press Ctrl-V followed by character u, and the the hexadecimal unicode reprentation. For instance, to insert character Ѳ:

Searching a character with a given hex or unicode code:

In a search or replace expression, a given character can be represented as “\%xhh”, where hh is the hexadecimal code of the character being searched. For instance, to search for the character with hex code 9d:

In the same way, a unicode character can be represented in a search/replace expression as “\%uhhhh”. For instance, to search all instances of character Ѳ in the current file and replace them with O:

Digraphs

The most commonly used non-ASCII symbols can be inserted pressing Ctrl-K followed by a two-letter combination (digraph) corresponding to the desired symbol. For instance:

  • To insert the sterling poung symbol £, press^KPd
  • To insert character ä, press^Ka:

The command “:set digraph” can also be used to enable digraph mode. In digraph mode, characters with diacritical marks can also be entered used the backspace key. For instance:

  • To enter character ä, press    a <BS> :    ( “a” + backspace + “:” )
  • To enter character ê, press    e <BS> >    ( “e” + backspace + “>” )

Finally, the comand “:digraphs” can be used to list all available digraphs

In this list:

  • The first two characters are the characters to be entered after Ctrl-K
  • Next comes the graphical representation of the resulting character
  • Finally comes the decimal Unicode coding for the character

We can see that there are several hundred characters with digraphs. Among them, all control characters between hex 0x00 and hex 0x1F, non-graphical characters with codes between 0x7F and 0xFF, currency symbols (sterling pound, yen,…) most non-ASCII characters used in western languages, and other commonly used symbols.

Reloading an open file, reading it as utf8

Normally, vim detects the encoding of a file and opens it accordingly. But it may happen that a text file containing utf8 encoded text, also contains some control character. Vim interprets the encoding as “Non-ISO extended-ASCII text”, and the utf8 characters are not correctly displayed. In this case, we can force vim to reload the file as utf8, with the command:

References

Related posts

 Posted by at 8:20 pm

 Leave a Reply

(required)

(required)