Jan 182015
 

Sometimes, while editing a text file with the vim editor, we might need to enter, search and/or replace characters not available in our keyboard, such as æ, å, ě, … or non-printable characters such as controls characters ^A, ^B, … or characters with hexadecimal codes between 0x7F and 0xFF.

This post goes through some of the possibilities in vim to work with those characters.

Obtaining the hexadecimal or unicode code of a character

If we want to obtain the hexadecimal or unicode code of a given character found in the file being edited, just put the cursor over it, and press “ga”. A line is written at the bottom of the screen, showing the hexadecimal/unicode and octal code for that character. For instance:

<Ѳ> 1138, Hex 0472, Octal 2162

in this example, we see that character Ѳ is coded as  Hex 0472 (unicode), octal 2162.

Inserting a character with a given hex/unicode code:

To insert the character for a given single-byte hex code, press Ctrl-V, followed by character x, and then the hexadecimal code. For instance:

^Vx9d

To insert the character for a given two byte unicode hexadecimal representation, press Ctrl-V followed by character u, and the the hexadecimal unicode reprentation. For instance, to insert character Ѳ:

^Vu0472

Searching a character with a given hex or unicode code:

In a search or replace expression, a given character can be represented as “\%xhh”, where hh is the hexadecimal code of the character being searched. For instance, to search for the character with hex code 9d:

/\%x9d

In the same way, a unicode character can be represented in a search/replace expression as “\%uhhhh”. For instance, to search all instances of character Ѳ in the current file and replace them with O:

:1,$s/\%u0472/O/g

Digraphs

The most commonly used non-ASCII symbols can be inserted pressing Ctrl-K followed by a two-letter combination (digraph) corresponding to the desired symbol. For instance:

  • To insert the sterling poung symbol £, press^KPd
  • To insert character ä, press^Ka:

The command “:set digraph” can also be used to enable digraph mode. In digraph mode, characters with diacritical marks can also be entered used the backspace key. For instance:

  • To enter character ä, press    a <BS> :    ( “a” + backspace + “:” )
  • To enter character ê, press    e <BS> >    ( “e” + backspace + “>” )

Finally, the comand “:digraphs” can be used to list all available digraphs

:digraphs
NU ^@  10    SH ^A   1    SX ^B   2    EX ^C   3    ET ^D   4    EQ ^E   5    AK ^F   6
BL ^G   7    BS ^H   8    HT ^I   9    LF ^@  10    VT ^K  11    FF ^L  12    CR ^M  13
SO ^N  14    SI ^O  15    DL ^P  16    D1 ^Q  17    D2 ^R  18    D3 ^S  19    D4 ^T  20
NK ^U  21    SY ^V  22    EB ^W  23    CN ^X  24    EM ^Y  25    SB ^Z  26    EC ^[  27
FS ^\  28    GS ^]  29    RS ^^  30    US ^_  31    SP     32    Nb #   35    DO $   36
At @   64    <( [   91    // \   92    )> ]   93    '> ^   94    '! `   96    (! {  123
!! |  124    !) }  125    '? ~  126    DT ^? 127    PA <80> 128  HO <81> 129  BH <82> 130
NH <83> 131  IN <84> 132  NL <85> 133  SA <86> 134  ES <87> 135  HS <88> 136  HJ <89> 137
VS <8a> 138  PD <8b> 139  PU <8c> 140  RI <8d> 141  S2 <8e> 142  S3 <8f> 143  DC <90> 144
P1 <91> 145  P2 <92> 146  TS <93> 147  CC <94> 148  MW <95> 149  SG <96> 150  EG <97> 151
SS <98> 152  GC <99> 153  SC <9a> 154  CI <9b> 155  ST <9c> 156  OC <9d> 157  PM <9e> 158
AC <9f> 159  NS    160    !I ¡  161    Ct ¢  162    Pd £  163    Cu ¤  164    Ye ¥  165
BB ¦  166    SE §  167    ': ¨  168    Co ©  169    -a ª  170    << «  171    NO ¬  172
-- ­  173    Rg ®  174    'm ¯  175    DG °  176    +- ±  177    2S ²  178    3S ³  179
'' ´  180    My µ  181    PI ¶  182    .M ·  183    ', ¸  184    1S ¹  185    -o º  186
>> »  187    14 ¼  188    12 ½  189    34 ¾  190    ?I ¿  191    A! À  192    A' Á  193
A> Â  194    A? Ã  195    A: Ä  196    AA Å  197    AE Æ  198    C, Ç  199    E! È  200
E' É  201    E> Ê  202    E: Ë  203    I! Ì  204    I' Í  205    I> Î  206    I: Ï  207
D- Ð  208    N? Ñ  209    O! Ò  210    O' Ó  211    O> Ô  212    O? Õ  213    O: Ö  214
*X ×  215    O/ Ø  216    U! Ù  217    U' Ú  218    U> Û  219    U: Ü  220    Y' Ý  221
TH Þ  222    ss ß  223    a! à  224    a' á  225    a> â  226    a? ã  227    a: ä  228
...

In this list:

  • The first two characters are the characters to be entered after Ctrl-K
  • Next comes the graphical representation of the resulting character
  • Finally comes the decimal Unicode coding for the character

We can see that there are several hundred characters with digraphs. Among them, all control characters between hex 0x00 and hex 0x1F, non-graphical characters with codes between 0x7F and 0xFF, currency symbols (sterling pound, yen,…) most non-ASCII characters used in western languages, and other commonly used symbols.

Reloading an open file, reading it as utf8

Normally, vim detects the encoding of a file and opens it accordingly. But it may happen that a text file containing utf8 encoded text, also contains some control character. Vim interprets the encoding as “Non-ISO extended-ASCII text”, and the utf8 characters are not correctly displayed. In this case, we can force vim to reload the file as utf8, with the command:

:e! ++enc=utf8

References

Related posts

 Posted by at 8:20 pm

 Leave a Reply

(required)

(required)