Feb 212015
 

Sometimes, while editing a text file in vim, you might need to insert, rearch or replace characters that are not in your keyboard, such as  æ, å, ě, … or non-printing characters such as the control characters^A, ^B, … or characters with hexadecimal codes in the range 0x7F to 0xFF.

This post explains some of the possible ways that vim offers to handle those characters.

Obtaining the hex code of a character:

The hexadecimal code of a given character present in the file that is being edited can be obtained by placing the cursor on top of it, and pressing “ga”. A line will be displayed at the bottom of the window showing the hexadecimal and octal coding of the character. For instance:

<Ѳ> 1138, Hex 0472, Octal 2162

This example shows that character Ѳ is coded as 0472 (utf8 encoding)

Inserting a character by entering its hex or unicode code

To insert a character with a given hex code, press Ctrl-V, followed by “x”, and then enter the hex code of the character. For instance:

^Vx9d

To insert a character with a given unicode code, press Ctrl-V followed by “u”, and then the unicode code. For instance, to insert Ѳ:

^Vu0472

Searching a character with a given hex or unicode code:

In a search or replace expression, a character can be specified as “\%xhh”, where hh is the hex code of the character. For instance, to search a character with hex code 9d:

/\%x9d

In the same way, a character can be specified as “\%uhhhh”, where hhhh is the unicode code of the character. For instance, to search the next occurrence of character Ѳ:

/\%u0472

Digraphs

The most common characters in western languages can be inserted pressing Ctrl-K followed by a two-letter combination (known as digraph) specific for the desired symbol. For instance, the sterling pound symbol £ can be inserted pressing ^KPd, and the ä character can be inserted pressing ^Ka:.

Besides, digraph mode can be enabled with the command “: set digraph”. In digraph mode, accented characters can also be inserted using the backspace key. For instance, character ä can be inserted as a<BS>: ( “a” + backspace + “:” ); character ê can be inserted with the sequence “e” + backspace + “>”

Finally, all digraphs known to vim can be listed with the command “:digraphs”:

:digraphs
NU ^@  10    SH ^A   1    SX ^B   2    EX ^C   3    ET ^D   4    EQ ^E   5    AK ^F   6
BL ^G   7    BS ^H   8    HT ^I   9    LF ^@  10    VT ^K  11    FF ^L  12    CR ^M  13
SO ^N  14    SI ^O  15    DL ^P  16    D1 ^Q  17    D2 ^R  18    D3 ^S  19    D4 ^T  20
NK ^U  21    SY ^V  22    EB ^W  23    CN ^X  24    EM ^Y  25    SB ^Z  26    EC ^[  27
FS ^\  28    GS ^]  29    RS ^^  30    US ^_  31    SP     32    Nb #   35    DO $   36
At @   64    <( [   91    // \   92    )> ]   93    '> ^   94    '! `   96    (! {  123
!! |  124    !) }  125    '? ~  126    DT ^? 127    PA <80> 128  HO <81> 129  BH <82> 130
NH <83> 131  IN <84> 132  NL <85> 133  SA <86> 134  ES <87> 135  HS <88> 136  HJ <89> 137
VS <8a> 138  PD <8b> 139  PU <8c> 140  RI <8d> 141  S2 <8e> 142  S3 <8f> 143  DC <90> 144
P1 <91> 145  P2 <92> 146  TS <93> 147  CC <94> 148  MW <95> 149  SG <96> 150  EG <97> 151
SS <98> 152  GC <99> 153  SC <9a> 154  CI <9b> 155  ST <9c> 156  OC <9d> 157  PM <9e> 158
AC <9f> 159  NS    160    !I ¡  161    Ct ¢  162    Pd £  163    Cu ¤  164    Ye ¥  165
BB ¦  166    SE §  167    ': ¨  168    Co ©  169    -a ª  170    << «  171    NO ¬  172
-- ­  173    Rg ®  174    'm ¯  175    DG °  176    +- ±  177    2S ²  178    3S ³  179
'' ´  180    My µ  181    PI ¶  182    .M ·  183    ', ¸  184    1S ¹  185    -o º  186
>> »  187    14 ¼  188    12 ½  189    34 ¾  190    ?I ¿  191    A! À  192    A' Á  193
A> Â  194    A? Ã  195    A: Ä  196    AA Å  197    AE Æ  198    C, Ç  199    E! È  200
E' É  201    E> Ê  202    E: Ë  203    I! Ì  204    I' Í  205    I> Î  206    I: Ï  207
D- Ð  208    N? Ñ  209    O! Ò  210    O' Ó  211    O> Ô  212    O? Õ  213    O: Ö  214
*X ×  215    O/ Ø  216    U! Ù  217    U' Ú  218    U> Û  219    U: Ü  220    Y' Ý  221
TH Þ  222    ss ß  223    a! à  224    a' á  225    a> â  226    a? ã  227    a: ä  228
...

In this list:

  • the two first characters are the sequence to be entered after Ctrl-K.
  • next appears the resulting symbol
  • and then, the decimal unicode the symbol

As you can see, there are hundreds of available digraphs, including digraphs for the control characters with hex codes 0x00 to 0x1F, non-printing characters with hex codes in the range 0x7F to 0xFF, currency symbols (sterling pound, yen,…) and most non-ASCII symbols used in western language, as well as other symbols commonly used (Copyright symbol, etc.)

Reloading an open file, reading it as utf8

Vim detects the encoding used in a file and opens it accordingly. But it might happen that a text file encoded in utf8 contains a control character. In this case, vim interprets the encoding of the file as “Non-ISO extended-ASCII text”, and as a result the utf8 characters are not correctly displayed. If this is the case, you can force vim to reload the file as utf8, with the command:

:e! ++enc=utf8

References

Related posts

 Posted by at 11:13 am

 Leave a Reply

(required)

(required)