Sometimes, while editing a text file in vim, you might need to insert, rearch or replace characters that are not in your keyboard, such as æ, å, ě, … or non-printing characters such as the control characters^A, ^B, … or characters with hexadecimal codes in the range 0x7F to 0xFF.
This post explains some of the possible ways that vim offers to handle those characters.
Obtaining the hex code of a character:
The hexadecimal code of a given character present in the file that is being edited can be obtained by placing the cursor on top of it, and pressing “ga”. A line will be displayed at the bottom of the window showing the hexadecimal and octal coding of the character. For instance:
1 2 3 |
<Ѳ> 1138, Hex 0472, Octal 2162 |
This example shows that character Ѳ is coded as 0472 (utf8 encoding)
Inserting a character by entering its hex or unicode code
To insert a character with a given hex code, press Ctrl-V, followed by “x”, and then enter the hex code of the character. For instance:
1 2 3 |
^Vx9d |
To insert a character with a given unicode code, press Ctrl-V followed by “u”, and then the unicode code. For instance, to insert Ѳ:
1 2 3 |
^Vu0472 |
Searching a character with a given hex or unicode code:
In a search or replace expression, a character can be specified as “\%xhh”, where hh is the hex code of the character. For instance, to search a character with hex code 9d:
1 2 3 |
/\%x9d |
In the same way, a character can be specified as “\%uhhhh”, where hhhh is the unicode code of the character. For instance, to search the next occurrence of character Ѳ:
1 2 3 |
/\%u0472 |
Digraphs
The most common characters in western languages can be inserted pressing Ctrl-K followed by a two-letter combination (known as digraph) specific for the desired symbol. For instance, the sterling pound symbol £ can be inserted pressing ^KPd, and the ä character can be inserted pressing ^Ka:.
Besides, digraph mode can be enabled with the command “: set digraph”. In digraph mode, accented characters can also be inserted using the backspace key. For instance, character ä can be inserted as a<BS>: ( “a” + backspace + “:” ); character ê can be inserted with the sequence “e” + backspace + “>”
Finally, all digraphs known to vim can be listed with the command “:digraphs”:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
:digraphs NU ^@ 10 SH ^A 1 SX ^B 2 EX ^C 3 ET ^D 4 EQ ^E 5 AK ^F 6 BL ^G 7 BS ^H 8 HT ^I 9 LF ^@ 10 VT ^K 11 FF ^L 12 CR ^M 13 SO ^N 14 SI ^O 15 DL ^P 16 D1 ^Q 17 D2 ^R 18 D3 ^S 19 D4 ^T 20 NK ^U 21 SY ^V 22 EB ^W 23 CN ^X 24 EM ^Y 25 SB ^Z 26 EC ^[ 27 FS ^\ 28 GS ^] 29 RS ^^ 30 US ^_ 31 SP 32 Nb # 35 DO $ 36 At @ 64 <( [ 91 // \ 92 )> ] 93 '> ^ 94 '! ` 96 (! { 123 !! | 124 !) } 125 '? ~ 126 DT ^? 127 PA <80> 128 HO <81> 129 BH <82> 130 NH <83> 131 IN <84> 132 NL <85> 133 SA <86> 134 ES <87> 135 HS <88> 136 HJ <89> 137 VS <8a> 138 PD <8b> 139 PU <8c> 140 RI <8d> 141 S2 <8e> 142 S3 <8f> 143 DC <90> 144 P1 <91> 145 P2 <92> 146 TS <93> 147 CC <94> 148 MW <95> 149 SG <96> 150 EG <97> 151 SS <98> 152 GC <99> 153 SC <9a> 154 CI <9b> 155 ST <9c> 156 OC <9d> 157 PM <9e> 158 AC <9f> 159 NS 160 !I ¡ 161 Ct ¢ 162 Pd £ 163 Cu ¤ 164 Ye ¥ 165 BB ¦ 166 SE § 167 ': ¨ 168 Co © 169 -a ª 170 << « 171 NO ¬ 172 -- 173 Rg ® 174 'm ¯ 175 DG ° 176 +- ± 177 2S ² 178 3S ³ 179 '' ´ 180 My µ 181 PI ¶ 182 .M · 183 ', ¸ 184 1S ¹ 185 -o º 186 >> » 187 14 ¼ 188 12 ½ 189 34 ¾ 190 ?I ¿ 191 A! À 192 A' Á 193 A> Â 194 A? Ã 195 A: Ä 196 AA Å 197 AE Æ 198 C, Ç 199 E! È 200 E' É 201 E> Ê 202 E: Ë 203 I! Ì 204 I' Í 205 I> Î 206 I: Ï 207 D- Ð 208 N? Ñ 209 O! Ò 210 O' Ó 211 O> Ô 212 O? Õ 213 O: Ö 214 *X × 215 O/ Ø 216 U! Ù 217 U' Ú 218 U> Û 219 U: Ü 220 Y' Ý 221 TH Þ 222 ss ß 223 a! à 224 a' á 225 a> â 226 a? ã 227 a: ä 228 ... |
In this list:
- the two first characters are the sequence to be entered after Ctrl-K.
- next appears the resulting symbol
- and then, the decimal unicode the symbol
As you can see, there are hundreds of available digraphs, including digraphs for the control characters with hex codes 0x00 to 0x1F, non-printing characters with hex codes in the range 0x7F to 0xFF, currency symbols (sterling pound, yen,…) and most non-ASCII symbols used in western language, as well as other symbols commonly used (Copyright symbol, etc.)
Reloading an open file, reading it as utf8
Vim detects the encoding used in a file and opens it accordingly. But it might happen that a text file encoded in utf8 contains a control character. In this case, vim interprets the encoding of the file as “Non-ISO extended-ASCII text”, and as a result the utf8 characters are not correctly displayed. If this is the case, you can force vim to reload the file as utf8, with the command:
1 2 3 |
:e! ++enc=utf8 |
References
Related posts
—