Sometimes, while editing a text file with the vim editor, we might need to enter, search and/or replace characters not available in our keyboard, such as æ, å, ě, … or non-printable characters such as controls characters ^A, ^B, … or characters with hexadecimal codes between 0x7F and 0xFF.
This post goes through some of the possibilities in vim to work with those characters.
Obtaining the hexadecimal or unicode code of a character
If we want to obtain the hexadecimal or unicode code of a given character found in the file being edited, just put the cursor over it, and press “ga”. A line is written at the bottom of the screen, showing the hexadecimal/unicode and octal code for that character. For instance:
1 2 3 |
<Ѳ> 1138, Hex 0472, Octal 2162 |
in this example, we see that character Ѳ is coded as Hex 0472 (unicode), octal 2162.
Inserting a character with a given hex/unicode code:
To insert the character for a given single-byte hex code, press Ctrl-V, followed by character x, and then the hexadecimal code. For instance:
1 2 3 |
^Vx9d |
To insert the character for a given two byte unicode hexadecimal representation, press Ctrl-V followed by character u, and the the hexadecimal unicode reprentation. For instance, to insert character Ѳ:
1 2 3 |
^Vu0472 |
Searching a character with a given hex or unicode code:
In a search or replace expression, a given character can be represented as “\%xhh”, where hh is the hexadecimal code of the character being searched. For instance, to search for the character with hex code 9d:
1 2 3 |
/\%x9d |
In the same way, a unicode character can be represented in a search/replace expression as “\%uhhhh”. For instance, to search all instances of character Ѳ in the current file and replace them with O:
1 2 3 |
:1,$s/\%u0472/O/g |
Digraphs
The most commonly used non-ASCII symbols can be inserted pressing Ctrl-K followed by a two-letter combination (digraph) corresponding to the desired symbol. For instance:
- To insert the sterling poung symbol £, press^KPd
- To insert character ä, press^Ka:
The command “:set digraph” can also be used to enable digraph mode. In digraph mode, characters with diacritical marks can also be entered used the backspace key. For instance:
- To enter character ä, press a <BS> : ( “a” + backspace + “:” )
- To enter character ê, press e <BS> > ( “e” + backspace + “>” )
Finally, the comand “:digraphs” can be used to list all available digraphs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
:digraphs NU ^@ 10 SH ^A 1 SX ^B 2 EX ^C 3 ET ^D 4 EQ ^E 5 AK ^F 6 BL ^G 7 BS ^H 8 HT ^I 9 LF ^@ 10 VT ^K 11 FF ^L 12 CR ^M 13 SO ^N 14 SI ^O 15 DL ^P 16 D1 ^Q 17 D2 ^R 18 D3 ^S 19 D4 ^T 20 NK ^U 21 SY ^V 22 EB ^W 23 CN ^X 24 EM ^Y 25 SB ^Z 26 EC ^[ 27 FS ^\ 28 GS ^] 29 RS ^^ 30 US ^_ 31 SP 32 Nb # 35 DO $ 36 At @ 64 <( [ 91 // \ 92 )> ] 93 '> ^ 94 '! ` 96 (! { 123 !! | 124 !) } 125 '? ~ 126 DT ^? 127 PA <80> 128 HO <81> 129 BH <82> 130 NH <83> 131 IN <84> 132 NL <85> 133 SA <86> 134 ES <87> 135 HS <88> 136 HJ <89> 137 VS <8a> 138 PD <8b> 139 PU <8c> 140 RI <8d> 141 S2 <8e> 142 S3 <8f> 143 DC <90> 144 P1 <91> 145 P2 <92> 146 TS <93> 147 CC <94> 148 MW <95> 149 SG <96> 150 EG <97> 151 SS <98> 152 GC <99> 153 SC <9a> 154 CI <9b> 155 ST <9c> 156 OC <9d> 157 PM <9e> 158 AC <9f> 159 NS 160 !I ¡ 161 Ct ¢ 162 Pd £ 163 Cu ¤ 164 Ye ¥ 165 BB ¦ 166 SE § 167 ': ¨ 168 Co © 169 -a ª 170 << « 171 NO ¬ 172 -- 173 Rg ® 174 'm ¯ 175 DG ° 176 +- ± 177 2S ² 178 3S ³ 179 '' ´ 180 My µ 181 PI ¶ 182 .M · 183 ', ¸ 184 1S ¹ 185 -o º 186 >> » 187 14 ¼ 188 12 ½ 189 34 ¾ 190 ?I ¿ 191 A! À 192 A' Á 193 A> Â 194 A? Ã 195 A: Ä 196 AA Å 197 AE Æ 198 C, Ç 199 E! È 200 E' É 201 E> Ê 202 E: Ë 203 I! Ì 204 I' Í 205 I> Î 206 I: Ï 207 D- Ð 208 N? Ñ 209 O! Ò 210 O' Ó 211 O> Ô 212 O? Õ 213 O: Ö 214 *X × 215 O/ Ø 216 U! Ù 217 U' Ú 218 U> Û 219 U: Ü 220 Y' Ý 221 TH Þ 222 ss ß 223 a! à 224 a' á 225 a> â 226 a? ã 227 a: ä 228 ... |
In this list:
- The first two characters are the characters to be entered after Ctrl-K
- Next comes the graphical representation of the resulting character
- Finally comes the decimal Unicode coding for the character
We can see that there are several hundred characters with digraphs. Among them, all control characters between hex 0x00 and hex 0x1F, non-graphical characters with codes between 0x7F and 0xFF, currency symbols (sterling pound, yen,…) most non-ASCII characters used in western languages, and other commonly used symbols.
Reloading an open file, reading it as utf8
Normally, vim detects the encoding of a file and opens it accordingly. But it may happen that a text file containing utf8 encoded text, also contains some control character. Vim interprets the encoding as “Non-ISO extended-ASCII text”, and the utf8 characters are not correctly displayed. In this case, we can force vim to reload the file as utf8, with the command:
1 2 3 |
:e! ++enc=utf8 |
References
Related posts
—