about locale specific characters within Swiss File Knife on the Windows command line.
characters and codepages with SFK for Windows:
SFK uses 8-bit character codes with a possible
range of 255 different characters. see: sfk ascii
character codes 32-126, or hexadecimal 0x20-0x7E,
are 7-bit ASCII characters. within SFK they are
called "Low Codes", or LoCodes. as long as you
use only a-z A-Z 0-9 !"#$%&_ etc. you use LoCodes,
which will work the same on every computer in the
world, and you can ignore code pages.
but as soon as you want to use accent characters,umlauts, cyrillic, greek etc. you need HiCodes
in the range 0x80-0xFF. these are dependent on the
codepages of your Windows system, and you can only
use chars of your own language, plus English.
your Windows CMD.EXE command line uses two codepages:1. ANSI codepage for data processing.
every text within SFK is encoded in this codepage.
Most text editor programs like Notepad will
use this codepage by default.
2. Dos/OEM codepage for input and display.
what you type on your keyboard is encoded in 850.
the CMD.EXE terminal can only display HiCodes in
this codepage correctly.
HiCode conversions step by step:
- when you run sfk, and pass parameters, these are
converted from OEM to Ansi and then given to sfk.
so sfk gets only Ansi encoded parameters.
- within SFK all data processing is done with Ansi,
e.g. filter ... +xed ... will pass Ansi text.
- when printing text to terminal, SFK converts it
from Ansi to OEM for output. otherwise HiCodes
would all look wrong, as the terminal needs OEM.
- when writing text output to file, like
filter ... >out.txt
filter ... +tofile out.txt
it is written as Ansi, without any conversion.
you can then open out.txt with the Notepad
or Depeche View, which expect Ansi text,
and HiChars will display correctly.
Beware of HiCodes within batch files.
- if you run SFK interactively like:
sfk filter in.txt -+myword
and myword contains HiCodes, you type them
all as OEM chars, and it works.
- if you create a batch file with Windows Notepad,
and therein type
sfk filter in.txt -+myword
and myword contains HiCodes, you will find that
filter no longer finds the word.
Because Notepad created an Ansi encoded text file,
so the "myword" chars are Ansi encoded.
- CMD.EXE still thinks "myword" is OEM,
and incorrectly "converts" it to Ansi,
which actually breaks all HiCode chars.
- sfk.exe then gets myword with completely
wrong encoding, and the search fails.
how to fix this:
- write your .bat files with OEM encoding.
this can be done with Notepad++:
- create a new file mytest.bat
- select: Encoding / Character Set / your area,
then select your OEM codepage.
- now type sfk commands into the batch file,
and save it.
- side effect: if you create sfk scripts
embedded in such a batch file, like:
sfk batch mytest2.bat
searches therein will fail again if this
is OEM encoded. because by default "sfk script"
wants to load Ansi text. to fix this use
option -dos like: sfk script -dos ...What is not possible?
SFK cannot process any text outside your Ansi codepage.
for example, if a computer uses Western Europe
codepage 1252, it is possible to search German umlauts
and some French accent characters. but it is impossible
to search and filter cyrillic text (encoded in 1251),
and it will even be impossible to type cyrillic chars
in the first place, as the keyboard has no such keys.
see also:sfk help nocase about case insensitive search
sfk help unicode unicode to Ansi conversion