Sisulizer's Kaboom - The conversion utility

What is Sisulizer's Kaboom?
Sisulizer's Kaboom is a converter utility for file and clipboard data in text format. It is useful in your daily development and localization work. You can download and use it for free.
- The file converter part of Kaboom fully supports ANSI, UNICODE and DBCS code pages.
- The clipboard converter is made for the current ANSI code page (8-Bit) of your windows installation.
Kaboom is a Classic Visual Basic application with multi-language string resources. In the moment it has English and German strings. Your operating system chooses the language it displays at startup.
Lataa
| Tuote |
Muoto |
pvm |
Koko MB |
| Kaboom 2 |
Asennus EXE |
9/28/2006 |
1.7
|
Online Manual
After the installation Kaboom is ready to use. You can start it from the Windows Start menu. After the start Kaboom shows its main menu with three entries.

- File Converter. This opens the file converter part of Kaboom.
- Clipboard Converter. This opens the clipboard converter part of Kaboom.
- Online Manual. This menu entry loads this page.
The red x-Button in the title bar of Kaboom is used to close the application.
File Converter
The file converter allows to convert text files stored in one code page into another. It is e.g. possible to convert a file written on a Japanese computer using shift-jis into an UNICODE file. Kaboom checks which conversions are available on your computer and offers these for your usage.
Converting a file

Converting a file with Kaboom is simple. Just follow these steps:
- Select the text file in the foreign code page with the three-dots button "...". Kaboom opens a file select dialog box helping you to find the file to convert.
- If your file is marked with some BOM (Byte-Order-Mark) then Kaboom has allready selected the correct code page for you. Most files you will convert will not have a BOM. Therefore you must select the correct code page with the "Code page group" and "Code page" combo boxes for the source file. The sorting of the code pages should help you to find the correct one.
- In the preview you can see right away if your selection of the code page is correct.
- Sometimes files have additional specialties. E.g. the line feed encoding of UNIX and Macintosh files is different. "Additional Filter" take care of such. BASE64 encoding is sometimes used in e-mails.
- Kaboom creates some filename for the target file for you. You can use the three-dots "..." button if you want to choose a different name for "Target Filename".
- In the "Code page group" and "Code page" combo boxes you can select the code page of the target file.
- If the target code page is UTF-7, UTF-8, UTF-16-LE or UTF-16-BE then the target file can have a BOM (Byte-Order-Mark). The checkbox allows you to write a BOM or not.
- The "Convert" button starts the conversation.
Attention: Not all conversions make sense! If you e.g. convert a 8-Bit file written with a Cyrillic code page like KOI8 into a 8-Bit file for code page 1252 (Windows Western) then information will become lost. But you can convert it into a Cyrillic file for the Macintosh (Code page 10007) or into an UNICODE format like UTF-7, UTF-8, or UTF-16. Especially the UNICODE formats are always a good choice because they can hold up to 65535 different characters while ANSI files can only have 256 different characters.
| Background Info: What is a codepage and why is it needed?
Codepages are needed because ANSI files only have 8-Bit to display a character. This means there are only 256 possible chars - way too less for all languages of the world.
The American charset needs only 128 different chars = 7-Bit. While 7-Bit was a bit unhandy for computers this lead to the situation that another bit and therefore another 128 possibilities are available to display chars.
On MS-DOS some of these have been used for drawing boxes and lines. With Windows these boxes and lines have been removed form the charsets and instead more foreign chars have been added. For the most western languages like English, French, German, and others this was fine. The German charset for example only needs seven extra chars to the US charset - room enough for special chars from Spain and Norway etc.
But for e.g. Cyrillic charsets the space was not big enough. So codepages are needed to fill that gap. A codepage on Windows is nothing more than an information, that the upper 128 chars use some other characters. Instead of e.g. the German umlaut Ü a Cyrillic Ш is shown which both have the ANSI value 205. So if the Windows codepage 1252 is selected a Ü is shown while with the Russian Windows codepage 1251 Ш (sha) is shown.
If codepages are used it is impossible to show Ü and Ш on the same display. This is only possible if UNICODE is used (E.g. this page uses UNICODE (UTF-8) to display both chars).
While this solves the problems for most of the languages the codepage technique does not help for languages with more than 128 special chars, e.g. like Japanese, Korean and Chinese. For these languages DBCS has been invented. While the lower 128 chars are still the same as in US codepages the upper 128 are specially encoded. Here one char of the upper 128 chars starts a multi-byte sequence. This means that one char is stored in one or many chars. For example in Japanese shift-jis one char can use up to five bytes.
So if somebody writes some text file on her or his computer and does not use UNICODE to save it the current codepage will be used. If this file is given to someone whith some other current codepage it will not show correctly. So if you are in Western Europe or USA and you get a text file from somebody from Greece, Turkey, China, Japan the chances are high that the file is useless to you. Here Kaboom comes to your help. Simply convert the file into UNICODE and print, edit or use it in any way -- without loosing information. If you edit the file and you want to return it with your changes simply convert it back into the codepage the receiver needs it. The handling in Kaboom is pretty easy.
|
| Background info: What is a BOM?
BOM is a short-cut for Byte-Order-Mark. It is written to the beginning of a text file to tell the reading application in which order little- or big-endian the bytes are organized if a char is stored in 16- or 32-Bit UNICODE. But the BOM is also used to mark UFT-7 and UTF-8 files. These files are 8-Bit files using a code to store 16-Bit chars so the name BOM for them is a bit missleading. But while it is quiet nice to know the file format of a file a BOM can be used to mark the format inside the file. If a file is read by an application not aware of BOMs it will show the chars used to sign the file as data. In this case you can use Kaboom to read in a file with BOM and convert it into a file without using the BOM checkbox.
|
| Background info: What is Little-Endian and Big-Endian
There are two types of byte-ordering: big- and little-endian. Intel processors are using the little-endian order, this means the more significant digits in a number are on the right side. If we write some number like 4711 the most significant digit is 4 (= 4.000) is on the left side. The BOM shows the application in which direction the numbers have the read.
|
Clipboard Converter
The clipboard converter knows the following conversions and filters.
Converting Clipboard Data

Converting a text with Kaboom is simple. Just follow these steps:
- Select the text you want to convert in some other application, e.g. Windows Notepad, and use the shortcut ctrl+c to copy it into your Windows clipboard.
- Switch to Kaboom's Clipboard converter and make sure the checkboxes "Paste from clipboard before convert" and "Copy to clipboard after convert" are checked like in the screen shot.
- Select in the Filter group and Filter combo boxes the conversion you want to perform
- Press the button. Depending on the filter and the action he performs the button has different titles.
- The converted text is now ready to be pasted into some other application, e.g. Windows Notepad. You can e.g. use the shortcut ctrl+v to past the converted text into notepad.
If you want to manually enter text into the source field make sure that the checkbox "Paste from clipboard before convert" is unchecked. Kaboom will now convert what you type into the text field if you press the button. The link titled paste will retrieve the content of the Windows clipboard and overwrites the current content of the text field. If you want to add text to already existing text in the source text field you should use the standard Windows shortcut for the clipboard like ctrl+v .
If you are not sure if the conversion you selected is the right one you should uncheck "Copy to clipboard after convert". Kaboom will display the result in the target text field. Use the link titled copy to manually copy the content of the field to the Windows clipboard.
Available filters in Kaboom
Char Filters
Clean Up String
Replaces white chars from a string with underline chars _.
Lower Case
Changes all upper chars to lower chars
Make Caps
This makes the first char of every word in the string upper case.
Remove White Chars
"Remove White Chars" removes all punctuation and other chars from the input. In Kaboom White Chars are the following chars: <Blank><Tab><CR><LF>,;:./(){}[]<>+-~#*&%$§!=\'"
Tabs to Blanks
Changes Tab chars into Blank chars.
Upper Case
Changes all lower chars to upper chars
Checksums
CRC16
The filter calculates the CRC16 checksum for the string in the source field.
CRC32
The filter calculates the CRC32 checksum for the string in the source field.
Internet Checksum
The filter calculates a so called Internet checksum for the string in the source field.
Code page Filters
Char to OEM
Converts a string from an ANSI char set into a char set used in a DOS session.
Code page<current ANSI code page> to UTF-7
Converts a text using the current ANSI code page into UTF-7. The target field will show the escape chars used in UTF-7 instead of interpreting them.
Code page<current ANSI code page> to UTF-8
Converts a text using the current ANSI code page into UTF-8. The target field will show the escape chars used in UTF-8 instead of interpreting them.
OEM to Char
Converts a string from the char set used in a DOS session into ANSI char set.
UTF-7 to Code page<current ANSI code page>
Converts a text using UTF-7 escaped into the current ANSI code page. The source field will show the escape chars used in UTF-7 instead of interpreting them.
UTF-8 to Code page<current ANSI code page>
Converts a text using UTF-8 escaped into the current ANSI code page. The source field will show the escape chars used in UTF-8 instead of interpreting them.
Code page Finder
These group does not contain classic filters. The functions here are service functions to find a code page for a number and vice versa.
Code page Name from Code page Number
This functions finds the code page number used by Windows for some code page name, e.g. "shift_jis" or "shift-jis" will result in 932. For some code pages Kaboom knows more than one name
Code page Number from Code page Name
This functions finds the code page name used by Windows for some code page number, e.g. 932 will result in "shift_jis". While there can be more than one name for one code page number Kaboom will return the name used in the headers of Mime or HTML-files
Filenames
This group is also not a classic filter. Nevertheless the functions can be sometimes handy in your daily (development) work.
Calc full filename
This filter can convert a filename like
c:\windows\system32\..\..\test\test.dat
into
c:\test\test.dat.
Long Filename to Short
Modern Windows uses long filenames. But sometimes the short 8.3 filename representations is needed. This function finds the short filename.
Path with Drive to UNC
This filter finds the UNC representation of a network path using a drive letter.
Short Filename to Long
Modern Windows uses long filenames. Sometimes the short 8.3 filename is given. This function finds the long filename.
Hex Decoder
Hex-Stream
This filter changes a string with hexadecimal numbers into characters.
Hex Encoder
Hex-Dump
This filter changes the character char values into their hexadecimal representation or vice versa. The output is formatted in columns and rows so a human can easily read them. There is no decoder for this format.
Hex-Stream
This filter changes the character char values into their hexadecimal representation.
Internet Decoder
International Domain Names (IDNA/PunyCode)
There is a new standard for using special chars in URLs called IDNA. If you want to register a domain name having special chars, like Japanese, Spanish or French accents or German umlauts. You can use this filter to remove the computer coding and see the text in human text. Please be aware that this part of Kaboom is ANSI based, you some IDNA from China will not render correctly on some Western computer and vice versa.
Internet Encoder
International Domain Names (IDNA/PunyCode)
There is a new standard for using special chars in URLs called IDNA. If you want to register a domain name having special chars, like Japanese, Spanish or French accents or German umlauts you can use this filter to get the actual text to register. You can use only special chars your actual system allows to display in your current ANSI char set.
Mail Data Base64
Base64 encryption is sometimes used in the body of E-Mails.
Mail Data Quoted Printable
Quoted printable is found in the body part of E-Mails. QP encodes special chars in way that it can be transported as 7-Bit ANSI.
Mail Header Quoted Binary (RFC1522)
Binary (Base64) encoding is found in the header part of E-Mails. QP encodes special chars in way that it can be transported as 7-Bit ANSI.
Mail Header Quoted Printable (RFC1522)
Quoted printable encoding is found in the header part of E-Mails. QP encodes special chars in way that it can be transported as 7-Bit ANSI.
URL
A URL in the browser encrypts special chars e.g. <Blanks> become %20. Some spammers try to use this to fake you. If you see URL encoded this way in your E-Mail you will not know where it links to. Kaboom can decrypt this for you.
Internet Encoder
AntiHarvest (complete NCR)
AntiHarvest changes every char in the input field into Numeric Character Reference (NCR). NCR is used in HTML to describe special characters like umlauts, accented chars or signs like < > & and so on. Normally only the special chars are encrypted as NCR. The AntiHarvest filter encrypts all chars of the string. The result can be used for links to E-Mail Addresses on web sites. This helps to protect you E-Mail Address from E-Mail Harvester visiting your web site to grab E-Mail addresses. The grabbed addresses will be used to send spam to your postbox.
International Domain Names (IDNA/PunyCode)
There is a new standard for using special chars in URL's called IDNA. If you want to register a domain name having special chars, like Japanese, Spanish or French accents or German umlauts you can use this filter to get the actual text to register. You can use only special chars your actual system allows to display in your current ANSI char set.
Mail Data Base64
Base64 encryption is sometimes used in the body of E-Mails.
Mail Data Quoted Printable
Quoted printable is found in the body part of E-Mails. QP encodes special chars in way that it can be transported as 7-Bit ANSI.
Mail Header Quoted Binary (RFC1522)
Binary (Base64) encoding is found in the header part of E-Mails. QP encodes special chars in way that it can be transported as 7-Bit ANSI.
Mail Header Quoted Printable (RFC1522)
Quoted printable encoding is found in the header part of E-Mails. QP encodes special chars in way that it can be transported as 7-Bit ANSI.
Numeric Character Reference (NCR)
Changes special chars in the input into their Numeric Character Reference (NCR). NCR is used in HTML to describe special characters like umlauts, accented chars or signs like < > & and so on.
URL
A URL in the browser encrypts special chars e.g. <Blanks> become %20. Some spammers try to use this to fake you. If you see URL encoded this way in your E-Mail you will not know where it links to. Kaboom can create this format for you.
Line Feeds
CR to CRLF / CRLF to CR / CRLF to LF / LF to CRLF
Different operation systems have different new line definition. While Windows used CRLF (Carriage Return plus Line Feed) UNIX only uses CR. Sometimes you get UNIX document where everything seems to be printed in one line in Windows Notepad. These filters will solve the problem.
CRLF to <BR>
This filter changes every new line into a HTML <br>-tag.
CRLF to Blanks
This filter changes every new line into a single blank char (" ").
Other Filters
RLE Encode/Decode
This is a simple running length encoding. If a string contains the same chars in a row this encoding will shrink the string.
ROT13
Encrypts a string in a way that a human can not read it. If you use the function twice the effect is reversed.
Soundex
This is not a classical filter. Soundex calculates the "Soundex" value of a text. Text with the same Soundex value sound similar if spoken.
Strip Tags from HTML
Removes Tags from HTML and returns the plain text information.
|