Base64 decode and encode

About Base64

Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding.

Design

The particular set of 64 characters chosen to represent the 64 place-values for the base varies between implementations. The general strategy is to choose 64 characters that are both members of a subset common to most encodings, and also printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not 8-bit clean. For example, MIME's Base64 implementation uses A–Z, a–z, and 0–9 for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is UTF-7.

The earliest instances of this type of encoding were created for dialup communication between systems running the same OS — e.g., uuencode for UNIX, BinHex for the TRS-80 (later adapted for the Macintosh) — and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.

Examples

In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the bytes 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from left to right (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.

Text content M a n
ASCII 77 (0x4d) 97 (0x61) 110 (0x6e)
Bit pattern 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
Index 19 22 5 46
Base64-encoded T W F u

As this example illustrates, Base64 encoding converts three octets into four encoded characters.

The Base64 index table:

Value Char Value Char Value Char Value Char
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

When the number of bytes to encode is not divisible by three (that is, if there are only one or two bytes of input for the last 24-bit block), then the following action is performed:

Add extra bytes with value zero so there are three bytes, and perform the conversion to base64.

If there was only one significant input byte (say 'M'), only the first two base64 digits are picked (12 bits).

Text content M
ASCII 77 (0x4d) 0 (0x00) 0 (0x00)
Bit pattern 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Index 19 16 0 0
Base64-encoded T Q = =

If there were two significant input bytes (say 'Ma'), the first three base64 digits are picked (18 bits). '=' characters might be added to make the last block contain four base64 characters.

Text content M a
ASCII 77 (0x4d) 97 (0x61) 0 (0x00)
Bit pattern 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0
Index 19 22 4 0
Base64-encoded T W E =

As a result, when the last input group contains only one octet, the four least significant bits of the last used 6-bit block are set to zero:

Bit pattern 0 1 0 0 0 0
Index 16
Base64-encoded Q

And when the last input group contains two octets, the two least significant bits of the last used 6-bit block are set to zero:

Bit pattern 0 0 0 1 0 0
Index 4
Base64-encoded E

Padding

The '==' sequence indicates that the last group contained only one byte, and '=' indicates that it contained two bytes. The example below illustrates how truncating the input of the whole of the above quote changes the output padding:

Length Input Length Output Padding
20 any carnal pleasure. 28 YW55IGNhcm5hbCBwbGVhc3VyZS4= 1
19 any carnal pleasure 28 YW55IGNhcm5hbCBwbGVhc3VyZQ== 2
18 any carnal pleasur 24 YW55IGNhcm5hbCBwbGVhc3Vy 0
17 any carnal pleasu 24 YW55IGNhcm5hbCBwbGVhc3U= 1
16 any carnal pleas 24 YW55IGNhcm5hbCBwbGVhcw== 2

The same characters will be encoded differently depending on their position within the three-octet group which is encoded to produce the four characters. For example:

Input Output
pleasure. cGxlYXN1cmUu
leasure. bGVhc3VyZS4=
easure. ZWFzdXJlLg==
asure. YXN1cmUu
sure. c3VyZS4=

The ratio of output bytes to input bytes is 4:3 (33% overhead). Specifically, given an input of n bytes, the output will be 4 \lceil n/3 \rceil bytes long, including padding characters.

In theory, the padding character is not needed for decoding, since the number of missing bytes can be calculated from the number of Base64 digits. In some implementations, the padding character is mandatory, while for others it is not used. One case in which padding characters are required is concatenating multiple Base64 encoded files.

Decoding Base64 with padding

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single '=' indicates that the four characters will decode to only two bytes, while '==' indicates that the four characters will decode to only a single byte. For example:

Encoded Padding Length Decoded
YW55IGNhcm5hbCBwbGVhcw== two '='s one any carnal pleas
YW55IGNhcm5hbCBwbGVhc3U= one '=' two any carnal pleasu
YW55IGNhcm5hbCBwbGVhc3Vy no '='s three any carnal pleasur

Decoding Base64 without padding

Without padding, after normal decoding of four characters to three bytes over and over again, less than four encoded characters may remain. In this situation only two or three characters shall remain. A single remaining encoded character is not possible. For example:

Length Encoded Length Decoded
2 YW55IGNhcm5hbCBwbGVhcw 1 any carnal pleas
3 YW55IGNhcm5hbCBwbGVhc3U 2 any carnal pleasu
4 YW55IGNhcm5hbCBwbGVhc3Vy 3 any carnal pleasur

source: wikipedia.org