Interface version 2.0
This is the proposed interface for the version 2.0 of this library (as of 2020-04-29).
base64_encode
base64_encode()
comes in two overloaded versions:
std::string base64_encode(std::string const& s, bool url = false);
std::string base64_encode(unsigned char const*, unsigned int len, bool url = false);
Both of these functions encode data as Base 64 and return the encoded string as a std::string
.
The parameter
url
determines if the encoded string can be used in
URLs: if
url
is set to
true
, the encoded string will contain
-
,
_
and
+
instead of
+
,
/
and
=
. (See
Wikipedia for more information).
The parameter len
is needed for the second version because the length of an unsigned char const*
is not determined if the data contains null values.
base64_encode_pem / base64_encode_mime
std::string base64_encode_pem (std::string const& s);
std::string base64_encode_mime(std::string const& s);
These two functions also encode data as Base 64. Additionally, they insert a line break after each 64th (pem) and 76th (mime) encoded characters.
base64_decode
base64_decode()
decodes an encoded string.
std::string base64_decode(std::string const& s, bool remove_linebreaks = false);
The parameter
remove_linebreaks
needs to be set to
true
if the encoded string is expected to contain
line breaks, for example because they were encoded with
base64_encode_pem()
or
base64_encode_mime()
.
Contributions
I am thankful for the following contributions to this libary.
minastaros for improving the code and adding test cases. (Version
1.01.00
)
Khiem Doan added the correct
headerfile:
<cctype>
rather than
<iostream>
, which is sufficient for
isalnum()
.
2020-04-29: it turns out, this header file is not needed anymore.
wbfloofsky provided a pull request that encouraged me to write directly into pre-allocated
std::string
buffers rather than first write into a separate buffer and then copy this buffer (
push_back()
) to the string. This change led to Version 1.03.
wbfloofsky also added the functionality to measure the time it takes to encode and decode a string in
test.cpp
. This inspired me to add
measure-time.cpp
which does that, and only that.
Francisco Ruiz added
test-google.cpp
, which is more or less the same as
test.cpp
, but apparently can be used in a Google testing framework.
Francisco Ruiz and
huangxinV587 both suggested that I use a lookup function to find the position of an encodeded character in
base64_chars
in order to improve performance. I have merged their code into one version and given the lookup function the name
pos_of_char()
.
JomaCorpFX provided the main functionality for
base64_encode_pem()
,
base64_encode_mime()
and the
url
parameter in
base64_encode()
.
kosniaz spotted a bug with url-encoded data. This bug is now fixed (2020-05-09) and resulted in release candidate
2.rc.01.
Yannic Bonenberger provided an interface (function overloads) for
std::string_view
(rather than
const std::string&
). This interface requires at least C++17. This interface is available as of release candidate
2.rc.02 (2020-05-13).
Yannic Bonenberger also notified me of a concurrency issue if the library was used in a multi-threaded environment. This issue is fixed with 2.rc.03 (2020-05-13).
Celemony (via
mynameisjohn) provided the changes to fix
implicit cast warnings (
size_t
etc.), resulting in
2.rc.04 (2020-05-31).
Pablo Martin-Gomez changed the
throw "…"
(which must be caught by
catch (const char* …)
to
throw std::runtime_error("Input is not valid base64-encoded data.")
(which can be caught by
catch (std::exception …)
), resulting in
2.rc.05 (2020-10-23).
Pablo Martin-Gomez also exchanged the cumbersome while
loop
size_t pos=0;
while ((pos = copy.find("\n", pos)) != std::string::npos) {
copy.erase(pos, 1);
}
with the much more elegant erase-remove idiom which allows to move the elements to be kept only once instead of at each erase()
call.
#include <algorithm>
…
copy.erase(std::remove(copy.begin(), copy.end(), '\n'), copy.end());
Pablo Martin-Gomez also improved the code by returning early from the function decode()
if encoded_string
is empty, resulting in 2.rc.07 (2020-10-23).
Pablo Martin-Gomez,
xanather and
Joma notified me of a
buffer overrun problem that occurs when trying to decode unpadded data (which RFC 2045 apparently explicitly allows).
base64.cpp
/*
base64.cpp and base64.h
base64 encoding and decoding with C++.
More information at
https://renenyffenegger.ch/notes/development/Base64/Encoding-and-decoding-base-64-with-cpp
Version: 2.rc.09 (release candidate)
Copyright (C) 2004-2017, 2020-2022 René Nyffenegger
This source code is provided 'as-is', without any express or implied
warranty. In no event will the author be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this source code must not be misrepresented; you must not
claim that you wrote the original source code. If you use this source code
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original source code.
3. This notice may not be removed or altered from any source distribution.
René Nyffenegger rene.nyffenegger@adp-gmbh.ch
*/
#include "base64.h"
#include <algorithm>
#include <stdexcept>
//
// Depending on the url parameter in base64_chars, one of
// two sets of base64 characters needs to be chosen.
// They differ in their last two characters.
//
static const char* base64_chars[2] = {
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
"+/",
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
"-_"};
static unsigned int pos_of_char(const unsigned char chr) {
//
// Return the position of chr within base64_encode()
//
if (chr >= 'A' && chr <= 'Z') return chr - 'A';
else if (chr >= 'a' && chr <= 'z') return chr - 'a' + ('Z' - 'A') + 1;
else if (chr >= '0' && chr <= '9') return chr - '0' + ('Z' - 'A') + ('z' - 'a') + 2;
else if (chr == '+' || chr == '-') return 62; // Be liberal with input and accept both url ('-') and non-url ('+') base 64 characters (
else if (chr == '/' || chr == '_') return 63; // Ditto for '/' and '_'
else
//
// 2020-10-23: Throw std::exception rather than const char*
//(Pablo Martin-Gomez, https://github.com/Bouska)
//
throw std::runtime_error("Input is not valid base64-encoded data.");
}
static std::string insert_linebreaks(std::string str, size_t distance) {
//
// Provided by https://github.com/JomaCorpFX, adapted by me.
//
if (!str.length()) {
return "";
}
size_t pos = distance;
while (pos < str.size()) {
str.insert(pos, "\n");
pos += distance + 1;
}
return str;
}
template <typename String, unsigned int line_length>
static std::string encode_with_line_breaks(String s) {
return insert_linebreaks(base64_encode(s, false), line_length);
}
template <typename String>
static std::string encode_pem(String s) {
return encode_with_line_breaks<String, 64>(s);
}
template <typename String>
static std::string encode_mime(String s) {
return encode_with_line_breaks<String, 76>(s);
}
template <typename String>
static std::string encode(String s, bool url) {
return base64_encode(reinterpret_cast<const unsigned char*>(s.data()), s.length(), url);
}
std::string base64_encode(unsigned char const* bytes_to_encode, size_t in_len, bool url) {
size_t len_encoded = (in_len +2) / 3 * 4;
unsigned char trailing_char = url ? '.' : '=';
//
// Choose set of base64 characters. They differ
// for the last two positions, depending on the url
// parameter.
// A bool (as is the parameter url) is guaranteed
// to evaluate to either 0 or 1 in C++ therefore,
// the correct character set is chosen by subscripting
// base64_chars with url.
//
const char* base64_chars_ = base64_chars[url];
std::string ret;
ret.reserve(len_encoded);
unsigned int pos = 0;
while (pos < in_len) {
ret.push_back(base64_chars_[(bytes_to_encode[pos + 0] & 0xfc) >> 2]);
if (pos+1 < in_len) {
ret.push_back(base64_chars_[((bytes_to_encode[pos + 0] & 0x03) << 4) + ((bytes_to_encode[pos + 1] & 0xf0) >> 4)]);
if (pos+2 < in_len) {
ret.push_back(base64_chars_[((bytes_to_encode[pos + 1] & 0x0f) << 2) + ((bytes_to_encode[pos + 2] & 0xc0) >> 6)]);
ret.push_back(base64_chars_[ bytes_to_encode[pos + 2] & 0x3f]);
}
else {
ret.push_back(base64_chars_[(bytes_to_encode[pos + 1] & 0x0f) << 2]);
ret.push_back(trailing_char);
}
}
else {
ret.push_back(base64_chars_[(bytes_to_encode[pos + 0] & 0x03) << 4]);
ret.push_back(trailing_char);
ret.push_back(trailing_char);
}
pos += 3;
}
return ret;
}
template <typename String>
static std::string decode(String const& encoded_string, bool remove_linebreaks) {
//
// decode(…) is templated so that it can be used with String = const std::string&
// or std::string_view (requires at least C++17)
//
if (encoded_string.empty()) return std::string();
if (remove_linebreaks) {
std::string copy(encoded_string);
copy.erase(std::remove(copy.begin(), copy.end(), '\n'), copy.end());
return base64_decode(copy, false);
}
size_t length_of_string = encoded_string.length();
size_t pos = 0;
//
// The approximate length (bytes) of the decoded string might be one or
// two bytes smaller, depending on the amount of trailing equal signs
// in the encoded string. This approximation is needed to reserve
// enough space in the string to be returned.
//
size_t approx_length_of_decoded_string = length_of_string / 4 * 3;
std::string ret;
ret.reserve(approx_length_of_decoded_string);
while (pos < length_of_string) {
//
// Iterate over encoded input string in chunks. The size of all
// chunks except the last one is 4 bytes.
//
// The last chunk might be padded with equal signs or dots
// in order to make it 4 bytes in size as well, but this
// is not required as per RFC 2045.
//
// All chunks except the last one produce three output bytes.
//
// The last chunk produces at least one and up to three bytes.
//
size_t pos_of_char_1 = pos_of_char(encoded_string.at(pos+1) );
//
// Emit the first output byte that is produced in each chunk:
//
ret.push_back(static_cast<std::string::value_type>( ( (pos_of_char(encoded_string.at(pos+0)) ) << 2 ) + ( (pos_of_char_1 & 0x30 ) >> 4)));
if ( ( pos + 2 < length_of_string ) && // Check for data that is not padded with equal signs (which is allowed by RFC 2045)
encoded_string.at(pos+2) != '=' &&
encoded_string.at(pos+2) != '.' // accept URL-safe base 64 strings, too, so check for '.' also.
)
{
//
// Emit a chunk's second byte (which might not be produced in the last chunk).
//
unsigned int pos_of_char_2 = pos_of_char(encoded_string.at(pos+2) );
ret.push_back(static_cast<std::string::value_type>( (( pos_of_char_1 & 0x0f) << 4) + (( pos_of_char_2 & 0x3c) >> 2)));
if ( ( pos + 3 < length_of_string ) &&
encoded_string.at(pos+3) != '=' &&
encoded_string.at(pos+3) != '.'
)
{
//
// Emit a chunk's third byte (which might not be produced in the last chunk).
//
ret.push_back(static_cast<std::string::value_type>( ( (pos_of_char_2 & 0x03 ) << 6 ) + pos_of_char(encoded_string.at(pos+3)) ));
}
}
pos += 4;
}
return ret;
}
std::string base64_decode(std::string const& s, bool remove_linebreaks) {
return decode(s, remove_linebreaks);
}
std::string base64_encode(std::string const& s, bool url) {
return encode(s, url);
}
std::string base64_encode_pem (std::string const& s) {
return encode_pem(s);
}
std::string base64_encode_mime(std::string const& s) {
return encode_mime(s);
}
#if __cplusplus >= 201703L
//
// Interface with std::string_view rather than const std::string&
// Requires C++17
// Provided by Yannic Bonenberger (https://github.com/Yannic)
//
std::string base64_encode(std::string_view s, bool url) {
return encode(s, url);
}
std::string base64_encode_pem(std::string_view s) {
return encode_pem(s);
}
std::string base64_encode_mime(std::string_view s) {
return encode_mime(s);
}
std::string base64_decode(std::string_view s, bool remove_linebreaks) {
return decode(s, remove_linebreaks);
}
#endif // __cplusplus >= 201703L
base64.h
//
// base64 encoding and decoding with C++.
// Version: 2.rc.09 (release candidate)
//
#ifndef BASE64_H_C0CE2A47_D10E_42C9_A27C_C883944E704A
#define BASE64_H_C0CE2A47_D10E_42C9_A27C_C883944E704A
#include <string>
#if __cplusplus >= 201703L
#include <string_view>
#endif // __cplusplus >= 201703L
std::string base64_encode (std::string const& s, bool url = false);
std::string base64_encode_pem (std::string const& s);
std::string base64_encode_mime(std::string const& s);
std::string base64_decode(std::string const& s, bool remove_linebreaks = false);
std::string base64_encode(unsigned char const*, size_t len, bool url = false);
#if __cplusplus >= 201703L
//
// Interface with std::string_view rather than const std::string&
// Requires C++17
// Provided by Yannic Bonenberger (https://github.com/Yannic)
//
std::string base64_encode (std::string_view s, bool url = false);
std::string base64_encode_pem (std::string_view s);
std::string base64_encode_mime(std::string_view s);
std::string base64_decode(std::string_view s, bool remove_linebreaks = false);
#endif // __cplusplus >= 201703L
#endif /* BASE64_H_C0CE2A47_D10E_42C9_A27C_C883944E704A */