Lecture 2: Standard Library: Vectors, Iterators, Sets, and Maps

1 Standard Template Library (STL)

STL is a misnomer. It was a library that informed the C++ standard library, so it refers to a library that is very similar to some parts of the C++ standard library. Using the name STL for the whole standard library is incorrect. It refers to some parts of the standard library implemented using templates, hence the name. Specifically, it has four components (the current standard library of C++ grew leaps and bounds beyond this):

Algorithms
It provides some standard algorithms optimized for general use cases so that you don't have to write up a good algorithms for sortingThe standard library comes with plenty flavors of sorting algorithms., binary search, running sum, find minimum, etc. every time you need it.
Containers
The standard library comes with very flexible containers (vector, deque, linked lists, maps, sets, priority queue) that support certain operations cheaply (keep & traverse elements in sorting order, find the smallest element, find a given element).
Iterators
Iterators allow traversing data structures or a range of items (the newest version of C++ also has ranges for a similar purpose, but we will not get into it). With iterators, we can generalize some algorithms: For example, a copying algorithm can use iterators, so the same implementation works if copying from a vector to a set, or vice versa.
Functions
The standard library comes with some abstractions to pass functions around like normal objects, so we can write a generic algorithm that uses an unknown function (for example, abstracting over the comparison function in a sorting algorithm). We will take a look into function objects in later lectures.

2 The Standard Library

It's really rare if a programming language provides all of the necessary tools to accomplish a task "out of the box". Programmers usually use provided building blocks to create something specific to fit their needs. Programmers are also able to build additional building blocks to build upon as well

These libraries usually come standard with the language.

  • You don't have to download separate components.
  • The libraries should be cross-platform compatible (you shouldn't have to code differently based on running it with Windows, Mac, or Linux)

Implementing and maintaining libraries come with a cost.

  • Python has a dedicated organization called the Python Software Foundation.
  • Java was developed and maintained by Sun Microsystems, which has been bought by Oracle. Its reference implementation is open-source: OpenJDK.
  • Similarly, Rust was originally backed by Mozilla, but it is a community-developed open-source project now.
  • C++ isn't "owned" by anyone really. The standard is maintained by a consortium.

Note that all of these languages, the language evolves through series of community improvement proposals that the core developers (or the standards committee members) discuss and refine.

Since C++ isn't a product of a large organization, and is organized like open-source.

  • http://isocpp.org/, http://www.open-std.org/.
  • Individual C++ compilers are then implemented based on the specifications.
    • g++ / clang++ for Unix.
    • MSVC for Microsoft.
  • These compilers also ship with their implementation of the standard library, although you can mix-and-match the standard libraries and compilers to an extent.
  • … there isn't any guarantee that these behave EXACTLY the same, but they do for the most part based on the specifications. For example, they may implement different sorting algorithms for std::sort but the algorithm being used is guaranteed to run in \( \bigO(n \log n) \) time. In practice, the standard libraries implement roughly the same algorithm with slightly different performance tuning.
  • In this class, we'll assume we're using the C++17 specification unless stated otherwise.

3 Standard Libarary Containers

There are many implementations of containers.

  • Containers are data abstractions where you can store a sequence of elements.
  • Iterators are a common part of these containers, which allow you to "iterate" through the components.
    • They also act as handles to specific objects (rather than iterating over data), we will talk about this when we are discussing maps.
  • Depending on the container, you can even read from/write to these elements using iterators.

3.1 std::vector

  • A vector is a sequence of objects that are conceptually stored one after the other
  • Vectors are implemented with templates, so you can store one kind of object type in the vector container
# Makefile
CXX=g++

main: main.o
        ${CXX} -o main -std=C++17 main.o

clean:
        rm -f *.o main
// main.cpp
#include<vector>

int main() {
  std::vector<int> v; // a vector containing int elements
  return 0;
}

Under-the-hood, vectors are implemented using arrays and behave similar to arrays.

  • Vectors can be indexed starting from 0 to \( size - 1 \) yet they're different than arrays.
  • Vectors are dynamically-resizable
  • Vectors have a size associated with it.
  • Arrays do not know their size and the programmer must be aware of it.

3.2 Adding to a vector example

// main.cpp
#include<vector>

template <class T>
void printVector(std::vector<T> &v) {
  for (int i = 0; i < v.size(); i++) {
    std::cout << "v[" << i << "] = " << v[i] << std::endl;

  }
  // range-based for loop example
  // for (const T& i : v) {
  //  std::cout << "v[" << index << "] = " << i << std::endl;
  // }

}

int main() {
  std::vector<int> v;
  for (int i = 0; i < 5; i++) // it could be any reasonable size
    v.push_back(i);

  printVector(v);
  return 0;
}
  • Like arrays, if you index a vector element that is out of range, you will probably get junk data or make your program crash.
  • You can also use the .at() function to access an element. Which is the better option especially for containers other than std::vector. We will talk about why later.
  • Unlike operator [], if .at() references an element that the vector doesn't contain, an exception is thrown (more on exceptions later).

3.3 Example

std::cout << v.at(4) << std::endl;
std::cout << v.at(5) << std::endl; // EXCEPTION THROWN
std::cout << v1[5] << std::endl; // JUNK

Other supported operations are:

front()
returns the first element
back()
returns the last element
pop_back()
delete the last element
std::cout << "v.front() = " << v.front() << std::endl;
std::cout << "v.back() = " << v.back() << std::endl;
v.pop_back();
printVector(v);

3.4 Vector Initialization

push_back() is one way to create elements in a vector. Though it's not the only way

  • You can declare a vector with a size initially
  • You can also initialize a vector with a size and default values.

3.5 Example:

std::vector<int> v1(100); // initializes vector with 100 elements.
std::vector<int> v2(100, 1); //initializes vector with 100 elements = 1

3.6 Example creating a vector on the heap with a pointer reference to the vector contents on the heap

The following code snippet allocates both the vector itself and the contents on the heap. It is usually not what we want. std::vector already allocates the data on the heap so it uses a small space on the stack.

std::vector<int>* v = new vector<int>(10,1); // vector with 10 elements = 1
std::cout << v->size() << std::endl;
printVector(*v);

So, consider the following two vectors:

std::vector<int>* a = new vector<int>(1000,1); // vector with 1000 elements = 1
std::vector<int> b(1000,1); // vector with 1000 elements = 1

Here is where a, *a, and b are allocated on the memory:


 Stack                        Heap
┌────────────────────────┐   ┌────────────────────────┐
│                        │   │                        │
│  ┌─┐                   │   │    ┌───────────┐       │
│  │a├───────────────────┼───┼───►│*a         │       │
│  └─┘                   │   │    │           │       │
│                        │   │    │size: 1000 │       │
│  ┌────────────────┐    │   │    │data:  ────┼────┐  │
│  │b               │    │   │    │           │    │  │
│  │                │    │   │    └───────────┘    │  │
│  │size: 1000      │    │   │                     │  │
│  │data:  ──────┐  │    │   │                     │  │
│  │             │  │    │   │ ┌───────────────────▼┐ │
│  └─────────────┼──┘    │   │ │actual data of *a   │ │
│                │       │   │ │(this is huge)      │ │
│                │       │   │ └────────────────────┘ │
│                │       │   │                        │
│                │       │   │                        │
│                │       │   │  ┌──────────────────┐  │
│                └───────┼───┼─►│actual data of *b │  │
│                        │   │  │(also huge)       │  │
│                        │   │  └──────────────────┘  │
│                        │   │                        │
└────────────────────────┘   └────────────────────────┘

b stores only a pointer to the actual array, and the size on the stack. The 1000-element array is stored on the heap in both cases.

4 Iterators

  • An iterator is an abstraction for a position in a collection of objects.
  • Container classes in the C++ standard library support iterators.
  • It's common to think of an iterator as a pointer to an element's position
  • Though technically it's not a pointer, but most likely uses a pointer in its implementation.
  • Even though iterators are supported between different types of containers, an iterator can only be used with its own container type.

4.1 Example

vector<std::string> v2;

v2.push_back("Hello.");
v2.push_back("My");
v2.push_back("name");
v2.push_back("is");
v2.push_back("Batman");

for (vector<std::string>::iterator i = v2.begin(); i < v2.end(); i++) {
    std::cout << *i << " "; // std::string value
    std::cout << i->size() << std::endl; // prints the size of the strings
}

In the above example, we've seen vector functions that deal specifically with iterators.

begin()
returns an iterator that points to the first element
end()
returns an iterator that points to the last element
++
increments the iterator to the next element
<
compares positions of the iterator
*
dereferences an iterator to get the object

4.2 Example (Showing different ways to index elements using iterators):

vector<std::string>::iterator i = v2.begin();
std::cout << v2[4] << std::endl;      // Batman
std::cout << i[4] << std::endl;       // Batman
std::cout << *(i + 4) << std::endl;   // Batman

In order to erase items in the vector, there is an erase method that requires iterators to do this

4.3 Example of erasing elements

// Removing 2nd index of the vector
v2.erase(v2.begin() + 2); // remove "name"
printVector(v2);

// -- separate example --

// Removing 1st and 2nd index - [1,3)
v2.erase(v2.begin() + 1, v2.begin() + 3);
printVector(v2);

5 Sets

  • A set is a collection of unique values containing no duplicates.
  • Sets support iterators
    • Items in a set are in sorted order when iterating through them.
    • std::set are implemented using a variant of binary search trees (BSTs) you learned in 24. They are self-balancing, so their .

5.1 Example

#include <set>

int main() {
  std::set<std::string> s;
  s.insert("Case");
  s.insert("Molly");
  s.insert("Armitage");
  s.insert("Case"); // duplicate (only stored once)
  s.insert("Wintermute");
  // print out the contents
  //
  // question: why is there a `const_` below?
  for (set<std::string>::const_iterator i = s.begin(); i != s.end(); i++) {
    std::cout << *i << std::endl;
  }

  // We can use `auto` to let the compiler infer the type of i. This
  // is useful when dealing with long type names. This is equivalent to above.
  for (auto i = s.begin(); i != s.end(); i++) {
    std::cout << *i << std::endl;
  }
}

Note: We will use std::set in the examples for now, we will talk about another option that usually performs better when we talk about hashing.

Moreover, the standard library sets' design is constrained because they support some operations used rarely. Other C++ libraries such as Abseil support faster, almost-always drop-in replacements.

6 Finding an element in a set

  • find() returns an iterator to the item in a set if it exists.
    • Otherwise, find() returns set.end(). This is a special iterator marking the end of the set (similar to other containers). Such special marker values are also called sentinel values.
  • count() is another alternative. It returns number of times an element appears in a set. So, it is 1 if the element is in the set and it is 0 otherwise. C++ also supports multisets, when using a multiset you can put the same element more than once.

6.1 Example

if (s.find("Case") != s.end()) {
    std::cout << "Case exists!" << std::endl; // prints this
} else {
    std::cout << "Case does not exist" << std::endl;
}

if (s.find("Neuromancer") != s.end()) {
    std::cout << "Neuromancer exists!" << std::endl;
} else {
    std::cout << "Neuromancer does not exist" << std::endl;   // prints this
}

7 Maps

  • A map is an associated container containing a key / value mapping.
    • Like a set, the keys are unique.
    • Unlike a set, there is a value associated with each key.
    • std::map is also implemented using a self-balancing BST (usually red-black trees). So, their elements are also sorted in order when you traverse them.

7.1 Example

#include <map>
std::map<int, std::string> students; // mapping studentIDs to studentNames

// Use bracket notation for creation
students[0] = "Richert";
students[1] = "John Doe";
students[2] = "Jane Doe";
std::cout << "students[1] = " << students[1] << std::endl;

7.2 Example using find()

  • Similar to a set, find will look for a specific key and return map.end() if the key does not exist.
// Check if a student id exists
if (students.find(1) == students.end()) {
    std::cout << "Can’t find id = 1" << std::endl;
} else {
    std::cout << "Found student id = 1, Name = " << students[1] << std::endl;
}

7.3 Example using string and double types

map<std::string, double> stateTaxes;

stateTaxes["CA"] = 0.88;
stateTaxes["NY"] = 1.65;

if (stateTaxes.find("OR") == stateTaxes.end()) {
    std::cout << "Can't find OR" << std::endl;
} else {
    std::cout << "Found state OR" << std::endl;
}

7.4 Example between insert vs. []

  • insert() will add a key / value pair to the map.
    • If the key already exists, then .insert() will not replace the existing value.
  • [] will map a key to a specific value.
    • If the key already exists, then [] will replace the existing value.
    • If the key does not exist, it first creates a default value.
      • This may be expensive!
    • insert_or_assign() is the better option over [] when you want to insert/update a value but don't know if it is in the map already.
  • It is usually better to use .at() over [] when reading
    • It will throw an exception rather than inserting a default value silently.
  • It is also better to use .insert() or .insert_or_assign() when inserting rather than [].
  • In short, avoid using [] with maps unless you know that the key is already in the map.
  • Why are there loads of slightly different ways to do the same thing?
    • C++ is an old language.
    • The standards committee stuck with a behavior for [] and .at(). Rather than changing the behavior and breaking existing programs, they introduce new functions such as .insert_or_assign()
#include <utility> // for std::pair

// ...

students.insert(std::pair<int, std::string>(2, "Flatline")); // does not replace
// you can use curly braces with the pair
students.insert(std::pair<int, std::string>{2, "Flatline"});
// it is OK to drop the name as well, the compiler can figure it out
// most of the time
students.insert({2, "Chris Gaucho"});
students[2] = "Chris Gaucho"; // replaces
std::cout << students[2] << std::endl;

8 Erasing using iterators

  • erase() can either erase an item in a map using an iterator location OR a specific key value.

8.1 Example

// Erasing by iterator
auto p = students.find(2); // p's type is std::map<int,
                           // std::string>::iterator. It is a
                           // mouthful.
students.erase(p); // erases "Jane Doe"

// Erasing by key
students.erase(0); // erases "Richert"

// print out the entire map.
for (auto i = students.begin(); i != students.end(); i++) {
    std::cout << i->first << ": " << i->second << std::endl;
}

Footnotes:

1

The standard library comes with plenty flavors of sorting algorithms.

2

C++ also supports multisets, when using a multiset you can put the same element more than once.

Author: Mehmet Emre

Created:

The material for this class is based on Prof. Richert Wang's material for CS 32