Can I put 1 billion nodes into a linked list?Filed Under: Weekly Tuesday Dose of goodness
Dear all,
I’ve seen someone search for a term as described in the post title above. While I think it’s just a question, it’s also worth to explore what a linked list is all about.
I’ll also talk about some practical usages of linked list as well as theoretical performance of this popular data structure. Lastly, I’ll give some examples as to when a linked list can be used.
Wanna know more? Read on…
Introduction
First of all, what’s a linked list? Why can I use an array instead?
A linked list is basically a data structure algorithm that has the following characteristics:
- Index based
- Ordered
- Not automatically sorted by contents
- Can expand easily
- Has a head and tail node usually, making it easy to implement stacks and queues with very little overheads
A linked list is usually accessed by a numerical index unless implemented otherwise. However, the nature of the linked list makes any form of direct access undesirable. I’ll explain why later.
One of the key things in a linked list is that the objects added/inserted are ordered. This means, the order of insertion/addition is preserved as long as desired. That’ll allow you to create things like list boxes, logs and many other requirements that requires an ordered list.
The contents of the linked list are usually a separate entity as opposed to a simple binary tree (aka single sorted list) in which an integer can be the key as well as the object. This means that the contents are never sorted by default.
It can expand easily and efficiently by adding to the head or tail of the linked list. Arbitrary insertions are definitely possible but they’re usually an expensive operation. This will be explained later as well.
Since a linked list (singly or doubly linked) usually has a head and tail node, it’s easy to create mechanisms like stacks and queues since it’s very efficient to remove the first or last node of the linked list.
Versus Arrays
Now, we’re just going to talk about the standard native array here.
Compared to a linked list, the array has the following:
Disadvantages
- Has a fixed size (regardless of stack arrays or pointer arrays. realloc doesn’t simply increase the size)
- Objects may not always stay ordered since each node in the array has a fixed position and cannot be changed
- A proper removal usually require the objects to shift backwards. This is an expensive operation.
- Insertion can be difficult unless there’s someone to monitor the last-known index. However, even so, the last known index doesn’t reflect the lowest available index. Doing so can be an expensive operation as well.
- Multiple arrays may require multiple redundant implementations to perform the same thing for different types. That’s why in StridesLib, there’s a cArray<T> where a common source code will be generated to handle every different type of cArray<T>. The standard library’s equivalent will be std::vector<T>.
Advantages
- Accessing any nodes in an array is anytime faster than a linked list and even faster than a binary tree or hash map.
- Good for requirements that require an integer ID during insertion. That’ll make access efficient and has almost zero overhead.
- Fixed size means that the buffer is almost certainly contiguous. That means that memory is well-allocated and has no possibility of memory fragmentation.
- Simple to understand, implement and use for small programs.
- A quick (but possibly dirty) way out to certain temporary algorithms.
Big-O-Notation (aka Performance)
The performance of a data structure is not measured by how much time it takes for a single operation to complete its task. Rather, it’s the potential performance impact based on a magnitude.
Usually represented by n, the surrounding mathematical expressions will determine the amount of performance degraded as n grows.
A Big-O-notation always starts with an O and is usually followed by a pair of brackets.
For example: O(n).
Apply this notation to a linked list for example, this means that for n items in the linked list, n traversals are required to perform a certain operation with it. Example of such an operation will be arbitrary access.
A doubly-linked list may have a Big-O-Notation that looks like O(n/2). However, that’s only for clarity. Usually constants are not considered in Big-O.
Traversals are considered redundant processing and is thus expensive to have more of such logic in your application.
An iteration through a list will never be smaller than O(n) due to the minimum need to address all the nodes in the list. This is the best performance for iteration and any faster will depend on the design of the list by reducing the number of nodes in the list or adding less nodes by additional filtering, thus reducing the number of traversals needed in the first place.
So, can I put a billion nodes or not?
Sorry about my long-winded introduction. Ok back to the main question. Can you put 1 billion nodes in a linked list?
The answer is, theoretically yes but practically no. And it’s only no due to the following reasons:
- For 32-bit systems, the maximum memory available to you is 2^32 which is around 4.2 billion bytes. Judging from the amount required, it’ll either run out of memory or hits big time into the virtual memory. This doesn’t apply to 64-bit systems if you have enough RAM.
- It’s not practical to put so many nodes into a single linked list unless it’s really a single category of items. The old school style of object oriented programming where different object types can be stored in a linked list is constantly abused and mis-used. It simply doesn’t mean that you can dump every object into a single link. It means that you should category and can aggregate objects that belongs to several child classes of a parent class to be added into a single list for processing.An example would be the classic Boat, Car, Plane which are children of Vehicle class.
- You can dump all these objects in, but how are you going to access them? Using a simple direct access? That’s expensive!
While this is feasible theoretically, it’s not practical.
Performance of Linked List
Here’s the perform matrix of the linked list:
Insertion from the head or tail : O(c)
Arbitrary insertion anywhere: O(n), O(n/2) for doubly-linked
Retrieval from head or tail: O(c)
Arbitrary retrieval anywhere in the list: O(n), O(n/2) for doubly-linked
Removal from the head or tail: O(c)
Arbitrary removal from anywhere: O(n), O(n/2) for doubly-linked
Iteration cycle: O(n) - No brainer since there’re n objects in the first place
Iteration per object: O(c) - where c is a constant, which is really just a single pointer re-assignment
Conclusion
In conclusion, a linked list is still very useful even though it’s only so for certain requirements.
There’s no such thing as a best data structure even though the hash map comes close. But hash maps are expensive in per-operation constant costs due to hashing, hash-to-index calculations and finally the O(log[base 2]n) access to a binary tree for the major operations.
Big-O-Notation doesn’t just apply to data structures only. It applies to all algorithms as well. Please take note that it’s measuring performance in terms of potential performance issues. Thus this measurement is always in the negative perspective - that is - taking account of the worst case at all times.
Knowing the Big-O of every algorithm you use makes you as a programmer understand your tools of your trade well. It will also raise your level and people’s respect for your sound technical capabilities. It’s no longer just about coding alone.
Finally, I hope this post answers more than what was asked by the post’s title. It may be very long winded but it’s really meant for clarity and a more thorough understanding of data structures.
This topic is also taught in greater detail under Strides Interactive C++ Intermediate Training Course. To find out more, go to (or just click on the link) http://www.stridesdev.org/services/ for more information.
Signing off,
Jeremy
- Permalink
- Admin
- 8 Jun 2010 10:31 AM
- Comments (0)

