The Big-O-Notation - Part #1Filed Under: Weekly Tuesday Dose of goodness
- The Big-O-Notation - Part #1
- The Big-O-Notation - Part #2
- The Big-O-Notation - Part #3
Hi all,
Today we’ll talk about the Big-O-Notation. This is a topic that is commonly taught in universities, citing data structures as the unit of measurement. However, this very important topic isn’t taught to all developers and programmers.
If we are to be responsible developers, we have to understand what is Big-O-Notation. Why?
Read on…
To first understand why Big-O-notation plays such a big role in programming, we first have to understand what it is used for.
Big-O-Notation is used to measure the performance of a particular algorithm and give a notation based on its scalability.
What does that mean? This is to measure the performance deficiencies in your code as the volume of data increases in your program. Why deficiencies and not efficiencies? Well, let’s face it, the more codes we write, the more poorly it’s being written, the poorer the performance. On the other end, writing good codes and less codes doesn’t necessarily improve the performance - it merely optimizes your application that’s all.
While Big-O-Notation can be used in algorithms, let’s just focus on the unit of measure - data structures today.
As we know, in any development environment, data structures also known as collections are being used day in and day out.
Today we’re going to just talk about 3 major categories of collections or data structures.
1) Array
2) Linked List
3) Binary Tree
Array
What are arrays? This, I’m sure I don’t have to elaborate too much. Arrays are data structures of a single type with a fixed size. Whether it’s a dynamic array or stack array is immaterial. Arrays, without re-allocation, has a fixed size.
Linked List
This is probably one of the easiest or hardest data structures to understand. Why? It’s because, each node cannot be addressed directly via an index. (Don’t let the operator [] fool you). It’s also the easiest data structure to understand compared to binary trees and many other more complicated data structures.
Linked List typically do not have a fixed size unless limited programmatically by the programmer.
Binary Tree
What does binary tell you? 2-ways. In binary trees, nodes get 2 children each, including the parent node. The left and right child respectively. So how do we add things into a binary tree? First of all, we must understand that a binary tree is a Key-Value (or Key Pair Value) data structure type. Its key must always be comparable for bigger than (>), smaller than (<) and equals (==). In C++, templates can be used and programmed to use it blindly - compilers will then report compilation errors for types with missing operator overrides.
For Java, it’s simply implementing the Comparable interface.
Binary Trees typically do not have a limited size and can grow immensely huge if not controlled by code and design.
Big-O-Notation measurements
So we’ve briefly introduced the 3 data structure types. Let’s talk about what Big-O-notation will look out for.
1) Insertion (assume, next known index for arrays)
2) Retrieval (arbitrary)
3) Removal (assume shifting required for arrays)
4) Iteration
What are we looking for exactly? The worse-case performance of a data structure method.
Insertion
Arrays - O ( c ) where c is a constant time
Single-Linked List - O ( c ) where c is a constant time
Binary Tree - O (log2 n) where n is the power of 2 based on total number of elements
Why?
Arrays - Insertion is always immediate and has a constant performance time. It doesn’t change even if you have a very big array.
Single-Linked List - Insertion is always immediate, since it’s either a insertion from head or tail.
Binary Tree - Insertion is relatively fast, since each binary tree level can contain 2^n elements, assuming that the tree is auto-balanced out and it has 65000 elements, the worse case would be 16. Since 2^16 is 65536 and 2^15 is 32768.
Retrieval
Arrays - O ( c ) where c is a constant time
Single-Linked List - O ( n ) where n is constant time multiplied by number of elements in the data structure
Binary Tree - O (log2 n) where n is the power of 2 based on total number of elements
Why?
Arrays - It’s very easy to pinpoint the location of the object using simple memory location offsets. How so? Let’s say we have an array of 1000 elements. The size of each element is 12 bytes. If you want to access 224th element, index 223, and if the base location of the array is 1000 (decimal, bytes), you’ll simply need to add 12 * 223 + base address and viola! This works for any array size and the algorithm speed doesn’t slow down despite the element size.
Single-Linked List - Since each node between each other cannot be a calculated position and that the data structure only understands its head and tail node, traversal is the only way to reach the desired index.
Assuming that there’re 1,000,000 elements in the linked list, to retrieve the 500,000th node, index 499,999, the algorithm has to do a while-loop (or for-loop) to traverse to the next node starting from the head node, 500,000 times! That’s like 500,000 times slower than the equivalent of an array with a fixed size of 1,000,000.
Binary Tree - Assuming that the tree is balanced automatically, the worse speed of retrieving a node in a binary tree would be the same as the number of levels in the binary tree. The levels are increased as the binary tree grows. How do we know? Very simple, let’s take a the same example of a binary tree with 65,000 elements.
We know that there isn’t such a thing as 2^1.5 since the binary tree is a logical data structure. Therefore, let’s do a small table as a reference.
2^0 -1= 0 (empty binary tree)
2^1 -1= 1 (root node added)
2^2 -1= 3 (root node with 2 children)
2^3-1=7 (root node with 2 children with another 2 children each, 1+2+4= 7)
2^4-1=15
2^5-1=31
2^6-1=63
2^7-1=127
2^8-1=255
2^9-1=511
2^10-1=1023
2^11-1=2047
2^12-1=4095
2^13-1=8191
2^14-1=16383
2^15-1=32767
2^16-1=65535
Why is there a -1? The reason is because, binary trees always start with 1 root node and not 2 root nodes. For this reason, the maximum number of elements permissible by per level is always an odd number.
Since -1 is a constant, it’s usually removed from the overall calculation. Constants are easily tolerated by Big-O-notation since we want to search for the performance impact that is increased over time.
Looking at the reference table, we can easily see that for a binary tree with 65,000 elements, the worse case would be 16 traversals. This applies no matter where you look for in the binary tree. You may, most of the time, get performance better than 16 traversal, but it’ll never exceed 16 traversal unless your binary tree grows.
This is the reason why retrieval in Binary Trees are O (log2 n). Why must the subscript 2 be mentioned? That is to tell everyone that the logarithm is based on base 2, and not base 10. Otherwise, if you just write O (log n), then good luck to you misleading the entire crowd.
Ok, I think that’s enough for today. It can be a lot to take for those who haven’t been in contact with Big-O-Notations before.
I’ll continue with the removals and iterations in my next article.
In the 3rd part, I’ll end off with the differences between iterations and loops as well as a few myths and truths about Big-O-Notations and Data Structures.
Cheers,
Jeremy
Myth #1
It’s ok to use ANY collections, just use it and ignore what’s being implemented behind it.
Truth:
Go ahead and do that and you’ll become one of the worse developers ever.
- Permalink
- Admin
- 25 Aug 2009 9:14 AM
- Comments (2)
October 7th, 2009 at 5:15 am
Thank you for putting the time in to make this information available to aspiring programmers like myself. I did not take enough math in college to have come across this terminology, but now that I’m learning “C” and “Obj C”, it will be very helpful. I appreciate your time!
Justin Plants
October 7th, 2009 at 12:02 pm
Hi Justin,
Glad that it helped. Do feel free to ask any questions if you need any clarifications regarding the Big-O-Notation.
Regards,
Jeremy