Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Brian "Beej Jorgensen" Hall edited this page on May 29, 2020 ยท 1 revision
NOTE: pipenv is optional! We don't use it in CS. But it's a neat package manager if you get into more complex Python projects. It can be a headache of an install for some people. You can safely ignore anything about pipenv below if you don't want to mess with it.
We'll want to install Python 3 (version 3.x), which is the runtime for the language itself. The runtime is what allows you to execute Python code and files by typing python [file_or_code_to_execute]
in your terminal. You can also open up a Python REPL (Read-Eval-Print Loop) to quickly and easily mess around with Python code once you've gotten the runtime installed. If you recall how Node let's you execute and run JavaScript code locally on your machine instead of having to rely on the browser, the Python runtime pretty much let's you do the same thing but with Python code.
Additionally, we'll be talking about how to install the (optional) pipenv virtual environment manager. Think of it as the npm
of Python (though pipenv is also capable of performing a bunch of other tasks as well, most notably running your Python projects in an isolated virtual environment).
Unfortunately, we haven't found a way to get Anaconda to play nicely with pipenv. If you get them working together, please let your instructor know how you did it.
If you can run python
or python3
and see a 3.7 or later version, you're good to go:
or on some systems, Python 3 is just python
:
And optionally try pipenv
:
Otherwise, keep reading. :)
While macOS comes with Python 2, we need to get Python 3 in there as well.
If you don't have Brew installed, follow the instructions on the brew website.
Use Brew to install Python and pipenv at the Terminal prompt:
Note: Git Bash doesn't seem to cooperate if you're trying to install Python on Windows. Try another terminal like Powershell.
Recommend updating Windows to the latest version.
Python 3 is in the Windows Store and can be installed from there.
When installing the official package, be sure to check the
checkbox!!
This is what worked for Beej. YMMV.
Install Python, as per above.
Bring up a shell (cmd.exe or Powershell)
Run py -m pip
Run py -m pip install --user pipenv
At this point, you should be able to always run pipenv with py -m pipenv
, but that's a little inconvenient. Read on for making it easier.
You'll see a message like this in the pipenv install output, but with a slightly different path:
Select that path (not including any quotes around it), and copy it
Go to the Start menu, type "environment" and run the program Edit Environment Variables
In the System Properties popup, hit the Advanced
tab, then Environment Variables
On the list of environment variables, doubleclick Path
Click New
Paste that path from step 5 into the new entry slot. Make sure there aren't any quotes around it.
Click OK
, OK
, OK
.
Relaunch any shells you have open.
Now you should be able to just run pip
and pipenv
in Powershell without putting py -m
in front of it.
Pipenv official instructions
Install pipenv per these instructions
Install Python 3 with Chocolatey
Install pipenv per these instructions
If you're running Windows 10+, you might want to install the Windows Subsystem for Linux. This gives you a mini, integrated Linux system inside Windows. You then install and use Python from there.
Update Windows to latest if you haven't already.
Go to the Microsoft store and install Ubuntu.
Run Ubuntu to get a bash shell.
Make a new username and password. This is completely separate from your Windows username and password, but I made mine the same so that I wouldn't forget.
Upgrade the Ubuntu system. Run:
Give your newly-entered password if prompted.
Running python3 --version
should give you 3.6 or higher.
Run pip install pipenv
.
If you've installed VS Code, add the "Remote WSL" extension. This way you can run code
from within Ubuntu.
In the Ubuntu shell, you can run explorer.exe .
in any directory to open a Windows Explorer window in that directory.
Also in Windows Explorer, you can put \\wsl$
in the URL bar to see your Ubuntu drive. (If it doesn't show up, relaunch your Ubuntu shell.)
If you run into trouble with the above, try the following:
Open cmd.exe as an administrator and start bash with bash
Type Python -V' and 'Python3 -V
If one of these responds with Python 3.6.8
use that command from now on
If neither response is Python 3.6.8
but one is a higher version of Python, this means one of two things
If you have manually installed a higher version of Python, we recommend uninstalling it
If you have not, it is possible that Microsoft has updated WSL and you will need to adjust these instructions to accommodate
Otherwise, update Ubuntu:
sudo apt-get update
sudo apt-get upgrade
Repeat 2.1 above to determine if you should use Python
or Python3
when using Python. Note: inside the shell, you will always use Python as the command.
Make sure pip is installed for Python 3
pip --version
and pip3 --version
. One of these needs to respond with a version that has a version of Python 3 at the end of the line.
If you only have it for 2.7, you will need to install for 3 with:
sudo apt update && sudo apt upgrade
sudo apt install python3-pip
Check versions and commands again. You will likely need to use pip3
for the next step, but it's possible it may be just pip
. Use the one with the version associated with Python 3.6.8
Make sure pipenv is installed for Python 3 python3 -m pipenv --version
If not, install pipenv:
sudo apt update && sudo apt upgrade
(if you didn't just do this above)
pip3 install --user pipenv
Check the version again
Try pipenv shell
. If this fails, make sure that every reference in the error refers to Python 3.6. If not, review the above steps
If the error does refer to 3.6:
Confirm that python --version
refers to 2.7.something
Confirm that /usr/bin/python3 --version
refers to 3.6.8
pipenv --three --python=
which python3`` NOTE that there are backticks (`) around which python3
This should create the shell forcing it to use 3.6.8
Create a function that returns True
if the given string has any of the following:
Only letters and no numbers.
Only numbers and no letters.
If a string has both numbers and letters or contains characters that don't fit into any category, return False
.
Examples:
csAlphanumericRestriction("Bold") โ True
csAlphanumericRestriction("123454321") โ True
csAlphanumericRestriction("H3LL0") โ False
Notes:
Any string that contains spaces or is empty should return False
.
[execution time limit] 4 seconds (py3)
[input] string input_str
[output] boolean
During this sprint, we will introduce you to some fundamental Computer Science fundamentals. First, we will introduce you to Python (the language we will be using throughout Computer Science). Second, you will learn about Lambda's Problem-Solving Framework, which we call U.P.E.R. Third, you will learn about Big O notation and analyzing an algorithm's time and space complexity. Last, you will learn about arrays and in-place and out-of-place algorithms.
All of these topics lay down a crucial base for the other three sprints in Computer Science. You will rely on your Python skills, problem-solving abilities, ability to analyze time and space complexity, and your mental model for computer memory throughout the rest of the course.
In this module, you will begin to learn the fundamentals of the Python programming language. Additionally, you will learn about the U.P.E.R. Problem-Solving framework and best practices for asking for help. After completing this module, you will have all the basics that you need to get started using Python to solve algorithmic code challenges and deepen your understanding and skill set related to programming.
In this module, you will continue to learn the fundamentals of the Python programming language and put the U.P.E.R. Problem-Solving framework into practice. These skills are crucial as you encounter harder and harder code challenges in preparation for the technical interview process.
This module will teach about mutability vs. immutability, time complexity analysis, and space complexity analysis. These topics are incredibly crucial for optimizing the algorithms that you write and ensuring that we prepare you for the technical interview process.
In this module, you will learn about static arrays, dynamic arrays, and the difference between an in-place algorithm and an out-of-place algorithm.
Note: the following instructions assume that you are connected to the Internet and that you have both the main
and universe
package repositories enabled. All unix shell commands are assumed to be running from your home directory ($HOME). Finally, any command that begins with sudo
assums that you have administrative rights on your machine. If you do not โ please ask your system administrator about installing the software you need.
What follows are instructions for setting up an Ubuntu 16.04 (Xenial) home environment for use with this book. I use Ubuntu GNU/Linux for both development and testing of the book, so it is the only system about which I can personally answer setup and configuration questions.
In the spirit of software freedom and open collaboration, please contact me if you would like to maintain a similar appendix for your own favorite system. Iโd be more than happy to link to it or put it on the Open Book Project site, provided you agree to answer user feedback concerning it.
Thanks!Jeffrey ElknerArlington, Virginia
Ubuntu 16.04 comes with both Python 2 and Python 3 installed. Typing python
at the shell prompt still launches Python 2. Use the command python3
for Python 3.
In addition to the debian packages in the Ubuntu Package archive, we will be using Python software from the Python Package Index or PyPI. The tool for installing packages from PyPI is called pip. Since we want Python 3 packages installed which will work with the Python 3 already on our Ubuntu system, we will use the Ubuntu python3-pip debian package.
To add this package run following from the unix command prompt:
Now would also be a good time to install a few other packages you will want to have on your system:
This will install the Tkinter GUI toolkit, the pep8 Python style checker, and the bzr revision control system which we will use to grab some program examples.
Bottle is a micro web application framework written in Python. It is used in this book to introduce web application development.
To install bottle
run:
Then try:
at the python prompt to varify that it is working.
Vim can be used very effectively for Python development, but Ubuntu only comes with the vim-tiny
package installed by default, so it doesnโt support color syntax highlighting or auto-indenting.
To use Vim, do the following:
From the unix command prompt, run:
Create a file in your home directory named .vimrc
that contains the following:
When you edit a file with a .py
extension, you should now have color systax highlighting and auto indenting. Pressing the <f3>
key should run your program, and bring you back to the editor when the program completes. <f4>
runs the program with the verbose (-v
) switch set, which will be helpful when running doctests. <f8>
will run the pep8 style checker against your program source, which is useful in helping you learn to write Python programs with good styling.
To learn to use vim, run the following command at a unix command prompt:
The following creates a useful environment in your home directory for using pip3
to install packages into your home directory and for adding your own Python libraries and executable scripts:
From the command prompt in your home directory, create bin
and lib
subdirectories of your .local
directory by running the following command:
Now add a my_python
subdirectory to .local/lib
:
Add the following lines to the bottom of your .bashrc
in your home directory:
This will set your prefered editor to Vim, add your own .local/bin
directory as a place to put executable scripts, and add .local/lib/my_python
to your Python search path so modules you put there will be found by Python.
Then run:
to set these environment varialbles and prepend the .local/bin
directory to your search path (note: logging out and back in will accomplish the same result).
Lumpy is python module that generates UML diagrams. It was written by Allen B. Downey as part of his Swampy suite of Python programs written for use with his textbooks.
The version here is modified to work with Python 3 on Ubuntu 16.04. Click on lumpy.py
to download the module. Put this file in your .local/lib/my_python
directory after your $HOME environment is configured.
Lumpy is used in several of the exercises in this book to help illustrate python data structures.
On unix systems, Python scripts can be made executable using the following process:
Add this line as the first line in the script:
At the unix command prompt, type the following to make myscript.py
executable:
Move myscript.py
into your .local/bin
directory, and it will be runnable from anywhere.
Python does not have a static array data type. However, lists are built on dynamic arrays. As you will see, dynamic arrays rely on an underlying static array to work. So while you won't be creating and using this data structure directly, it is still essential to understand.
A data structure is a structure that is designed for holding information in a particular way. A static array is a data structure that is designing for storing information sequentially (in order). For example, if you were to store the English alphabet in a static array, you would expect the "B" character to right next to both the "A" character and the "C" character. Additionally, every position within the static array is labeled with an index. So, if you wanted to access the first item in the static array, you would expect that item to have an index of 0. The second item would have an index of 1. The third item would have an index of 2. This pattern continues for the entire capacity of the static array.
Lookup
To look up an item by index in an array is constant time (O(1
). If you have the specific index of an object in an array, the computations to find that item in memory are all constant time as well.
Append
Adding an item to an array is constant time (O(1)
). We always have a reference point to the last thing in a static array, so we can insert an item after the current end.
Insert
Unless you are inserting an item at the end of the list, items must be shifted over to make room for the new information you add to the static array. It's like if you had a chain of people stretched out, holding hands, in a line. The first person in the line is butted right up against a wall, and there is no room on one side of him. If someone wanted to join the end of the line, the people already in the line wouldn't have to do anything (O(1)
). However, if you wanted to join the beginning of the line, every single person would have to move over (away from the wall) (O(n)
) so that you would have room to join. If you wanted to join the line somewhere in the middle, only the people to your one side would have to shift to make room for you. In the computer, this shifting is moving information from one address in memory to another. Each move takes time.
Delete
Just like insertions, deletions are only efficient (O(1)
) when they are done at the end of the static array. If something is deleted from any other position in the array, the items have to be moved over, so there isn't any empty space left. Remember, static arrays can be a good data structure because retrieving information from a specific index is fast. It is fast because we can ensure that information is consistently stored in sequence right next to each other. That way, we can always be confident that whatever information is at index 5 is the sixth item in the array. If we left empty spaces in the middle of our static array, we would no longer ensure that this was true.
Space
The space complexity of an array is linear (O(n)
). Each item in the array will take up space in memory.
Static arrays are great to use when you need a data structure to retrieve information from a specific index efficiently. This is because, as we explained earlier, accessing any specific index in a static array involves a simple mathematical computation (starting index + (size of each item * index)). This computation is done in O(1)
time and is not affected by the static array size at all. If you need a data structure where you are likely only to append items (add them to the end of the list), a static array also works great. When you add a new item to the end of the list, nothing has to be shifted over or moved in memory, so that operation is very efficient (O(1)
).
There are situations where static arrays are not the best data structure to use for storing your information. What about if you don't know how much information you need to store? Or if the amount of information you need to store is likely to fluctuate or change frequently. In this case, a static array is not good. The reason is that when you create a static array, you have to know and declare the size of that array. That way, your computer can separate off a chunk of memory that is the exact right size for storing that static array. If you run out of room in the static array, you can't simply make it bigger; you have to create a brand new, bigger static array. You have to copy each item from the first static array into the newer, bigger one.
Another reason that static arrays are not always the best choice to use for storing information is that they are inefficient unless you are performing operations at the end of the static array. They are inefficient because if you want to insert or delete something at the beginning (or the middle of the list), all the items to the right of that index must be moved over. If you delete something, everything has to be shifted over, so there isn't an empty index in the middle of your data. If you insert something, all the items have to shift over to make room for the new item before inserting it.
You often encounter a scenario where you want to use a subset of items from an existing array. Array slicing is when you take a subset from an existing array and allocate a new array with just the items from the slice.
In Python, the syntax looks like this:
The default start index is 0, and if you leave off the end_index, the slice will capture through the end of the list.
You might be wondering, what is the time and space complexity of slicing an array? To understand the complexity, you need to know what is happening behind the scenes when you take a slice of an array. First, you are actually allocating a new list. Second, you copy all of the items in your slice from the original array into the newly allocated list. This means that you have an O(n)
time cost (for the copying) and an O(n)
space cost for the newly allocated list.
You must keep these facts in mind and account for them when using a slice in your code. It's not a free operation.
Draw out what happens to a static array when you insert an item at the beginning of the array.
Draw out what happens to a static array when you delete an item from the array's beginning.
An in-place function modifies or destroys the state of the input data when it is run. For instance, if you write a function that squares every integer in an input list, an in-place version of this function would change the data in the list that was passed in. It would not create a new list and return the new list. In-place functions are more space-efficient because they don't create new variables directly tied to the input size. However, to get that space-efficiency, you have to risk that the function's user may end up changing state to the input accidentally.
Imagine a scenario where you have an antique map that you are using to navigate on a hike. You end up needing directions, and when you come across another hiker, you ask them for help. The person helping you has two options. They can take your antique map, use a pen, and mark it up with their notations that will help you navigate. However, you most likely didn't want those annotations to be on your map forever. The other option would be to find another piece of paper and have the person helping you write out their annotations on that. This way, your original antique map doesn't have to be modified. However, now you have two maps that you have to carry around on your hike.
In contrast to in-place functions, out-of-place functions don't modify or destroy the input state when they are run. Any changes done to the input are done to a copy of the input, not the original that was passed in. This is why they are less space-efficient. If you have a list of 1,000,000 items that you want to square, you first have to make a copy of that list. Now, you have two lists of 1,000,000 items. However, you avoid any side-effects that might be unintended.
In Python, some function arguments are passed in by their actual value, and some are passed in as a reference to the object in memory. Primitive values like integers, floats, and strings are passed in by their actual value. So, if you call a function and pass in the integer 2
when you reference that value by the named parameter of the function, you can't change 2
in memory. However, non-primitive objects like lists or dictionaries are passed in as references to that object in memory. So, if you call a function and pass in the dictionary {"name": "Matt"}
when you reference that dictionary using the named parameter, you are changing the original object that was passed in. For objects that are passed in by reference, they must be copied to a new variable before they are modified if you want to avoid side effects.
It would be best if you always defaulted to using an out-of-place function. This is a safer default to avoid unintended side-effects in your program. However, there are scenarios that you might encounter where you need to be extremely space-efficient. In that case, you might have to use an in-place function to work within the particular space-constraints you've been given.
Here is an example of a function that triples each number in an input list. This function does this in-place:
Now, since this is an in-place function, watch what happens when we use it:
my_list
was modified when I called the function, and the function only returned the default return value of None
.
Let's now write a similar function, but this time we will do it out-of-place:
Look what happens when we use this function:
Notice how we had to store the returned list in a new variable. Also, notice that it didn't modify the list that we passed in when we called the function.
In your own words, describe the difference between an in-place algorithm and an out-of-place algorithm.
In your own words, explain when it is an excellent choice to use an in-place algorithm.
Remember how we said you had to know how much information you were going to store when you created a static array? Well, with a dynamic array, you don't have to know. You don't have to declare a size when you instantiate a dynamic array. That makes it better in scenarios where the amount of information you need to store is unknown or is likely to fluctuate.
Lookup
To look up an item by index in an array is constant time (O(1
). If you have the specific index of an item in an array, the computations to find that item in memory are all constant time as well.
Append
Adding an item to an array is constant time (O(1)
) in the average case. However, in the worst case, the cost is O(n)
(this will be explained in more detail below).
Insert
In the worst case, inserting an item is linear time (O(n)
). When you insert into an array, all the items โ starting at the index we are inserting into โ have to be shifted one index. These items have to be "moved over" to make room for the new item being inserted. The worst-case scenario is inserting at the 0th index, and every item in the array has to shift over.
Delete
In the worst case, deleting an item is linear time (O(n)
). For any item you delete (unless it is the last item), all of the items after that index have to be shifted over to fill the now blank spot in the array. Remember, arrays store data in sequential order, so if we delete an item, we cannot just leave that space blank. If we left the space blank, it would ruin the quick lookup time. To have a fast lookup time, we need to be able to rely on the distance from the start of the array to whatever index we are trying to access.
Space
The space complexity of an array is linear (O(n)
). Each item in the array will take up space in memory.
Again, probably the dynamic array's biggest strength is not having to know or worry about the size of the data structure. It can grow to accommodate your data as needed. And, you don't have to manage this growth; the data structure itself grows when necessary. Dynamic arrays also have some of the same strengths as a static array. They also have efficient lookups (O(1)
) when you have a specific index that you want to retrieve from.
The main weakness of the dynamic array is related to its strength. To not have to worry about or manage the array's size, when the array runs out of room, it has to grow to accommodate more items. So, let's say your dynamic array is currently set up to store ten items. If it's full and you try to add an 11th item, the data structure can't just assume that there is a spot available right after the 10th item. It actually creates a new, bigger array and then copies all of the first ten items into the new array, and finally, it adds the 11th item. We will talk a bit more about how this works below. Additionally, dynamic arrays have the same weaknesses as static arrays, slow insertions and deletions (O(n)
).
Underneath the hood of a dynamic array is a static array. When you create a dynamic array, it is a static array that keeps track of the starting index, the index of the last item that it stores, and the index for the last slot in its capacity. This brings up an important point. An array has a size and a capacity. An array's size is how many items it is storing at the moment. Its capacity is how many items it could store before it runs out of room.
So, let's say that your dynamic array instantiates with an underlying static array with a capacity of 10 and a size of 0 when you create it. Then, you add ten items to the array. Now, it has a capacity of 10 and a size of 10. If you now go to append an 11th item to the array, you've run out of capacity. Here is where the dynamic of the dynamic array comes into play. The data structure will create a new underlying static array with a capacity twice the size of the original underlying static array. It will then copy the ten original items into the new array and finally add the 11th item. The cost of copying the original items into the new array is O(n)
. So, when we say that, in the worst-case, an append on a dynamic array has a time-complexity of O(n)
, this is why. However, all the other appends still have a time-complexity of O(1)
. So, in the average case append, the time-complexity is still efficient. Also, consider that as the array's capacity keeps doubling, the doublings will occur less and less frequently.
What type in Python is a dynamic array?
In your own words, explain why the worst-case time cost of appending to a dynamic array is O(n)
.
What is the difference between the size of a dynamic array and the capacity of a dynamic array?
This module project requires you to answer some multiple-choice questions related to the module's objectives. Additionally, you must continue developing your problem-solving skills by completing coding challenges related to its content.
Website Version
Curriculum
Utilities
practice
Resources
quick-reference
Python-Docs
MISC
Interview Prep
Installations Setup & Env
Aux-Exploration
Windows machines usually do not ship with Python installed. Installing on Windows is pretty simple.
Download the latest Python 3 Installer from python.org (Links to an external site.) (make sure you pay attention to 32-bit vs. 64-bit and select the right one for your machine).
Run the installer and make sure you check the box that says "Add Python 3.x to PATH" to ensure that you place the interpreter in your execution path.
Most likely, your Linux distribution already has Python installed. However, it is likely to be Python 2 and not Python 3.
You can determine what version you have by opening a terminal and typing python --version
. If the version shown is Python 2.x.x
, then you want to install the latest version of Python 3.
The procedure for installing the latest version of Python depends on which distribution of Linux you are running.
Use this article (Links to an external site.) to find instructions specific to your Linux distribution.
Current versions of macOS include a version of Python 2, but you want to be using Python 3.
The best way to install Python 3 on macOS is to use the Homebrew package manager.
Install Homebrew
Go to http://brew.sh/ (Links to an external site.) and select the Homebrew bootstrap code under "Install Homebrew" and copy the complete command to your clipboard.
Open a terminal window, paste the Homebrew bootstrap code, and hit "Enter."
It may take some time to install Homebrew, so you need to wait for that process to complete before moving on.
After Homebrew has finished its installation process, you then need to install Python.
Install Python
Open a terminal and run the following command brew install python3
. This command will download and install the latest version of Python.
Ensure that everything was installed correctly by opening a terminal window and running the command pip3
.
If you see help text from Python's "pip" package manager, you have a working Python installation.
Here are a few websites that give you online access to the Python interpreter:
Before you do anything else, search for a solution to your problem on your own. One thing you should start doing is keeping track of all your research when solving a problem. One easy way to do this is to have a browser window represent a specific search for a solution, and each open tab represents an attempt at solving it. Keeping track of your research is vital because it's helpful to provide examples of similar questions or similar problems and explain why those didn't answer your specific problem or question. It also helps the person answering your question avoid pointing you toward resources you've already explored, and lets them know that you've already put in the work.
The first thing you do when you ask a question๏ฟผ๏ฟผ๏ฟผ๏ฟผ is to introduce the problem. The first paragraph of your written question should serve as an executive summary of the problem. All the following paragraphs should fill in the details of the problem.
An important thing to include in your problem introduction is a precise explanation of how you encountered the problem. Write about the difficulties that kept you from solving it. Describe what you already tried and include the results of the research you've done.
You should also provide as much detail about the context as possible. For instance, include the language version, the platform version, the operating system, the database type, specific IDE, and any web server information. You should also include your particular constraints. For example, you may not be allowed to use feature A or B that would provide an obvious solution. If you have an odd constraint, it may also help explain why you have that constraint.
One thing to remember is that not all questions benefit from including code. However, if you include code, definitely do not just copy in your entire program! By having irrelevant details, you make your question much harder to answer and decrease the chances of someone helping you.
Here are some guidelines for when to include code in your question.
Include just enough code to allow others to reproduce your specific problem.๏ฟผ๏ฟผ One way to do this is to restart from scratch. Do not include snippets of your entire program. Instead, create a new program, but only add what's necessary to recreate the problem.
If you aren't exactly sure where the problem code is, one way to find it is by removing code chunks one at a time until the problem disappears โ then add back the last part. This way, you can deduce that the last piece of code you added back is likely the source of your problem.
Be careful not to remove too much code, either. Keep your question brief, but maintain enough context for clarity.
Make sure you include all the portions of the code needed to reproduce the problem. It would be best if you assumed that the person who is answering your question would not write any code to reproduce your issue. Again, remember, do not use images of codeโthose trying to help you need direct access to the code you include in your question.
When you include your code, it's also important to tell the reader exactly what you expect the behavior to be. Be sure to show the reader the exact wording of the error message you encountered (if there was one). It's also crucial to double-check that your included example reproduces the problem.
One other thing you can do is create a live example on a site like sqlfiddle.com or jsbin.com. If you do, make sure you also include a copy of your code in your question. Not everyone will utilize the link to the live example.
And to reiterate, do not post images of any code, data, or error messagesโreserve images for things like rendering bugsโthings that are impossible to describe accurately with just text.
Don't send a question you haven't proofread. When you post your question, you should have already read and reread it, taking care to follow all the best practices and making sure your question makes sense. It would be best if you imagined that you're coming to your question fresh, with no other context but the question itself. You want to make your question as easy for someone to answer as possible. Remember, the reader is likely choosing between several questions they could answer. You want your question to stand out as something concise and approachable. Don't forget to double-check your spelling, grammar, and formatting. Keep it as straightforward as you can; you're not writing a novel.
As feedback and responses to your question begin coming in, respond as quickly as possible. You'll likely receive clarifying questions, and your readers need that clarification to help you.๏ฟผ๏ฟผ๏ฟผ๏ฟผ
Now let's look at an example of a question posted to Stack Overflow and analyze it to see if it follows the best practices outlined above.
The question (Links to an external site.):
The first thing to notice is that the post has a short but descriptive title that adequately summarizes the question.
Accessing the index in 'for' loops?
Next, did the questioner provide any additional context or proof of the research they've done so far? It doesn't look like it. They could improve the question by including what they tried and the resources they explored.
The questioner did an excellent job of introducing the question and including code that shows what they are trying to do. In this case, they did not need to include experience vs. expected behavior; they just needed to have the expected behavior. By clearly stating what the desired result was, it helped the person answering to respond appropriately.
The code they included is a minimal and complete example, allowing someone to reproduce the problem quickly. The questioner left out irrelevant details and code that would've distracted from the primary question. They also included an example of what the desired output would be, which is helpful.
It appears the questioner proofread their question beforehand as it does not contain any glaring spelling, grammar, or formatting problems. However, we could critique this example for including a redundant sentence at the end. Instead of including that sentence, they might have rephrased the first sentence of the question to be more precise.
Choose a real-world example from a recent problem/challenge. Use the guidelines and process outlined above to ask for help in your cohort-specific help channel.
Identify an unanswered question in your cohort-specific help channel. Do your best to provide a helpful response to that question.
Find an example of a bad question on Stack Overflow. Analyze the question using the guidelines above and write a short response explaining why you believe it is a bad question.
Find an example of a good question on Stack Overflow. Analyze the question using the guidelines above and write a short response explaining why you believe it is a good question.
Learning to use the print
function in Python is the perfect way to start writing Python code. When learning to write in any new programming language, one of the first things you want to do is get some output from your program. The print
function is how you output the value of an object to the screen. You will learn how to use the print
function in Python.
print
with different objectsLet's start by executing the print function to print different types of objects in Python. There are numerous types of objects that you can print using the print
function.
Using print
with no arguments:
Notice the empty line after calling the print
function. The default end
value when calling print is the newline character \n
.
Using print
with a string literal:
Notice how calling print
with the string literal printed the exact string we passed in onto the screen.
Using print
with a variable:
Notice how calling print
with the slogan
variable prints the value assigned to the slogan
variable.
Using print
with an expression:
Notice how the argument for the print
function can be an expression. Once the expression is resolved to a string object, the print
function can output it to the screen.
Using print
with other object types:
Any object passed as an argument into print
will get converted into a string type before outputted to the screen.
You can see how the print
function is easy to use and how it can handle any object type that you pass into it.
print
Now, let's look at how we can pass multiple arguments into the print
function. Using print
with multiple arguments gives you a flexible and easy way to output items to the screen.
We can pass multiple objects, all of the same or different types, into print
.
Notice how each object we passed in was converted to a string and then output to the screen. Notice also that print
used " "
as the default separator value.
We can change the separator value by assigning a value to the keyword argument sep
.
end
value with print
You can also specify the end
value by assigning a value to the end
keyword argument when you call the print
function. Being able to print a value to the screen but allow the user to stay on the same line is useful and necessary in some cases.
Here is how you can change the default end
value (which is \n
) when calling the print
function.
Customizing the end
value when calling the print
function can be useful and necessary in some circumstances.
You have now learned the basics of using the print
function in Python. You learned how to call the print
function to print objects of different types. You now know how to use print
with multiple positional arguments. In certain necessary situations, you also know how to change the default end
value when calling the print
function.
Now, get some practice using the print
function by completing the challenge below.
Python is unique because indentation instead of some other character marks blocks of code. A block of code is a collection of statements that are grouped. The syntax for denoting blocks varies from language to language. For example, in C, blocks are delimited by curly braces ({
and }
). Understanding how Python uses whitespace and indentation to denote logical lines and code blocks is essential.
Whitespace is any character represented by something that appears empty (usually \t
or " "
). The characters that Python considers to be whitespace can be seen by printing out the value of string.whitespace
from the string
library.
Notice the characters are " "
(space), \t
(tab), \n
(newline), \r
(return), \x0b
(unicode line tabulation), and \x0c
(unicode form feed).
You've seen the different types of whitespace characters that can appear, but you mainly need to concern yourself with " "
, \t
, and \n
.
Whitespace is used to denote the end of a logical line of code. In Python, a logical line of code's end (a statement or a definition) is marked by a \n
.
Notice how the REPL evaluates the expression first + second
when I return on line 3. Below that, I can write one logical line of code over multiple lines by ending each line with a \
character. That \
character lets the Python interpreter that even though there is a newline, you don't want it to treat it as the end of a logical line.
It's important to understand that Python assumes meaning in newline characters when trying to interpret your code.
Whitespace (indentation) can denote code blocks. Python gives meaning to the amount of whitespace (indentation level) that comes before a logical line of code.
This code raises an Indentation Error
because the Python interpreter expects to find additional whitespace inside the if
block.
The Python interpreter can successfully run this code because consistent whitespace (level of indentation) is used.
Although you can't tell in the code snippet above, for the second if
statement, I used a \t
to indent. But, for the indentation on print("it worked!"
, I used eight " "
(spaces). The mismatch of tab usage and spaces raises an error when Python tries to interpret the code.
Consistent whitespace usage (indentation) is crucial to making sure that Python can interpret your code correctly.
In Python, whitespace has meaning; it denotes the end of logical lines and also code blocks. Whitespace is any character represented by something that appears empty, although the most common characters are " "
, \t
, and \n
. The Python interpreter knows where the end of a logical line of code is because of the \n
. The amount of whitespace (level of indentation) is used in Python to denote blocks of code. Understanding how the Python interpreter looks at whitespace is vital to writing valid Python code.
Python is not a "statically typed" language, and every variable in Python is an object. You don't have to declare a variable's type.
In Python, you can have integers and floating-point numbers.
You can define an integer like so:
You can also cast a floating-point number to be an integer like so:
To define a floating-point number, you can declare it literally or typecast it with the float constructor function:
You can define strings with either single or double quotes:
It's common to use double quotes for strings so that you can include apostrophes without accidentally terminating the string.
Let's practice declaring variables to store an int, a float, and a string:
There are a few basic operators that you should be familiar with as you start writing Python code.
You can use the addition (+
), subtraction (-
), multiplication (*
), and division (/
) operators with numbers in Python.
There is also an operator called the modulo operator (%
). This operator returns the remainder of integer division.
You can use two multiplication operators to make the exponentiation operator (**
).
You can use the addition operator to concatenate strings and lists:
You can also use the multiplication operator to create a new list or string that repeats the original sequence:
Now, let's see if we can combine all of this information in a quick demo.
First, let's create two variables, a
and b
, where each variable stores an instance of the object
class.
Next, let's see if we can make two lists, one containing five instances of a
, and the second with five instances of b
.
Then, let's combine a_list
and b_list
into a combined
list.
If our code works as expected, combined
should have a length of 10.
To format a string in Python, you use the %
operator to format a set of stored variables in a tuple. You also include argument specifiers in your string with special symbols like %s
and %d
.
For example, let's say you want to insert a name
variable inside a string. You would do the following:
If you have more than one argument specifier, you need to enclose your arguments in a tuple:
Any object that is not a string can also be formatted using the %s
operator. The string which returns from the object's repr
method will be used in the formatted string.
A few of the common argument specifiers are:
%s
- String (or any object with a string representation)
%d
- Integers
%f
- Floating point numbers
%.<number of digits>f
- Floating point numbers with a fixed amount of digits to the dot's right.
%x/%X
- Integers in hexadecimal (lowercase/uppercase)
Let's see if we can use all of this information to practice formatting a few strings.
Let's imagine that we have some data that we want to inject into a string.
We need to print a formatted string using argument specifiers and a tuple that contains our data:
8
You can think of a string as anything between quotes. Strings store a sequence of characters or bits of text.
There are lots of ways you can interact with strings in Python.
The len()
method prints out the number of characters in the string.
The index()
method prints out the index of the substring argument's first occurrence.
The count()
method returns the number of occurrences of the substring argument.
To slice a string, you can use this syntax: [start:stop:step]
. To reverse the string's order, you can set the step value to be -1
.
You can convert a string to uppercase or lowercase with the upper()
and lower()
methods.
You can determine if a string starts with or ends with a specific sequence with the startswith()
and endswith()
methods.
The split()
method allows you to split up a string into a list. The default separator is any whitespace. You can also specify the separator value with an argument if you want.
You can think of a string as anything between quotes. Strings store a sequence of characters or bits of text.
There are lots of ways you can interact with strings in Python.
The len()
method prints out the number of characters in the string.
The index()
method prints out the index of the substring argument's first occurrence.
The count()
method returns the number of occurrences of the substring argument.
To slice a string, you can use this syntax: [start:stop:step]
. To reverse the string's order, you can set the step value to be -1
.
You can convert a string to uppercase or lowercase with the upper()
and lower()
methods.
You can determine if a string starts with or ends with a specific sequence with the startswith()
and endswith()
methods.
The split()
method allows you to split up a string into a list. The default separator is any whitespace. You can also specify the separator value with an argument if you want.
Python uses boolean values to evaluate conditions. An expression in any Boolean context will evaluate to a Boolean value and then control your program's flow. Python's boolean values are written as True
and False
(make sure you capitalize the first character).
To compare the value of two expressions for equality, you use the ==
operator. You can also use <
(less than), >
(greater than), <=
(less than or equal), >=
(greater than or equal), and !=
(not equal).
You build up more complex boolean expressions by using the and
and or
operators.
Any time you have an iterable object (like a list), you can check if a specific item exists inside that iterable by using the in
operator.
We can use the if
, elif
, and the else
keywords to define a series of code blocks that will execute conditionally.
Any object that is considered "empty" evaluates to False
. For example, ""
, []
, and 0
all evaluate to False
.
If we want to determine if two objects are actually the same instance in memory, we use the is
operator instead of the value comparison operator ==
.
There is also the not
operator, which inverts the boolean that follows it:
You can use two types of loops in Python, a for
loop and a while
loop. A for
loop iterates over a given sequence (iterator expression). A while
loop repeats as long as a boolean context evaluates to True
.
The break
statement terminates the loop containing it. Control of the program flows to the statement immediately after the body of the loop. If the break
statement is inside a nested loop (loop inside another loop), the break
statement will only terminate the innermost loop.
You can use the continue
statement to skip the rest of the code inside a loop for the current iteration only. The loop does not terminate entirely but continues with the next iteration.
Here is an example of a few different ways you can use a range
as the iterable for a for
loop.
This example shows the simple usage of a while
loop to print the same values as the for
loops above.
You can use a break
statement to exit a for
loop or a while
loop.
You can also use a continue
statement to skip the current block but not exit the loop entirely.
To make our code more readable and DRY (Don't Repeat Yourself), we often want to encapsulate code inside a callable function.
To define a function in Python, we follow this syntax:
Let's define a greeting function that allows us to specify a name and a specific greeting.
Now, we can call our greet
function and pass in the data that we want.
If we want to define a function that returns a value to the caller, we use the return
keyword.
Python uses boolean values to evaluate conditions. An expression in any Boolean context will evaluate to a Boolean value and then control your program's flow. Python's boolean values are written as True
and False
(make sure you capitalize the first character).
To compare the value of two expressions for equality, you use the ==
operator. You can also use <
(less than), >
(greater than), <=
(less than or equal), >=
(greater than or equal), and !=
(not equal).
You build up more complex boolean expressions by using the and
and or
operators.
Any time you have an iterable object (like a list), you can check if a specific item exists inside that iterable by using the in
operator.
We can use the if
, elif
, and the else
keywords to define a series of code blocks that will execute conditionally.
Any object that is considered "empty" evaluates to False
. For example, ""
, []
, and 0
all evaluate to False
.
If we want to determine if two objects are actually the same instance in memory, we use the is
operator instead of the value comparison operator ==
.
There is also the not
operator, which inverts the boolean that follows it:
A module is a collection of code that is written to meet specific needs. For example, you could split up different parts of a game you were building into modules. Each module would be a separate Python file that you could manage separately.
Any Python file that ends with the .py
extension is considered a module. The name of the module is the name of the file.
To import from other modules, we can use the import
command.
So, by importing the built-in math
module, we have access to all of the functions and data defined in that module. We access those functions and data using dot notation, just like we do with objects.
If you only need a specific function from a module, you can import that specific function like so:
You can also import all the names from a module with this syntax to avoid using dot notation throughout your file.
You can also bind the module to a name of your choice by using as
.
To find out which names a module defines when imported, you can use the dir()
method. This method returns an alphabetically sorted list of strings for all of the names defined in the module.
What is a linked list, and how is it different from an array? How efficient or inefficient are its operations? What are its strengths and weaknesses? How can I construct and interact with a linked list? By the end of this objective, you will be able to answer all of these questions confidently.
A linked list is a simple, linear data structure used to store a collection of elements. Unlike an array, each element in a linked list does not have to be stored contiguously in memory.
For example, in an array, each element of the list [43, 32, 63
is stored in memory like so:
43
is the first item in the collection and is therefore stored in the first slot. 32
is the second item and is stored immediately next to 43
in memory. This pattern continues on and on.
In a linked list, each element of the list could be stored like so:
You can see here that the elements can be spaced out in memory. Because the elements are not stored contiguously, each element in memory must contain information about the next element in the list. The first item stores the data 43
and the location in memory (*3
) for the next item in the list. This example is simplified; the second item in the list 32
could be located anywhere in memory. It could even come before the first item in memory.
You might also be wondering what types of data can be stored in a linked list. Pretty much any type of data can be stored in a linked list. Strings, numbers, booleans, and other data structures can be stored. You should not feel limited using a linked list based on what type of data you are trying to store.
Are the elements in a linked list are sorted or unsorted? The elements in a linked list can be either sorted or unsorted. There is nothing about the data structure that forces the elements to be sorted or unsorted. You cannot determine if a linked list's elements are sorted by determining they are stored in a linked list.
What about duplicates? Can a linked list contain them? Linked lists can contain duplicates. There is nothing about the linked list data structure that would prevent duplicates from being stored. When you encounter a linked list, you should know that it can contain duplicates.
Are there different types of linked lists? If so, what are they? There are three types of linked lists: singly linked list (SLL), doubly linked list (DLL), and circular linked list. All linked lists are made up of nodes where each node stores the data and also information about other nodes in the linked list.
Each singly linked list node stores the data and a pointer where the next node in the list is located. Because of this, you can only navigate in the forward direction in a singly linked list. To traverse an SLL, you need a reference to the first node called the head. From the head of the list, you can visit all the other nodes using the next pointers.
The difference between an SLL and a doubly linked list (DLL) is that each node in a DLL also stores a reference to the previous item. Because of this, you can navigate forward and backward in the list. A DLL also usually stores a pointer to the last item in the list (called the tail).
A Circular Linked List links the last node back to the first node in the list. This linkage causes a circular traversal; when you get to the end of the list, the next item will be back at the beginning of the list. Each type of linked list is similar but has small distinctions. When working with linked lists, itโs essential to know what type of linked list.
To look up an item by index in a linked list is linear time (O(n)
). To traverse through a linked list, you have to start with the head reference to the node and then follow each subsequent pointer to the next item in the chain. Because each item in the linked list is not stored contiguously in memory, you cannot access a specific index of the list using simple math. The distance in memory between one item and the next is varied and unknown.
Adding an item to a linked list is constant time (O(1)
). We always have a reference point to the tail of the linked list, so we can easily insert an item after the tail.
In the worst case, inserting an item in a linked list is linear time (O(n)
). To insert an item at a specific index, we have to traverse โ starting at the head โ until we reach the desired index.
In the worst case, deleting an item in a linked list is linear time (O(n)
). Just like insertion, deleting an item at a specific index means traversing the list starting at the head.
The space complexity of a linked list is linear (O(n)
). Each item in the linked list will take up space in memory.
The primary strength of a linked list is that operations on the linked list's ends are fast. This is because the linked list always has a reference to the head (the first node) and the tail (the last node) of the list. Because it has a reference, doing anything on the ends is a constant time operation (O(1)
) no matter how many items are stored in the linked list. Additionally, just like a dynamic array, you don't have to set a capacity to a linked list when you instantiate it. If you don't know the size of the data you are storing, or if the amount of data is likely to fluctuate, linked lists can work well. One benefit over a dynamic array is that you don't have doubling appends. This is because each item doesn't have to be stored contiguously; whenever you add an item, you need to find an open spot in memory to hold the next node.
The main weakness of a linked list is not efficiently accessing an "index" in the middle of the list. The only way that the linked list can get to the seventh item in the linked list is by going to the head node and then traversing one node at a time until you arrive at the seventh node. You can't do simple math and jump from the first item to the seventh.
Remember that linked lists have efficient operations on the ends (head and tail). There are two structures that only operate on the ends; queues and stacks. So, most queue or stack implementations use a linked list as their underlying data structure.
We can see the difference between how a linked list and an array are stored in memory, but why is this important? Once you see the problem with the way arrays are stored in memory, the benefits of a linked list become clearer.
The primary problem with arrays is that they hold data contiguously in memory. Remember that having the data stored contiguously is the feature that gives them quick lookups. If I know where the first item is stored, I can use simple math to figure out where the fifth item is stored. The reason that this is a problem is that it means that when you create an array, you either have to know how much space in memory you need to set aside, or you have to set aside a bunch of extra memory that you might not need, just in case you do need it. In other words, you can be space-efficient by only setting aside the memory you need at the moment. But, in doing that, you are setting yourself up for low time efficiency if you run out of room and need to copy all of your elements to a newer, bigger array.
With a linked list, the elements are not stored side-by-side in memory. Each element can be stored anywhere in memory. In addition to storing the data for that element, each element also stores a pointer to the memory location of the next element in the list. The elements in a linked list do not have an index. To get to a specific element in a linked list, you have to start at the head of the linked list and work your way through the list, one element at a time, to reach the specific element you are searching for. Now you can see how a linked list solves some of the problems that the array data structure has.
Letโs look at how we can represent a singly linked list graphically and in Python code. Seeing a singly linked list represented graphically and in code can help you understand it better.
How do you represent a singly linked list graphically? Letโs say you wanted to store the numbers 1, 2, and 3. You would need to create three nodes. Then, each of these nodes would be linked together using the pointers.
Notice that the last element or node in the linked list does not have a pointer to any other node. This fact is how you know you are at the end of the linked list.
What does a singly linked list implementation look like in Python? Let's start by writing a LinkedListNode
class for each element in the linked list.
Now, we need to build out the class for the LinkedList
itself:
Our class is super simple so far and only includes an initialization method. Let's add an append
method so that we can add nodes to the end of our list:
Now, let's use our simple class definitions for LinkedListNode
and LinkedList
to create a linked list of elements 1
, 2
, and 3
.
You must be able to understand and interact with linked lists. You now know the basic properties and types of linked lists, what makes a linked list different from an array, what problem it solves, and how to represent them both graphically and in code. You now know enough about linked lists that you should be able to solve algorithmic code challenges that require a basic understanding of linked lists.
Draw out a model of a singly-linked list that stores the following integers in order: 3,2,6,5,7,9
.
Draw out a model of a doubly-linked list that stores the following integers in order: 5,2,6,4,7,8
.
L1 = Node(34) L1.next = Node(45) L1.next.next = Node(90)
L1 = [34]-> [45]-> [90] -> None
Node(45) Node(90)
Result:
Lists are similar to arrays. They can store any type of variable and as many variables as you want. You can iterate over lists effortlessly.
A dictionary is like a list, but instead of accessing values with an index, you access values with a "key." A "key" can be any type of object (string, number, list, etc.). Also, unlike lists, dictionaries do not have an order.
To build a list, you can do the following:
In Python, if you try to access a list index that does not exist, you get an IndexError: list index out of range
message:
Let's make sure we can perform basic list operations.
First, let's create a numbers
list that contains the numbers 1
, 2
, and 3
.
Now, let's create a strings
list that contains the strings "Lambda"
and "School"
:
Now, let's make sure we can access items from a specific index in a list. Let's access the 3rd item from numbers and the 2nd item from strings and print them out (don't forget that lists are zero-indexed).
Last, let's iterate through our numbers
list to sum up all of the numbers:
Let's use a dictionary to create a collection that maps first names as keys (strings) to phone numbers as values.
Instead of adding one key-value pair at a time, we can initialize the dictionary to have the same values.
We can iterate over a dictionary as we iterated over a list. We can use the items()
method, which returns a tuple with the key and value for each item in the dictionary.
https://gist.github.com/bgoonz/c10af728179ff056894c6f17dfb819bc#file-ht2-ipynb
Remember when we wondered what would happen if multiple keys hashed to the same index, and we said that we would worry about it later? Whelp, it's later ๐คช.
Let's say we were given the key-value pair ("Ryan", 10)
. Our hash code then maps "Ryan" to index 3. Excellent, that works!Now let's say after we inserted ("Ryan", 10)
, we have to insert ("Parth", 12)
. Our hash code maps "Parth" to index 3. Uh oh! Ryan is already there! What do we do?? ๐ฑ
Ok, let's stop freaking out, and let's think about this. If we don't do anything, the value stored at index 3 will just get overwritten. Meaning if we try to retrieve the value associated with "Ryan"
, 12 will be returned instead of 10. That might not seem like a big deal, but what if we were returning passwords based on a user ID, and we returned someone else's password. That would be horrible.
Let's fix this problem. The most common way to solve this is with chaining. If we see multiple values hashed to an index, we will chain them in a some data structure that can hold multiple items. In our case, we'll use Python's list
type, but a more typical solution would use a linked list. We'll cover linked lists in a future module.
Ok, sounds ideal? But how does this work in code? Let's write some of it together.
Below is a partially filled out hash table class where we will be using HashTableEntry
as our chain entries.
Take a look at the code below.
Let's implement the put
method with collision resolution by chaining. What are the two cases we need to handle?
There are no entries at the index. Great! We can initialize the entry to a list with the new HashTableEntry
in it.
There are multiple entries at the index. We need to check every entry in the chain. If the key in one of the entries is equal to the key we are passing in, we need to replace it. For instance, let's say we pass in ("Ryan", 12),
and then we later pass in ("Ryan", 15)
. We would need to replace "Ryan"'s old value with 15. If there are no entries that match, we create a new entry at the end of the chain.
Ok, that might sound confusing. Let's start breaking it down into code.
First, we need to hash the key and start with the first entry at that index.
Next, we need to go through the chain. We need to check two conditions:
The current entry is not empty.
The key or the current entry is not equal to the key we are passing in.
Sweet! Now we need to check what happens when the loop breaks. It would only break for two reasons:
We reached an entry with the same key and need to replace the value.
We reached the end of the chain and need to create a new entry.
Let's write that in code!
Great! We created the put
method.
https://replit.com/@bgoonz/cs-unit-1-sprint-4-module-2-hash-table-collision-resolution#main.py
What does runtime look like with linked list chaining?
The performance of hash tables for search, insertion, and deletion is constant time (O(1)
) in the average case. However, as the chains get longer and longer, in the worst case, those same operations are done in linear time (O(n)
). The more collisions that your hash table has, the less performant the hash table is. To avoid collisions, a proper hash function and maintaining a low load factor is crucial. What is a load factor?
The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.
Hash tables use an array for storage. So, the load factor is the number of occupied slots divided by the length of the array. So, an array of length 10 with three items in it has a load factor of 0.3, and an array of length 20 with twenty items has a load factor of 1. If you use linear probing for collision resolution, then the maximum load factor is 1. If you use chaining for collision resolution, then the load factor can be greater than 1.
As the load factor of your hash table increases, so does the likelihood of a collision, which reduces your hash table's performance. Therefore, you need to monitor the load factor and resize your hash table when the load factor gets too large. The general rule of thumb is to resize your hash table when your load factor is greater than 0.7. Also, when you resize, it is common to double the size of the hash table. When you resize the array, you need to re-insert all of the items into this new hash table. You cannot simply copy the old items into the new hash table. Each item has to be rerun through the hashing function because the hashing function considers the size of the hash table when determining the index that it returns.
You can see that resizing is an expensive operation, so you donโt want to resize too often. However, when we average it out, hash tables are constant time (O(1)
) even with resizing.
The load factor can also be too small. If the hash table is too large for the data that it is storing, then memory is being wasted. So, in addition to resizing, when the load factor gets too high, you should also resize when the load factor gets too low.
One way to know when to resize your hash table is to compute the load factor whenever an item is inserted or deleted into the hash table. If the load factor is too high or too low, then you need to resize.
We added a get_load_factor
and resize
method to calculate the load factor and resize the hash table with a new capacity when necessary.
Let's change our put
method to resize when the load factor gets too high. Here's how our current put
method looks:
To know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.
Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.
Fantastic, we did it!
What does runtime look like with linked list chaining?
The performance of hash tables for search, insertion, and deletion is constant time (O(1)
) in the average case. However, as the chains get longer and longer, in the worst case, those same operations are done in linear time (O(n)
). The more collisions that your hash table has, the less performant the hash table is. To avoid collisions, a proper hash function and maintaining a low load factor is crucial. What is a load factor?
The load factor of a hash table is trivial to calculate. You take the number of items stored in the hash table divided by the number of slots.
Hash tables use an array for storage. So, the load factor is the number of occupied slots divided by the length of the array. So, an array of length 10 with three items in it has a load factor of 0.3, and an array of length 20 with twenty items has a load factor of 1. If you use linear probing for collision resolution, then the maximum load factor is 1. If you use chaining for collision resolution, then the load factor can be greater than 1.
As the load factor of your hash table increases, so does the likelihood of a collision, which reduces your hash table's performance. Therefore, you need to monitor the load factor and resize your hash table when the load factor gets too large. The general rule of thumb is to resize your hash table when your load factor is greater than 0.7. Also, when you resize, it is common to double the size of the hash table. When you resize the array, you need to re-insert all of the items into this new hash table. You cannot simply copy the old items into the new hash table. Each item has to be rerun through the hashing function because the hashing function considers the size of the hash table when determining the index that it returns.
You can see that resizing is an expensive operation, so you donโt want to resize too often. However, when we average it out, hash tables are constant time (O(1)
) even with resizing.
The load factor can also be too small. If the hash table is too large for the data that it is storing, then memory is being wasted. So, in addition to resizing, when the load factor gets too high, you should also resize when the load factor gets too low.
One way to know when to resize your hash table is to compute the load factor whenever an item is inserted or deleted into the hash table. If the load factor is too high or too low, then you need to resize.
We added a get_load_factor
and resize
method to calculate the load factor and resize the hash table with a new capacity when necessary.
Let's change our put
method to resize when the load factor gets too high. Here's how our current put
method looks:
To know when to resize, we need to correctly increment the count whenever we insert something new into the hash table. Let's go ahead and add that.
Next, we need to check if the load factor is greater than or equal to 0.7. If it is, we need to double our capacity and resize.
Fantastic, we did it!
Do we need to modify our delete
and get
methods to account for the new get_load_factor
and resize
methods? Why or why not?
To remove a key-value pair from a dictionary, you need to use the del
keyword or use the pop()
method available on dictionary objects. The difference is pop()
deletes the item from the dictionary and returns the value. When you use the del
keyword, you've written a statement that doesn't evaluate to anything.
An object's identity can never change once it has been created. You can think of an object's identity as its specific address in memory. In the code above, a = 1
created a new object in memory whose identity is represented by the integer 4483164816
.
Python has an is
operator that allows you to compare two object's identities.
In the code above, we first assign 1
to the variable a
. Then, we assign 2
to the variable b
. These are two different objects in memory and thus have different identities. We verify that they are different by using the is
operator, which returns False
. The line b = a
assigns the variable b
the object that the variable a
is pointed to. Now, both a
and b
are referencing the same object in memory. We can use the id()
function to verify that this is the case as well:
The type of an object determines what are its possible values and what operations that object supports. The type()
function will return what type an object is:
Just like an object's identity, once an object is created, its identity can never change. It's an object's type that determines whether an object is mutable or immutable.
The value of some objects can be changed after they are created. The value of some objects cannot be changed after they are created. If the object's value can be changed, that object is considered to be mutable (changeable). If the object's value cannot be changed, that object is considered to be immutable (unchangeable).
A mutable object is an object whose value can be changed after it is created. The word mutable is defined as:
liable to change
The following types of objects are mutable:
list
set
dict
byte array
instances of user-defined classes
Let's look at a few examples in code:
Lists
In the first line, we create a list object with three elements and assign it to the variable my_list
. Then, because lists are mutable, we change 'love'
at index 2 to be 'joy'
instead. We also can grow our list by appending a new element to the list.
Sets
In the first line, we create a set object with three elements and assign it to the variable my_set
. Because set objects are mutable, we can add 'happy'
to the set and remove 'happiness'
from the set.
Dicts
On line one, we create a dict object that has two key-value pairs. Then, because dict objects are mutable, we add key-value pair "location": "Nepal"
. Last, we delete that same key-value pair.
Mutable objects work great when you know you will likely need to change the size of the object as you use and interact with it. Changing mutable objects is cheap (because you don't have to copy all existing elements to a new object).
Aliasing with Mutable Objects
Below, I'm going to walk through what happens when you alias a mutable object. In Python, aliasing happens whenever a variable's value is assigned to another variable because variables are just names that store references to values.
Let me illustrate this with a helpful code visualizer tool called Python Tutor (Links to an external site.):
On line 1, we instantiate a new list object with three elements (1
, 2
, and 3
). The name my_list_orig
is the variable that we assign the new list to.
Then, on line 2, we create an alias of my_list_orig
by pointing my_list_alias
to whatever object my_list_orig
is pointing at. Notice in the image above that there is still only one list object. However, there are two variables in the global frame, and they are both pointing to the same object.
On line 3, we append a new element to my_list_orig
. Notice that, because both variables are referencing the same object, even though we appended to my_list_orig
, we also modified my_list_alias
.
On line 4, we removed the element 1
from my_list_orig
. Notice, just like when we added to the list, my_list_alias
is also affected.
This behavior is something you need to be aware of if you ever use aliasing with mutable objects in your code.
An immutable object is an object whose value cannot be changed after it is created. Immutable means not changeable. Anytime you try to update the value of an immutable object, a new object is created instead.
The following types are immutable:
Numbers (int, float, complex)
Strings
Bytes
Booleans
Tuples
Immutable objects are useful when you want to make sure that the object you created will always maintain the same value. Immutable objects are more expensive to change (in terms of time and space complexity) because changing the object requires making a copy of the existing object.
Let's look at a few examples:
Numbers
In the code above, the first line creates a new int object, and the variable my_int
now points at that object. You can see that this object has int
for its type, 4513307280
for its identity (location in memory), and 1
for its value.
Then, we assign 2
to my_int
which creates a whole new object and assigns it to the variable my_int
. This object has int
for its type, 4513307312
for its identity (which you can see is different from the first object), and 2
for its value.
Strings
Let's look at how string concatenation works in Python. Remember that str objects are immutable.
So, on line 1, we create a string object with the value 'a'
and assign it to the variable my_str
. We verify that the object is of type str
, we print its identity (140716674193840
) and print its value.
Then, we concatenate 'b'
onto the existing string with the line my_str += 'b'
. Now, because string objects are immutable, we cannot change a string object's value after it has been created. To concatenate, we create a new string object and assign the value 'ab'
to that object.
This behavior in Python is vital to be aware of when working with string concatenation. If you have to add and remove frequently from a string, this will be inefficient if you work with string objects directly.
Tuples
Tuples are an immutable container of names, where each name has an unchangeable (immutable) binding to an object in memory. You cannot change the bindings of the names to the objects.
Here we created a tuple using (
and )
to denote the tuple literal syntax. Just like a list, tuples can contain elements of any type. Above, we've included a string, a list, and a boolean as our tuple elements. We are proving the tuple object's immutability by showing the error that occurs when trying to assign a new item to a position in the tuple.
One thing that often causes confusion surrounding the immutability of tuples in Python is demonstrated by the following behavior:
Notice that we cannot create a new list object and bind it to the name at position 1 of our tuple. This is demonstrated when my_tuple[1] = [4,5,6]
raises a TypeError
. However, we can assign new objects to the list that is at position 1 of our tuple? Why is that? Well, what do we know about lists in Python? Lists are mutable objects. So, we can modify a list without creating a new object. So, when we are modifying the list directly (instead of assigning a new object), it doesn't affect our tuple's immutability. Notice that the identity (140716674620864
) of the list at my_tuple[1]
doesn't change after replacing its three elements with 4
, 5
, and 6
.
Mutable and immutable objects are not treated the same when they are passed as arguments to functions. When mutable objects are passed into a function, they are passed by reference. So, suppose you change the mutable object that was passed in as an argument. In that case, you are changing the original object as well.
Mutable Objects as Arguments
Notice that when append_num_to_list
is called and my_list
is passed in as an argument. When my_list
is bound to lst
in that stack frame, lst
points to the original my_list
in memory. The function call did not create a copy of my_list
. This behavior is because lists are mutable objects in Python.
Immutable Objects as Arguments
Next, let's see how Python behaves when we pass an immutable object as an argument to a function:
Notice when an immutable object is passed into a function, the object is copied and bound to the parameter name. In the example above, when my_string
is passed into concatenate_string_to_string
, my_string
is copied to a new object bound to the name orig_string
.
In Python, everything is an object.
Additionally, all objects in Python have three things:
Identity
Type
Value
An object's identity can never change once it has been created. You can think of an object's identity as its specific address in memory. In the code above, a = 1
created a new object in memory whose identity is represented by the integer 4483164816
.
Python has an is
operator that allows you to compare two object's identities.
In the code above, we first assign 1
to the variable a
. Then, we assign 2
to the variable b
. These are two different objects in memory and thus have different identities. We verify that they are different by using the is
operator, which returns False
. The line b = a
assigns the variable b
the object that the variable a
is pointed to. Now, both a
and b
are referencing the same object in memory. We can use the id()
function to verify that this is the case as well:
The type of an object determines what are its possible values and what operations that object supports. The type()
function will return what type an object is:
Just like an object's identity, once an object is created, its identity can never change. It's an object's type that determines whether an object is mutable or immutable.
The value of some objects can be changed after they are created. The value of some objects cannot be changed after they are created. If the object's value can be changed, that object is considered to be mutable (changeable). If the object's value cannot be changed, that object is considered to be immutable (unchangeable).
A mutable object is an object whose value can be changed after it is created. The word mutable is defined as:
liable to change
The following types of objects are mutable:
list
set
dict
byte array
instances of user-defined classes
Let's look at a few examples in code:
Lists
In the first line, we create a list object with three elements and assign it to the variable my_list
. Then, because lists are mutable, we change 'love'
at index 2 to be 'joy'
instead. We also can grow our list by appending a new element to the list.
Sets
In the first line, we create a set object with three elements and assign it to the variable my_set
. Because set objects are mutable, we can add 'happy'
to the set and remove 'happiness'
from the set.
Dicts
On line one, we create a dict object that has two key-value pairs. Then, because dict objects are mutable, we add key-value pair "location": "Nepal"
. Last, we delete that same key-value pair.
Mutable objects work great when you know you will likely need to change the size of the object as you use and interact with it. Changing mutable objects is cheap (because you don't have to copy all existing elements to a new object).
Aliasing with Mutable Objects
Below, I'm going to walk through what happens when you alias a mutable object. In Python, aliasing happens whenever a variable's value is assigned to another variable because variables are just names that store references to values.
Let me illustrate this with a helpful code visualizer tool called Python Tutor (Links to an external site.):
On line 1, we instantiate a new list object with three elements (1
, 2
, and 3
). The name my_list_orig
is the variable that we assign the new list to.
Then, on line 2, we create an alias of my_list_orig
by pointing my_list_alias
to whatever object my_list_orig
is pointing at. Notice in the image above that there is still only one list object. However, there are two variables in the global frame, and they are both pointing to the same object.
On line 3, we append a new element to my_list_orig
. Notice that, because both variables are referencing the same object, even though we appended to my_list_orig
, we also modified my_list_alias
.
On line 4, we removed the element 1
from my_list_orig
. Notice, just like when we added to the list, my_list_alias
is also affected.
This behavior is something you need to be aware of if you ever use aliasing with mutable objects in your code.
An immutable object is an object whose value cannot be changed after it is created. Immutable means not changeable. Anytime you try to update the value of an immutable object, a new object is created instead.
The following types are immutable:
Numbers (int, float, complex)
Strings
Bytes
Booleans
Tuples
Immutable objects are useful when you want to make sure that the object you created will always maintain the same value. Immutable objects are more expensive to change (in terms of time and space complexity) because changing the object requires making a copy of the existing object.
Let's look at a few examples:
Numbers
In the code above, the first line creates a new int object, and the variable my_int
now points at that object. You can see that this object has int
for its type, 4513307280
for its identity (location in memory), and 1
for its value.
Then, we assign 2
to my_int
which creates a whole new object and assigns it to the variable my_int
. This object has int
for its type, 4513307312
for its identity (which you can see is different from the first object), and 2
for its value.
Strings
Let's look at how string concatenation works in Python. Remember that str objects are immutable.
So, on line 1, we create a string object with the value 'a'
and assign it to the variable my_str
. We verify that the object is of type str
, we print its identity (140716674193840
) and print its value.
Then, we concatenate 'b'
onto the existing string with the line my_str += 'b'
. Now, because string objects are immutable, we cannot change a string object's value after it has been created. To concatenate, we create a new string object and assign the value 'ab'
to that object.
This behavior in Python is vital to be aware of when working with string concatenation. If you have to add and remove frequently from a string, this will be inefficient if you work with string objects directly.
Tuples
Tuples are an immutable container of names, where each name has an unchangeable (immutable) binding to an object in memory. You cannot change the bindings of the names to the objects.
Here we created a tuple using (
and )
to denote the tuple literal syntax. Just like a list, tuples can contain elements of any type. Above, we've included a string, a list, and a boolean as our tuple elements. We are proving the tuple object's immutability by showing the error that occurs when trying to assign a new item to a position in the tuple.
One thing that often causes confusion surrounding the immutability of tuples in Python is demonstrated by the following behavior:
Notice that we cannot create a new list object and bind it to the name at position 1 of our tuple. This is demonstrated when my_tuple[1] = [4,5,6]
raises a TypeError
. However, we can assign new objects to the list that is at position 1 of our tuple? Why is that? Well, what do we know about lists in Python? Lists are mutable objects. So, we can modify a list without creating a new object. So, when we are modifying the list directly (instead of assigning a new object), it doesn't affect our tuple's immutability. Notice that the identity (140716674620864
) of the list at my_tuple[1]
doesn't change after replacing its three elements with 4
, 5
, and 6
.
Mutable and immutable objects are not treated the same when they are passed as arguments to functions. When mutable objects are passed into a function, they are passed by reference. So, suppose you change the mutable object that was passed in as an argument. In that case, you are changing the original object as well.
Mutable Objects as Arguments
Notice that when append_num_to_list
is called and my_list
is passed in as an argument. When my_list
is bound to lst
in that stack frame, lst
points to the original my_list
in memory. The function call did not create a copy of my_list
. This behavior is because lists are mutable objects in Python.
Immutable Objects as Arguments
Next, let's see how Python behaves when we pass an immutable object as an argument to a function:
Notice when an immutable object is passed into a function, the object is copied and bound to the parameter name. In the example above, when my_string
is passed into concatenate_string_to_string
, my_string
is copied to a new object bound to the name orig_string
.
An algorithm is a set of instructions for accomplishing a task. Within this broad definition, we could call every piece of code an algorithm.
After coming up with a first-pass solution to a problem, we need to measure how "good" our answer is. Will it stand up to the test of millions of users? Is it fast enough that our users will be blown away by how quickly they get their results? Or will torturously slow speeds cause lag that scares them all away?
When given a choice between different algorithms, we want to choose the most efficient algorithm (considering both time and space efficiency depending on our needs).
Note: It is common for your first solution to work with a few items or users and break as you add more. Making sure that the solutions scale is something all developers must look out for.
We need a way to talk about efficiency (number of operations in the worst case) in a more general sense.
Big O notation is the language we use for describing how efficient an algorithm is.
The specific terms of Big O notation describe how fast the runtime grows (relative to the input size), focusing on when the input gets extremely large.
Why do we focus on the growth of runtime versus exact runtime? The actual runtime depends on the specific computer running the algorithm, so we cannot compare efficiencies that way. By focusing on the general growth, we can avoid exact runtime differences between machines and environments.
We also talk about runtime relative to the input size because we need to express our speed in terms of something. So we show the speed of the algorithm in terms of the input size. That way, we can see how the speed reacts as the input size grows.
We don't care about speed when the input size is small. The differences in speed are likely to be minimal when the input size is small. When the input size gets enormous, we can see the differences in efficiency between algorithms.
Refer to the table below to see a list of the most common runtimes. The table is ordered from fastest to slowest.
Besides the table, it's also essential to look at the curves of these different runtimes.
Again, n
represents the size of the data, and on the chart above, N
represents the number of operations. This visualization should help illustrate why O(1)
or O(log n)
is the most desirable.
Note: Big O only matters for large data sets. An O(n^3)
solution is adequate, as long as you can guarantee that your datasets will always be small.
Let's look at a few different examples of Python functions that print something to the output. For each of these, the input will be items
.
Constant Time O(1)
Why is this constant time? Because no matter how large or small the input is (1,000,000 or 10), the number of computations within the function is the same.
Linear Time O(n)
Why is this classified as linear time? Because the speed of the algorithm increases at the same rate as the input size. If list_of_things
has ten items, then the function will print ten times. If it has 10,000 items, then the function will print 10,000 times.
Quadratic Time O(n^2)
Why is this quadratic time? The clue is the nested for loops. These nested for loops mean that for each item in list_of_things
(the outer loop), we iterate through every item in list_of_things
(the inner loop). For an input size of n
, we have to print n
* n
times or n^2
times.
What if we had a function like this?
print(items[last_idx])
is constant time because it doesn't change as the input changes. So, that portion of the function is O(1)
.
The while loop that prints up to the middle index is 1/2 of whatever the input size is; we can say that portion of the function is O(n/2)
.
The final portion will run 2000 times, no matter the size of the input.
So, putting it all together, we could say that the efficiency is O(1 + n/2 + 2000)
. However, we don't say this. We describe this function as having linear time O(n)
because we drop all of the constants. Why do we cut all of the constants? Because as the input size gets huge, adding 2000 or dividing by 2 has minimal effect on the algorithm's performance.
Let's consider the following function:
We could describe this function as O(n + n^2)
; however, we only need to keep the essential term, n^2
, so this would be O(n^2)
. Why can we do this? Because as the input size (n
) gets larger and larger, the less significant terms have less effect, and only the most significant term is important.
Let's consider the following function:
What would the result be if it just so happens that the thing_we_are_trying_to_find
in list_of_things
is the very first item in the list? The function would only have to look at one item in list_of_things
before returning. In this case, it would be O(1)
. But, when we talk about a function's complexity, we usually assume the "worst case." What would the "worst-case" be? It would be if it were the last item in list_of_things
. In that case, we would have to look through all the list_of_things
, and that complexity would be O(n)
.
Note: When talking about runtime complexity in casual conversation, engineers often blur the distinction between big theta and big O notation. In reality, these are two distinct ways of describing an algorithm. Big theta gives both an upper and a lower bound for the running time. Big O only provides an upper bound. Refer to the following articles for a deeper dive: Big-Theta notation (Links to an external site.) and Big-O notation (Links to an external site.).
Complexity analysis with Big O notation is a valuable tool. It would be best if you got in the habit of thinking about the efficiency of the algorithms you write and use in your code. However, just because two algorithms have the same Big O notation doesn't mean they are equal.
Imagine you have a script that takes 1 hour to run. By improving the function, you can divide that runtime by six, and now it only takes 10 minutes to run. With Big O notation, O(n)
and O(n/6)
can both be written as O(n)
, but that doesn't mean it isn't worth optimizing the script to save 50 minutes every time the script runs.
That being said, there is a term you should become familiar with: premature optimization (xkcd: Optimization (Links to an external site.)). Sometimes, you can sacrifice readability or spend too much time on something to improve its efficiency. Depending on the situation, it could be that having a finished product to iterate on is more important than maximally efficient code. It is your job as a developer to know when making your code more efficient is necessary. You will always be making calculated tradeoffs between runtime, memory, development time, readability, and maintainability. It takes time to develop the wisdom to strike the right balance depending on the scenario.
Let's look at a few code snippets and classify their runtime complexity using Big O notation.
First, let's think about what the above function is doing. It's printing i
โฆbut i
is not being incremented by 1, as we usually see. It's doubled every time we run the loop. So, for example, if n = 100
, then the final result would beโฆ
Or if n = 10
, then we would printโฆ
We can use the process of elimination to narrow down which runtime classification makes sense for this algorithm. The number of times the loop runs seems to vary based on the value of n
, so this is NOT O(1).
From the above examples, we can also see that the number of times the loop runs is increasing slower than the input size is increasing. n
must be doubled before the loop will run one more time. We can eliminate classifications such as O(n log n)
, O(n^c)
, O(c^n)
, and O(n!)
.
The only two options left at this point are logarithmic and linear. Since the two growth rates (input, the number of operations) are not the same, this function must run in logarithmic time!
Classification
Description
Constant O(1)
The runtime is entirely unaffected by the input size. This is the ideal solution.
Logarithmic O(log n)
As the input size increases, the runtime will grow slightly slower. This is a pretty good solution.
Linear O(n)
As the input size increases, the runtime will grow at the same rate. This is a pretty good solution.
Polynomial O(n^c)
As the input size increases, the runtime will grow at a faster rate. This might work for small inputs but is not a scalable solution.
Exponential O(c^n)
As the input size increases, the runtime will grow at a much faster rate. This solution is inefficient.
Factorial O(n!)
As the input size increases, the runtime will grow astronomically, even with relatively small inputs. This solution is exceptionally inefficient.