Introduction to Python and Jupyter Notebooks

Introduction to Python and Jupyter Notebooks

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

This workshop is a quick introduction to using Python and Jupyter Notebooks.

Python

For most Data Science tasks there are two competing Open Source languages. R is favoured more by those with a mathematical background. Python is preferred by those with a programming background; all of my workshops are currently in Python.

I prefer Python because I can achieve a lot with very little code. We can make use of an extensive library ecosystem and Python’s scripting abilities to be very productive.

This can also be a downside. With experience you will tend to create big one-liners; code that does a lot in a single line. This can make it hard for new users to understand. Where possible, I try to be as clear as possible.

Jupyter notebooks

(aka. IPython notebooks)

Notebooks are an incredible way to work. They allow you to present documentation and working code in the same file. People can pick it up, read through the content and immediately see the running code.

There are two types of cells: markdown and code. This is a markdown cell. Code cells run actual python code!

The notebook

When using cells, try to separate distinct pieces of code.

At the end of each cell, there is room for some output. The output could be blank, printfs, images, html. Pretty much anything you can think of.

print(1) # This is like a printf. It will print underneath the current cell
print("This is a %s %d %0.2f" % ("word", 1, 0.2343)) # This is how you print variables
"abc"
"123"    # By default, the last non-supressed element that isn't asigned to a variable is also printed
1
This is a word 1 0.23





'123'

You run cells by pressing ctrl-enter. If you press shift-enter this will run the cell and advance.

You can find many more handy keyboard shortcuts by viewing “Help->Keyboard shortcuts”

Variables and array types

Python is untyped by default. So variables are super easy to define.

Other than variables, the two constructs that we will be using all day are lists (a.k.a. arrays) and dictionaries (a.k.a. maps). Why they don’t call them arrays and maps, I don’t know.

my_variable = 2                         # A simple variable
my_list = [1, 2, 3]                     # A simple list
print(my_list[0])                       # Zero indexing, print is an inbuilt printf like function.
another_list = ["a", "string", "list"]
print(another_list[:])                  # Colon means "all"
print(another_list[0:2])                # Index ranges are exclusive.
character_list = 'abc'
print(character_list[-1])               # -1 means the last entry, -2 means last but one
1
['a', 'string', 'list']
['a', 'string']
c
first_dict = {'bob': 32, 'steve': 94}               # Simple dictionary
key = "bob"
print("%s is aged %d" % (key, first_dict[key]))     # print accepts parameters after a % sign. Note the brackets around the terms.
bob is aged 32

### Example

The code below will"

  • Create a Python list of keys
  • Create a map continaing those keys pointing to some values
  • Use the Python function print(...) to print a value or range of values using your list
keys = ["a", "b", "c"]
d = {"a": 1, "b": 2, "c": 3}
for k in keys:
    print(d[k])
1
2
3

Functions

Functions are defined in similar ways to other languages, but you might not be used to the syntax.

Because types are interpreted at runtime, it’s not very strict (you can tell it to enforce types).

def printData(x=[1, 2, 3]):     # An = in the paramter list means "default to". Note the colon
    for x_i in x:               # Note the tab indentation in the function. This is required.
        print(x_i)              # The "in" construct iterates over values in x.

printData()
printData([11, 12])
printData(["a", "string"])
1
2
3
11
12
a
string

Handy functions

There are a wide range of handy extensions to python. You might not need to use these. But here are some…

str_list = [str(x) for x in range(3)]               # "List comprehension", i.e. create a list from a for loop
print(str_list)
print(', '.join(str(x) for x in range(3, 0, -1)))   # Joining strings

# The following will only work in python 3, the first (of many) difference between 2 and 3.
l = lambda x: print(x**2)                           # A "lambda", a function. ** = power. Most times you can just define a function.
l(3)
['0', '1', '2']
3, 2, 1
9

More articles

Data Cleaning Example - Loan Data

A Python Data Science Workshop providing an example of working with and cleaning loan data.

Read more

Entropy Based Feature Selection

A Python Workshop explaining and deriving a decision tree.

Read more
}