Into the programming – part 2

DATA SCIENCE WITH KESHAV - LESSON 5: INTO THE PROGRAMMING - PART 2

Hello, and welcome to Lesson 5 of my tutorial series, “Data Science with Keshav“. To get an overview of what this tutorial series is about, you can check out my another post, Data Science 101. To get to part 1 of the tutorial, into the programming, you can follow this link.

In the last article, we talked about some fundamental concepts in python programming. If you noticed, I included few queries that were very important. However, in this article, we will talk about other data types (this might include lists only), other than the native types that are very important for you as a python programmer. I will continue the remaining portion in next article if anything is left regarding data types.

Let us start from one of the most used data types in python, “LIST”. As the name suggests “LIST” provides you an opportunity to put same or different types of data together. You can access items in a list via indexing and using some other advanced features as well. Let us get into details with a practical approach.

# Initialization of list
In [1]: x = list()#First method
In [2]: y = [] #Second method

Well, there are two ways of initializing a list. But if we talk about efficient code writing, or say code optimizations, I suggest you go for the second method. You can check the execution time of statements in python using the following code.

In [1]: import timeit
In [2]: exc_time1 = sum(timeit.Timer('x = list()').repeat(repeat=100,number=1000))/100
In [3]: exc_time2 = sum(timeit.Timer('x = []').repeat(repeat=100,number=1000))/100
In [4]: (exc_time2/exc_time1)*100
Out[4]: 26.74822229484029

Here, I am not spending my time explaining what timeit does, I am gonna cover these things later on. But as you can see, the second method is around 27% faster. If you have any queries regarding this let’s meet in the comment section at the end of the post. So, initialization is done. Let’s hop into more details. We will start from understanding range() function at first, which will be in your frequent use.

# Understanding range function at first
In [1]: ?range()
Init signature: range(self, /, *args, **kwargs)
Docstring:     
range(stop) -> range object
range(start, stop[, step]) -> range object
Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
Type:           type

Let’s start using it.

In [1]: range() # This will throw an error
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-5bcbe005bf48> in <module>()
----> 1 range()
TypeError: range expected 1 arguments, got 0
In [2]: range(0)
Out[2]: range(0, 0)
In [3]: list(range(0)) #creates nothing
Out[3]: []
In [4]: list(range(5))
Out[4]: [0, 1, 2, 3, 4]
In [5]: list(range(5,10))
Out[5]: [5, 6, 7, 8, 9]
In [6]: list(range(5,10,2))
Out[6]: [5, 7, 9]
In [7]: list(range(5,10,0.1)) # Will throw error, all three argument should be integer
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-fd2fd60f5526> in <module>()
----> 1 list(range(5,10,0.1))
TypeError: 'float' object cannot be interpreted as an integer

I believe, now, you can make your list using range() functions. You saw in In[7] we got an error, this is obvious. Now, I am going to give you a challenge.

Can you create a list such that it starts with 0.9 ends at 11.19 with steps of 0.73? If you can please give me an answer in comment sections.

Now, the question is, are there any other methods of creating a random list of numbers? Yes! There are tons of methods you can use. For eg: Libraries like numpy etc. can be used to make lists as well. For now, I am gonna show you one important concept in list generations called a list comprehension.

Here, we are going to create a list o the even numbers between 10 and 50.

# First lets see what are even numbers
In [1]: 1%2 #1 is even number here 1%2 returns remainder when 1 is divided by 2
Out[1]: 1
In [2]: 2%2 #1 is even number here 1%2 returns remainder when 1 is divided by 2
Out[2]: 0
In [3]: 22%2 #1 is even number here 1%2 returns remainder when 1 is divided by 2
Out[3]: 0
In [4]: 97877%2 #1 is even number here 1%2 returns remainder when 1 is divided by 2
Out[4]: 1
# No you must know how to check an even number
In [5]: 'even' if 1%2==0 else 'odd'
Out[5]: 'odd'
# Similarly
In [6]: 'even' if 98987%2==0 else 'odd'
Out[6]: 'odd'
# Similarly
In [7]: 'even' if 9898%2==0 else 'odd'
Out[7]: 'even'

Here I have introduced ternary operator. Well, we may not use this operator for now. But I want to make you understand if and else are used to check if some condition is true or false.

In [1]: x = [i for i in range(9,21) if i%2==0]  #this is list comprehension
In [2]: x
Out[2]: [10, 12, 14, 16, 18, 20]

I am going to introduce another function choice() from library ‘random’ which randomly selects an element from list or range object.

In [1]: from random import choice
In [2]: choice([1,2,3,4])
Out[2]: 3
In [3]: choice([1,2,3,4])
Out[3]: 4
In [4]: choice([1,2,3,4])
Out[4]: 3
In [5]: choice([1,2,3,4])
Out[5]: 1
In [6]: choice(range(10))
Out[6]: 5
In [7]: choice(range(10))
Out[7]: 2
# I guess now you know the use of choice
# Lets use this in our list comprehension technique to create a random number lists of five elements between 23 to 32.
In [8]: x = [choice(range(23,32)) for i in range(5)]
In [9]: x
Out[9]: [30, 31, 30, 29, 28]

I guess now you can combine these skills to create your own lists. So far we are seeing list of numbers only, but we can create a list of different data types as well.

In [1]: a = list('python')
In [2]: a #lists of only characters
Out[2]: ['p', 'y', 't', 'h', 'o', 'n']
In [3]: a = [1, 'hello','c',[1,2,3],range(9,10)] #lists of various types of data 
In [4]: a
Out[4]: [1, 'hello', 'c', [1, 2, 3], range(9, 10)]

I think now we should move into basic operations in lists. I suggest you try yourselves using techniques I described previously. In ipython console, you can create a list and just type the list put dot and press tab to see all the available options

list_operations

I think I must not get into more details. I must leave you for explorations. However, if you ever run into any problem, I’ll always be right here to help you out. But for now, I am going to raise one challenge and try to solve it.

Suppose I have following list

In [1]: x
Out[1]: 
['21.pdf', '24.pdf', '2.pdf', '20.pdf', '18.pdf', '8.pdf', '10.pdf', '9.pdf', '5.pdf', '6.pdf', '19.pdf', '13.pdf', '23.pdf', '16.pdf', 
'4.pdf', '14.pdf', '3.pdf', '22.pdf', '17.pdf', '11.pdf', '28.pdf', '1.pdf', '27.pdf', '15.pdf', '26.pdf', '25.pdf', '7.pdf', '12.pdf']

x is a list that contains the name of pdf files in one of a directory. I need this list to be in order, like, [‘1.pdf’, ‘2.pdf’, ……… ]. Only, then I can stack all these pdfs into one single pdf in order.

So how can we do this? I am trying to sort the list. I am going to teach you some amazing techniques that you can use this in your work

# Let us see what following command does to a string 
In [5]: "23.pdf".split('.')
Out[5]: ['23', 'pdf']
# or you can assign "23.pdf" to a value and do the same
In [6]: a = "23.pdf"
In [7]: a.split('.')
Out[7]: ['23', 'pdf']
# Now just notice what following does
In [8]: "23.pdf".split('.')[0]
Out[8]: '23'
# We separate number out of our string "23.pdf" or any string with similar pattern
# But still output is not a number it is string , as you can see it is quoted
In [9]: int("23.pdf".split('.')[0])
Out[9]: 23
# With above ways we can get number from our string as a number not as a string

I suggest you understand what we did above. Now we can make an output of such string as keys to sort our list as per our need.

In [1]: x.sort(key=lambda x:int(x.split('.')[0]))  # just focus in ":int(x.split('.')[0])"
In [1]: x
Out[13]: ['1.pdf', '2.pdf', '3.pdf', '4.pdf', '5.pdf', '6.pdf', '7.pdf', '8.pdf', '9.pdf', '10.pdf', '11.pdf',
'12.pdf', '13.pdf', '14.pdf', '15.pdf', '16.pdf', '17.pdf', '18.pdf', '19.pdf', '20.pdf', '21.pdf', '22.pdf',
'23.pdf', '24.pdf', '25.pdf', '26.pdf', '27.pdf', '28.pdf']

I know you might have lots of questions. I suggest you write them in comments.

For now. I must put a comma to this article series. See you on the next article.

Be the first to comment

Leave a Reply

Your email address will not be published.




This site uses Akismet to reduce spam. Learn how your comment data is processed.