<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Python | Tengku Hanis</title>
    <link>https://tengkuhanis.netlify.app/category/python/</link>
      <atom:link href="https://tengkuhanis.netlify.app/category/python/index.xml" rel="self" type="application/rss+xml" />
    <description>Python</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>©Tengku Hanis 2020-2025 Made with [blogdown](https://github.com/rstudio/blogdown)</copyright><lastBuildDate>Wed, 07 Aug 2024 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://tengkuhanis.netlify.app/images/icon_hua2ec155b4296a9c9791d015323e16eb5_11927_512x512_fill_lanczos_center_2.png</url>
      <title>Python</title>
      <link>https://tengkuhanis.netlify.app/category/python/</link>
    </image>
    
    <item>
      <title>Basic plotting with Matplotlib and Seaborn</title>
      <link>https://tengkuhanis.netlify.app/post/basic-plotting-in-python/</link>
      <pubDate>Wed, 07 Aug 2024 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/basic-plotting-in-python/</guid>
      <description>


&lt;p&gt;This post is continuation of my previous post about Python. For those interested:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;a href=&#34;https://tengkuhanis.netlify.app/post/basic-data-wrangling-with-python/&#34;&gt;Basic data wrangling with Python&lt;/a&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Basic plotting with matplotlib and seaborn&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Comparison of ggplot in R versus in Python&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are several packages or libraries available in Python for plotting and visualization. However, the most commonly used package is &lt;a href=&#34;https://matplotlib.org/&#34;&gt;matplotlib&lt;/a&gt;. This package is quite extensive and often time can be quite complicated to use. Thus, &lt;a href=&#34;https://seaborn.pydata.org/&#34;&gt;seaborn&lt;/a&gt; package is another alternative and complementary to matplotlib. Seaborn is based on matplotlib and provides a high-level functionality compare to matplotlib.&lt;/p&gt;
&lt;p&gt;So, in this blog post, let us compare several basic plots using both packages.&lt;/p&gt;
&lt;div id=&#34;load-packages&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Load packages&lt;/h2&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;load-dataset&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Load dataset&lt;/h2&gt;
&lt;p&gt;We going to use the &lt;a href=&#34;https://www.kaggle.com/datasets/arshid/dat-flower-dataset&#34;&gt;iris dataset&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;dat = sns.load_dataset(&amp;#39;iris&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can further see the information on this dataset.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;dat.head(5)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    sepal_length  sepal_width  petal_length  petal_width species
## 0           5.1          3.5           1.4          0.2  setosa
## 1           4.9          3.0           1.4          0.2  setosa
## 2           4.7          3.2           1.3          0.2  setosa
## 3           4.6          3.1           1.5          0.2  setosa
## 4           5.0          3.6           1.4          0.2  setosa&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;histogram&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Histogram&lt;/h2&gt;
&lt;p&gt;Let’s plot the histogram using matplotlib first.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;plt.hist(dat[&amp;#39;sepal_length&amp;#39;], bins=30)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Notice that this histogram does not has any label. So, to add a label, we need to do this manually.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;plt.hist(dat[&amp;#39;sepal_length&amp;#39;], bins=30)
plt.xlabel(&amp;#39;Sepal length&amp;#39;) #x-axis label
plt.ylabel(&amp;#39;Frequency&amp;#39;) #y-axis label
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-5-3.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;However, using seaborn, the label is extracted from the variable name, which is pretty convenient.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;sns.histplot(dat[&amp;#39;sepal_length&amp;#39;], bins=30)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-6-5.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Let’s say we want to plot the histogram according to different levels.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;species = [&amp;#39;setosa&amp;#39;, &amp;#39;versicolor&amp;#39;, &amp;#39;virginica&amp;#39;]

for i in species:
    subset = dat[dat[&amp;#39;species&amp;#39;] == i]
    plt.hist(subset[&amp;#39;sepal_length&amp;#39;], label = i)

plt.legend(loc = &amp;#39;upper right&amp;#39;)
plt.xlabel(&amp;#39;Sepal length&amp;#39;)
plt.ylabel(&amp;#39;Frequency&amp;#39;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-7-7.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The codes above are quite long. In seaborn, the histogram above can be generated quite easily.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;sns.histplot(x = &amp;#39;sepal_length&amp;#39;, hue = &amp;#39;species&amp;#39;, data = dat)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-8-9.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;boxplot&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Boxplot&lt;/h2&gt;
&lt;p&gt;First, let’s do boxplot using matplotlib.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;bp = plt.boxplot(dat[&amp;#39;sepal_length&amp;#39;])
plt.xlabel(&amp;#39;Sepal length&amp;#39;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-9-11.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;If we wanto to do boxplot according to other variable. The codes become a bit complicated especially for beginners.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;species = dat.groupby(&amp;#39;species&amp;#39;)
setosa = species.get_group(&amp;#39;setosa&amp;#39;)[&amp;#39;sepal_length&amp;#39;]
versicolor = species.get_group(&amp;#39;versicolor&amp;#39;)[&amp;#39;sepal_length&amp;#39;]
virginica = species.get_group(&amp;#39;virginica&amp;#39;)[&amp;#39;sepal_length&amp;#39;]

bp = plt.boxplot([setosa, versicolor, virginica], labels = [&amp;#39;setosa&amp;#39;, &amp;#39;versicolor&amp;#39;, &amp;#39;virginica&amp;#39;])
plt.xlabel(&amp;#39;Sepal length&amp;#39;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-10-13.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Both plots above are quite easy to do in seaborn. Below are the codes for the basic histogram.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;sns.boxplot(dat[&amp;#39;sepal_length&amp;#39;])
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-11-15.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Next, to plot &lt;code&gt;sepal_length&lt;/code&gt; based on &lt;code&gt;species&lt;/code&gt; is pretty much straightforward in seaborn.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;sns.boxplot(y=&amp;#39;sepal_length&amp;#39;, hue=&amp;#39;species&amp;#39;, data=dat)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-12-17.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;scatter-plot&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Scatter plot&lt;/h2&gt;
&lt;p&gt;Lastly, let’s see the scatter plot using matplotlib.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;plt.scatter(x=dat[&amp;#39;sepal_length&amp;#39;], y=dat[&amp;#39;sepal_width&amp;#39;])
plt.xlabel(&amp;#39;Sepal length&amp;#39;)
plt.ylabel(&amp;#39;Sepal width&amp;#39;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-13-19.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can further extend this plot by categorising it into different species.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;# Define the species to colors mapping
species_to_color = {&amp;#39;setosa&amp;#39;: &amp;#39;blue&amp;#39;, &amp;#39;versicolor&amp;#39;: &amp;#39;green&amp;#39;, &amp;#39;virginica&amp;#39;: &amp;#39;red&amp;#39;}
colors = dat[&amp;#39;species&amp;#39;].map(species_to_color)

# Create the scatter plot
plt.scatter(x=dat[&amp;#39;sepal_length&amp;#39;], y=dat[&amp;#39;sepal_width&amp;#39;], c=colors)
plt.xlabel(&amp;#39;Sepal length&amp;#39;)
plt.ylabel(&amp;#39;Sepal width&amp;#39;)
plt.legend(handles=[plt.Line2D([0], [0], marker=&amp;#39;o&amp;#39;, color=&amp;#39;w&amp;#39;, markerfacecolor=color, markersize=10, label=species) for species, color in species_to_color.items()], title=&amp;#39;Species&amp;#39;)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-14-21.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Now, let’s see the seaborn package. This is the basic scatter plot.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;sns.scatterplot(x=&amp;#39;sepal_length&amp;#39;, y=&amp;#39;sepal_width&amp;#39;, data=dat)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-15-23.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;To extend this plot by categorising it into different species in seaborn is actually quite simple.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;sns.scatterplot(x=&amp;#39;sepal_length&amp;#39;, y=&amp;#39;sepal_width&amp;#39;, hue=&amp;#39;species&amp;#39;, data=dat)
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://tengkuhanis.netlify.app/post/basic-plotting-in-python/index.en_files/figure-html/unnamed-chunk-16-25.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In conclusion, matplotlib and seaborn complement each other well. Seaborn is an excellent choice for quick and standard plots, thanks to its high-level interface. On the other hand, matplotlib offers a more extensive range of customization options and is ideal for creating complex and detailed visualizations. Ultimately, choosing between matplotlib and seaborn depends on the specific requirements of the visualization task.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Basic data wrangling with Python</title>
      <link>https://tengkuhanis.netlify.app/post/basic-data-wrangling-with-python/</link>
      <pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate>
      <guid>https://tengkuhanis.netlify.app/post/basic-data-wrangling-with-python/</guid>
      <description>


&lt;p&gt;&lt;img src=&#34;images/img2.jpeg&#34; style=&#34;width:60.0%;height:40.0%&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Python is one of the most popular programming language and software. In this post, I will demonstrate how to do a basic data wrangling with Python. This is going to be one of the several series of post related to Python (hopefully). My plan is to cover these topics:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Basic data wrangling with Python&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Basic plotting with matplotlib and seaborn&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Comparison of ggplot in R versus in Python&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once I finish writing any of the topics, I will link it to the above.&lt;/p&gt;
&lt;p&gt;So, let’s start.&lt;/p&gt;
&lt;div id=&#34;loading-necessary-packages&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Loading necessary packages&lt;/h2&gt;
&lt;p&gt;Before loading the packages, you need to install the packages. Basically, there are two ways to install the Python packages. Either by pip command or conda command. I will skip this part, but you can refer to &lt;a href=&#34;https://packaging.python.org/en/latest/tutorials/installing-packages/#installing-packages&#34;&gt;this link to install the packages using pip command&lt;/a&gt; or &lt;a href=&#34;https://conda.io/projects/conda/en/latest/user-guide/concepts/installing-with-conda.html#installing-with-conda&#34;&gt;this link to install the packages using conda command&lt;/a&gt;. For those who has both R and Python in your machine, I suggest to use a conda command.&lt;/p&gt;
&lt;p&gt;Let’s load the required packages.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;import numpy as np 
import pandas as pd
from seaborn import load_dataset&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;All the functions from each package can be assessed from the alias or the abbreviated text above. For example, functions in &lt;code&gt;pandas&lt;/code&gt; package can be accessed through &lt;code&gt;pd&lt;/code&gt; or to be specific &lt;code&gt;pd.&lt;/code&gt;. You will see this many times through out this blog post, so do not worry much about this. I am sure you will get the gist of it once you see this later on. In practice, you don’t actually need to use &lt;code&gt;pd&lt;/code&gt; for &lt;code&gt;pandas&lt;/code&gt; and &lt;code&gt;np&lt;/code&gt; for &lt;code&gt;numpy&lt;/code&gt;, but this is a convention or standard practice widely adopted in the Python community.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;load-the-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Load the data&lt;/h2&gt;
&lt;p&gt;We going to use &lt;a href=&#34;https://archive.ics.uci.edu/dataset/53/iris&#34;&gt;iris dataset&lt;/a&gt;. This dataset is readily available in seaborn package.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris = load_dataset(&amp;#39;iris&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once we load the data, we need to check the variable type.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.dtypes&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## sepal_length    float64
## sepal_width     float64
## petal_length    float64
## petal_width     float64
## species          object
## dtype: object&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Variable species, by right, is a categorical variable. So, we can use &lt;code&gt;Categorical()&lt;/code&gt; from &lt;code&gt;pandas&lt;/code&gt; to change it from an object variable type to a category. &lt;code&gt;pd.&lt;/code&gt; here, means we access the function from &lt;code&gt;pandas&lt;/code&gt; package as I explained it previously.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris[&amp;#39;species&amp;#39;] = pd.Categorical(iris[&amp;#39;species&amp;#39;])&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we check the variable type again, we can see the species variable is a category.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.dtypes&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## sepal_length     float64
## sepal_width      float64
## petal_length     float64
## petal_width      float64
## species         category
## dtype: object&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we can also see the data. Let’s see the first 10 rows.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.head(10)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    sepal_length  sepal_width  petal_length  petal_width species
## 0           5.1          3.5           1.4          0.2  setosa
## 1           4.9          3.0           1.4          0.2  setosa
## 2           4.7          3.2           1.3          0.2  setosa
## 3           4.6          3.1           1.5          0.2  setosa
## 4           5.0          3.6           1.4          0.2  setosa
## 5           5.4          3.9           1.7          0.4  setosa
## 6           4.6          3.4           1.4          0.3  setosa
## 7           5.0          3.4           1.5          0.2  setosa
## 8           4.4          2.9           1.4          0.2  setosa
## 9           4.9          3.1           1.5          0.1  setosa&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;slicing-and-indexing&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Slicing and indexing&lt;/h2&gt;
&lt;p&gt;To see a specific column, we can index as below. Notice, that the row number starts with 0 as opposed to R (if you have used R previously) in which the row number starts with 1.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris[&amp;#39;sepal_length&amp;#39;][0:10]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 0    5.1
## 1    4.9
## 2    4.7
## 3    4.6
## 4    5.0
## 5    5.4
## 6    4.6
## 7    5.0
## 8    4.4
## 9    4.9
## Name: sepal_length, dtype: float64&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Similarly, we can also index as below to get the first 10 rows of sepal_length variable.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris[&amp;#39;sepal_length&amp;#39;][:10]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 0    5.1
## 1    4.9
## 2    4.7
## 3    4.6
## 4    5.0
## 5    5.4
## 6    4.6
## 7    5.0
## 8    4.4
## 9    4.9
## Name: sepal_length, dtype: float64&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next to access the first 5 rows, we can do as below.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris[0:5]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    sepal_length  sepal_width  petal_length  petal_width species
## 0           5.1          3.5           1.4          0.2  setosa
## 1           4.9          3.0           1.4          0.2  setosa
## 2           4.7          3.2           1.3          0.2  setosa
## 3           4.6          3.1           1.5          0.2  setosa
## 4           5.0          3.6           1.4          0.2  setosa&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can also use &lt;code&gt;iloc()&lt;/code&gt; and &lt;code&gt;loc()&lt;/code&gt; functions. The main difference between the two functions is that &lt;code&gt;iloc()&lt;/code&gt; can only accept a numerical value and &lt;code&gt;loc()&lt;/code&gt; function can accept a string value.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.iloc[0:2, 0:3] #rows, then columns&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    sepal_length  sepal_width  petal_length
## 0           5.1          3.5           1.4
## 1           4.9          3.0           1.4&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.loc[0:2, [&amp;#39;sepal_length&amp;#39;, &amp;#39;species&amp;#39;]]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    sepal_length species
## 0           5.1  setosa
## 1           4.9  setosa
## 2           4.7  setosa&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Subsequently, we can also slice according a logical condition. Below, we slice the petal_length variable that is above the value of 6.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;ind = iris[&amp;#39;petal_length&amp;#39;] &amp;gt; 6
iris[&amp;#39;petal_length&amp;#39;][ind]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 105    6.6
## 107    6.3
## 109    6.1
## 117    6.7
## 118    6.9
## 122    6.7
## 130    6.1
## 131    6.4
## 135    6.1
## Name: petal_length, dtype: float64&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s say we want our data to include only setosa species.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;ind = iris[&amp;#39;species&amp;#39;] == &amp;#39;setosa&amp;#39;
iris.loc[ind, :].head()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    sepal_length  sepal_width  petal_length  petal_width species
## 0           5.1          3.5           1.4          0.2  setosa
## 1           4.9          3.0           1.4          0.2  setosa
## 2           4.7          3.2           1.3          0.2  setosa
## 3           4.6          3.1           1.5          0.2  setosa
## 4           5.0          3.6           1.4          0.2  setosa&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once we know about slicing and indexing, we can use this knowledge to change certain values. For example, below we change:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;row 1, 2, 3, and 4 of sepal_length to NA values&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;row 6 of species and sepal_width to NA values&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.loc[0:3, &amp;#39;sepal_length&amp;#39;] = np.nan 
iris.iloc[5, [1, 4]] = np.nan&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see the result.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.head(6)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    sepal_length  sepal_width  petal_length  petal_width species
## 0           NaN          3.5           1.4          0.2  setosa
## 1           NaN          3.0           1.4          0.2  setosa
## 2           NaN          3.2           1.3          0.2  setosa
## 3           NaN          3.1           1.5          0.2  setosa
## 4           5.0          3.6           1.4          0.2  setosa
## 5           5.4          NaN           1.7          0.4     NaN&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;missing-values&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Missing values&lt;/h2&gt;
&lt;p&gt;If we want to see if we have any missing values in our data, we can use &lt;code&gt;isnull()&lt;/code&gt; function.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.isnull().any().any() #For overall&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## True&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.isnull().any() #Check for each column&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## sepal_length     True
## sepal_width      True
## petal_length    False
## petal_width     False
## species          True
## dtype: bool&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can further calculate how many missing values that we have.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.isnull().sum()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## sepal_length    4
## sepal_width     1
## petal_length    0
## petal_width     0
## species         1
## dtype: int64&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;descriptive-statistics&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Descriptive statistics&lt;/h2&gt;
&lt;p&gt;To get a basic descriptive statistics, we can use &lt;code&gt;describe()&lt;/code&gt; function. Below, we additionally use &lt;code&gt;round()&lt;/code&gt; to round up the results into one decimal points.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.describe().round()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##        sepal_length  sepal_width  petal_length  petal_width
## count         146.0        149.0         150.0        150.0
## mean            6.0          3.0           4.0          1.0
## std             1.0          0.0           2.0          1.0
## min             4.0          2.0           1.0          0.0
## 25%             5.0          3.0           2.0          0.0
## 50%             6.0          3.0           4.0          1.0
## 75%             6.0          3.0           5.0          2.0
## max             8.0          4.0           7.0          2.0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice that the results above only include numerical variables. So, to get the results for categorical variables as well, we need to add &lt;code&gt;include = all&lt;/code&gt; as below.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris.describe(include = &amp;#39;all&amp;#39;).round()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         sepal_length  sepal_width  petal_length  petal_width     species
## count          146.0        149.0         150.0        150.0         149
## unique           NaN          NaN           NaN          NaN           3
## top              NaN          NaN           NaN          NaN  versicolor
## freq             NaN          NaN           NaN          NaN          50
## mean             6.0          3.0           4.0          1.0         NaN
## std              1.0          0.0           2.0          1.0         NaN
## min              4.0          2.0           1.0          0.0         NaN
## 25%              5.0          3.0           2.0          0.0         NaN
## 50%              6.0          3.0           4.0          1.0         NaN
## 75%              6.0          3.0           5.0          2.0         NaN
## max              8.0          4.0           7.0          2.0         NaN&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Alternatively, we can also calculate the unique values for the categorical variable. &lt;code&gt;value_counts()&lt;/code&gt; only calculate the non-missing values.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris[&amp;#39;species&amp;#39;].value_counts()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## species
## versicolor    50
## virginica     50
## setosa        49
## Name: count, dtype: int64&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Similarly, for numerical variable we can also do manually each statistics. For example to calculate mean, we can use &lt;code&gt;mean()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;iris[&amp;#39;sepal_width&amp;#39;].mean().round()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 3.0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That’s it. These are the basics of handling a dataset in Python. With this knowledge, I hope you feel ready to dive in and explore more on your own.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
