More

    Data Visualization: Python Seaborn part 1

    1. In the world of Analytics, the best way to get insights is by visualizing the data. We have already used Matplotlib, a 2D plotting library that allows us to plot different graphs and charts.
    2. Another complimentary package that is based on data visualization library is Seaborn, which provide a higher level interface to draw statistical graphics.

    Seaborn

    • Is a python data visualization library for statistical plotting

    • Is based on matplotlib (built on top of matplotlib)

    • Is designed to work with NumPy and pandas data structures

    • Provides a high-level interface for drawing attractive and informative statistical graphics.

    • Comes equipped with preset styles and color palettes so you can create complex, aesthetically pleasing charts with a few lines of code.

    Seaborn vs Matplotlib

    Seaborn is built on top of Python’s core visualization library matplotlib, but it’s meant to serve as a complement, not a replacement.

    • In most cases, we’ll still use matplotlib for simple plotting

    • On Seaborn’s official website, they state: “If matplotlib “tries to make easy things easy and hard things possible”, seaborn tries to make a well-defined set of hard things easy too.

    • Seaborn helps resolve the two major problems faced by Matplotlib, the problems are −
     *      • Default Matplotlib parameters
    
     *      • Working with data frames
    

    In [ ]:

    # Let's see the difference between codes of matplotlib and Seaborn 
    

    In [ ]:

    # Matplotlib 
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    
    x = np.linspace(0, 10, 1000)
    plt.plot(x, np.sin(x), x, np.cos(x));
    plt.show()
    

    In [ ]:

    # Seaborn 
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import seaborn as sns
    
    sns.set()
    
    x = np.linspace(0, 10, 1000)
    # print(x)
    plt.plot(x, np.sin(x), x, np.cos(x));
    plt.show()
    

    Data visualization using Seaborn

    1. Visualizing statistical relationships
    2. Visualizing categorical data

    Visualizing statistical relationships (This can be also defined as relationship between variables)

    The process of understanding relationships between variables of a dataset and how these relationships, in turn, depend on other variables is known as statistical analysis

    relplot()

    • This is a figure-level-function that makes use of two other axes functions for Visualizing Statistical Relationships which are –

    * scatterplot()
    
    * lineplot()
    
    
    
    • By default it plots scatterplot()

    In [ ]:

    import seaborn as sns
    import pandas as pd
    import matplotlib.pyplot as plt
    sns.set()
    

    In [ ]:

    df = sns.load_dataset('tips')
    df.head()
    

    Out[ ]:

    total_billtipsexsmokerdaytimesize
    016.991.01FemaleNoSunDinner2
    110.341.66MaleNoSunDinner3
    221.013.50MaleNoSunDinner3
    323.683.31MaleNoSunDinner2
    424.593.61FemaleNoSunDinner4

    In [ ]:

    df.tail()
    

    Out[ ]:

    total_billtipsexsmokerdaytimesize
    23929.035.92MaleNoSatDinner3
    24027.182.00FemaleYesSatDinner2
    24122.672.00MaleYesSatDinner2
    24217.821.75MaleNoSatDinner2
    24318.783.00FemaleNoThurDinner2

    In [ ]:

    sns.relplot(x = 'total_bill', y = 'tip', data = df, kind = 'scatter')
    plt.show() #that how there is direct relation between the food ordered and tip given. 
    

    In [ ]:

    # We can also change kind to line. 
    sns.relplot(x = 'total_bill', y = 'tip', data = df, kind = 'line')
    plt.show() #there is direct relation between the food ordered and tip given. 
    

    In [ ]:

    # Parameters -
    # • x, y
    # • data
    # • hue: It separtes the colour of dots with their types. 
    # • size
    # • col: It can help to have different sex graphs. 
    # • style: They are used for showing differnt style of points.
    

    In [ ]:

    sns.relplot(x = 'total_bill', y = 'tip', data = df, hue = 'time')
    plt.show() # By using hue we can see different time of lunch and dinner. 
    

    In [ ]:

    sns.relplot(x = 'total_bill', y = 'tip', data = df, hue = 'time', style = 'sex')
    plt.show() # By style we can see circle are male and x are female. 
    

    In [ ]:

    sns.relplot(x = 'total_bill', y = 'tip', data = df, hue = 'time', col='sex')
    plt.show() # col generated two differenet graphs when sex is male or female. 
    

    Let’s do the same with lines.

    In [ ]:

    import seaborn as sns
    import pandas as pd
    import matplotlib.pyplot as plt
    sns.set()
    

    In [ ]:

    print(sns.get_dataset_names())
    
    ['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise', 'flights', 'fmri', 'gammas', 'geyser', 'iris', 'mpg', 'penguins', 'planets', 'tips', 'titanic']
    

    In [ ]:

    df = sns.load_dataset('flights')
    df.head()
    

    Out[ ]:

    yearmonthpassengers
    01949Jan112
    11949Feb118
    21949Mar132
    31949Apr129
    41949May121

    In [ ]:

    df.tail()
    

    Out[ ]:

    yearmonthpassengers
    1391960Aug606
    1401960Sep508
    1411960Oct461
    1421960Nov390
    1431960Dec432

    In [ ]:

    sns.relplot(x = 'year', y = 'passengers', data = df, kind = 'line')
    plt.show() # So the dark blue line gives us exact average and rest of the shade tells us the diversity at that point. 
    

    In [ ]:

    sns.lineplot(x = 'year', y = 'passengers', data = df)
    plt.show()
    

    In [ ]:

    sns.relplot(x = 'year', y = 'passengers', data = df, kind = 'line', hue = 'month')
    plt.show()
    

    In [ ]:

    sns.relplot(x = 'year', y = 'passengers', data = df, kind = 'line', 
                col = 'month')
    plt.show()
    

    Recent Articles

    Related Stories

    BÌNH LUẬN

    Vui lòng nhập bình luận của bạn
    Vui lòng nhập tên của bạn ở đây