Pandas Dataframes are simply a table with some rows (McKinney, 2015) and many additional features built right in to make data scientists lives much easier.
What are NoSQL Pandas dataframes?
Pandas is an open source Python library that provides high performance yet easy to use data structures and data analysis tools (Pandas, 2018).
Dataframes are two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes. They can be thought of as dictionary type containers for Series objects (PandasDocs, 2018); they are also the primary data structure used in Pandas.
Methods to store and manipulate big data with Pandas NoSQL Dataframes
As Pandas is simply a Python library, it means that all conventional Python rules apply, just with the added benefit of being able to utilise a flexible yet powerful library to easily manipulate large data sets; just add the library.
import Pandas
It is easy to use Dataframes to map any specific datapoints, as follows:
student_grades = pandas.DataFrame({"Scores":[80, 70, 75, 47], "Names":["John", "Mary", "Richard", "Peter"]})
more_data = pandas.DataFrame({"Column1":[1, 2, 3, 4], "Column2":[1, 2, 3, 4]})
Oftentimes data needs to be read in from CSV (comma-separated) or TSV (tab-separated) files and Pandas makes this really easy (PythonHow, n.d.):
some_variable = pandas.read_csv("some_csv_file.csv")
Pandas allows a second argument to be passed as follows to cater for alternative separations (tab in this example):
some_variable = pandas.read_csv("some_csv_file.csv", sep="\t")
It is as simple to save a CSV:
some_variable.to_csv("some_other_file.csv");
A lot of the time, big data is already in a JSON format and once again, Pandas makes this simple:
some_variable = pandas.read_json("some_json_file.json")
Normalization of JSON data is often tricky, but Pandas has a way of addressing it with its Pandas.io.json.json_normalize method.
It allows an intuitive semi-structured JSON data object to be converted into a flat table with ease (Bronshtein, 2017).
Schemes that Facilitate CRUD Storage Primitives
The term CRUD stands for Creation, Retrieval, Updating and Deletion of data. It declares the four foundational principles of any data solution in order to maintain parity with its contract.
Pandas provides an effective way to abstract each one of these with minimal code.
Creation
Adding data via Series or Dataframes is mostly the same as adding values to a dictionary type.
Each column is mapped and indexed by an integer representation which can be used for future activities on the elements, either via column or row.
Retrieval
Selecting or getting a specific data item or range of points is done by accessing the original variable with a specified index parameter.
Update
In order to update or change a section of the data, one simply overwrites the known space in the column/row location.
Deletion
Deleting a Pandas dataframe element can be done by utilising the drop method on the appropriate column/row index.
The virtual table will keep it’s known indexes to maintain performance and automatically adjust it’s garbage collection reducers around the removed key and associated values.
References
Pandas (2018) Python Data Analysis Library [Online] Pandas.PyData.org, Available from: https://pandas.pydata.org/ (Accessed on 16th February 2018)
PandasDocs (2018) pandas.DataFrame [Online] Pandas.PyData.org, Available from: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html (Accessed on 16th February 2018)
PythonHow (n.d.) Loading CSV data in Python with pandas [Online] PythonHow.com, Available from: https://pythonhow.com/data-analysis-with-python-pandas/ (Accessed on 16th February 2018)
Bronshtein, A. (2017) A Quick Introduction to the ?Pandas? Python Library [Online] TowardsDataScience.com, Available from: https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-f1b678f34673 (Accessed on 16th February 2018)
McKinney, W. (2015) DataFrames: The Good, Bad, and Ugly [Online] SlideShare.net, Available from: https://www.slideshare.net/wesm/dataframes-the-good-bad-and-ugly (Accessed on 16th February 2018)