{ "cells": [ { "cell_type": "markdown", "id": "ce0ab035-e7b5-4b8c-a4a1-5d84335e0619", "metadata": {}, "source": [ "# Exporting a DataFrame from a list of sequences" ] }, { "cell_type": "raw", "id": "5f10484e-218f-4d1b-aa9d-7093d38d8cb1", "metadata": { "raw_mimetype": "text/restructuredtext", "tags": [] }, "source": [ ":py:class:`~thebeat.core.Sequence` objects can be quickly converted to a :class:`pandas.DataFrame` using the :py:func:`thebeat.utils.get_ioi_df` function. Below we illustrate how it works.\n", "\n", "The resulting :class:`pandas.DataFrame` is in the `tidy data `_ format, i.e. 'long format', where each individual IOI has its own row." ] }, { "cell_type": "markdown", "id": "bbfed0df-5df1-40a8-b34b-4e077481b112", "metadata": {}, "source": [ "## Preliminaries\n", "First we import some necessary functions and create a NumPy generator object with a seed, so you will get the same output as we. " ] }, { "cell_type": "code", "execution_count": 12, "id": "aa63d2d0-25ed-4579-adea-5c8549f4774e", "metadata": {}, "outputs": [], "source": [ "import thebeat\n", "import numpy as np\n", "import pandas as pd\n", "\n", "rng = np.random.default_rng(seed=123)" ] }, { "cell_type": "raw", "id": "25ec3e54-1dfb-4a2c-8f2b-bed528acfd50", "metadata": { "raw_mimetype": "text/restructuredtext", "tags": [] }, "source": [ "And we create some random :py:class:`~thebeat.core.Sequence` objects:" ] }, { "cell_type": "code", "execution_count": 13, "id": "57ce37f6-2053-4c63-b906-a996ce3f30d2", "metadata": {}, "outputs": [], "source": [ "# Create empty list\n", "sequences = []\n", "\n", "# Create 10 random Sequence objects and add to the list\n", "for _ in range(10):\n", " sequence = thebeat.Sequence.generate_random_normal(n_events=10,\n", " mu=500,\n", " sigma=25,\n", " rng=rng)\n", " sequences.append(sequence)" ] }, { "cell_type": "markdown", "id": "2d310fba-68af-409e-85e2-d26cea745e30", "metadata": {}, "source": [ "See what they look like:" ] }, { "cell_type": "code", "execution_count": 14, "id": "32e91aa0-db01-42bd-b940-e579258b8848", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Sequence(iois=[475.27 490.81 532.2 ... 484.09 513.55 492.09]), Sequence(iois=[491.94 502.43 461.85 ... 503.41 538.3 483.5 ]), Sequence(iois=[492.21 508.44 444.81 ... 518.87 496.35 532.05]), Sequence(iois=[526.85 509.82 500.13 ... 445.7 490.75 504.11]), Sequence(iois=[521.5 544.04 524.83 ... 535.75 496.09 483.16]), Sequence(iois=[484.02 498.47 490.18 ... 500.7 500.71 501.38]), Sequence(iois=[487.96 485.41 478.45 ... 486.42 486.03 492.09]), Sequence(iois=[488.48 464.09 534.13 ... 489.04 494.71 509.1 ]), Sequence(iois=[523.82 537.99 542.6 ... 503.21 481.64 484.49]), Sequence(iois=[520.33 541.05 494.34 ... 493.18 510.56 497.97])]\n" ] } ], "source": [ "print(sequences)" ] }, { "cell_type": "markdown", "id": "0b16ae08-743e-4ea8-a278-2e2e2316dd7e", "metadata": {}, "source": [ "## Creating a simple DataFrame of IOIs" ] }, { "cell_type": "code", "execution_count": 15, "id": "e6bb47d8-16ff-42db-9aed-0bf5700704a1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequence_iioi_iioi
000475.271966
101490.805334
202532.198132
303504.849360
404523.005772
............
494492.915720
595475.121716
696493.178206
797510.561104
898497.966426
\n", "

90 rows × 3 columns

\n", "
" ], "text/plain": [ " sequence_i ioi_i ioi\n", "0 0 0 475.271966\n", "1 0 1 490.805334\n", "2 0 2 532.198132\n", "3 0 3 504.849360\n", "4 0 4 523.005772\n", ".. ... ... ...\n", "4 9 4 492.915720\n", "5 9 5 475.121716\n", "6 9 6 493.178206\n", "7 9 7 510.561104\n", "8 9 8 497.966426\n", "\n", "[90 rows x 3 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = thebeat.utils.get_ioi_df(sequences)\n", "df" ] }, { "cell_type": "markdown", "id": "7b267cc7-4554-4bec-a45c-a7a0bdffbf83", "metadata": {}, "source": [ "## Creating a DataFrame with additional calculations" ] }, { "cell_type": "raw", "id": "9f44f62c-4c65-4ecb-b77a-03df5b0c1ee9", "metadata": { "raw_mimetype": "text/restructuredtext", "tags": [] }, "source": [ "We can pass :py:func:`~thebeat.utils.get_ioi_df` additional functions to its ``additional_functions`` parameter. They will be applied to each provided sequence, and the resulting values will be added to the :class:`pandas.DataFrame`. Note that we need to provide the actual functions, not simply their names. Also note that ``additional_functions`` expects a list, so even if providing only one function, that function needs to be within a list." ] }, { "cell_type": "code", "execution_count": 16, "id": "a06e1976-3ad2-427f-b98b-5cf2b3ad2360", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequence_imeanstdaminamaxioi_iioi
00503.36449917.923263475.271966532.1981320475.271966
10503.36449917.923263475.271966532.1981321490.805334
20503.36449917.923263475.271966532.1981322532.198132
30503.36449917.923263475.271966532.1981323504.849360
40503.36449917.923263475.271966532.1981324523.005772
........................
49501.02871018.898427475.121716541.0450254492.915720
59501.02871018.898427475.121716541.0450255475.121716
69501.02871018.898427475.121716541.0450256493.178206
79501.02871018.898427475.121716541.0450257510.561104
89501.02871018.898427475.121716541.0450258497.966426
\n", "

90 rows × 7 columns

\n", "
" ], "text/plain": [ " sequence_i mean std amin amax ioi_i \\\n", "0 0 503.364499 17.923263 475.271966 532.198132 0 \n", "1 0 503.364499 17.923263 475.271966 532.198132 1 \n", "2 0 503.364499 17.923263 475.271966 532.198132 2 \n", "3 0 503.364499 17.923263 475.271966 532.198132 3 \n", "4 0 503.364499 17.923263 475.271966 532.198132 4 \n", ".. ... ... ... ... ... ... \n", "4 9 501.028710 18.898427 475.121716 541.045025 4 \n", "5 9 501.028710 18.898427 475.121716 541.045025 5 \n", "6 9 501.028710 18.898427 475.121716 541.045025 6 \n", "7 9 501.028710 18.898427 475.121716 541.045025 7 \n", "8 9 501.028710 18.898427 475.121716 541.045025 8 \n", "\n", " ioi \n", "0 475.271966 \n", "1 490.805334 \n", "2 532.198132 \n", "3 504.849360 \n", "4 523.005772 \n", ".. ... \n", "4 492.915720 \n", "5 475.121716 \n", "6 493.178206 \n", "7 510.561104 \n", "8 497.966426 \n", "\n", "[90 rows x 7 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = thebeat.utils.get_ioi_df(sequences=sequences,\n", " additional_functions=[np.mean, np.std, np.min, np.max])\n", "df" ] }, { "cell_type": "markdown", "id": "2a6f6828-0903-4baf-bf3f-c989d4a4f6d8", "metadata": {}, "source": [ "## Saving the DataFrame" ] }, { "cell_type": "raw", "id": "9379c828-52d3-40ac-87f2-54e458530111", "metadata": { "raw_mimetype": "text/restructuredtext", "tags": [] }, "source": [ "To save the :class:`pandas.DataFrame`, we can simply use its :meth:`pandas.DataFrame.to_csv` method:" ] }, { "cell_type": "code", "execution_count": 17, "id": "ef8d8b5c-ccf2-48c7-b566-14933a0ff594", "metadata": {}, "outputs": [], "source": [ "df.to_csv('random_sequences.csv')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 5 }