Blurring the Lines Between a CLI and the Python REPL

Introduction

Imagine navigating through complex datasets with the ease of a few keystrokes, all while staying within the cozy confines of Python's REPL. No need to jump between a traditional Command Line Interface (CLI) and your code editor. Welcome to the world of interactive Python libraries that turn your REPL into a powerful, almost CLI-like, exploratory tool. Intrigued? Let's delve into this fascinating approach that promises to revolutionize how we interact with Python libraries.

Why Not Just Use a CLI?

Before we dive into the mechanics, you might wonder: "Why not simply build a CLI application?" While CLI apps are excellent for many tasks, they often lack the fluidity and statefulness we need when working with complex data. Imagine wanting to filter, transform, and visualize data all in one go. With CLI, you're often running commands in isolation without an easy way to chain these operations. This is where our approach shines—providing the interactivity of the Python REPL with the structured workflow usually reserved for CLI applications.

What you need to make it work

You need the following 3 things to have a program act like a cli inside the Python REPL.

  1. Self-Displaying Objects: A data wrapper object knows how to display itself meaning that the display logic present in a regular CLI is not needed. Custom __repr__ functions allow you to control exactly how the object looks, which is important for letting users navigate the application.
  2. Indexed Selection: The use of __getitem__ lets you convert a dataframe row into another Self-Displaying Object
  3. Rich Terminal Framework: Using rich lets you mimic an application inside the REPL and improve how dataframes are displayed in the terminal.

Real-World Use Case: Navigating SEC Filings

This pattern of having data wrapped in object that can display themselves is used extensively in edgartools.

Show me the code

class Filings:
    """
    A container for filings
    """
    def __init__(self,
                 data: pd.Dataframe):
        self.data: pa.pd.Dataframe = data

    def get_filing_at(self, item: int):
        """Get the filing at the specified index"""
        return Filing(
            cik=self.data['cik'][item].as_py(),
            company=self.data['company'][item].as_py(),
            form=self.data['form'][item].as_py(),
            filing_date=self.data['filing_date'][item].as_py(),
            accession_no=self.data['accession_number'][item].as_py(),
        )
    
    def __getitem__(self, item):
        return self.get_filing_at(item)
        
    def __repr__(self):
        # render the data as a rich table
        return repr_as_rich_table(self.data)

class Filing:

    """
    A single SEC filing. Allow you to access the documents and data for that filing
    """

    def __init__(self,
                 cik: int,
                 company: str,
                 form: str,
                 filing_date: str,
                 accession_no: str):
        self.cik = cik
        self.company = company
        self.form = form
        self.filing_date = filing_date
        self.accession_no = accession_no

What's Next?

Can you implement this in your application? Sure, wrapping dataframes inside objects work really well, and I've done it a few times now. I'd be curious if this approach catches on, but I'm also bullish on using rich in python apps anyway, so I think we'll see more use of that.

Dwight Gunning

Dwight Gunning