I stayed up past my bedtime recently and made a script and later a web app to convert a dataclass to a non-dataclass. The web app is powered by a WebAssembly build of Python (which also powers my Python pastebin tool).
While making this script I found excuses to use odd Python features, the most interesting being Python's match
-case
statement.
Python 3.10 added a match
-case
block that folks often assume to be equivalent to the switch
-case
blocks from other programming languages.
While you can use match
-case
like switch
-case
, you usually wouldn't: match
-case
is both more powerful and more complex than switch
-case
.
Python's match
-case
blocks are for structural pattern matching -- that phrase sounds complex because it is!
I'll write a follow-up post soon on how this script works at a high level, but right now I'd like to talk about my adventures using structural pattern matching to writing this code.
Update: the follow-up post is now available: How I made a dataclass remover.
First let's briefly talk about why I made this tool.
Why would anyone want to convert a dataclass into "not a dataclass"?
There are trade offs with using dataclasses: performance concerns (which don't usually matter) and edge cases where things get weird (__slots__
and slots=True
are both finicky).
But my reason for creating this dataclass to regular class converter was to help me better teach dataclasses.
Seeing the equivalent code for a dataclass helps us appreciate what dataclasses do for us.
Okay let's dive into match
-case
.
I knew the adventure I was embarking on involved parsing Python code. I don't usually parse Python code: I leave that up to tools like Black, flake8, and the Python interpreter itself.
But I did know that Python's ast
module had a parse
function which could accept a string representing Python code and return an "abstract syntax tree" (often shortened to AST) that represented that Python code.
Using ast.parse
to get a tree of AST nodes was easy.
The hard part came in making sense of those deeply-nested AST nodes.
I found myself writing a lot of if
-elif
blocks with very complex conditions.
Take this code for example:
if isinstance(node, ast.Call):
if (isinstance(node.func, ast.Attribute)
and node.func.value.id == "dataclasses"
and node.func.attr == "dataclass"):
return True
elif node.func.id == "dataclass":
return True
elif (isinstance(node, ast.Attribute)
and node.value.id == "dataclasses"
and node.value.attr == "dataclass"):
return True
elif isinstance(node, ast.Name) and node.id == "dataclass":
return True
else:
return False
That code checks for 4 different uses of the dataclass
decorator:
dataclasses.dataclass(...)
dataclass(...)
dataclasses.dataclass
dataclass
After writing the above code I remembered playing with match
-case
shortly after Python 3.10 was released.
Seeing those isinstance
checks in particular made me think "wait a minute, match
-case
was made for this!"
After introspecting (via breakpoint
and Python's debugging friends), I found that I could refactor the above if
-elif
into this equivalent match
-case
block:
match node:
case ast.Call(
func=ast.Attribute(
value=ast.Name(id="dataclasses"),
attr="dataclass",
),
):
return True
case ast.Call(func=ast.Name(id="dataclass")):
return True
case ast.Attribute(
value=ast.Name(id="dataclasses"),
attr="dataclass"
):
return True
case ast.Name(id="dataclass"):
return True
case _:
return False
With each of the case
statements I wrote above, assertions were made about:
node.func.value.id
That first case
statement nicely demonstrates the power of match
-case
for matching deeply-nested data structures.
We're using a single expression to confirm that node
is a Call
statement and the expression it's calling is an attribute lookup of dataclasses.dataclass
:
case ast.Call(
func=ast.Attribute(
value=ast.Name(id="dataclasses"),
attr="dataclass",
),
)
Compare that to these nested if
statements, which do the same thing:
if isinstance(node, ast.Call):
if (isinstance(node.func, ast.Attribute)
and node.func.value.id == "dataclasses"
and node.func.attr == "dataclass")
Both of those blocks of code say "I have a Call
object which contains an Attribute
object which has a specific attr
and also contains a Name
with a certain id
".
But the match
-case
statement does that so much more succinctly and I found it much more readable than the equivalent if
-elif
.
During this match
-case
refactoring I realized I needed an easy way to say "this attribute could be either A or B".
I dug through the structural pattern matching tutorial PEP and (fortunately) found just what I needed: the |
operator.
The |
operator allows a single case
statement to match against multiple patterns at once.
Instead of this giant if
statement (note that giant elif
clause):
if subnode.value == None:
field = dataclasses.field()
elif (isinstance(subnode.value, ast.Call) and (
isinstance(subnode.value.func, ast.Name)
and subnode.value.func.id == "field"
or
isinstance(subnode.value.func, ast.Attribute)
and isinstance(subnode.value.func.value, ast.Name)
and subnode.value.func.value.id == "dataclasses"
and subnode.value.func.value.attr == "field")):
field = dataclasses.field(**{
kwarg.arg: parse_field_argument(kwarg.arg, kwarg.value)
for kwarg in subnode.value.keywords
})
else:
field = dataclasses.field(default=ast.unparse(subnode.value))
I wrote this match
statement:
match subnode:
case ast.AnnAssign(value=None):
field = dataclasses.field()
case ast.AnnAssign(
value=ast.Call(
func=
ast.Name(id="field")
|
ast.Attribute(value=ast.Name(id="dataclasses"), attr="field")
)
):
field = dataclasses.field(**{
kwarg.arg: parse_field_argument(kwarg.arg, kwarg.value)
for kwarg in subnode.value.keywords
})
case ast.AnnAssign():
field = dataclasses.field(default=ast.unparse(subnode.value))
That match
statement is very complex, but it's much less visually dense than that if
statement was.
That second case
statement ensures that the annotated assignment node we're matching has a value
attribute which is either field(...)
or dataclasses.field(...)
.
case ast.AnnAssign(
value=ast.Call(
func=
ast.Name(id="field")
|
ast.Attribute(value=ast.Name(id="dataclasses"), attr="field")
)
):
Writing this 8 line long case
statement with that "or pattern" felt very silly.
But I found that I prefer it over the alternative elif
logic:
elif (isinstance(subnode.value, ast.Call) and (
isinstance(subnode.value.func, ast.Name)
and subnode.value.func.id == "field"
or
isinstance(subnode.value.func, ast.Attribute)
and isinstance(subnode.value.func.value, ast.Name)
and subnode.value.func.value.id == "dataclasses"
and subnode.value.func.value.attr == "field")):
The Zen of Python says "simple is better than complex" but it also says complex is better than complicated.
Both the elif
and case
statements above are complex because making sense of abstract syntax trees is an inherently complex activity.
But that case
statement seems a bit less complicated than the elif
equivalent.
The last match
-case
feature I discovered caught me by surprise.
In this if
statement, the third condition can't be boiled down to a simple structural pattern in match
-case
land:
if isintance(node, ast.ImportFrom) and node.module == "dataclasses":
continue # Don't import dataclasses anymore
elif isinstance(node, ast.Import) and node.names[0].name == "dataclasses":
continue # Don't import dataclasses anymore
elif isinstance(node, ast.ClassDef) and any(
is_dataclass_decorator(n)
for n in node.decorator_list
):
need_total_ordering |= update_dataclass_node(node)
new_nodes.append(node)
else:
new_nodes.append(node)
At first I thought I needed to give up on using match
-case
for that condition and resort to a nested if
-else
statement.
But then I stumbled upon guard clauses.
Guard clauses are handy when you need a case
clause that has some actual boolean logic in it.
Using a guard clause, the above if
-elif
can be rewritten like this (note that third case
statement with that if
condition on the end):
match node:
case ast.ImportFrom(module="dataclasses"):
continue # Don't import dataclasses anymore
case ast.Import(names=[ast.alias("dataclasses")]):
continue # Don't import dataclasses anymore
case ast.ClassDef() if any(
is_dataclass_decorator(n)
for n in node.decorator_list
):
need_total_ordering |= update_dataclass_node(node)
new_nodes.append(node)
case _:
new_nodes.append(node)
In that third case
statement we're checking the type of the node
(something match
-case
statements are great at) and we're also asking a complex question about the decorator_list
attribute of that node (thanks to that if
guard clause with that any(...)
logic).
While I did find a guard clause helpful here, this feature does feel like an escape hatch that should only be used when there's not a more readable alternative.
This undataclassing adventure was not my first time using match
-case
.
But before I wrote undataclass.py
most of my match
-case
statements involved matching iterables.
For example while prepping a talk on match-case for my local meetup, I noticed that this Django template tag parsing function:
def do_get_available_languages(parser, token):
args = token.contents.split()
if len(args) != 3 or args[1] != "as":
raise TemplateSyntaxError(
"'get_available_languages' requires 'as variable' (got %r)" % args
)
return GetAvailableLanguagesNode(args[2])
Could be rewritten like this:
def do_get_available_languages(parser, token):
match token.split_contents():
case [name, "for", code "as" info]:
return GetLanguageInfoNode(parser.compile_filter(code), info)
case [name, *rest]:
raise TemplateSyntaxError(
f"'{name}' requires 'for string as variable' (got {rest!r})"
)
Even if you don't understand how structural pattern matching works, that second block of code is likely easier to make guesses about at a glance.
Just like with tuple unpacking, that match
-case
statement visually demonstrates the shape of our code.
Python's match
-case
statement can even be used to match nested dictionary items.
For example this nested dictionary-processing code:
if webhook_data["event_type"] == "order_created":
customer_id = webhook_data["content"]["customer"]["id"]
order = webhook_data["content"]["order"]
process_order(customer_id, order)
elif webhook_data["event_type"] == "payment":
customer_id = webhook_data["content"]["customer"]["id"]
order = webhook_data["content"]["payment"]
process_payment(customer_id, payment)
else:
process_other(webhook_data)
Could be refactored to use structural pattern matching like this:
match webhook_data:
case {
"event_type": "order_created",
"content": {
"order": order,
"customer": {"id": customer_id},
},
}:
process_order(customer_id, order)
case {
"event_type": "payment",
"content": {
"payment": payment,
"customer": {"id": customer_id},
},
}:
process_payment(customer_id, payment)
case _:
process_other(webhook_data)
Is that clearer?
I'm not sure.
But it's definitely much more visually-oriented: those case
statements kind of like the webhook_data
object that we're trying to describe.
Along with tuple unpacking and list comprehensions, match
-case
results in code that looks like the objects we're describing.
Matching a nested dictionary results in code that looks like a nested dictionary.
Matching a list or tuple of length N
involves writing a list of length N
.
And in my case, matching an abstract syntax tree involves writing code that looks like an abstract syntax tree.
match
-case
Python's match
-case
statement is both complex and amazing.
I do not recommend using match
-case
in cases where an if
-elif
block is simpler (which is most of the time).
But, like many complex abstractions, match
-case
does have its uses.
In particular, using structural pattern matching can make the intent of AST-matching code easier to understand at a glance.
You should consider match
-case
statements when:
isinstance
checkingThough in all likelihood, you don't need match
-case
and your code would likely be simpler without it.
Python's structural pattern matching definitely makes parsing Python code much easier and I'm grateful I thought to use it when creating my undataclass tool.
If you're wondering how that undataclass.py
script works, I've written a full explanation of this dataclass converter here.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.