This Week I Learned: 2020 W22

A new-to-me SQL clause and a Ruby idiom

Published May 31, 2020

SQL HAVING clause

When getting used to a new application (or domain area), I’ve found it useful to look at its data. In particular, in my team’s region of the Rails monolith, I’ve learned so much from having SQL access to a sanitized read replica. It’s been mostly simple questions, really:

  1. How many records are in this table? For example, if select count(*) from t returns less than 100 records, it’s likely to be some set of domain constants.

  2. What is the distribution of enumerated values? First, if there’s a column named status (or similar), you’re dealing with some kind of lifecycle. What are its valid states? (select distinct(status) from t) How many items are currently in each of those states? (select status, count(*) from t group by status)

  3. If there are date fields (that aren’t the typical created_at or updated_at), how are those distributed? I can’t think of any generalizable examples for this. But in Payments Engineering, this is how you could discover that our world runs in weekly, monthly, and yearly cycles.

This week, I was exploring the lifecycle status and date distributions for a particular table. The question I wanted to answer was this: Which users started using the product today?

I had enough SQL experience to know that this would get me all the user IDs with the date they first used the product:

1
2
3
4
select user_id, min(occurred_at)
from product_usage
where status = 'committed'
group by user_id

(I changed the domain-specific names to make the interesting part clearer.)

But only getting the users whose first usage date was today (for example), looks like it would need a nested query, because you can’t filter based on the value of an aggregate function.

Unless you know about HAVING, since it lets you do exactly that! Here’s the tweak that makes this work:

1
2
3
4
5
select user_id, min(occurred_at) as first_usage_date
from product_usage
where status = 'committed'
group by user_id
having first_usage_date = DATE('2020-05-31')

(The first_usage_date alias isn’t necessary, but I think it’s more readable that way.)

I suspect this is part of ANSI SQL, but I don’t feel like paying $185 to buy a copy of the standard from ISO. So if you want to read up on the actual mechanics, check your database’s documentation. The Postgresql docs probably apply even if you’re using something else.

Ruby extend self idiom

Ruby’s extend allows you to add a module’s methods to a given object (which I know as a “mixin”). This has been written about many times elsewhere, so I’ll skip the explanation here.

What I learned about this week was the extend self idiom:

1
2
3
4
5
6
7
module M
  def hello
    puts 'Hello!'
  end

  extend self
end

Usually, the instance method hello would only be useful when you use include M in another class to add M’s methods to that class. This trick uses extend (instead of include) to expose M’s instance methods as class methods too!

Ruby global variable tracing

I encountered my first real-world usage of a Ruby global variable this week: $stats is a handle to the metrics API client. I recognized it because the dollar-variable syntax is taught in most Ruby intros (shoutout to Learn X in Y minutes). It’s usually followed by some kind of warning to avoid global state. This is surprising to me because, so far, my struggles with Ruby have been all about everything being globally visible all the time.

In trying to figure out the differences between a “real” global variable and something defined at the top-level (technically on Object), I actually found one way they’re better: trace_var! trace_var lets you set up a handle to run on every assignment to that variable. (Technically you can do this by defining the getter-setter method pair for any given name, but with globals you get it for free.)

The intended use case seems to be logging the newly-set value:

1
2
trace_var :$x, proc { |val| puts "Set $x to #{val}" }
$x = 10

prints

Set $x to 10

The docs don’t seem to mention this, so it may not be guaranteed, but I consistently see these callbacks handled as a stack (last-in first-out):

1
2
3
trace_var :$x, proc { puts 'First handler' }
trace_var :$x, proc { puts 'Second handler' }
$x = 10

prints

Second handler
First handler

Naturally, the next thing I tried to do was get a pair of callbacks to enter an infinite loop, but that didn’t seem to work:

1
2
3
4
5
6
trace_var :$x, proc { |x| puts "$x is now #{x}"; $y = $x + 1 }
$x = 0
puts "Suprise $y => #{$y}"

trace_var :$y, proc { |y| puts "$y is now #{y}"; $x = $y + 1 }
$x = 1

prints

$x is now 0
Surprise $y => 1
$x is now 1
$y is now 2

and exits cleanly.

Sadly, the Perl-like special variables aren’t truly globals, so this doesn’t print anything special:

1
2
3
4
5
6
7
8
9
special_vars = %i($! $1 $2 $3) # The ones I tried
special_vars.each do |sym|
  trace_var sym, proc { |val| puts "#{sym} <- #{val}" }
end

/123(.*)abc/.match('123---abc')
puts $1
raise StandardError, 'test'
puts $!