While cleaning up some code I ran across some obscure code of mine from my Ruby youth. I remembered reading about an ultra cool Ruby tool the other day so I decided to give my code a good flogging. The victim for tonight is a method used to slice ID numbers into chunks of at most 4 characters long. This is required to overcome the typical Linux file system limitation of at most 32000 entries per directory. In this case the project stores roughly a million thumbnail images on disk. We start with the original piece of code:
1 2 3 4 5 6 7 8 |
def splice_number(number, part_size = 4) n = number.to_s r = [] return r if n.size.zero? (n.size / part_size).times { |t| r << n[(t*part_size)..((t+1)*part_size-1)] } r << n[-(n.size % part_size)..n.size] if (n.size % part_size) > 0 r end |
Ah yes, WTF was I thinking when I wrote this? Who cares, it seemed very clever then! What does Flog think about this?
Total score = 28.35
none#splice_number: (28)
7: size
3: *
2: %
2: []
2: <<
1: +
1: -@
1: -
1: /
1: lit_fixnum
1: zero?
1: >
1: to_s
1: times
Pretty good score. Now it’s time for some torturing. Say hello to my little friend: unpack!
1 2 3 4 |
def splice_number(number, part_size = 4) n = number.to_s n.unpack("a#{part_size}" * (n.size / part_size) + ((n.size % part_size == 0) ? "" : "a*")) end |
Flog?
Total score = 13.05
none#splice_number: (13)
3: size
1: %
1: /
1: ==
1: *
1: +
1: to_s
1: unpack
0: lit_fixnum
Yeow, well over half the pain gone!! Perhaps we can still improve by being less clever?
1 2 3 4 5 6 |
def splice_number(number, part_size = 4) n = number.to_s r = n.unpack("a#{part_size}" * (n.size / part_size) + "a*") r.delete("") r end |
More lines, but less code! Hmm?
Total score = 9.25
none#splice_number: (9)
1: size
1: /
1: *
1: +
1: delete
1: to_s
1: unpack
0: lit_fixnum
Weeh, a full 2/3 of the pain flogged out of the code! That’s all the torture I’ll do for tonight..
Update: Okay, couldn’t help myself, last blow:
1 2 3 4 5 6 7 8 |
def splice_number(number, part_size = 4) n = number.to_s p = "a#{part_size}" t = n.size / part_size r = n.unpack(p * t + "a*") r.delete("") r end |
With score:
Total score = 8.05
none#splice_number: (8)
1: *
1: size
1: +
1: delete
1: to_s
1: /
1: unpack
0: lit_fixnum
Done..

What about:
number.to_s.scan(/\d{1,4}/)
I just had to pass my solution to flog:
<pre>
Total score = 2.89233901885654
main#splice_number: (2.9)
1.3: to_s
1.1: scan
1.1: assignment
0.3: lit_fixnum
</pre>
I simplified a bit to much, didn’t put the ‘part_size’ into the equation, this should do the same as yours:
<pre>
def splice_number(number, part_size = 4)
p = Regexp.new("\\d{1,#{part_size}}")
number.to_s.scan(p)
end
</pre>
And results in a 4.4
<pre>
Total score = 4.36928197762516
main#splice_number: (4.4)
2.2: assignment
1.3: to_s
1.1: new
1.1: scan
0.3: lit_fixnum
</pre>
Excellent!!
I completely miss the presence of the scan method call, and in Ruby fashion is supports regex!
-andy
You miss the presence of the scan call? The above really works in ruby, I’ve tested it!
What I meant, *I* didn’t know about the scan call in the Ruby String class
.. very handy indeed, thanks!!<br/>
<br/>
-andy
<pre>$ flog
def splice_number(number, part_size = 4)
number.to_s.scan(/\d{1,#{part_size}}/)
end
Total score = 3.775
none#splice_number: (3.8)
1.3: to_s
1.1: scan
1.1: assignment
0.3: lit_fixnum</pre>
Great article… unpack is indeed the way to go.<br/>
That a* + delete is more expensive than just seeing if it is needed in the first place:<br/>
<br/>
<pre>
def splice_number6(n, p = 4)
# flog = 10.92, time = 5.17
n = n.to_s
s = n.size
c = s / p
fmt = "a#{p}" * c
fmt << "a*" unless s == p * c
n.unpack(fmt)
end
</pre>
Here are the flog (1.1) scores + times for the above implementations:
<pre>
flog = 32.26, time = 10.48
flog = 10.51, time = 06.54
flog = 10.36, time = 06.70
flog = 04.37, time = 12.32 # peter
flog = 02.89, time = 11.82 # eric
flog = 10.92, time = 05.17 # me
</pre>
@EricHodel Thanks, didn’t know you could use th #{ notation inside a regexp like that!
@RyanDavis Yeah, I’d expect regex to be outperformed by unpack. The regexp method is IMHO a bit more flexible.. one could easily make it have a ‘min_part_size’ or work on character strings.