Sometimes you wind up with a collection of different file types and need to filter out just those of certain extensions. The ./generic/extension function provides a convenient way to get the extension.

Obvious, but Inefficient

It's common to see this done with a foreach loop:

   1 xy_files = [];
   2 
   3 foreach [n = v] in my_files do {
   4   xy_files += if ./generic/extension(n) ==  "xy" then [$n = v] else [];
   5 };

This is inefficient for a couple reasons:

Filtering with _map

A better choice is to use the _map primitive function.

   1 /**nocache**/
   2 filter_xy_files(n,v)
   3 {
   4   return if ./generic/extension(n) ==  "xy" then [$n = v] else [];
   5 };
   6 
   7 xy_files = _map(filter_xy_files, my_files);

This avoids both the inefficiencies mentioned above. Each call to filter_xy_files is its own scope, so no long chain of assignments builds up. The results of each function called by _map are combined in the same efficient way as _append which doesn't allow for name conflicts.

(A good rule of thumb for performance is: Use _map first, and foreach only if you must.)

Generic Versions

Of course you probably don't want to keep writing little filter functions for _map, so let's write a re-usable function that will filter out the files with any extension.

Simple Version

   1 /**nocache**/
   2 filter_by_extension(b: binding, ext: text)
   3 {
   4   /**nocache**/
   5   filter_one(n,v)
   6   {
   7     return if ./generic/extension(n) ==  ext then [$n = v] else [];
   8   };
   9   return _map(filter_one, b);
  10 };

With this we can simply do this in our example:

   1 xy_files = filter_by_extension(my_files, "xy");

Multiple Extensions

What if you want to select files with multiple extensions? We can improve out function a little to support that:

   1   /**nocache**/
   2   filter_by_extension(b: binding, ext)
   3   {
   4     /**nocache**/
   5     filter_one(n,v)
   6     {
   7       n_ext = ./generic/extension(n);
   8       return (if ((_is_text(ext) && (n_ext == ext)) ||
   9                   (_is_binding(ext) && ext!$n_ext))
  10               then [$n = v]
  11               else []);
  12     };
  13     return _map(filter_one, b);
  14   };

Now we can get those files ending in "xy" and those ending in "yz" with this:

   1 xy_and_yz_files = filter_by_extension(my_files, [xy=1,yz=1]);

Also, this function can still take a single extension as a text value:

   1 xy_files = filter_by_extension(my_files, "xy");

Inverting the Selection

What if you wanted to remove files with particular extensions instead of removing all other files? With a little more work, our filtering function can do that too.

   1   /**nocache**/
   2   filter_by_extension(b: binding, ext, invert=FALSE)
   3   {
   4     /**nocache**/
   5     filter_one(n,v)
   6     {
   7       n_ext = ./generic/extension(n);
   8       selected = ((_is_text(ext) && (n_ext == ext)) ||
   9                   (_is_binding(ext) && ext!$n_ext));
  10       return (if ((!invert && selected) ||
  11                   (invert && !selected))
  12               then [$n = v]
  13               else []);
  14     };
  15     return _map(filter_one, b);
  16   };

Now we can remove those files ending in "xy" and those ending in "yz" with:

   1 non_xy_non_yz_files = filter_by_extension(my_files, [xy=1,yz=1], TRUE);

Or remove just those ending in "xy" with:

   1 non_xy_files = filter_by_extension(my_files, "xy", TRUE);