User-Defined ENVITask That Masks Some Output Pixels
My last Data Point blog post User-Defined ENVITasks in ENVI 5.2 SP1 introduced how you can use
the new ENVITaskFromProcedure class to create your own custom ENVITasks that
wrap any IDL procedure you have. Today I’m going to explore a concrete example
that combines a few more advanced topics:
- Raster Statistics
- Threshold ROI Masking
- Saving the results with correct pixels masked out
This example is motivated by a recent Tech Support question
I was asked to assist on, but I am fleshing it out more to be a little more useful
to you the reader. The question was about how to use ENVIROIMaskRaster()
to mask pixels out of a raster and then export the results so that the masked
pixels would persist on disk. I could just focus on this part, but I thought
it would be useful to also show how to build the ROI using the ENVI API,
instead of just loading one from a file.
The intent of this task is to take an input raster, evaluate
its histogram, and then mask out any pixels in the top or bottom N% of
the histogram. This is similar to what our linear percent stretch modes do,
but in that case pixels in the top and bottom of the are clamped to a 0 or 255
value in their respective band, with the values in between linearly scaled in
that range. This task will make those extreme pixels invalid, so they are
transparent when visualized and ignored by our analytics.
You might be worried that we need to iterate over all the
pixels in the raster to build the histograms, but fear not, the
ENVIRasterStatistics()
function was introduced in ENVI 5.2, and it provides an optimized
implementation of the histogram calculations. So we can utilize this function
with the HISTOGRAMS and HISTOGRAM_BINSIZE keywords to get the histogram for
each band. By default this function will return a 256 bin histogram, but for
this task we will set the bin size to 1 so that we get (max – min + 1) bins for
each band.
Once we have the histogram, we pass it into Total()
with the /CUMULATIVE keyword. This will result in an array of the same size as
the histogram, where each element is the sum of all the input elements with an
index less than or equal to the output index. The last element of each output
array will be equal to the number of pixels in the raster. So if we divide by
the last element (or number of pixels), then we will get the cumulative
distribution function (CDF) of the pixels for each band.
We can then use a WHERE test to find out how many bins have
a CDF value less than N%, or greater than (1 – N%). This will
tell us what the range of pixel values will be if we want to eliminate the top
and bottom N% of the pixels. We use these values to create a threshold
ROI, using the ENVIROI::AddThreshold procedure.
We have two options for how to use the ROIs to mask the
raster. We can create 1 ROI and add separate thresholds to it for each band,
or we can create multiple ROIs that each have one band threshold. The results
will be the same when we use the ENVIROIMaskRaster()
virtual raster function, only pixels that are outside of every threshold
are masked out. This is a logical AND operation, but I want a logical OR
operation, where every pixel that is outside of any threshold will be
masked out. To do this I need to create separate ROIs for each band threshold
and apply them consecutively using the ENVIROIMaskRaster() function. The nice
benefit of the virtual raster functions is that they don’t commit anything to
disk, they build up a filter chain of raster operations that are executed on
the pixels on a per tile basis, so there is very little performance overhead
when doing this.
Once we have the masked raster, we want to save it to disk
so that the masked pixels will stay masked. We don’t want to waste disk space
by writing out are array of Boolean values, so the “data ignore value” is used
to mark the masked pixels as invalid. Unfortunately there isn’t a one line
solution to this in ENVI 5.2 SP1, so we have to “burn in” the data ignore value
for the masked pixels. In ENVI 5.3 we have added this feature with a
DATA_IGNORE_VALUE keyword on ENVIRaster::Export,
but in the meantime we need to create a new ENVIRaster object and explicitly
change the masked pixel values so that we can persist the masking. So we use
the ENVIRaster()
factory function, and use the INHERITS_FROM keyword to copy over all the
metadata and the spatial reference object from our input raster. We will also
supply the URI that this output raster will be written to, and specify the
DATA_IGNORE_VALUE we want to use, so that the ENVI header file will contain
that metadata field and properly mask out the pixels for us.
The ENVIRasterIterator
class makes the pixel processing a lot simpler. We first need to get a raster
iterator, which is via the ENVIRaster::CreateTileIterator()
function. We’ll omit the keywords, as we want to process the whole raster and
will let it decide the optimal tile size to use. There are a few ways to use
the iterator, and while I’m usually partial to foreach loops over for loops, we
can’t use the foreach syntax this time because we need to get the PIXEL_STATE
output from the ENVIRasterIterator::Next()
fuction. Once we have the PIXEL_STATE array, we use WHERE to find all the
non-zero values (pixels that either already have the data ignore value or are
masked out), and then we replace all the corresponding pixels with the data
ignore value. We then use the ENVIRaster::SetTile
procedure to copy the pixels to the output raster, which takes the current
raster iterator object so that it knows the line/sample/band extents of the
pixel array and commits it to disk correctly. Once we’ve finished iterating
over the raster and copying the masked pixels to the output raster, all we need
to do is call ENVIRaster::Save
on the output raster to finish committing the pixels to disk and to generate
the header file for us.
Here is the final version of my procedure, which uses
keywords for the input raster, the threshold percentage, the data ignore value,
and the output filename to use:
pro maskExtremePixels, INPUT_RASTER=inputRaster,
THRESHOLD_PERCENT=percent, $
DATA_IGNORE_VALUE=dataIgnoreValue,
OUTPUT_FILENAME=outFilename
compile_opt idl2
; calculate raster statistics and histograms
stats = ENVIRasterStatistics(inputRaster,
/HISTOGRAMS, HISTOGRAM_BINSIZE=1)
; create local variable to represent current virtual raster,
starting with input raster
currRaster = inputRaster
; iterate over list of histogram hashes
foreach hist, stats['HISTOGRAMS'], band do begin
; calculate cumulative distribution function from histogram
cdf = Total(hist['COUNTS'], /Cumulative)
cdf /= cdf[-1]
; find CDF bins that are above/below threshold
lowInd = Where(cdf lt percent,
lowCount)
highInd = Where(cdf gt (1.0 - percent),
highCount)
; calculate digital number that corresponds to threshold bins
minVal = stats['MIN', band] + lowCount
maxVal = stats['MAX', band] - highCount
; create ROI to mask the input raster
roi = ENVIRoi(NAME='Band Thresholds', COLOR='Blue')
; add threshold definition for current band to ROI
roi.AddThreshold, inputRaster, band, MIN_VALUE=minVal,
MAX_VALUE=maxVal
; apply ROI to raster to mask out extreme pixels, and update
current reference
currRaster = ENVIRoiMaskRaster(currRaster,
roi)
endforeach
; create raster to commit masked pixels to disk
outRaster = ENVIRaster(INHERITS_FROM=inputRaster, $
DATA_IGNORE_VALUE=dataIgnoreValue, $
URI=outFilename)
; create a raster iterator to access data
iterator = currRaster.CreateTileIterator()
for i = 1, iterator.nTiles do begin
; get the current tile's pixel and pixel value state arrays
tile = iterator.Next(PIXEL_STATE=pixelState)
; identify any pixels which should be masked out
maskInd = where(pixelState ne 0, maskCount)
; apply pixel mask if any pixels in this tile are masked
if (maskCount gt 0) then begin
tile[maskInd] = dataIgnoreValue
endif
; push masked pixels into output raster
outRaster.SetTile, tile, iterator
endfor
; save output raster and header to disk
outRaster.Save
end
Now we want to turn this into an ENVITask, so that we can
run this on the desktop and in the cloud using ENVI Services Engine. All we
need to do is build a task template file that describes the input parameters to
this procedure:
{
"name": "MaskExtremePixels",
"baseClass": "ENVITaskFromProcedure",
"routine": "MaskExtremePixels",
"displayName": "Mask Extreme Pixels",
"description": "This task processes the histogram
of each band on the input ENVIRaster and builds pixel masks that exclude all
pixels that are in the top or bottom percent of the histogram.",
"version": "5.2.1",
"parameters": [
{
"name": "INPUT_RASTER",
"displayName": "Input Raster",
"dataType": "ENVIRASTER",
"direction": "input",
"parameterType": "required",
"description": "Specify the raster to mask."
},
{
"name": "THRESHOLD_PERCENT",
"displayName": "Threshold Percent",
"dataType": "FLOAT",
"direction": "input",
"parameterType": "required",
"description": "Percent of histogram to mask out
from top and bottom."
},
{
"name": "DATA_IGNORE_VALUE",
"displayName": "Data Ignore Value",
"dataType": "DOUBLE",
"direction": "input",
"parameterType": "required",
"description": "Value to use for masked out
pixels in OUTPUT_RASTER."
},
{
"name": "OUTPUT_RASTER",
"keyword": "OUTPUT_FILENAME",
"displayName": "Output Raster",
"dataType": "ENVIRASTER",
"direction": "output",
"parameterType": "required",
"description": "Masked version of
INPUT_RASTER."
}
]
}
For the most part, this task template is pretty
straightforward. We provide a name and description, then since we are wrapping
a procedure we use the ENVITaskFromProcedure class for baseClass and the
procedure name for routine. Then we have the list of parameters, with their
name, displayName, dataType, direction, parameterType, and description
attributes. The only anomaly to this is the OUTPUT_RASTER parameter. If we
look at the procedure signature, you’ll see the keyword OUTPUT_FILENAME, which
is an input string telling the procedure where to write the raster to disk.
When we wrap it as a task, we want to make life easier for task consumers, so
we make the output parameter of type ENVIRaster, and set its keyword to OUTPUT_FILENAME.
When this task template is loaded, the task framework will automatically add an
input parameter of type ENVIURI that wraps this same keyword, appending the “_URI”
suffix to the original parameter’s name. So the user could explicitly set the
OUTPUT_RASTER_URI parameter value to control where the masked raster is written
to, or they can leave it unset in which case the task framework will generate a
temporary filename for that keyword.
Once you copy this task template to a file with the
extension .task and the procedure PRO file to the custom code folder
(C:\Program Files\Exelis\ENVI52\custom_code on Windows), or one of the other
documented locations
you can then use it. Here is an example showing how:
e = envi()
file = FilePath('qb_boulder_msi', ROOT_DIR=e.ROOT_DIR, SUBDIR='data')
oRaster = e.OpenRaster(file)
oTask = enviTask('maskExtremePixels')
oTask.INPUT_RASTER = oRaster
oTask.THRESHOLD_PERCENT = 0.02
oTask.DATA_IGNORE_VALUE = -1
oTask.Execute
oView = e.getView()
oLayer = oView.CreateLayer(oTask.OUTPUT_RASTER)