NCDatasets.jl
Documentation for NCDatasets.jl
Datasets
NCDatasets.Dataset
— Type.Dataset(filename::AbstractString,mode::AbstractString = "r";
format::Symbol = :netcdf4, attrib = [])
Create a new NetCDF file if the mode
is "c"
. An existing file with the same name will be overwritten. If mode
is "a"
, then an existing file is open into append mode (i.e. existing data in the netCDF file is not overwritten and a variable can be added). With the mode set to "r"
, an existing netCDF file or OPeNDAP URL can be open in read-only mode. The default mode is "r"
. The optional parameter attrib
is an iterable of attribute name and attribute value pairs, for example a Dict
, DataStructures.OrderedDict
or simply a vector of pairs (see example below).
Supported formats:
:netcdf4
(default): HDF5-based NetCDF format.:netcdf4_classic
: Only netCDF 3 compatible API features will be used.:netcdf3_classic
: classic netCDF format supporting only files smaller than 2GB.:netcdf3_64bit_offset
: improved netCDF format supporting files larger than 2GB.
Files can also be open and automatically closed with a do
block.
Dataset("file.nc") do ds
data = ds["temperature"][:,:]
end
Dataset("file.nc", "c", attrib = ["title" => "my first netCDF file"]) do ds
defVar(ds,"temp",[10.,20.,30.],("time",))
end;
mfds = Dataset(fnames,mode = "r"; aggdim = nothing)
Opens a multi-file dataset in read-only "r" or append mode "a". fnames
is a vector of file names. Variables are aggregated over the first unimited dimension or over the dimension aggdim
if specified.
Note: all files are opened at the same time. However the operating system might limit the number of open files. In Linux, the limit can be controled with the command ulimit
[1,2].
All variables containing the dimension aggdim
are aggerated. The variable who do not contain the dimension aggdim
are assumed constant.
[1]: https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux [2]: https://unix.stackexchange.com/questions/8945/how-can-i-increase-open-files-limit-for-all-processes/8949#8949
Base.keys
— Method.keys(ds::Dataset)
Return a list of all variables names in Dataset ds
.
Base.haskey
— Function.haskey(ds::Dataset,varname)
Return true of the Dataset ds
has a variable with the name varname
. For example:
ds = Dataset("/tmp/test.nc","r")
if haskey(ds,"temperature")
println("The file has a variable 'temperature'")
end
This example checks if the file /tmp/test.nc
has a variable with the name temperature
.
Base.getindex
— Method.getindex(ds::Dataset,varname::AbstractString)
Return the NetCDF variable varname
in the dataset ds
as a NCDataset.CFVariable
. The CF convention are honored when the variable is indexed:
_FillValue
will be returned asmissing
scale_factor
andadd_offset
are applied- time variables (recognized by the units attribute) are returned
as DateTime
object.
A call getindex(ds,varname)
is usually written as ds[varname]
.
NCDatasets.variable
— Function.variable(ds::Dataset,varname::String)
Return the NetCDF variable varname
in the dataset ds
as a NCDataset.Variable
. No scaling is applied when this variable is indexes.
NCDatasets.sync
— Function.sync(ds::Dataset)
Write all changes in Dataset ds
to the disk.
Base.close
— Function.close(ds::Dataset)
Close the Dataset ds
. All pending changes will be written to the disk.
NCDatasets.path
— Function.path(ds::Dataset)
Return the file path (or the opendap URL) of the Dataset ds
Variables
NCDatasets.defVar
— Function.defVar(ds::Dataset,name,vtype,dimnames; kwargs...)
defVar(ds::Dataset,name,data,dimnames; kwargs...)
Define a variable with the name name
in the dataset ds
. vtype
can be Julia types in the table below (with the corresponding NetCDF type). Instead of providing the variable type one can directly give also the data data
which will be used to fill the NetCDF variable. The parameter dimnames
is a tuple with the names of the dimension. For scalar this parameter is the empty tuple ()
. The variable is returned (of the type CFVariable).
Note if data
is a vector or array of DateTime
objects, then the dates are saved as double-precision floats and units "days since 1900-00-00 00:00:00" (unless a time unit is specifed with the attrib
keyword described below)
Keyword arguments
fillvalue
: A value filled in the NetCDF file to indicate missing data. It will be stored in the _FillValue attribute.chunksizes
: Vector integers setting the chunk size. The total size of a chunk must be less than 4 GiB.deflatelevel
: Compression level: 0 (default) means no compression and 9 means maximum compression. Each chunk will be compressed individually.shuffle
: If true, the shuffle filter is activated which can improve the compression ratio.checksum
: The checksum method can be:fletcher32
or:nochecksum
(checksumming is disabled, which is the default)attrib
: An iterable of attribute name and attribute value pairs, for example aDict
,DataStructures.OrderedDict
or simply a vector of pairs (see example below)typename
(string): The name of the NetCDF type required for vlen arrays [1]
chunksizes
, deflatelevel
, shuffle
and checksum
can only be set on NetCDF 4 files.
NetCDF data types
NetCDF Type | Julia Type |
---|---|
NC_BYTE | Int8 |
NC_UBYTE | UInt8 |
NC_SHORT | Int16 |
NC_INT | Int32 |
NC_INT64 | Int64 |
NC_FLOAT | Float32 |
NC_DOUBLE | Float64 |
NC_CHAR | Char |
NC_STRING | String |
Example:
julia> data = randn(3,5)
julia> Dataset("test_file.nc","c") do ds
defVar(ds,"temp",data,("lon","lat"), attrib = [
"units" => "degree_Celsius",
"long_name" => "Temperature"
])
end;
[1]: https://web.archive.org/save/https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf-c/nc005fdef005fvlen.html
NCDatasets.dimnames
— Function.dimnames(v::Variable)
Return a tuple of the dimension names of the variable v
.
NCDatasets.name
— Function.name(v::Variable)
Return the name of the NetCDF variable v
.
NCDatasets.chunking
— Function.storage,chunksizes = chunking(v::Variable)
Return the storage type (:contiguous or :chunked) and the chunk sizes of the varable v
.
NCDatasets.deflate
— Function.isshuffled,isdeflated,deflate_level = deflate(v::Variable)
Return compression information of the variable v
. If shuffle is true
, then shuffling (byte interlacing) is activaded. If deflate is true
, then the data chunks (see chunking
) are compressed using the compression level deflate_level
(0 means no compression and 9 means maximum compression).
NCDatasets.checksum
— Function.checksummethod = checksum(v::Variable)
Return the checksum method of the variable v
which can be either be :fletcher32
or :nochecksum
.
NCDatasets.loadragged
— Function. data = loadragged(ncvar,index::Colon)
Load data from ncvar
in the contiguous ragged array representation [1] as a vector of vectors. It is typically used to load a list of profiles or time series of different length each.
The indexed ragged array representation [2] is currently not supported.
[1]: https://web.archive.org/web/20190111092546/http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#contiguousraggedarrayrepresentation [2]: https://web.archive.org/web/20190111092546/http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#indexedraggedarrayrepresentation
Different type of arrays are involved when working with NCDatasets. For instance assume that test.nc
is a file with a Float32
variable called var
. Assume that we open this data set in append mode ("a"
):
using NCDatasets
ds = Dataset("test.nc","a")
v_cf = ds["var"]
The variable v_cf
has the type CFVariable
. No data is actually loaded from disk, but you can query its size, number of dimensions, number elements, ... by the functions size
, ndims
, length
as ordinary Julia arrays. Once you index, the variable v_cf
, then the data is loaded and stored into a DataArray
:
v_da = v_cf[:,:]
Attributes
The NetCDF dataset (as return by Dataset
or NetCDF groups) and the NetCDF variables (as returned by getindex
, variable
or defVar
) have the field attrib
which has the type NCDatasets.Attributes
and behaves like a julia dictionary.
Base.getindex
— Method.getindex(a::Attributes,name::AbstractString)
Return the value of the attribute called name
from the attribute list a
. Generally the attributes are loaded by indexing, for example:
ds = Dataset("file.nc")
title = ds.attrib["title"]
Base.setindex!
— Method.Base.setindex!(a::Attributes,data,name::AbstractString)
Set the attribute called name
to the value data
in the attribute list a
. Generally the attributes are defined by indexing, for example:
ds = Dataset("file.nc","c")
ds.attrib["title"] = "my title"
Base.keys
— Method.Base.keys(a::Attributes)
Return a list of the names of all attributes.
Dimensions
NCDatasets.defDim
— Function.defDim(ds::Dataset,name,len)
Define a dimension in the data set ds
with the given name
and length len
. If len
is the special value Inf
, then the dimension is considered as unlimited
, i.e. it will grow as data is added to the NetCDF file.
For example:
ds = Dataset("/tmp/test.nc","c")
defDim(ds,"lon",100)
This defines the dimension lon
with the size 100.
Base.setindex!
— Method.Base.setindex!(d::Dimensions,len,name::AbstractString)
Defines the dimension called name
to the length len
. Generally dimension are defined by indexing, for example:
ds = Dataset("file.nc","c")
ds.dim["longitude"] = 100
If len
is the special value Inf
, then the dimension is considered as unlimited
, i.e. it will grow as data is added to the NetCDF file.
NCDatasets.dimnames
— Method.dimnames(v::Variable)
Return a tuple of the dimension names of the variable v
.
Groups
NCDatasets.defGroup
— Method.defGroup(ds::Dataset,groupname, attrib = []))
Create the group with the name groupname
in the dataset ds
. attrib
is a list of attribute name and attribute value pairs (see Dataset
).
Base.getindex
— Method.group = getindex(g::NCDatasets.Groups,groupname::AbstractString)
Return the NetCDF group
with the name groupname
. For example:
julia> ds = Dataset("results.nc", "r");
julia> forecast_group = ds.group["forecast"]
julia> forecast_temp = forecast_group["temperature"]
Base.keys
— Method.Base.keys(g::NCDatasets.Groups)
Return the names of all subgroubs of the group g
.
Common methods
One can iterate over a dataset, attribute list, dimensions and NetCDF groups.
for (varname,var) in ds
# all variables
@show (varname,size(var))
end
for (dimname,dim) in ds.dims
# all dimensions
@show (dimname,dim)
end
for (attribname,attrib) in ds.attrib
# all attributes
@show (attribname,attrib)
end
for (groupname,group) in ds.groups
# all groups
@show (groupname,group)
end
Time functions
DateTimeStandard
DateTimeJulian
DateTimeProlepticGregorian
DateTimeAllLeap
DateTimeNoLeap
DateTime360Day
Dates.year(dt::AbstractCFDateTime)
Dates.month(dt::AbstractCFDateTime)
Dates.day(dt::AbstractCFDateTime)
Dates.hour(dt::AbstractCFDateTime)
Dates.minute(dt::AbstractCFDateTime)
Dates.second(dt::AbstractCFDateTime)
Dates.millisecond(dt::AbstractCFDateTime)
convert
reinterpret
timedecode
timeencode
daysinmonth
daysinyear
Utility functions
NCDatasets.ncgen
— Function.ncgen(fname; ...)
ncgen(fname,jlname; ...)
Generate the Julia code that would produce a NetCDF file with the same metadata as the NetCDF file fname
. The code is placed in the file jlname
or printed to the standard output. By default the new NetCDF file is called filename.nc
. This can be changed with the optional parameter newfname
.
NCDatasets.nomissing
— Function.a = nomissing(da)
Retun the values of the array da
of type Array{Union{T,Missing},N}
(potentially containing missing values) as a regular Julia array a
of the same element type and checks that no missing values are present.
a = nomissing(da,value)
Retun the values of the array da
of type Array{Union{T,Missing},N}
as a regular Julia array a
by replacing all missing value by value
.
NCDatasets.varbyattrib
— Function.varbyattrib(ds, attname = attval)
Returns a list of variable(s) which has the attribute attname
matching the value attval
in the dataset ds
. The list is empty if the none of the variables has the match. The output is a list of CFVariable
s.
Examples
Load all the data of the first variable with standard name "longitude" from the NetCDF file results.nc
.
julia> ds = Dataset("results.nc", "r");
julia> data = varbyattrib(ds, standard_name = "longitude")[1][:]
Experimental functions
NCDatasets.ancillaryvariables
NCDatasets.filter
Issues
libnetcdf not properly installed
If you see the following error,
ERROR: LoadError: LoadError: libnetcdf not properly installed. Please run Pkg.build("NCDatasets")
you can try to install netcdf explicitly with Conda:
using Conda
Conda.add("libnetcdf")
NetCDF: Not a valid data type or _FillValue type mismatch
Trying to define the _FillValue
, procudes the following error:
ERROR: LoadError: NCDatasets.NetCDFError(-45, "NetCDF: Not a valid data type or _FillValue type mismatch")
The error could be generated by a code like this:
using NCDatasets
# ...
tempvar = defVar(ds,"temp",Float32,("lonc","latc","time"))
tempvar.attrib["_FillValue"] = -9999.
In fact, _FillValue
must have the same data type as the corresponding variable. In the case above, tempvar
is a 32-bit float and the number -9999.
is a 64-bit float (aka double, which is the default floating point type in Julia). It is sufficient to convert the value -9999.
to a 32-bit float:
tempvar.attrib["_FillValue"] = Float32(-9999.)
Corner cases
An attribute representing a vector with a single value (e.g.
[1]
) will be read back as a scalar (1
) (same behavior in python netCDF4 1.3.1).NetCDF and Julia distinguishes between a vector of chars and a string, but both are returned as string for ease of use, in particular
an attribute representing a vector of chars ['u','n','i','t','s']
will be read back as the string "units"
.
- An attribute representing a vector of chars
['u','n','i','t','s','\0']
will also be read back as the string"units"
(issue #12).
<!– LocalWords: NCDatasets jl Datasets Dataset netCDF –>