Documentation read from 07/17/2019 13:52:24 version of /vol/public-pseed/FIGdisk/dist/releases/cvs.1555556707/common/lib/FigKernelPackages/ERDB.pm.
The Entity-Relationship Database Package allows the client to create an easily-configurable database of Entities connected by Relationships. Each entity is represented by one or more relations in an underlying SQL database. Each relationship is represented by a single relation that connects two entities. Entities and relationships are collectively referred to in the documentation as objects.
Although this package is designed for general use, most examples are derived from the world of bioinformatics, which is where this technology was first deployed.
Each entity has at least one relation,
the primary relation,
that has the same name as the entity.
The primary relation contains a field named id
that contains the unique identifier of each entity instance.
An entity may have additional relations that contain fields which are optional or can occur more than once.
For example,
the Feature
entity has a feature-type attribute that occurs exactly once for each feature.
This attribute is implemented by a feature_type
column in the primary relation Feature
.
In addition,
however,
a feature may have zero or more aliases.
These are implemented using a FeatureAlias
relation that contains two fields-- the feature ID (id
) and the alias name (alias
).
The Feature
entity also contains an optional virulence number.
This is implemented as a separate relation FeatureVirulence
which contains an ID (id
) and a virulence number (virulence
).
If the virulence of a feature ABC is known to be 6,
there will be one row in the FeatureVirulence
relation possessing the value ABC as its ID and 6 as its virulence number.
If the virulence of ABC is not known,
there will not be any rows for it in FeatureVirulence
.
Entities are connected by binary relationships implemented using single relations possessing the same name as the relationship itself and that has an 1-to-many (1M
) or many-to-many (MM
).
Each relationship's relation contains a from-link
field that contains the ID of the source entity and a to-link
field that contains the ID of the target entity.
The name of the relationship is generally a verb phrase with the source entity as the subject and the target entity as the object.
So,
for example,
the ComesFrom relationship connects the Genome
and Source
entities,
and indicates that a particular source organization participated in the mapping of the genome.
A source organization frequently participates in the mapping of many genomes,
and many source organizations can cooperate in the mapping of a single genome,
so this relationship has an arity of many-to-many (MM
).
The relation that implements the ComesFrom
relationship is called ComesFrom
and contains two fields-- from-link
,
which contains a genome ID,
and to-link
,
which contains a source ID.
A relationship may itself have attributes.
These attributes,
known as intersection data attributes,
are implemented as additional fields in the relationship's relation.
So,
for example,
the IsMadeUpOf relationship connects the Contig entity to the Sequence entity,
and is used to determine which sequences make up a contig.
The relationship has as an attribute the start-position,
which indicates where in the contig that the sequence begins.
This attribute is implemented as the start_position
field in the IsMadeUpOf
relation.
The database itself is described by an XML file. In addition to all the data required to define the entities, relationships, and attributes, the schema provides space for notes describing the data and what it means and information about how to display a diagram of the database. These are used to create web pages describing the data.
Special support is provided for text searching. An entity field can be marked as searchable, in which case it will be used to generate a text search index in which the user searches for words in the field instead of a particular field value.
Considerable support is provided for loading a database from flat files.
The flat files are in the standard format expected by the MySQL LOAD DATA INFILE
command.
This command expects each line to represent a database record and each record to have all the fields specified,
in order,
with tab characters separating the fields.
The ERDBLoadGroup object can be subclassed and used to create load files that can then be loaded using the ERDBLoader.pl command; however, there is no requirement that this be done.
In order to use the load facility, the constructor for the database object must be able to function with no parameters or with the parameters construed as a hash. The following options are used by the ERDB load facility. It is not necessary to support them all.
XML database definition file.
Name of the database to use.
Socket for accessing the database.
Name and password used to log on to the database, separated by a slash.
Database host name.
The ERDB system supports many different data types.
It is possible to configure additional user-defined types by adding PERL modules to the code.
Each new type must be a subclass of ERDBType.
Standard types are listed in the compile-time STANDARD_TYPES constant.
Custom types should be listed in the $ERDBExtras::customERDBtypes
variable of the configuration file.
The variable must be a list reference containing the names of the ERDBType subclasses for the custom types.
To get complete documentation of all the types, use the "ShowDataTypes" method. The most common types are
Signed whole number with a range of roughly negative 2 billion to positive 2 billion. Integers are stored in the database as a 32-bit binary number.
Variable-length string, up to around 250 characters. Strings are stored in the database as variable-length ASCII with some escaping.
Variable-length string, up to around 65000 characters. Text is stored in the database as variable-length ASCII with some escaping. Only the first 250 characters can be indexed.
Double-precision floating-point number, ranging from roughly -10^-300 to 10^-300, with around 14 significant digits. Floating-point numbers are stored in the database in IEEE 8-byte floating-point format.
Date/time value, in whole seconds. Dates are stored as a number of seconds from the beginning of the Unix epoch (January 1, 1970) in Universal Coordinated Time. This makes it identical to a date or time number in PERL, Unix, or Windows.
All data fields are converted when stored or retrieved using the "EncodeField" and "DecodeField" methods. This allows us to store very exotic data values such as string lists, images, and PERL objects. The conversion is not, however, completely transparent because no conversion is performed on the parameter values for the various "Get"-based queries. There is a good reason for this: you can specify general SQL expressions as filters, and it's extremely difficult for ERDB to determine the data type of a particular parameter. This topic is dealt with in more detail below.
There are several places in which field names are specified by the caller. The standard field name format is the name of the entity or relationship followed by the field name in parentheses. In some cases there a particular entity or relationship is considered the default. Fields in the default object can be specified as an unmodified field name. For example,
Feature(species-name)
would specify the species name field for the Feature
entity. If the Feature
table were the default, it could be specified as
species-name
without the object name. You may also use underscores in place of hyphens, which can be syntactically more convenient in PERL programs.
species_name
In some cases, the object name may not be the actual name of an object in the database. It could be an alias assigned by a query, or the converse name of a relationship. Alias names and converse names are generally specified in the object name list of a query method. The alias or converse name used in the query method will be carried over in all parameters to the method and any data value structures returned by the query. In most cases, once you decide on a name for something in a query, the name will stick for all data returned by the query.
Queries against the database are performed by variations of the "Get" method. This method has three parameters: the object name list, the filter clause, and the parameter list. There is a certain complexity involved in queries that has evolved over a period of many years in which the needs of the applications were balanced against a need for simplicity. In most cases, you just list the objects used in the query, code a standard SQL filter clause with field names in the "Standard Field Name Format", and specify a list of parameters to plug in to the parameter marks. The use of the special field name format and the list of object names spare you the pain of writing a FROM
clause and worrying about joins. For example, here's a simple query to look up all Features for a particular genome.
my $query = $erdb->Get('Genome HasFeature Feature', 'Genome(id) = ?', [$genomeID]);
For more complicated queries, see the rest of this section.
The object name list specifies the names of the entities and relationships that participate in the query. This includes every object used to filter the query as well as every object from which data is expected. The ERDB engine will automatically generate the join clauses required to make the query work, which greatly simplifies the coding of the query. You can specify the object name list using a list reference or a space-delimited string. The following two calls are equivalent.
my $query = $erdb->Get(['Genome', 'UsesImage', 'Image'], $filter, \@parms); my $query = $erdb->Get('Genome UsesImage Image', $filter, \@parms);
If you specify a string, you have a few more options.
AND
to start a new join chain with an object further back in the list.These requirements do not come up very often, but they can make a big differance.
For example, let us say you are looking for a feature that has a role in a particular subsystem and also belongs to a particular genome. You can't use
my $query = $erdb->Get(['Feature', 'HasRoleInSubsystem', 'Subsystem', 'HasFeature', 'Genome'], $filter, \@parms);
because you don't want to join the HasFeature
table to the subsystem table. Instead, you use
my $query = $erdb->Get("Feature HasRoleInSubsystem Subsystem AND Feature HasFeature Genome", $filter, \@parms);
Now consider a taxonomy hierarchy using the entity Class
and the relationship BelongsTo
and say you want to find all subclasses of a particular class. If you code
my $query = $erdb->Get("Class BelongsTo Class", 'Class(id) = ?', [$class])
Then the query will only return the particular class, and only if it belongs to itself. The following query finds every class that belongs to a particular class.
my $query = $erdb->Get("Class BelongsTo Class2", 'Class2(id) = ?', [$class]);
This query does the converse. It finds every class belonging to a particular class.
my $query = $erdb->Get("Class BelongsTo Class2", 'Class(id) = ?', [$class]);
The difference is indicated by the field name used in the filter clause. Because the first occurrence of Class
is specified in the filter rather than the second occurrence (Class2
), the query is anchored on the from-side of the relationship.
The filter clause is an SQL WHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can be parameterized with parameter markers (?
). Each field used in the WHERE clause must be specified in "Standard Field Name Format". Any parameters specified in the filter clause should be added to the parameter list as additional parameters. The fields in a filter clause can come from primary entity relations, relationship relations, or secondary entity relations; however, all of the entities and relationships involved must be included in the list of object names on the query. There is never a default object name for filter clause fields.
The filter clause can also specify a sort order. To do this, simply follow the filter string with an ORDER BY clause. For example, the following filter string gets all genomes for a particular genus and sorts them by species name.
"Genome(genus) = ? ORDER BY Genome(species)"
Note that the case is important. Only an uppercase "ORDER BY" with a single space will be processed. The idea is to make it less likely to find the verb by accident.
The rules for field references in a sort order are the same as those for field references in the filter clause in general; however, unpredictable things may happen if a sort field is from an entity's secondary relation.
Finally, you can limit the number of rows returned by adding a LIMIT clause. The LIMIT must be the last thing in the filter clause, and it contains only the word "LIMIT" followed by a positive number. So, for example
"Genome(genus) = ? ORDER BY Genome(species) LIMIT 10"
will only return the first ten genomes for the specified genus. The ORDER BY clause is not required. For example, to just get the first 10 genomes in the Genome table, you could use
"LIMIT 10"
as your filter clause.
The parameter list is a reference to a list of parameter values. The parameter values are substituted for the parameter marks in the filter clause in strict left-to-right order.
In the parameter list for a filter clause, you must be aware of the proper data types and perform any necessary conversions manually. This is not normally a problem. Most of the time, you only query against simple numeric or string fields, and you only need to convert a string if there's a possibility it has exotic characters like tabs or new-lines in it. Sometimes, however, this is not enough.
When you are writing programs to query ERDB databases, you can call "EncodeField" directly, specifying a field name in the "Standard Field Name Format". The value will be converted as if it was being stored into a field of the specified type. Alternatively, you can call "encode", specifying a data type name. Both of these techniques are shown in the example below.
my $query = $erdb->Get("Genome UsesImage Image", "Image(png) = ? AND Genome(description) = ?", [$erdb->EncodeFIeld('Image(png)', $myImage), ERDB::encode(text => $myDescription)]);
You can export the "encode" method if you expect to be doing this a lot and don't want to bother with the package name on the call.
use ERDB qw(encode); # ... much later ... my $query = $erdb->Get("Genome UsesImage Image", "Image(png) = ? AND Genome(description) = ?", [$erdb->EncodeField('Image(png)', $myImage), encode(text => $myDescription)]);
The entire database definition must be inside a Database tag. The display name of the database is given by the text associated with the Title tag. The display name is only used in the automated documentation. The entities and relationships are listed inside the Entities and Relationships tags, respectively. There is also a Shapes
tag that contains additional shapes to display on the database diagram, and an Issues
tag that describes general things that need to be remembered. These last two are completely optional.
<Database> <Title>... display title here...</Title> <Issues> ... comments here ... </Issues> <Regions> ... region definitions here ... </Regions> <Entities> ... entity definitions here ... </Entities> <Relationships> ... relationship definitions here ... </Relationships> <Shapes> ... shape definitions here ... </Shapes> </Database>
Entities, relationships, shapes, indexes, and fields all allow text tags called Notes and Asides. Both these tags contain comments that appear when the database documentation is generated. In addition, the text inside the Notes tag will be shown as a tooltip when mousing over the diagram.
The following special codes allow a limited rich text capability in Notes and Asides.
[b]...[/b]: Bold text
[i]...[/i]: Italics
[p]...[/p]: Paragraph
[link href]...[/link]: Hyperlink to the URL href
[list]...[*]...[*]...[/list]: Bullet list, with [*] separating list elements.
Both entities and relationships have fields described by Field tags. A Field tag can have Notes associated with it. The complete set of Field tags for an object mus be inside Fields tags.
<Entity ... > <Fields> ... Field tags ... </Fields> </Entity>
The attributes for the Field tag are as follows.
Name of the field. The field name should contain only letters, digits, and hyphens (-
), and the first character should be a letter. Most underlying databases are case-insensitive with the respect to field names, so a best practice is to use lower-case letters only. Finally, the name search-relevance
has special meaning for full-text searches and should not be used as a field name.
Data type of the field.
Name of the relation containing the field. This should only be specified for entity fields. The ERDB system does not support optional fields or multi-occurring fields in the primary relation of an entity. Instead, they are put into secondary relations. So, for example, in the Genome
entity, the group-name
field indicates a special grouping used to select a subset of the genomes. A given genome may not be in any groups or may be in multiple groups. Therefore, group-name
specifies a relation value. The relation name specified must be a valid table name. By convention, it is usually the entity name followed by a qualifying word (e.g. GenomeGroup
). In an entity, the fields without a relation attribute are said to belong to the primary relation. This relation has the same name as the entity itself.
If specified, then the field is a candidate for full-text searching. A single full-text index will be created for each relation with at least one searchable field in it. For best results, this option should only be used for string or text fields.
This attribute allows the subclass to assign special meaning for certain fields. The interpretation is up to the subclass itself. Currently, only entity fields can have this attribute.
This attribute specifies the default field value to be used while loading. The default value is used if no value is specified in an "InsertObject" call or in the "Put" in ERDBLoadGroup call that generates the load file. If no default is specified, then the field is required and must have a value specified in the call.
The default value is specified as a string, so it must be in an encoded form.
If 1
, this attribute indicates that the field can have a null value. The default is 0
.
An entity can have multiple alternate indexes associated with it. The fields in an index must all be from the same relation. The alternate indexes assist in searching on fields other than the entity ID. A relationship has at least two indexes-- a to-index and a from-index that order the results when crossing the relationship. For example, in the relationship HasContig
from Genome
to Contig
, the from-index would order the contigs of a ganome, and the to-index would order the genomes of a contig. In addition, it can have zero or more alternate indexes. A relationship's index can only specify fields in the relationship.
The alternate indexes for an entity or relationship are listed inside the Indexes tag. The from-index of a relationship is specified using the FromIndex tag; the to-index is specified using the ToIndex tag.
Be aware of the fact that in some versions of MySQL, the maximum size of an index key is 1000 bytes. This means at most four normal-sized strings.
The Index tag has one optional attribute.
If 1
, then the index is unique. The default is 0
(a non-unique index).
Each index can contain a Notes tag. In addition, it will have an IndexFields tag containing the IndexField tags. The IndexField tags specify, in order, the fields used in the index. The attributes of an IndexField tag are as follows.
The FromIndex, ToIndex and Index tags can have a unique attribute. If specified, the index will be generated as a unique index. The ToIndex for a one-to-many relationship is always unique.
A large database may be too big to fit comfortably on a single page. When this happens, you have the option of dividing the diagram into regions that are shown one at a time. When regions are present, a combo box will appear on the diagram allowing the user to select which region to show. Each entity, relationship, or shape can have multiple RegionInfo tags describing how it should be displayed when a particular region is selected. The regions themselves are described by a Region tag with a single attribute-- name-- that indicates the region name. The tag can be empty, or can contain Notes
elements that provide useful documentation.
Name of the region.
The diagram tag allows you to specify options for generating a diagram. If the tag is present, then it will be used to configure diagram display in the documentation widget (see ERDBPDocPage). the tag has the following attributes. It should not have any content; that is, it is not a container tag.
Width for the diagram, in pixels. The default is 750.
Height for the diagram, in pixels. The default is 800.
Ratio of shape height to width. The default is 0.62.
Width in pixels for each shape.
If set to 1, there will be a white background instead of an NMPDR noise background.
If set to 1, a dropdown box and buttons will appear that allow you to edit the diagram, download your changes, and make it pretty for printing.
Maximum font size to use, in points. The default is 16.
URL of the CGI script that downloads the diagram XML to the user's computer. The XML text will be sent via the data
parameter and the default file name via the name
parameter.
Margin between adjacent shapes, in pixels. The default is 10.
The DisplayInfo tag is used to describe how an entity, relationship, or shape should be displayed when the XML file is used to generate an interactive diagram. A DisplayInfo can have no elements, or it can have multiple Region elements inside. The permissible attributes are as follows.
URL to which the user should be sent when clicking on the shape. For entities and relationships, this defaults to the most likely location for the object description in the generated documentation.
The themes are black
, blue
, brown
, cyan
, gray
, green
, ivory
, navy
, purple
, red
, and violet
. These indicate the color to be used for the displayed object. The default is gray
.
The number of the column in which the object should be displayed. Fractional column numbers are legal, though it's best to round to a multiple of 0.5. Thus, a column of 4.5
would be centered between columns 4 and 5.
The number of the row in which the object should be displayed. Fractional row numbers are allowed in the same manner as for columns.
If 1
, the object is visibly connected by lines to the other objects identified in the from
and to
attributes. This value is ignored for entities, which never have from
or to
.
Caption to be displayed on the object. If omitted, it defaults to the object's name. You may use spaces and \n
codes to make the caption prettier.
If 1
, then the row
and col
attributes are used to position the object, even if it has from
and to
attributes. Otherwise, the object is placed in the midpoint between the from
and to
shapes.
For large diagrams, the DisplayInfo tag may have one or more RegionInfo elements inside, each belonging to one or more named regions. (The named regions are desribed by the Region tag.) The diagrammer will create a drop-down box that can be used to choose which region should be displayed. Each region tag has a name
attribute indicating the region to which it belongs, plus any of the attributes allowed on the DisplayInfo tag. The name indicates the name of a region in which the parent object should be displayed. The other attributes override the corresponding attributes in the DisplayInfo parent. An object with no Region tags present will be displayed in all regions. There is a default region with no name that consists only of objects displayed in all regions. An object with no DisplayInfo tag at all will not be displayed in any region.
By convention entity and relationship names use capital casing (e.g. Genome
or HasRegionsIn
. Most underlying databases, however, are aggressively case-insensitive with respect to relation names, converting them internally to all-upper case or all-lower case.
If syntax or parsing errors occur when you try to load or use an ERDB database, the most likely reason is that one of your objects has an SQL reserved word as its name. The list of SQL reserved words keeps increasing; however, most are unlikely to show up as a noun or declarative verb phrase. The exceptions are Group
, User
, Table
, Index
, Object
, Date
, Number
, Update
, Time
, Percent
, Memo
, Order
, and Sum
. This problem can crop up in field names as well.
Every entity has a field called id
that acts as its primary key. Every relationship has fields called from-link
and to-link
that contain copies of the relevant entity IDs. These are essentially ERDB's reserved words, and should not be used for user-defined field names.
Issues are comments displayed at the top of the database documentation. They have no effect on the database or the diagram. The Issue
tag is a text tag with no attributes.
An entity is described by the Entity tag. The entity can contain Notes and Asides, an optional DisplayInfo tag, an Indexes tag containing one or more secondary indexes, and a Fields tag containing one or more fields. The attributes of the Entity tag are as follows.
Name of the entity. The entity name, by convention, uses capital casing (e.g. Genome
or GroupBlock
) and should be a noun or noun phrase.
Data type of the primary key. The primary key is always named id
.
A value of 1
means that after the entity's primary relation is loaded, the ID field will be set to autonumber, so that new records inserted will have automatic keys generated. Use this option with care. Once the relation is loaded, it cannot be reloaded unless the table is first dropped and re-created. In addition, the key must be an integer type.
A relationship is described by the Relationship tag. Within a relationship, there can be DisplayInfo, Notes and Asides tags, a Fields tag containing the intersection data fields, a FromIndex tag containing the index used to cross the relationship in the forward direction, a ToIndex tag containing the index used to cross the relationship in reverse, and an Indexes
tag containing the alternate indexes.
The Relationship tag has the following attributes.
Name of the relationship. The relationship name, by convention, uses capital casing (e.g. ContainsRegionIn
or HasContig
), and should be a declarative verb phrase, designed to fit between the from-entity and the to-entity (e.g. Block ContainsRegionIn
Genome).
Name of the entity from which the relationship starts.
Name of the entity to which the relationship proceeds.
Relationship type: 1M
for one-to-many and MM
for many-to-many.
A name to be used when travelling backward through the relationship. This value can be used in place of the real relationship name to make queries more readable.
If TRUE (1
), then deletion of an entity instance on the from side will NOT cause deletion of the connected entity instances on the to side. All many-to-many relationships are automatically loose. A one-to-many relationship is generally not loose, but specifying this attribute can make it so.
Shapes are objects drawn on the database diagram that do not physically exist in the database. Entities are always drawn as rectangles and relationships are always drawn as diamonds, but a shape can be either of those, an arrow, a bidirectional arrow, or an oval. The Shape tag can contain Notes, Asides, and DisplayInfo tags, and has the following attributes.
Type of shape: arrow
for an arrow, biarrow
for a bidirectional arrow, oval
for an ellipse, diamond
for a diamond, and rectangle
for a rectangle.
Object from which this object is oriented. If the shape is an arrow, it will point toward the from-object.
Object toward which this object is oriented. If the shape is an arrow, it will point away from the to-object.
Name of the shape. This is used by other shapes to identify it in from
and to
directives.
my $database = ERDB->new($dbh, $metaFileName, %options);
Create a new ERDB object.
DBKernel database object for the target database.
Name of the XML file containing the metadata.
Hash of configuration options.
The supported configuration options are as follows. Options not in this list will be presumed to be relevant to the subclass and will be ignored.
If TRUE, the database will be configured for a forward-only cursor. Instead of caching the query results, the query results will be provided at the rate in which they are demanded by the client application. This is less stressful on memory and disk space, but means you cannot have more than one query active at the same time.
my @keywords = ERDB::SplitKeywords($keywordString);
This method returns a list of the positive keywords in the specified keyword string. All of the operators will have been stripped off, and if the keyword is preceded by a minus operator (-
), it will not be in the list returned. The idea here is to get a list of the keywords the user wants to see. The list will be processed to remove duplicates.
It is possible to create a string that confuses this method. For example
frog toad -frog
would return both frog
and toad
. If this is a problem we can deal with it later.
The keyword string to be parsed.
Returns a list of the words in the keyword string the user wants to see.
my $erdb = ERDB::GetDatabase($name, $dbd, %parms);
Return an ERDB object for the named database. It is assumed that the database name is also the name of a class for connecting to it.
Name of the desired database.
Alternate DBD file to use when processing the database definition.
Additional command-line parameters.
Returns an ERDB object for the named database.
my ($tableName, $fieldName) = ERDB::ParseFieldName($string, $defaultName);
or
my $normalizedName = ERDB::ParseFieldName($string, $defaultName);
Analyze a standard field name to separate the object name part from the field part.
Standard field name string to be parsed.
Default object name to be used if the object name is not specified in the input string.
In list context, returns the table name followed by the base field name. In scalar context, returns the field name in a normalized "Standard Field Name Format", with underscores converted to hyphens and an object name present. If the parse fails, will return an undefined value.
my $count = ERDB::CountParameterMarks($filterString);
Return the number of parameter marks in the specified filter string.
ERDB filter clause to examine.
Returns the number of parameter marks in the specified filter clause.
my $entityObject = $erdb->GetEntity($entityType, $ID);
Return an object describing the entity instance with a specified ID.
Entity type name.
ID of the desired entity.
Returns a ERDBObject object representing the desired entity instance, or an undefined value if no instance is found with the specified key.
my @values = $erdb->GetChoices($entityName, $fieldName);
Return a list of all the values for the specified field that are represented in the specified entity.
Note that if the field is not indexed, then this will be a very slow operation.
Name of an entity in the database.
Name of a field belonging to the entity in "Standard Field Name Format".
Returns a list of the distinct values for the specified field in the database.
my @values = $erdb->GetEntityValues($entityType, $ID, \@fields);
Return a list of values from a specified entity instance. If the entity instance does not exist, an empty list is returned.
Entity type name.
ID of the desired entity.
List of field names in "Standard_Field_Name_Format".
Returns a flattened list of the values of the specified fields for the specified entity.
my @list = $erdb->GetAll(\@objectNames, $filterClause, \@parameters, \@fields, $count);
Return a list of values taken from the objects returned by a query. The first three parameters correspond to the parameters of the "Get" method. The final parameter is a list of the fields desired from each record found by the query in "Standard Field Name Format". The default object name is the first one in the object name list.
The list returned will be a list of lists. Each element of the list will contain the values returned for the fields specified in the fourth parameter. If one of the fields specified returns multiple values, they are flattened in with the rest. For example, the following call will return a list of the features in a particular spreadsheet cell, and each feature will be represented by a list containing the feature ID followed by all of its essentiality determinations.
@query = $erdb->Get('ContainsFeature Feature'], "ContainsFeature(from-link) = ?", [$ssCellID], ['Feature(id)', 'Feature(essential)']);
List containing the names of the entity and relationship objects to be retrieved. See "Object Name List".
WHERE/ORDER BY clause (without the WHERE) to be used to filter and sort the query. See "Filter Clause".
List of the parameters to be substituted in for the parameters marks in the filter clause. See "Parameter List".
List of the fields to be returned in each element of the list returned, or a string containing a space-delimited list of field names. The field names should be in "Standard Field Name Format".
Maximum number of records to return. If omitted or 0, all available records will be returned.
Returns a list of list references. Each element of the return list contains the values for the fields specified in the fields parameter.
my $found = $erdb->Exists($entityName, $entityID);
Return TRUE if an entity exists, else FALSE.
Name of the entity type (e.g. Feature
) relevant to the existence check.
ID of the entity instance whose existence is to be checked.
Returns TRUE if the entity instance exists, else FALSE.
my $count = $erdb->GetCount(\@objectNames, $filter, \@params);
Return the number of rows found by a specified query. This method would normally be used to count the records in a single table. For example,
my $count = $erdb->GetCount('Genome', 'Genome(genus-species) LIKE ?', ['homo %']);
would return the number of genomes for the genus homo. It is conceivable, however, to use it to return records based on a join. For example,
my $count = $erdb->GetCount('HasFeature Genome', 'Genome(genus-species) LIKE ?', ['homo %']);
would return the number of features for genomes in the genus homo. Note that only the rows from the first table are counted. If the above command were
my $count = $erdb->GetCount('Genome HasFeature', 'Genome(genus-species) LIKE ?', ['homo %']);
it would return the number of genomes, not the number of genome/feature pairs.
Reference to a list of the objects (entities and relationships) included in the query, or a string containing a space-delimited list of object names. See "ObjectNames".
A filter clause for restricting the query. See "Filter Clause".
Reference to a list of the parameter values to be substituted for the parameter marks in the filter. See "Parameter List".
Returns a count of the number of records in the first table that would satisfy the query.
my @dbObjects = $erdb->GetList(\@objectNames, $filterClause, \@params);
Return a list of ERDBObject objects for the specified query.
This method is essentially the same as "Get" except it returns a list of objects rather than a query object that can be used to get the results one record at a time. This is almost always preferable to "Get" when the result list is a manageable size.
Reference to a list containing the names of the entity and relationship objects to be retrieved, or a string containing a space-delimited list of object names. See "Object Name List".
WHERE clause (without the WHERE) to be used to filter and sort the query. See "Filter Clause".
Reference to a list of parameter values to be substituted into the filter clause. See "Parameter List".
Returns a list of ERDBObject objects that satisfy the query conditions.
my $query = $erdb->Get(\@objectNames, $filterClause, \@params);
This method returns a query object for entities of a specified type using a specified filter.
List containing the names of the entity and relationship objects to be retrieved, or a string containing a space-delimited list of names. See "Object Name List".
WHERE clause (without the WHERE) to be used to filter and sort the query. See "Filter Clause".
Reference to a list of parameter values to be substituted into the filter clause. See "Parameter List".
Returns an "ERDBQuery" object that can be used to iterate through all of the results.
my $query = $erdb->Prepare($objects, $filterString, $parms);
Prepare a query for execution but do not create a statement handle. This is useful if you have a query that you want to validate but you do not yet want to acquire the resources to run it.
List containing the names of the entity and relationship objects to be retrieved, or a string containing a space-delimited list of names. See "Object Name List".
WHERE clause (without the WHERE) to be used to filter and sort the query. See "Filter Clause".
Reference to a list of parameter values to be substituted into the filter clause. See "Parameter List".
Returns an ERDBQuery object that can be used to check field names or that can be populated with artificial data.
my $query = $erdb->Search($searchExpression, $idx, \@objectNames, $filterClause, \@params);
Perform a full text search with filtering. The search will be against a specified object in the object name list. That object will get an extra field containing the search relevance. Note that except for the search expression, the parameters of this method are the same as those for "Get" and follow the same rules.
Boolean search expression for the text fields of the target object. The default mode for a Boolean search expression is OR, but we want the default to be AND, so we will add a +
operator to each word with no other operator before it.
Name of the object to be searched in full-text mode. If the object name list is a list reference, you can also specify the index into the list.
List containing the names of the entity and relationship objects to be retrieved, or a string containing a space-delimited list of names. See "Object Name List".
WHERE clause (without the WHERE) to be used to filter and sort the query. See "Filter Clause".
Reference to a list of parameter values to be substituted into the filter clause. See "Parameter List".
Returns an ERDBQuery object for the specified search.
my @list = $erdb->GetFlat(\@objectNames, $filterClause, \@parameterList, $field);
This is a variation of "GetAll" that asks for only a single field per record and returns a single flattened list.
List containing the names of the entity and relationship objects to be retrieved, or a string containing a space-delimited list of names. See "Object_Name_List".
WHERE clause (without the WHERE) to be used to filter and sort the query. See "Filter Clause".
Reference to a list of parameter values to be substituted into the filter clause. See "Parameter List".
Name of the field to be used to get the elements of the list returned. The default object name for this context is the first object name specified.
Returns a list of values.
my $flag = $erdb->IsUsed($relationName);
Returns TRUE if the specified relation contains any records, else FALSE.
Name of the relation to check.
Returns the number of records in the relation, which will be TRUE if the relation is nonempty and FALSE otherwise.
my ($header, $rows) = ERDB::ComputeFieldTable($wiki, $name, $fieldData);
Generate the header and rows of a field table for an entity or relationship. The field table describes each field in the specified object.
WikiTools object (or equivalent) for rendering HTML or markup.
Name of the object whose field table is being generated.
Field structure of the specified entity or relationship.
Returns a reference to a list of the labels for the header row and a reference to a list of lists representing the table cells.
my $objectData = $erdb->FindEntity($name);
Return the structural descriptor of the specified entity, or an undefined value if the entity does not exist.
Name of the desired entity.
Returns the definition structure for the specified entity, or undef
if the named entity does not exist.
my $objectData = $erdb->FindRelationship($name);
Return the structural descriptor of the specified relationship, or an undefined value if the relationship does not exist.
Name of the desired relationship.
Returns the definition structure for the specified relationship, or undef
if the named relationship does not exist.
my $targetEntity = $erdb->ComputeTargetEntity($relationshipName);
Return the target entity of a relationship. If the relationship's true name is specified, this is the source (from) entity. If its converse name is specified, this is the target (to) entity. The returned name is the one expected to follow the relationship name in an object name string.
The name of the relationship to be used to identify the target entity.
Returns the name of the entity that would be found after crossing the relationship in the direction indicated by the chosen relationship name. If the relationship name is invalid, an undefined value will be returned.
my $objectData = $erdb->FindShape($name);
Return the structural descriptor of the specified shape, or an undefined value if the shape does not exist.
Name of the desired shape.
Returns the definition structure for the specified shape, or undef
if the named shape does not exist.
my $objectHash = $erdb->GetObjectsTable($type);
Return the metadata hash of objects of the specified type-- entity, relationship, or shape.
Type of object desired-- entity
, relationship
, or shape
.
Returns a reference to a hash containing all metadata for database objects of the specified type. The hash maps object names to object descriptors. The descriptors represent a cleaned and normalized version of the definition XML. Specifically, all of the implied defaults are filled in.
my $plural = ERDB::Plurals($singular);
Return the plural form of the specified object type (entity, relationship, or shape). This is extremely useful in generating documentation.
Singular form of the specified object type.
Plural form of the specified object type, in capital case.
my $rawMetaData = ERDB::ReadDBD($fileName);
This method reads a raw database definition XML file and returns it. Normally, the metadata used by the ERDB system has been processed and modified to make it easier to load and retrieve the data; however, this method can be used to get the data in its raw form.
Name of the XML file to read.
Returns a hash reference containing the raw XML data from the specified file.
my $type = $erdb->FieldType($string, $defaultName);
Return the ERDBType object for the specified field.
Field name string to be parsed. See "Standard Field Name Format".
Default object name to be used if the object name is not specified in the input string.
Return the type object for the field's type.
my $type = $erdb->IsSecondary($string, $defaultName);
Return TRUE if the specified field is in a secondary relation, else FALSE.
Field name string to be parsed. See "Standard Field Name Format".
Default object name to be used if the object name is not specified in the input string.
Returns TRUE if the specified field is in a secondary relation, else FALSE.
my $relData = $erdb->FindRelation($relationName);
Return the descriptor for the specified relation.
Name of the relation whose descriptor is to be returned.
Returns the object that describes the relation's indexes and fields.
my ($fromEntity, $toEntity) = $erdb->GetRelationshipEntities($relationshipName);
Return the names of the source and target entities for a relationship. If the specified name is not a relationship, an empty list is returned.
Name of the relevant relationship.
Returns a two-element list. The first element is the name of the relationship's from-entity, and the second is the name of the to-entity. If the specified name is not for a relationship, both elements are undefined.
my $okFlag = ERDB::ValidateFieldName($fieldName);
Return TRUE if the specified field name is valid, else FALSE. Valid field names must be hyphenated words subject to certain restrictions.
my $fieldHash = $self->GetFieldTable($objectnName);
Get the field structure for a specified entity or relationship.
Name of the desired entity or relationship.
The table containing the field descriptors for the specified object.
my $rowSize = $erdb->EstimateRowSize($relName);
Estimate the row size of the specified relation. The estimated row size is computed by adding up the average length for each data type.
Name of the relation whose estimated row size is desired.
Returns an estimate of the row size for the specified relation.
my $parms = $erdb->SortNeeded($relationName);
Return the pipe command for the sort that should be applied to the specified relation when creating the load file.
For example, if the load file should be sorted ascending by the first field, this method would return
sort -k1 -t"\t"
If the first field is numeric, the method would return
sort -k1n -t"\t"
Name of the relation to be examined. This could be an entity name, a relationship name, or the name of a secondary entity relation.
Returns the sort command to use for sorting the relation, suitable for piping.
my %specials = $erdb->SpecialFields($entityName);
Return a hash mapping special fields in the specified entity to the value of their special
attribute. This enables the subclass to get access to the special field attributes without needed to plumb the internal ERDB data structures.
Name of the entity whose special fields are desired.
Returns a hash. The keys of the hash are the special field names, and the values are the values from each special field's special
attribute.
my @names = $erdb->GetTableNames;
Return a list of the relations required to implement this database.
my @names = $erdb->GetEntityTypes;
Return a list of the entity type names.
my @list = $erdb->GetConnectingRelationships($entityName);
Return a list of the relationships connected to the specified entity.
Entity whose connected relationships are desired.
Returns a list of the relationships that originate from the entity. If the entity is on the from end, it will return the relationship name. If the entity is on the to end it will return the converse of the relationship name.
my ($froms, $tos) = $erdb->GetConnectingRelationshipData($entityName);
Return the relationship data for the specified entity. The return will be a two-element list, each element of the list a reference to a hash that maps relationship names to structures. The first hash will be relationships originating from the entity, and the second element a reference to a hash of relationships pointing to the entity.
Name of the entity of interest.
Returns a two-element list, each list being a map of relationship names to relationship metadata structures. The first element lists relationships originating from the entity, and the second element lists relationships that point to the entity.
my $types = ERDB::GetDataTypes();
Return a table of ERDB data types. The table returned is a hash of "ERDBType" objects keyed by type name.
my $markup = ERDB::ShowDataTypes($wiki, $erdb);
Display a table of all the valid data types for this installation.
An object used to render the table, similar to "WikiTools".
If specified, an ERDB object for a specific database. Only types used by the database will be put in the table. If omitted, all types are returned.
my $flag = $erdb->IsEntity($entityName);
Return TRUE if the parameter is an entity name, else FALSE.
Object name to be tested.
Returns TRUE if the specified string is an entity name, else FALSE.
my %fieldTuples = $erdb->GetSecondaryFields($entityName);
This method will return a list of the name and type of each of the secondary fields for a specified entity. Secondary fields are stored in two-column tables separate from the primary entity table. This enables the field to have no value or to have multiple values.
Name of the entity whose secondary fields are desired.
Returns a hash mapping the field names to their field types.
my $name = $erdb->GetFieldRelationName($objectName, $fieldName);
Return the name of the relation containing a specified field.
Name of the entity or relationship containing the field.
Name of the relevant field in that entity or relationship.
Returns the name of the database relation containing the field, or undef
if the field does not exist.
$erdb->DumpMetaData();
Return a dump of the metadata structure.
my @wikiLines = $erdb->GenerateWikiData($wiki);
Build a description of the database for a wiki. The database will be organized into a single page, with sections for each entity and relationship. The return value is a list of text lines.
The parameter must be an object that mimics the object-based interface of the "WikiTools" object. If it is omitted, "WikiTools" is used.
my @noteParagraphs = ERDB::ObjectNotes($objectData, $wiki);
Return a list of the notes and asides for an entity or relationship in Wiki format.
The metadata for the desired entity or relationship.
Wiki object used to render text.
Returns a list of text paragraphs in Wiki markup form.
my @errors = $erdb->CheckObjectNames($objectNameString);
Check an object name string for errors. The return value will be a list of error messages. If no error is found, an empty list will be returned. This process does not guarantee a correct object name list, but it catches the most obvious errors without the need for invoking a full-blown "Get" method.
An object name string, consisting of a space-delimited list of entity and relationship names.
Returns an empty list if successful, and a list of error messages if the list is invalid.
my $text = $erdb->GetTitle();
Return the title for this database.
my $hash = $erdb->GetDiagramOptions();
Return the diagram options structure for this database. The diagram options are used by the ERDB documentation widget to configure the database diagram. If the options are not present, an undefined value will be returned.
my $fileName = $erdb->GetMetaFileName();
Return the name of the database definition file for this database.
my $results = $erdb->LoadTable($fileName, $relationName, %options);
Load data from a tab-delimited file into a specified table, optionally re-creating the table first.
Name of the file from which the table data should be loaded.
Name of the relation to be loaded. This is the same as the table name.
A hash of load options.
Returns a statistical object containing a list of the error messages.
The permissible options are as follows.
If TRUE, then the table will be erased before loading.
Mode in which the load should operate, either low_priority
or concurrent
. This option is only applicable to a MySQL database.
If TRUE, then it is assumed that this is a partial load, and the table will not be analyzed and compacted at the end.
If TRUE, then when an error occurs, the process will be killed; otherwise, the process will stay alive, but a message will be put into the statistics object.
If ignore
, duplicate rows will be ignored. If replace
, duplicate rows will replace previous instances. If omitted, duplicate rows will cause an error.
my $newID = $erdb->InsertNew($entityName, %fields);
Insert a new entity into a table that uses sequential integer IDs. A new, unique ID will be computed automatically and returned to the caller.
Type of the entity being inserted. The entity must have an integer ID.
Hash of field names to field values. Every field in the entity's primary relation should be specified.
Returns the ID of the inserted entity.
$erdb->Analyze($tableName);
Analyze and compact a table in the database. This is useful after a load to improve the performance of the indexes.
Name of the table to be analyzed and compacted.
$erdb->TruncateTable($table);
Delete all rows from a table quickly. This uses the built-in SQL TRUNCATE
statement, which effectively drops and re-creates a table with all its settings intact.
Name of the table to be cleared.
my $newFlag = $erdb->VerifyTable($table, $indexFlag, $estimatedRows);
If the specified table does not exist, create it. This method will return TRUE if the table is created, else FALSE.
Name of the table to verify.
TRUE if the indexes for the relation should be created, else FALSE. If FALSE, "CreateIndexes" must be called later to bring the indexes into existence.
If specified, the estimated maximum number of rows for the relation. This information allows the creation of tables using storage engines that are faster but require size estimates, such as MyISAM.
Returns TRUE if the table was created, FALSE if it already existed in the database.
$erdb->CreateSearchIndex($objectName);
Check for a full-text search index on the specified entity or relationship object, and if one is required, rebuild it.
Name of the entity or relationship to be indexed.
$erdb->DropRelation($relationName);
Physically drop a relation from the database.
Name of the relation to drop. If it does not exist, this method will have no effect.
$erdb->DumpRelations($outputDirectory);
Write the contents of all the relations to tab-delimited files in the specified directory. Each file will have the same name as the relation dumped, with an extension of DTX.
Name of the directory into which the relation files should be dumped.
my $count = $erdb->DumpTable($tableName, $directory);
Dump the specified table to the named directory. This will create a load file having the same name as the relation with an extension of DTX. This file can then be used to reload the table at a later date. If the table does not exist, no action will be taken.
Name of the table to dump.
Name of the directory in which the dump file should be placed.
Returns the number of records written.
my $value = ERDB::TypeDefault($type);
Return the default value for fields of the specified type.
my $stats = $erdb->LoadTables($directoryName, $rebuild);
This method will load the database tables from a directory. The tables must already have been created in the database. (This can be done by calling "CreateTables".) The caller passes in a directory name; all of the relations to be loaded must have a file in the directory with the same name as the relation with a suffix of .dtx
. Each file must be a tab-delimited table of encoded field values. Each line of the file will be loaded as a row of the target relation table.
Name of the directory containing the relation files to be loaded.
TRUE if the tables should be dropped and rebuilt, else FALSE.
Returns a "Stats" object describing the number of records read and a list of the error messages.
$erdb->CreateTables();
This method creates the tables for the database from the metadata structure loaded by the constructor. It is expected this function will only be used on rare occasions, when the user needs to start with an empty database. Otherwise, the "LoadTables" method can be used by itself with the truncate flag turned on.
$erdb->CreateTable($tableName, $indexFlag, $estimatedRows);
Create the table for a relation and optionally create its indexes.
Name of the relation (which will also be the table name).
TRUE if the indexes for the relation should be created, else FALSE. If FALSE, "CreateIndexes" must be called later to bring the indexes into existence.
If specified, the estimated maximum number of rows for the relation. This information allows the creation of tables using storage engines that are faster but require size estimates, such as MyISAM.
my $fieldString = $erdb->ComputeFieldString($relationName);
Return the comma-delimited field definition string for a relation. This can be plugged directly into an SQL CREATE
statement.
Name of the relation whose field definition string is desired.
Returns a string listing SQL field definitions, in the proper order, separated by commas.
$erdb->VerifyFields($relName, \@fieldList);
Run through the list of proposed field values, insuring that all of them are valid.
Name of the relation for which the specified fields are destined.
Reference to a list, in order, of the fields to be put into the relation.
$erdb->DigestFields($relName, $fieldList);
Prepare the fields of a relation for output to a load file.
Name of the relation to which the fields belong.
List of field contents to be loaded into the relation.
my $coding = $erdb->EncodeField($fieldName, $value);
Convert the specified value to the proper format for storing in the specified database field. The field name should be specified in the standard object(field) format, e.g. Feature(id)
for the id
field of the Feature
table.
Name of the field, specified in as an object name with the field name in parentheses.
Value to encode for placement in the field.
Coded value ready to put in the database. In most cases, this will be identical to the original input.
my $coding = ERDB::encode($type, $value);
Encode a value of the specified type for storage in the database or for use as a query parameter. Encoding is automatic for all ERDB methods except when loading a table from a user-supplied load file or when processing the parameters for a query filter string. This method can be used in those situations to remedy the lack.
Name of the incoming value's data type.
Value to encode into a string.
Returns the encoded value.
my $value = $erdb->DecodeField($fieldName, $coding);
Convert the stored coding of the specified field to the proper format for use by the client program. This is essentially the inverse of "EncodeField".
Name of the field, specified as an object name with the field name in parentheses.
Coded data from the database.
Returns the original form of the coded data.
my $digested = ERDB::DigestKey($longString);
Return the digested value of a string. The digested value is a fixed length (22 characters) MD5 checksum. It can be used as a more convenient version of a symbolic key.
String to digest.
Digested value of the string.
$erdb->CreateIndex($relationName);
Create the indexes for a relation. If a table is being loaded from a large source file (as is the case in "LoadTable"), it is sometimes best to create the indexes after the load. If that is the case, then "CreateTable" should be called with the index flag set to FALSE, and this method used after the load to create the indexes for the table.
$erdb->SetTestEnvironment();
Denote that this is a test environment. Certain performance-enhancing features may be disabled in a test environment.
my $dbName = $erdb->dbName();
Return the physical name of the database currently attached to this object.
my $stats = $erdb->FixEntity($name);
This method scans an entity and insures that all of the instances connect to an owning relationship instance. Any entity that does not connect will be deleted.
Name of the one-to-many relationship that owns the entity.
Returns a Stats object describing the scan results.
my $stats = $erdb->FixRelationship($name, $testOnly);
This method scans a relationship and insures that all of the instances connect to valid entities on both sides. If any instance fails to connect, it will be deleted. The process is fairly memory-intensive.
Name of the relationship to scan.
If TRUE, then statistics will be accumulated but no deletions will be performed.
Returns a Stats object describing the scan results.
my $stats = $erdb->CleanRelationship($relName, @fields);
Remove duplicate rows from a relationship. A row is duplicate if the from- and to-links match and the zero or more specified additional fields also match.
Name of the relationship to clean.
List of additional fields in the relationship to be used to determine whether or not we have a duplicate row. The fields must be scalars and not that they cannot
Returns a Stats object describing what happened during the cleanup.
$erdb->BeginTran();
Start a database transaction.
$erdb->CommitTran();
Commit an active database transaction.
$erdb->RollbackTran();
Roll back an active database transaction.
my $count = $erdb->UpdateField($fieldName, $oldValue, $newValue, $filter, $parms);
Update all occurrences of a specific field value to a new value. The number of rows changed will be returned.
Name of the field in "Standard Field Name Format".
Value to be modified. All occurrences of this value in the named field will be replaced by the new value.
New value to be substituted for the old value when it's found.
A standard ERDB filter clause. See "Filter Clause". The filter will be applied before any substitutions take place. Note that the filter clause in this case must only specify fields in the table containing fields.
Reference to a list of parameter values in the filter. See "Parameter List".
Returns the number of rows modified.
$erdb->InsertValue($entityID, $fieldName, $value);
This method will insert a new value into the database. The value must be one associated with a secondary relation, since primary values cannot be inserted: they occur exactly once. Secondary values, on the other hand, can be missing or multiply-occurring.
ID of the object that is to receive the new value.
Field name for the new value in "Standard Field Name Format". This specifies the entity name and the field name in a single string.
New value to be put in the field.
$erdb->InsertObject($objectType, %fieldHash); or $erdb->InsertObject($objectType, \%fieldHash, %options);
Insert an object into the database. The object is defined by a type name and then a hash of field names to values. All field values should be represented by scalars. (Note that for relationships, the primary relation is the only relation.) Field values for the other relations comprising the entity are always list references. For example, the following line inserts an inactive PEG feature named fig|188.1.peg.1
with aliases ZP_00210270.1
and gi|46206278
.
$erdb->InsertObject('Feature', id => 'fig|188.1.peg.1', active => 0, feature-type => 'peg', alias => ['ZP_00210270.1', 'gi|46206278']);
The next statement inserts a HasProperty
relationship between feature fig|158879.1.peg.1
and property 4
with an evidence URL of http://seedu.uchicago.edu/query.cgi?article_id=142
.
$erdb->InsertObject('HasProperty', 'from-link' => 'fig|158879.1.peg.1', 'to-link' => 4, evidence => 'http://seedu.uchicago.edu/query.cgi?article_id=142');
Type name of the object to insert.
Hash of field names to values. The field names should be specified in "Standard Field Name Format". The default object name is the name of the object being inserted. The values will be encoded for storage by this method. Note that this can be an inline hash (for backward compatibility) or a hash reference.
Hash of insert options. The current list of options is
If TRUE, then duplicate-record errors will be suppressed. If the record already exists, the insert will not take place.
If specified, then duplicate-record errors will be suppressed. If ignore
is specified, duplicate records will be discarded. If replace
is specified, duplicate records will replace the previous version.
If TRUE, the fields are presumed to be already encoded for loading.
$erdb->UpdateEntity($entityName, $id, %fields);
or
my $ok = $erdb->UpdateEntity($entityName, $id, \%fields, $optional);
Update the values of an entity. This is an unprotected update, so it should only be done if the database resides on a database server.
Name of the entity to update. (This is the entity type.)
ID of the entity to update. If no entity exists with this ID, an error will be thrown.
Hash mapping field names to their new values. All of the fields named must be in the entity's primary relation, and they cannot any of them be the ID field. Field names should be in the "Standard Field Name Format". The default object name in this case is the entity name.
If specified and TRUE, then the update is optional and will return TRUE if successful and FALSE if the entity instance was not found. If this parameter is present, fields must be a hash reference and not a raw hash.
my $changeCount = $erdb->Reconnect($relName, $linkType, $oldID, $newID);
Move a relationship so it points to a new entity instance. All instances that reference a specified ID will be updated to specify a new ID.
Name of the relationship to update.
from
to update the from-link. to
to update the to-link.
Old ID value to be changed.
New ID value to be substituted for the old one.
Returns the number of rows updated.
my $stats = $erdb->MoveEntity($entityName, $oldID, $newID);
Transfer all relationship records pointing to a specified entity instance so they point to a different entity instance. This requires calling "Reconnect" on all the relationships that connect to the entity.
Name of the relevant entity type.
ID of the obsolete entity instance. All relationship records containing this ID will be changed.
ID of the new entity instance. The relationship records containing the old ID will have this ID substituted for it.
Returns a Stats object describing the updates.
my $stats = $erdb->Delete($entityName, $objectID, %options);
Delete an entity instance from the database. The instance is deleted along with all entity and relationship instances dependent on it. The definition of dependence is recursive.
An object is always dependent on itself. An object is dependent if it is a 1-to-many or many-to-many relationship connected to a dependent entity or if it is the "to" entity connected to a 1-to-many dependent relationship.
The idea here is to delete an entity and everything related to it. Because this is so dangerous, and option is provided to simply trace the resulting delete calls so you can verify the action before performing the delete.
Name of the entity type for the instance being deleted.
ID of the entity instance to be deleted.
A hash detailing the options for this delete operation.
Returns a statistics object indicating how many records of each particular table were deleted.
The permissible options for this method are as follows.
If TRUE, then the delete statements will be traced, but no changes will be made to the database. If dump
, then the data is dumped to load files instead of being traced.
If TRUE, then the entity instances will not be deleted, only the dependent records.
If TRUE, then all of the DELETE statements will be written to the standard output.
If TRUE, then the entity instance will be deleted, but none of the attached data will be removed (the opposite of keepRoot
).
my $count = $erdb->Disconnect($relationshipName, $originEntityName, $originEntityID);
Disconnect an entity instance from all the objects to which it is related via a specific relationship. This will delete each relationship instance that connects to the specified entity.
Name of the relationship whose instances are to be deleted.
Name of the entity that is to be disconnected.
ID of the entity that is to be disconnected.
Returns the number of rows deleted.
$erdb->DeleteRow($relationshipName, $fromLink, $toLink, \%values);
Delete a row from a relationship. In most cases, only the from-link and to-link are needed; however, for relationships with intersection data values can be specified for the other fields using a hash.
Name of the relationship from which the row is to be deleted.
ID of the entity instance in the From direction.
ID of the entity instance in the To direction.
Reference to a hash of other values to be used for filtering the delete.
my $deleteCount = $erdb->DeleteLike($relName, $filter, \@parms);
Delete all the relationship rows that satisfy a particular filter condition. Unlike a normal filter, only fields from the relationship itself can be used.
Name of the relationship whose records are to be deleted.
A filter clause for the delete query. See "Filter Clause".
Reference to a list of parameters for the filter clause. See "Parameter List".
Returns a count of the number of rows deleted.
my $numDeleted = $erdb->DeleteValue($entityName, $id, $fieldName, $fieldValue);
Delete secondary field values from the database. This method can be used to delete all values of a specified field for a particular entity instance, or only a single value.
Secondary fields are stored in two-column relations separate from an entity's primary table, and as a result a secondary field can legitimately have no value or multiple values. Therefore, it makes sense to talk about deleting secondary fields where it would not make sense for primary fields.
ID of the entity instance to be processed. If the instance is not found, this method will have no effect. If undef
is specified, all values for all of the entity instances will be deleted.
Name of the field whose values are to be deleted, in "Standard Field Name Format".
Value to be deleted. If not specified, then all values of the specified field will be deleted for the entity instance. If specified, then only the values which match this parameter will be deleted.
Returns the number of rows deleted.
my @attrNames = $sprout->GetUsefulCrossValues($sourceEntity, $relationship);
Return a list of the useful attributes that would be returned by a Cross call from an entity of the source entity type through the specified relationship. This means it will return the fields of the target entity type and the intersection data fields in the relationship. Only primary table fields are returned. In other words, the field names returned will be for fields where there is always one and only one value.
Name of the entity from which the relationship crossing will start.
Name of the relationship being crossed.
Returns a list of field names in "Standard Field Name Format".
my $colIndex = ERDB::FindColumn($headerLine, $columnIdentifier);
Return the location a desired column in a data mining header line. The data mining header line is a tab-separated list of column names. The column identifier is either the numerical index of a column or the actual column name.
The header line from a data mining command, which consists of a tab-separated list of column names.
Either the ordinal number of the desired column (1-based), or the name of the desired column.
Returns the array index (0-based) of the desired column.
my @columns = ERDB::ParseColumns($line);
Convert the specified data line to a list of columns.
A data mining input, consisting of a tab-separated list of columns terminated by a new-line.
Returns a list consisting of the column values.
my $cleanedString = $erdb->CleanKeywords($searchExpression);
Clean up a search expression or keyword list. This is a virtual method that may be overridden by the subclass. The base-class method removes extra spaces and converts everything to lower case.
Search expression or keyword list to clean. Note that a search expression may contain boolean operators which need to be preserved. This includes leading minus signs.
Cleaned expression or keyword list.
my $source = $erdb->GetSourceObject();
Return the object to be used in creating load files for this database. This is only the default source object. Loaders have the option of overriding the chosen source object when constructing the "ERDBLoadGroup" objects.
my @sections = $erdb->SectionList();
Return a list of the names for the different data sections used when loading this database. The default is a single string, in which case there is only one section representing the entire database.
my $flag = $sap->GlobalSection($name);
Return TRUE if the specified section name is the global section, FALSE otherwise.
my $name = $erdb->PreferredName();
Return the variable name to use for this database when generating code. The default is erdb
.
my $groupLoader = $erdb->Loader($groupName, $options);
Return an "ERDBLoadGroup" object for the specified load group. This method is used by ERDBGenerator.pl to create the load group objects. If you are not using ERDBGenerator.pl, you don't need to override this method.
Name of the load group whose object is to be returned. The group name is guaranteed to be a single word with only the first letter capitalized.
Reference to a hash of command-line options.
Returns an "ERDBLoadGroup" object that can be used to process the specified load group for this database.
my @groups = $erdb->LoadGroupList();
Returns a list of the names for this database's load groups. This method is used by ERDBGenerator.pl when the user wishes to load all table groups. The default is a single group called 'All' that loads everything.
my $dirName = $erdb->LoadDirectory();
Return the name of the directory in which load files are kept. The default is the FIG temporary directory, which is a really bad choice, but it's always there.
$erdb->Cleanup();
Clean up data structures. This method is called at the end of each section when loading the database. The subclass can use it to free up memory that may have accumulated due to caching or accumulation of hash structures. The default method does nothing.
my $flag = $erdb->UseInternalDBD();
Return TRUE if this database should be allowed to use an internal DBD. The internal DBD is stored in the _metadata
table, which is created when the database is loaded. The default is TRUE.
my $fieldString = $erdb->_FieldString($descriptor);
Compute the definition string for a particular field from its descriptor in the relation table.
Field descriptor containing the field's name and type.
Returns the SQL declaration string for the field.
my $typeString = $erdb->_TypeString($descriptor);
Determine the SQL type corresponding to a field from its descriptor in the relation table.
Field descriptor containing the field's name and type.
Returns the SQL type string for the field.
my $defaultValue = $self->_Default($objectName, $fieldName);
Return the default value for the specified field in the specified object. If no default value is specified, an undefined value will be returned.
Name of the object containing the field.
Name of the field whose default value is desired.
Returns the default value for the specified field, or an undefined value if no default is available.
my %fixedHash = $self->_SingleTableHash($fieldHash, $objectName, $unchanged);
Convert a hash of field names in "Standard Field Name Format" to field values into a hash of simple field names to encoded values. This is a common utility function performed by most update-related methods.
A hash mapping field names to values. The field names must be in "Standard Field Name Format".
The default object name to be used when no object name is specified for the field.
If TRUE, the field values will not be encoded for storage. (It is presumed they already are.) The default is FALSE.
Returns a hash of simple field names to encoded values for those fields.
my $fieldData = $erdb->_FindField($string, $defaultName);
Return the descriptor for the named field. If the field does not exist or the name is invalid, an error will occur.
Field name string to be parsed. See "Standard Field Name Format".
Default object name to be used if the object name is not specified in the input string.
Returns the descriptor for the specified field.
my $descriptor = $erdb->_CheckField($objectName, $fieldName);
Return the descriptor for the specified field in the specified entity or relationship, or an undefined value if the field does not exist.
Name of the relevant entity or relationship. If the object does not exist, an error will be thrown.
Name of the relevant field.
Returns the field descriptor from the metadata, or undef
if the field does not exist.
my @relationMap = _RelationMap($mappedNameHashRef, $mappedNameListRef);
Create the relation map for an SQL query. The relation map is used by "ERDBObject" to determine how to interpret the results of the query.
Reference to a hash that maps object name aliases to real object names.
Reference to a list of object name aliases in the order they appear in the SELECT list.
Returns a list of 3-tuples. Each tuple consists of an object name alias followed by the actual name of that object and a flag that is TRUE if the alias is a converse. This enables the "ERDBObject" to determine the order of the tables in the query and which object name belongs to each object alias name. Most of the time the object name and the alias name are the same; however, if an object occurs multiple times in the object name list, the second and subsequent occurrences may be given a numeric suffix to indicate it's a different instance. In addition, some relationship names may be specified using their converse name.
my ($suffix, $nameList, $nameHash) = $erdb->_SetupSQL($objectNames, $filterClause, $matchClause);
Process a list of object names and a filter clause so that they can be used to build an SQL statement. This method takes in an object name list and a filter clause. It will return a corrected filter clause, a list of mapped names and the mapped name hash.
This is an instance method.
Object name list from a query. See "Object Name List".
A string containing the WHERE clause for the query (without the WHERE
) and also optionally the ORDER BY
and LIMIT
clauses. See "Filter Clause".
An optional full-text search clause. If specified, it will be inserted at the front of the WHERE clause. It should already be SQL-formatted; that is, the field names should be in the form table.
fieldName.
Returns a three-element list. The first element is the SQL statement suffix, beginning with the FROM clause. The second element is a reference to a list of the names to be used in retrieving the fields. The third element is a hash mapping the names to 2-tuples consisting of the real name of the object and a flag indicating whether or not the mapping is via a converse relationship name.
my $sth = $erdb->_GetStatementHandle($command, $params);
This method will prepare and execute an SQL query, returning the statement handle. The main reason for doing this here is so that everybody who does SQL queries gets the benefit of tracing.
Command to prepare and execute.
Reference to a list of the values to be substituted in for the parameter marks.
Returns a prepared and executed statement handle from which the caller can extract results.
my $stats = ERDB::_GetLoadStats();
Return a blank statistics object for use by the load methods.
my $count = $erdb->_DumpRelation($outputDirectory, $relationName);
Dump the specified relation to the specified output file in tab-delimited format.
Directory to contain the output file.
Name of the relation to dump.
Returns the number of records dumped.
my $objectData = $self->_GetStructure($objectName);
Get the data structure for a specified entity or relationship.
Name of the desired entity or relationship.
The descriptor for the specified object.
my $relHash = $erdb->_GetRelationTable($objectName);
Get the list of relations for a specified entity or relationship.
Name of the desired entity or relationship.
A table containing the relations for the specified object.
$erdb->ValidateFieldNames($metadata);
Determine whether or not the field names in the specified metadata structure are valid. If there is an error, this method will abort.
Metadata structure loaded from the XML data definition.
my $stats = $erdb->_LoadRelation($directoryName, $relationName, $rebuild);
Load a relation from the data in a tab-delimited disk file. The load will only take place if a disk file with the same name as the relation exists in the specified directory.
DBKernel object for accessing the database.
Name of the directory containing the tab-delimited data files.
Name of the relation to load.
TRUE if the table should be dropped and re-created before loading.
Returns a statistical object describing the number of records read and a list of error messages.
my $metadata = ERDB::_LoadMetaData($self, $filename, $external);
This method loads the data describing this database from an XML file into a metadata structure. The resulting structure is a set of nested hash tables containing all the information needed to load or use the database. The schema for the XML file is ERDatabase.xml.
Blessed ERDB object.
Name of the file containing the database definition.
If TRUE, then the internal DBD stored in the database (if any) will be bypassed. This option is usually used by the load-related command-line utilities.
Returns a structure describing the database.
ERDB::_CreateRelationshipIndex($indexKey, $relationshipName, $relationshipStructure);
Create an index for a relationship's relation.
Type of index: either "From"
or "To"
.
Name of the relationship.
Structure describing the relationship that the index will sort.
ERDB::_ProcessIndexes($indexList, $relation);
Build the data structures for the specified indexes in the specified relation.
Reference to a list of indexes. Each index is a hash reference containing an optional Notes
value that describes the index and an IndexFields
value that is a reference to a list of index field structures. An index field structure, in turn, is a reference to a hash that contains a name
attribute for the field name and an order
attribute that specifies either ascending
or descending
. In this sense the index list encapsulates the XML Indexes
structure in the database definition.
The structure that describes the current relation. The new index descriptors will be stored in the structure's Indexes
member. Any previous data in the structure will be lost.
ERDB::_AddIndex($indexName, $relationStructure);
Add an index to a relation structure.
This is a static method.
Name to give to the new index.
Relation structure to which the new index should be added.
New index to add.
ERDB::_FixupFields($structure, $defaultRelationName);
This method fixes the field list for the metadata of an entity or relationship. It will add the caller-specified relation name to fields that do not have a name and set the PrettySort
values.
Entity or relationship structure to be fixed up.
Default relation name to be added to the fields.
my $fixedName = ERDB::_FixName($fieldName, $converse);
Fix the incoming field name so that it is a legal SQL column name.
Field name to fix.
If TRUE, then "from" and "to" will be exchanged.
Returns the fixed-up field name.
my @fixedNames = ERDB::_FixNames(@fields);
Fix all the field names in a list. This is essentially a batch call to "_FixName".
List of field names to fix.
Returns a list of fixed-up versions of the incoming field names.
ERDB::_AddField($structure, $fieldName, $fieldData);
Add a field to a field list.
Structure (usually an entity or relationship) that is to contain the field.
Name of the new field.
Structure containing the data to put in the field.
my \@fieldList = ERDB::_ReOrderRelationTable(\%relationTable);
This method will take a relation table and re-sort it according to the implicit ordering of the PrettySort
property. Instead of a hash based on field names, it will return a list of fields. This requires creating a new hash that contains the field name in the name
property but doesn't have the PrettySort
property, and then inserting that new hash into the field list.
This is a static method.
Relation hash to be reformatted into a list.
A list of field hashes.
my $flag = $erdb->_IsPrimary($relationName);
Return TRUE if a specified relation is a primary relation, else FALSE. A relation is primary if it has the same name as an entity or relationship.
Name of the relevant relation.
Returns TRUE for a primary relation, else FALSE.
my $joinClause = $erdb->_JoinClause($source, $target);
Create a join clause that connects the source object to the target object. If we are crossing from an entity to a relationship, we key off the relationship's from-link. If we are crossing from a relationship to an entity, we key off of it's to-link. It is also possible to cross from relationship to relationship if the two have an entity in common. Finally, we must be aware of converse names for relationships, and for nonrecursive relationships we allow crossing via the wrong link.
Name of the object from which we are starting.
Name of the object to which we are proceeding.
Returns a string that may be used in an SQL WHERE in order to connect the two objects. If no connection is possible, an undefined value will be returned.
my $joinString = $erdb->_BuildJoin($fld1 => $source, $sourceData, $fld2 => $target, $targetData);
Create a join string between the two objects. The second object must be a relationship; the first can be an entity or a relationship. The fields indicators specify the nature of the connection: id
for an entity connection, from
for the front of a relationship, and to
for the back of a relationship. The theory is that if everything is compatible, you just connect the indicated fields in the two objects. This may not be possible if the second relationship does not match the first object in the proper manner. If that is the case, attempts will be made to find a workable connection.
Join direction for the first object: id
if it's an entity, from
if it's a relationship and we're coming out the front, or to
if it's a relationship and we're coming out the end.
Name to use for the first object in constructing the field reference.
Entity or relationship descriptor for the first object.
Join direction for the second object: from
if it's a relationship and we're going in the front, or to
if it's a relationship and we're going in the end.
Name to use for the second object in constructing the field reference.
Relationship descriptor for the second object.
Returns a string that can be used in an SQL WHERE clause to connect the two objects, or undef
if no connection is possible.
my $realName = $erdb->_Resolve($objectName);
Determine the real object name for a name from an object name list. Trailing numbers are peeled off, and the alias table is checked. If the incoming name is invalid, the return value will be undefined.
Incoming object name to parse.
Returns the object's real name, or undef
if the name is invalid.
$erdb->InternalizeDBD();
Save the DBD metadata into the database so that it can be retrieved in the future.
my $objectData = $erdb->_FindObject($list => $name);
Return the structural descriptor of the specified object (entity, relationship, or shape), or an undefined value if the object does not exist.
Name of the list containing the desired type of object (Entities
, Relationships
, or Shapes
).
Name of the desired object.
Returns the object descriptor if found, or undef
if the object does not exist or is not of the proper type.
my $wikiText = ERDB::_WikiNote($dataString, $wiki);
Convert a note or comment to Wiki text by replacing some bulletin-board codes with HTML. The codes supported are [b]
for bold, [i]
for italics, [link]
for links, [list]
for bullet lists. and [p]
for a new paragraph. All the codes are closed by slash-codes. So, for example, [b]Feature[/b]
displays the string Feature
in boldface.
String to convert to Wiki text.
Wiki object used to format the text.
An Wiki text string derived from the input string.
my $text = ERDB::_ComputeRelationshipSentence($wiki, $relationshipName, $relationshipStructure, $dir);
The relationship sentence consists of the relationship name between the names of the two related entities and an arity indicator.
WikiTools object for rendering links. If this parameter is undefined, no link will be put in place.
Name of the relationship.
Relationship structure containing the relationship's description and properties.
Starting point of the relationship: from
(default) or to
.
Returns a string containing the entity names on either side of the relationship name and an indicator of the arity.
my $tableMarkup = _WikiObjectTable($name, $fieldStructure, $wiki);
Generate the field table for the named entity or relationship.
Name of the object whose field table is being generated.
Field structure for the object. This is a hash mapping field names to field data.
WikiTools object (or equivalent) for rendering HTML.
Returns the markup for a table of field information.