•   almost 10 years ago

Akoma Ntoso Metadata Clarification Questions

Good day,

I am seeking some clarification regarding the metadata:

1. In terms of collecting accurate metadata and converting dates in words to the correct year, looking at [1] I see the date as specified as "the third day of January, two thousand and twelve." Please confirm if this application will need to run for documents such as ones that have dates from the 1900s. I would also like confirm, what format would be used if a document with the year was 2021. Would it be "two thousand and twenty one" or "two thousand and twenty-one"?
2. Also, in terms of the metadata, in terms of author, should it be determined from document, or is it fair to assume the author is always Congress?

Thanks for posting this challenge and looking forward to your response!

[1] http://www.gpo.gov/fdsys/pkg/BILLS-112hr4310enr/html/BILLS-112hr4310enr.htm

  • 1 comment

  • Manager   •   almost 10 years ago

    1. In response to your first question about dates:

    The Government Printing Office does make available all published versions of bills, including text versions, from the 103rd (1993-1994) Congress forward. It would be anticipated that the application should be able to handle these documents, if at all possible.

    The text your question is referencing, "Begun and held at the City of Washington on Tuesday, the third day of January, two thousand and twelve" is text that appears on every enrolled measure for a particular session of Congress. It is the initial convening date and location of the first or second session of a Congress. (A session is the annual series of meetings of a Congress. There are two sessions per Congress. Under the Constitution, Congress must meet at least once a year at noon on January 3 unless it appoints a different day by law.)

    For the text of the session that will be held in 2021, we presume that 21 will be written "two thousand and twenty-one" - it would follow the pattern used for this line of text in the 1990s as seen in H.R. 6 from the 105th Congress.
    Of related interest, you can see that the convening date for the 106th Congress, 2d session was January 27th. Congress appointed this day by law with Public Law 105-140 (http://www.gpo.gov/fdsys/pkg/PLAW-105publ140/html/PLAW-105publ140.htm).

    When expressing such text in Akoma Ntoso, particular care should be given to the fundamental distinction between content (i.e., the text as provided by the relevant authority) and the markup (i.e., the additional specification expressed through the use of XML markup and the Akoma Ntoso vocabulary). This is connected with the distinction between FRBR Expression and FRBR Manifestation, as explained in the answer to the second question below.

    The Akoma Ntoso XML markup does not affect the original language of the document, which remains exactly as specified by the relevant authority. However, Akoma Ntoso does provide a number of content-specific and metadata-specific elements which make it possible to unambiguously express any date specified in the content or that can be inferred from the context. These elements (e.g. docDate, date, publication, etc.) use an attribute @date for the normalization of the temporal information using the xsd:date or xsd:dateTime lexical representation (namely, in the form yyyy-mm-dd or yyyy-mm-ddThh:mm:sszzzzzz) (see: http://www.w3.org/TR/xmlschema-2/). So "third day of January, two thousand and twelve" appears in the Akoma Ntoso XML markup as follows:

    <docDate date="2012-01-03">third day of January, two thousand and twelve.</docDate>

    The original content within the tags is not modified from the original published version of the text.

    2. In response to your second question about author metadata:

    There are actually several authors involved in the specification of a bill, and all of them need to be listed separately in the FRBR metadata blocks of Akoma Ntoso. FRBR (http://www.ifla.org/publications/functional-requirements-for-bibliographic-records) separates Work from Expressions from Manifestations. The Work, Expression and Manifestation may (and very often have) different authors. The distinction between these FRBR levels is fundamental to understanding the impact of the various actors in the determination of the parts of an Akoma Ntoso document. In particular, when referring to bills:

    • The author of the Work is the body that created the concept of the document and started the (possibly long) list of versions that constitute the history of the document. This is typically the chamber in which the bill originated, and thus the author needs to be specified accordingly.
    • The author of the Expression is the person/organization responsible for the actual wording of the current version of the bill. In the official lifecycle of a bill, each subsequent version is created as an official output of a process of a chamber or one of its committees, and as such the chamber and/or the specific committee must be specified as the author.
    • The author of the Manifestation is the person who creates the actual XML markup that is being provided, i.e. the individual or organization that is creating the XML document.

Comments are closed.