According to PMI-CPMAI, before implementing sophisticated platforms (such as catalogs or warehouses), AI initiatives must begin withfoundation work on data discovery and inventory. For an NLP system analyzing public comments on regulations, the framework stresses that teams must first “identify, locate, and characterize all relevant data sources, owners, formats, access paths, and constraints,” and ensure this information isdocumented in a consistent, accessible way. This is commonly described as adata inventory or data source audit, where the team systematically lists sources (web forms, email submissions, social media channels, open data portals, scanned documents), their frequency of update, retention policies, legal constraints, and access mechanisms.
PMI-CPMAI notes that this step is critical to ensure that data sources are bothwell-identified(no major channel missing, clear owners, understood structures) andaccessiblewithin regulatory and security constraints. An internal data catalog system can be a longer-term governance mechanism, but it only becomes effective if the underlying inventory work has already been done accurately; otherwise, the catalog simply reflects incomplete or outdated information. Data warehousing or CRM systems address storage or customer data management, not necessarily the breadth of public-comment channels.
Therefore, the most directly effective method to meet the project team’s immediate objective—ensuring data sources are well-identified and accessible for the NLP initiative—isconducting a thorough data inventory audit and ensuring it is well documented.