| Article of the Month - April 2025 | 
		Artificial Intelligence for Querying Land and 
		Property Data from Cadastral Plans
		Hamid Hosseini, Behnam Atazadeh and Abbas 
		Rajabifard, Australia
		 
		
			
				|  |  |  | 
			
				| Hamid Hosseini | Behnam Atazadeh | Abbas Rajabifard | 
		
		
			
			This article in .pdf-format 
			(16 pages)
			
		
						SUMMARY
		Cadastral plans are used in land registration systems for defining 
		legal boundaries of land parcels and properties as well as their 
		associated rights, restrictions, and responsibilities (RRRs). However, 
		existing registered cadastral plans are in 2D non-machine-readable 
		formats and data within these plans are not easily accessible and 
		readily usable, leading to unnecessary delays, disruptions, and costs 
		within land development projects. Artificial intelligence (AI) as an 
		emerging technology has been recognized as one of the operational 
		parameters for advancing land administration systems (LASs) which can 
		offer transformative solutions to overcome traditional approaches. This 
		paper presents a new approach to efficiently retrieve land and property 
		information from cadastral plans, reducing the high cognitive load 
		associated with manual approaches. Our approach’s two core 
		functionalities are data extraction from plans using computer vision and 
		communication with plans using natural language processing (NLP). To 
		demonstrate our approach, a prototype chatbot employing generative 
		pretrained transformer (GPT) as the core large language model (LLM) was 
		developed for data querying from plans. Initial testing shows effective 
		handling of semantic queries, while highlighting the need for further 
		refinement and development in handling more specific queries within land 
		administration domain and complex spatial queries. 
		1. INTRODUCTION
		Effective management of land and property data during its lifecycle 
		is significantly important for the operational efficiency of four 
		critical land administration functions: land ownership, value, use, and 
		development. This results in promoting economic development, 
		environmental sustainability, and social well-being in all jurisdictions 
		and countries (Williamson et al., 2010). Considering land ownership as 
		the basis for the next administrative activities, cadastral plans are 
		common data used for defining and registering boundaries of land parcels 
		and properties as well as their associated rights, restrictions, and 
		responsibilities (RRRs). In Victoria, Australia, plan of subdivision 
		(PS) and plan of consolidation (PC) are currently used to document and 
		represent legal information about the ownership and extent of RRRs over 
		land parcels and properties. In addition to PSs and PCs, abstract of 
		field records (AFRs) are used for documenting and representing the 
		required survey information such as land parcels’ connection to a road 
		intersection or a Crown boundary for generating new plans (Land Use 
		Victoria, 2024b). Any land transaction such as subdivisions, 
		consolidations, and boundary realignments involving new legal boundaries 
		or modifying existing boundaries must be supported by performing land 
		surveying and land registration activities to issue new titles for the 
		new land parcels. These plans are used by relevant stakeholders involved 
		in all stages of land development projects from initial design to future 
		maintenance and form the backbone of various land administration 
		processes such as back-capturing and examination.
		However, existing registered cadastral plans are in 2D 
		non-machine-readable formats such as papers and scanned documents which 
		have static representation and lack intelligence. These plans provide 
		essential land and property information. However, due to the special 
		characteristics of these plans, administrative, legal, and survey data 
		within these plans are not easily accessible and readily usable, leading 
		to insufficient data queries. The most apparent characteristics of these 
		plans are (see Figure 1):
		
			- Dense and detailed data: Each sheet of the plans contains dense 
			and detailed textual elements (e.g., characters, numbers, and 
			punctuations) in multiple sizes and orientations as well as 
			geometric elements (e.g., symbols, lines, polygons). Although it 
			provides rich and comprehensive understanding, it leads to 
			difficulties in finding specific land and property information 
			quickly.
- Fragmentation: Data inside these plans are fragmented and 
			scattered not only across multiple sheets but also throughout 
			individual sheets. Although it directs attention to a specific 
			portion of the plan’s information, it makes it difficult to follow 
			the content coherently and may cause to miss critical land and 
			property information.
- 2D flat view: Elevation and depth information which is essential 
			for representing the vertical dimension is provided by 2D flat 
			viewed diagrams such as crosse-sectional and isometric diagrams, 
			leading to ambiguity in terms of visualisation.
- Isolation: Considering two or more land parcels and properties, 
			their own plans are stored separately, leading to difficulties to 
			find out their relationships in an integrated 
		
		Figure 1. A Crown plan of a tunnel in the city of 
		Melbourne with its characteristics in static PDF format, leading to 
		increasing much cognitive load to understand the content
		Overall, a much cognitive load is required to understand the content 
		inside cadastral plans and therefore query desired data, especially for 
		less experienced users with less familiarity with the content, leading 
		to a reduction in the level of accessibility and usability of these 
		plans. This can potentially result in slow and inefficient 
		administrative processes which can lead to unnecessary delays, 
		disruptions, and costs within land development projects, particularly in 
		large-scale infrastructure projects which deal with numerous land 
		parcels and properties.
		Transforming data from 2D non-machine-readable formats to 3D full 
		digital models based on Land Administration Domain Model (LADM), City 
		Geography Markup Language (CityGML), and Industry Foundation Classes 
		(IFC), few studies have been conducted for connecting LADM and IFC 
		(Atazadeh et al., 2018), integration of LADM and CityGML (Góźdź et al., 
		2014; Li et al., 2016), and 3D-extending the CityGML for underground 
		legal boundaries (Saeidian et al., 2024). However, the majority of 
		existing registered land and property data have yet to be mapped into 
		these newly developed 3D digital models. In Victoria, Australia, 
		although many plans have converted to digital records (i.e., LandXML) 
		within the back-capturing process under the Digital Cadastre 
		Modernisation program (Land Use Victoria, 2024a), this initiative 
		currently does not support multi-story properties (Cumerford, 2010). 
		Considering the process of mapping, it is a semi-automated task 
		requiring domain experts and is not usable by non-specialist 
		stakeholders. In addition, conducting new surveys is both time-consuming 
		and costly, and using crowdsourcing approaches may fail in terms of 
		accuracy and heterogeneity.
		Applying innovative and efficient ways to enhance the reusability of 
		the existing data can potentially benefit stakeholders such as land 
		surveyors and land registries, facilitating access to and retrieval of 
		land and property information for different purposes. The new 
		intelligent approaches should be able to review the plans quickly and 
		generate query results faster, assisting stakeholders to make smarter 
		decisions with reduced cognitive effort. Artificial intelligence (AI) 
		has been recognized as one of the operational parameters for advancing 
		land administration systems (LASs) (Chehrehbargh et al., 2024). AI has 
		been widely adopted in various domains such as geospatial science and 
		has resulted in the emergence of geospatial AI (GeoAI). The adoption of 
		AI into land administration, as a subdomain of geospatial science, can 
		offer intelligent solutions to overcome traditional approaches within 
		land administration practices. By developing AI models, it becomes 
		possible not only to review and query from plans directly but also to 
		accelerate their conversion to 3D digital models and hence effective 
		data validation, storage, visualisation, and query. This is in line with 
		the future visions defined in the Cadastre 2034 initiative which has a 
		vision to enable people to understand their RRRs related to land and 
		real property in a survey accurate and 3D environment. The aim of this 
		initiative is to achieve a cadastral system which is sustainably 
		managed, truly accessible, easily visualised, readily used, fully 
		integrated with broader interests on land and provides a dynamic, 3D 
		digital representation of real world (ICSM, 2019).
		The main purpose of this paper is to introduce a new AI-based approach 
		to support land administration stakeholders in querying land and 
		property data from existing registered cadastral plans in an intelligent 
		environment using computer vision and natural language processing (NLP). 
		This is expected to increase the efficiency of the document reviewing 
		process and assist the stakeholders to conduct land administration tasks 
		with considerably less cognitive load. To demonstrate the practical 
		applicability of our approach, an initial prototype of a chatbot has 
		been developed and tested in which users can upload a cadastral plan in 
		PDF format and ask questions about the plan in a natural language form 
		and receive a response accordingly. This serves as a proof of concept, 
		illustrating how AI can transform traditional land administration 
		processes, making cadastral plans more accessible and reusable. 
		The rest of the paper is organised as follows: Section 2 provides the 
		background relevant to the research. In Section 3, the proposed approach 
		is described. In Section 4, an initial prototyping and testing has been 
		conducted to examine the feasibility of the proposed AI-based approach 
		for querying land and property data. Finally, Section 5 provides 
		discussions and conclusions.
		2 BACKGROUND AND RELATED WORK
		2.1 Artificial Intelligence
		Although AI lacks a universally accepted definition, it is generally 
		recognized as enabling machines to replicate different aspects of human 
		intelligence such as reasoning, learning, perceiving, communicating, 
		problem-solving, and acting (Russell & Norvig, 2016). This can lead to 
		learning from experience, adapting to new situations, and performing 
		human-like tasks (Duan et al., 2019). It involves a wide range of 
		techniques that can broadly be categorized into rule-based and 
		data-driven paradigms. Reasoning aspect of human intelligence refers to 
		the process of human-like logical thinking. Expert systems are popular 
		example of AI developments related to reasoning aspect in which explicit 
		knowledge in form of encoded if-then rules is used (Gupta & Nagpal, 
		2020). As computational power grew, this rigid approach has been 
		replaced by data-driven approaches which are based on learning from 
		data. Although it has less transparency, it brings more flexibility and 
		adaptability to new situations. Machine learning algorithms such as 
		decision trees, random forest (RF), support vector machine (SVM), and 
		k-means can be fed with experienced data and be trained and make 
		predictions for new data (Zhou, 2021). More advanced, deep learning as a 
		subset of machine learning can extract deep patterns from data through 
		its multi-layer neural network (Goodfellow, 2016). Moreover, computer 
		vision and natural language processing (NLP) can replicate human 
		cognition aspects such as vision and speech. Computer vision can 
		perceive and understand visual information and has shown its 
		capabilities for information extraction from imagery data such as plans. 
		On the other hand, NLP can potentially understand and generate 
		information in for of human language such as textual data (Nishant et 
		al., 2020). Leveraging these techniques, AI offers three main 
		capabilities: 1) automation 2) real-time functionality and prediction 3) 
		intelligent decision-making.
		Optical character recognition (OCR) is a technology used to identify 
		and convert textual data form different types of documents, such as 
		scanned papers and PDFs into machine-readable format (Memon et al., 
		2020). OCR for survey plan analysis automates the extraction of textual 
		information, such as boundary descriptions, parcel numbers, and surveyor 
		annotations, from scanned survey documents. Traditional OCR used pattern 
		recognition techniques and rule-based approaches but data-driven OCR 
		like those powered by convolutional neural networks (CNNs) can handle 
		more complex tasks, such as identifying text in varied fonts, layouts, 
		and even handwriting. These systems learn from large datasets and 
		continuously improve their accuracy and efficiency through AI models. By 
		converting survey plans into machine-readable textual formats, OCR 
		streamlines land tenure documentation. This automation enhances the 
		efficiency of land administration tasks, reducing manual data entry and 
		errors.
		Upon extraction of textual information from plans, these raw texts need 
		to be processed to be converted into computer-intelligible (i.e., 
		numerical) format (Chen et al., 2022). Moreover, for having enhanced 
		communication with the plans, generating new textual data is required. 
		These can be done using NLP and large language models (LLM). NLP 
		consists of preprocessing tasks for cleaning the text and vectorization 
		of words for converting the text into numerical format. Preprocessing 
		consists of several steps as follows:
		
			- Removing unnecessary data: It refers to 
			removing punctuation, HTML tags, etc.
- Tokenization: It refers to splitting text into 
			smaller units such as words or sentences.
- Normalization: It refers to standardizing text 
			by converting all characters to lowercase and normalizing spelling.
- Removing stopwords: It refers to removing words 
			that do not contribute much meaning (e.g., the, and, or).
- Stemming and lemmatization: Stemming refers to 
			reducing words to their base or root form and lemmatization refers 
			to mapping words to their dictionary form.
In order to convert the prepossessed text to machine- readable 
		format, the text needs to be converted into numerical format (i.e., 
		vector representation), known as embedding vectors. Bag of Words (BoG) 
		(Rani et al., 2022) is a traditional model for word embedding which 
		relies on the frequency of word occurrences but lacks contextual 
		awareness, resulting in semantic inaccuracy. In contrast, Word2Vec 
		(Mikolov et al., 2013) and GloVe (Pennington et al., 2014) are 
		predictive models based on learning concept. GloVe captures global 
		context across the entire corpus and is more suitable for representing 
		semantic relationships. However, these models are static and assign a 
		single vector representation to a word regardless of its context. On the 
		other hand, transformer-based word embedding methods such as embeddings 
		from language models (ELMo) (Peters et al., 2018), bidirectional encoder 
		representations from transformers (BERT) (Devlin, 2018), and generative 
		pretrained transformer (GPT) (Brown, 2020) excels in contextual 
		embedding in which dynamic vectors as the output adapting to the context 
		are generated. These methods are based on transformer architectures 
		which use self-attention mechanism to achieve a deeper understanding of 
		contextual relationships (Vaswani, 2017). Self-attention mechanism 
		allows each word to attend to all other words in the sequence and 
		generate a set of context-sensitive vectors for each word, enriched with 
		weighted information from other words in the sequence. The relevance of 
		tokens to one another is quantified through attention scores which is 
		calculated by using the dot product of query and key vectors, followed 
		by a softmax operation for normalisation. Considering LLM models for 
		generating new textual data related to a given text, the embeddings are 
		updated and refined layer by layer within the transformer. Once the 
		sequence has been processed, the new textual data is generated by 
		selecting the next word based on the highest probability from the 
		softmax output. This new textual data may include summaries or answers 
		to specific queries about the input text.
		2.2 Related Work
		Several studies have been conducted for information extraction from 
		cadastral document. In (Lenc et al., 2021), fully convolutional networks 
		(FCNs) has been developed for landmark and border line detection 
		combined with traditional image processing techniques like edge 
		detection for facilitating the creation of maps from historical 
		documents. Applicability of neural networks for effective annotation in 
		historical maps to facilitate their automatic vectorization is discussed 
		in (Petitpierre & Guhennec, 2023). In (Lenc et al., 2023), integration 
		of neural networks with standard computer vision techniques for the 
		automatic analysis of historical cadastral maps has been suggested when 
		little training data are available. In (Mango et al., 2023), line 
		convolution neural network (LCNN) and ResNet-50 have been used for 
		detecting parcels and their numbers in paper-based cadastral data, 
		respectively. However, it is unable to detect all numbers. In a similar 
		study (Marcial et al., 2013), lot numbers have been recognized using 
		artificial neural network (ANN) and image processing techniques like 
		binarization. The study achieved an average detection rate of 90% for 
		smaller maps and 84.78% for larger maps.
		In (Franken et al., 2021), a data processing platform named VeCToR has 
		been developed that combines deep learning algorithms with human 
		validation for high accuracy in the extraction of geometric and semantic 
		information from millions of historical field sketches which are 
		schematic drawings and different from cadastral plans, aiming to rebuild 
		cadastre maps of Netherland. Conversion of AFRs files into LandXML files 
		using OCR process has been investigated in (La Rosa & Garrido, 2019). In 
		(Yıldız et al., 2021), a model has been developed to automatically 
		digitize the temporal dimension of cadastral parcels using OCR and EAST 
		DL text detectors. However, it has challenges with ambiguous texts, 
		light reflections, and blurry images. Overall, deep learning models can 
		achieve excellent performance in cadastral map digitization, but the 
		limited training data is a big challenge, especially for historical maps 
		(Ignjatić et al., 2018).
		3. PROPOSED AI-BASED APPROACH
		Considering the issues associated with 2D survey plans and the 
		transformative solutions that AI brings to us, a conceptual framework 
		with an architecture has been introduced which is illustrated in Figure 
		2. The flow begins with user interactions, where users can upload survey 
		plans in raster PDF formats. Then they can input queries seeking 
		specific information such as depth limitation or distance between two 
		specific points ensuring accessibility and ease of communication, 
		regardless of their technical background. Based on the uploaded plans 
		and input queries, users can receive a coherent response in natural 
		language. This framework is underpinned by two core functionalities: 
		data extraction from plans using computer vision and communication with 
		plans using NLP, which are described in the following subsections.
		3.1 Textual and geometric data extraction from plans
		Through this functionality, the uploaded plans undergo conversion to 
		an image format such as PNG to be prepared for extracting essential data 
		from them. Once the sheets of the plan are converted to an image format, 
		key components within each image, such as notations and boundaries, need 
		to be identified and segmented using computer vision techniques. In this 
		regard, an essential component of the process is the utilization of 
		CNNs, which excel in image segmentation tasks by providing a precise and 
		detailed examination of the details present in survey plans. These 
		networks perform a thorough and detailed analysis of the various 
		elements present in the survey plans, ensuring that all textual and 
		geometric components are captured and ready for further analysis. The 
		output is a segmented image in where each pixel is classified into 
		distinct categories such as annotation, boundary, and other relevant 
		features. Following the segmentation process, textual data within the 
		segmented image regions containing texts are extracted and transcribed 
		from image format into machine-readable text using OCR engines. This 
		results in raw textual output being stored in a database or in a file 
		format like LandXML. Simultaneously, geometric data from the survey 
		plans are extracted and converted into structured formats like Geo 
		JavaScript Object Notation (GeoJSON) which leads to representing and 
		storing spatial data. Overall, this functionality ensures that both 
		textual and geometric information are systematically extracted, stored, 
		and made readily available for further analysis.
		3.2 Textual query processing and response generation for users
		Through this functionality, interaction between users and plans is 
		enhanced. User queries which are in natural language need to be analysed 
		and interpreted in a way that allows machines to understand. In this 
		regard, a fine-tuned LLM based on pretrained LLMs such as GPT must be 
		applied to handle the queries. In fact, the NLP component bridges the 
		gap between the extracted textual and geometric data and the natural 
		language queries submitted by surveyors. LLM, specifically trained on a 
		purpose-built dataset of surveyor queries related to survey plans and 
		corresponding answers, can potentially accommodate various queries. The 
		queries can be a straightforward semantic query that its corresponding 
		answer has been explicitly stated within the plan (e.g., what is the 
		reduced level of point no. 123?) or more complex spatial query (e.g., 
		what is the distance between point no. 123 and no. 124?).
		First, the query is classified to find out whether it is semantic or 
		spatial before accessing the structured database. This classification is 
		conducted using supervised learning algorithms. After classification, 
		the query is converted to a structured format like JSON to facilitate 
		the retrieval of information that has been extracted and stored in a 
		database before. Upon receiving results from the database, as these 
		outputs are often presented in a structured format and may not be 
		familiar for the user, the information is converted back into a natural 
		language format using the fine-tuned LLM, providing a description of the 
		result. Moreover, if the query contains both semantic and spatial 
		components, the query is sent to a developed spatial reasoning model to 
		perform spatial calculations. This component primarily uses symbolic 
		reasoning such as if-then rules and algorithms (e.g., topological 
		operators, metric operators, and directional operators) to process 
		structured geometric data and it does not use data-driven approaches. 
		The output might be precise spatial measurements or analysis results 
		that can be integrated into the final response provided to the user. 
		Finally, the user receives relevant responses.
		
		
		Figure 2. The proposed architecture for design of 
		the prototype 
		4. INITIAL PROTOTYPING AND TESTING
		To prove the concept, an initial prototyping was conducted to test 
		the feasibility of data querying from plans that include complex spatial 
		layouts and semantic annotations using AI technologies. The prototype 
		took the form of a web-based chatbot developed using Python programming 
		language in which GPT-3.5-Turbo was employed as the core LLM. To have a 
		customized LLM, we utilized retrieval-augmented generation (RAG) 
		techniques, which combine generative capabilities with the ability to 
		access external information sources, significantly improving the 
		accuracy and relevance of responses in land administration scenarios. In 
		this method, the output of a pretrained LLM is optimized by referencing 
		an external knowledge base outside of the LLM training data sources 
		before generating a response. First, the embeddings vectors of the 
		stored chunks (i.e., paragraphs) are first generated in a pre-trained 
		LLM. The embedding vector of user’s prompt is first generated and the 
		similarity between the user’s prompt and the stored paragraphs’ 
		embedding vector is then calculated to retrieve the most relevant 
		contexts. We used cosine similarity method and Oracle database 
		containing 1901 paragraphs extracted from scientific papers in different 
		sources such as Land Use Policy journal, organizational reports such as 
		FIG and CSDILA reports and papers, and governmental publications such as 
		resources in Land Use Victoria and Victorian legislation. The most 
		relevant contexts are then combined with user’s prompt and a 
		comprehensive prompt is formed as an input for the pretrained LLM. 
		Responses will be generated tailored to land administration domain that 
		enhances the responses of the pretrained LLM and reduces the occurrence 
		of hallucinations, thereby increasing the models’ credibility. The 
		interface of the developed prototype is depicted in Figure 3. It 
		includes a drag-and-drop file upload feature that allows users to upload 
		survey plans in PDF format. Also, a textbox is considered that lets 
		users type questions related to the uploaded plan and then click the 
		submit button to process the query, which triggers the system to extract 
		and interpret relevant information from the plan and generates suitable 
		response, which is then displayed the chatbot response textbox, 
		providing users with the requested information in real time.
		
		
		Figure 3. The interface of the developed prototype
		To evaluate the prototype, it was then tasked with answering targeted 
		questions to test its ability to understand semantic content and spatial 
		relationships within survey plans. The performance is assessed based on 
		the model’s accuracy/correctness in identifying and interpreting data 
		such as boundaries, annotations, and any semantic notes within the 
		survey plans. The testing phase used various semantic and spatial 
		queries derived from the Crown plan depicted earlier in Figure 1 as a 
		case study. The prototype’s responses were then evaluated and compared 
		against the expected results. Several queries were executed on a plan, 
		with specific portions utilized for testing, as illustrated in Figure 4. 
		The detailed queries and corresponding responses generated by the 
		prototype are shown in Tables 1 and 2. This evaluation highlights the 
		steps necessary to achieve greater robustness in real-world applications 
		in processing survey data queries.
		
		
		Figure 4. Some parts of a Crown plan as initial data 
		for testing and evaluation
		Table 1. Semantic queries and corresponding 
		responses 
		
			
				| Query  | Response  | Accuracy score  | 
			
				| Is this a plan of subdivision or a plan of 
				consolidation?  | This document is a plan of subdivision.  The plan shows 
				the division of Crown Allotments into smaller parcels.  The 
				legend indicates "Crown Allotment" and details of the 
				subdivision of those allotments.  | 0 | 
			
				| What is the plan number? What is the edition of 
				plan? What is the number of the crown allotment? 
				 | The plan number is OP 119954. The edition of the plan is 
				3. The crown allotment number is Q2.  | 100 | 
			
				| What is its city, parish, and country? 
				 | The plan shows the City of Melbourne and the City of 
				Richmond, Parishes of Melbourne South, Melbourne North, and Jika 
				Jika, all located in the County of Bourke.  The country is not 
				explicitly stated on the plan, but given the other details, it's 
				implied to be Australia.  | 100 | 
			
				| Which datum is used for reduced level? 
				 | The Australian Height Datum is used for the reduced levels.
				
				 | 100 | 
			
				| Who is the Licensed Surveyor?  | Neil W Street of 437 St Kilda Road, Melbourne is the Licensed 
				Surveyor.   | 100 | 
			
				| Who is the Surveyor-General?  | P. Williams is the Surveyor-General  | 0 | 
			
				| Who is the examiner?  | P. Williams examined the plan.  | 100 | 
			
				| On what date was it signed, and by whom? 
				 | The plan was signed on July 3, 2002, by Neil W. Street.
				 | 50 | 
			
				| What is the scale of the map?  | The main plan scale is 1:3000. There are also details at 
				scales of 1:500, 1:1000, and 1:750 and 1:10. One vertical 
				section is at 1:5000 and another is at 1:5500.  | 100 | 
			
				| What does CH mean in the legend abbreviation?
				 | CH stands for Chainage along the Highway Centre Line.
				 | 100 | 
			
				| How is the parcel shown?  | The parcel is shown by continuous thick lines.  | 100 | 
			
				| What is the CAD file number of sheet 2? 
				 | The CAD file number for Sheet 2 is SGLA5228.  | 100 | 
			
				| How many sheets does it have?  | The plan consists of 19 sheets. There is also a 
				modification table as a 20th sheet.  | 100 | 
			
				| How many abbreviations are there in the legend?
				 | There are eight abbreviations in the legend.  | 0 | 
		
		 
		Table 2. Spatial queries and corresponding responses
		
			
				| Query  | Response  | Accuracy score  | 
			
				| What are the coordinates of point numbers 17 and 
				19?  | Point 17: E 321761.289, N 5811548.400, RL -16.601 Point 
				19: E 321775.778, N 5811548.505, RL -38.372  | 50 (Do not have RL) | 
			
				| What is the distance between point numbers 17 and 
				19?  | I cannot calculate the precise distance between points 17 
				and 19. While the Easting and Northing coordinates are provided, 
				I lack the information necessary to perform a coordinate 
				transformation to accurately compute the distance in meters. 
				This would require knowing the map projection used (e.g., UTM, 
				State Plane) and potentially applying a more complex calculation 
				than a simple Euclidean distance. To obtain the distance, you 
				would need to use a Geographic Information System (GIS) or 
				specialized software that incorporates the necessary projection 
				information.  | Can not | 
			
				| Which points have the vertical dimension within 
				their coordinates?  | The points listed in the coordinate table on Sheet 18 and 
				Sheet 19 include a "RL" value alongside their Easting and 
				Northing coordinates. "RL" stands for Reduced Level, which 
				represents the vertical dimension (elevation). Therefore, all 
				the points listed in those tables (points 1 through 172) have a 
				vertical dimension included in their coordinate information.
				 | 0 | 
			
				| What is the RL of point number 1? 
				 | The RL of point number 1 is -5.398.  | 0 | 
			
				| Tell me which point have RL?  | I cannot definitively identify which points have RL 
				values without a more thorough and systematic review of the 
				tables on sheets 18 and 19. The OCR results are not perfectly 
				clean and may have missed some RL values, or they may be 
				inconsistently formatted. A manual inspection of the original 
				scanned document is required to confirm which points include RL 
				data.  | Can not | 
			
				| Is point number 59 a VC or TP?  | Point number 59 is a VC (Vertical Curve Tangent Point).
				 | 100 | 
			
				| Which type of map scale is used? Linear or ratio?
				 | The map uses a ratio scale. A ratio scale (also called a 
				representative fraction) expresses the scale as a ratio, such as 
				1:3000, meaning one unit on the map represents 3000 units on the 
				ground. This is different from a linear scale which uses a 
				visual bar to show the scale. While the document includes a 
				linear representation of the scale, the fundamental scale used 
				is expressed as a ratio. | 50 (It is not correct for the scale on the right side of the 
				sheets)  | 
			
				| What does DETAIL 4 show?  | DETAIL 4 shows a cross-section view of a portion of the 
				highway and surrounding area, focusing on the area around point 
				numbers 113-117. It highlights the relationship between the 
				highway, the natural surface level, and other details of the 
				subsurface. The detail drawing provides measurements and levels 
				relevant to this specific part of the project.  | 0 | 
			
				| Is DETAIL 4 a horizontal map or vertical?
				 | DETAIL 4 is a vertical cross-section. It shows a slice 
				through the earth, revealing the vertical relationships between 
				the highway, the ground surface, and other features at that 
				location. It's not a horizontal plan view which would show the 
				spatial layout from above.  | 0 | 
			
				| Is SECTION A/1 a horizontal map or vertical?
				 | SECTION A/1 is a vertical cross-section. The drawings 
				labelled "SECTION" in this document are all vertical 
				cross-sections showing subsurface details and elevations, not 
				horizontal plan views.  | 100 | 
			
				| How many points are observed in DETAIL 3?
				 | DETAIL 3 shows seven numbered points: 153, 154, 155, 157, 
				161, 163, and 165.  | 0 | 
			
				| How many parcels are there?  | Crown Allotment Q2 Crown Allotments 5 & 6, Section 19D
 Crown Allotment 5C
 Crown Allotment 1A, Section 39A
 | 0 | 
			
				| Is there any easement within this plan? 
				 | The provided document does not explicitly mention 
				easements. While the plans show various features and boundaries, 
				there is no text or notation that clearly indicates the presence 
				of easements. To determine whether easements exist, you would 
				need to consult additional documentation related to the land 
				title or other legal records associated with these parcels. 
				 | Can not | 
			
				| Where is the boundary of parcel Q2? 
				 | The boundary of parcel Q2 is shown as a thick continuous 
				line on the provided maps, encompassing an area that extends 
				along the Yarra River. The exact coordinates of the boundary are 
				not explicitly listed, but the boundary is defined by the series 
				of points and curves shown on sheets 1, 2, 3, 4 and 5. The 
				boundary follows both sides of the Yarra River and is impacted 
				by the highway (as the parcel is a stratum of crown land). 
				 | 0 (Only in sheet 1)  | 
			
				| With which points does Crown Allotment Q2 
				intersect? Please list the points that define its perimeter.
				 | I cannot provide a complete list of points defining 
				Parcel Q2's perimeter. The maps are complex, and a precise 
				identification requires a manual inspection of the drawings 
				themselves. The OCR data provides the coordinates of many points 
				but does not explicitly connect those points to Parcel Q2's 
				boundary in every case. To create the complete list, one would 
				need to manually trace the boundary line on the maps and note 
				the numbers of the points it passes through.  | Can not | 
		
		5. DISCUSSION AND CONCLUSION 
		In this paper, a novel method based on AI as an emerging technology 
		has been proposed for retrieving information from cadastral plans, 
		aiming at overcoming limitations associated with the plans, such as 
		fragmentation, and hence decreasing high cognitive load associated with 
		traditional manual analysis. Our proposed framework addresses the 
		challenges by leveraging cutting-edge AI technologies, specifically 
		computer vision techniques for image processing, OCR for text extraction 
		from the plans, and NLP and LLMs to facilitate query processing and 
		response generation. This approach significantly streamlines the data 
		access and allows quick retrieval of information from cadastral plans.
		Results from the initial testing phase show that the developed 
		prototype can effectively handle semantic queries explicitly defined 
		within the plans. However, more extensive fine-tuning is needed to 
		enhance LLMs’ capabilities in handling land administration 
		domain-specific queries. Moreover, the performance of the prototype 
		decreased when dealing with spatial queries. While it could extract 
		spatial coordinates of points, it cannot perform the necessary spatial 
		calculations or contextual understanding to provide meaningful spatial 
		insights. This indicates a need for employing spatial analysis tools to 
		provide the chatbot with accurate spatial reasoning. In summary, the 
		prototype demonstrates potential for automating document analysis, 
		especially for simple fact-extraction tasks. However, improvements are 
		needed to enhance the chatbot’s ability to understand implicit 
		information and infer relationships between different parts of the 
		document which can potentially lead to more accurate and complete 
		responses.
		Regarding improvement of the current state, high-quality data is 
		required for training algorithms to make accurate predictions within 
		different components of the prototype which is a critical area for 
		future research. High-quality data leads to ensure the developed AI 
		models can recognise and interpret various textual and geometric 
		elements within these plans, such as legal boundaries, survey 
		observations, and administrative information. Moreover, the performance 
		of the models in handling new situations in different survey use cases 
		is significantly related to the diversity of the datasets and various 
		land administration concepts such as legal boundaries and spaces, survey 
		measurements, and land administration terminologies must be clarified. 
		Hence, standardized data and processing protocols are required to create 
		larger, more consistent, and comprehensively annotated datasets for 
		model training. Additionally, to address data privacy concerns, we 
		propose to use on-device LLMs, such as Llama, instead of cloud-based 
		alternatives like GPT. By addressing these challenges and further 
		developing the prototype, the efficiency and effectiveness of survey 
		information retrieval can be dramatically improved, leading to enhance 
		the robustness of AI-driven solutions for land administration purposes.
		REFERENCES:
		
			- 
			Atazadeh, B., Rajabifard, A., & Kalantari, M. (2018). Connecting 
			LADM and IFC standards–pathways towards an integrated legal-physical 
			model. The 7th International FIG Workshop on the Land Administration 
			Domain Model, Zagreb, Croatia. 
- 
			Brown, T. B. (2020). Language models are few-shot learners. arXiv 
			preprint arXiv:2005.14165.  
- 
			Chehrehbargh, F. J., Rajabifard, A., Atazadeh, B., & Steudler, D. 
			(2024). Identifying global parameters for advancing Land 
			Administration Systems. Land use policy, 136, 106973.  
- 
			Chen, X., Xie, H., & Tao, X. (2022). Vision, status, and research 
			topics of Natural Language Processing. In (Vol. 1, pp. 100001): 
			Elsevier. 
- 
			Cumerford, N. (2010). The ICSM ePlan protocol, its development, 
			evolution and implementation. FIG congress, Sydney, Australia. 
- 
			Devlin, J. (2018). Bert: Pre-training of deep bidirectional 
			transformers for language understanding. arXiv preprint 
			arXiv:1810.04805.  
- 
			Duan, Y., Edwards, J. S., & Dwivedi, Y. K. (2019). Artificial 
			intelligence for decision making in the era of Big Data–evolution, 
			challenges and research agenda. International journal of information 
			management, 48, 63-71.  
- 
			Franken, J., Florijn, W., Hoekstra, M., & Hagemans, E. (2021). 
			Rebuilding the cadastral map of The Netherlands, the artificial 
			intelligence solution. FIG working week, Amsterdam, Netherlands  
- 
			Goodfellow, I. (2016). Deep learning. In: MIT press. 
- 
			Góźdź, K., Pachelski, W., van Oosterom, P., & Coors, V. (2014). 
			The possibilities of using CityGML for 3D representation of 
			buildings in the cadastre. Proceedings of the 4th International 
			Workshop on 3D Cadastres Workshop, Dubai, United Arab Emirates. 
- 
			Gupta, I., & Nagpal, G. (2020). Artificial intelligence and 
			expert systems. Mercury Learning and Information.  
- 
			ICSM. (2019). Cadastre 2034-Powering Land & Real Property.
			
			https://www.icsm.gov.au/sites/default/files/Cadastre2034_0.pdf 
- 
			Ignjatić, J., Nikolić, B., & Rikalović, A. (2018). Deep learning 
			for historical cadastral maps digitization: Overview, challenges and 
			potential. Comput. Sci. Res. Notes 2803, 42-47.  
- 
			La Rosa, D., & Garrido, O. (2019). Conversion of Cadastral Survey 
			Information into LandXML Files using Machine Learning University of 
			Southern Queensland].  
- 
			Land Use Victoria. (2024a). Digital Cadastre Modernisation.
			
			https://www.land.vic.gov.au/surveying/projects-and-initiatives/digital-cadastre-modernisation 
- 
			Land Use Victoria. (2024b). Plans of subdivision and 
			consolidation.
			
			https://www.land.vic.gov.au/land-registration/for-professionals/plans-of-subdivision-and-consolidation 
- 
			Lenc, L., Baloun, J., Martínek, J., & Král, P. (2023). Towards 
			Historical Map Analysis Using Deep Learning Techniques. 19th IFIP 
			International Conference on Artificial Intelligence Applications and 
			Innovations, León, Spain. 
- 
			Lenc, L., Prantl, M., Martínek, J., & Král, P. (2021). Border 
			detection for seamless connection of historical cadastral maps. 
			Document Analysis and Recognition–ICDAR 2021 Workshops: Lausanne, 
			Switzerland, September 5–10, 2021, Proceedings, Part I 16,  
- 
			Li, L., Wu, J., Zhu, H., Duan, X., & Luo, F. (2016). 3D modeling 
			of the ownership structure of condominium units. Computers, 
			environment and urban systems, 59, 50-63.  
- 
			Mango, J., Wang, M., Mu, S., Zhang, D., Ngondo, J., 
			Valerian-Peter, R., Claramunt, C., & Li, X. (2023). Transform 
			paper-based cadastral data into digital systems using GIS and 
			end-to-end deep learning techniques. International Journal of 
			Geographical Information Science, 37(5), 1099-1127.  
- 
			Marcial, D. E., Dy, E. D., Maceren, S. F., & Sarno, E. R. (2013). 
			Artificial neural network-based lot number recognition for cadastral 
			map. Recent Progress in Data Engineering and Internet Technology: 
			Volume 1,  
- 
			Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten 
			optical character recognition (OCR): A comprehensive systematic 
			literature review (SLR). IEEE access, 8, 142642-142668.  
- 
			Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. 
			(2013). Distributed representations of words and phrases and their 
			compositionality. Advances in neural information processing systems, 
			26.  
- 
			Nishant, R., Kennedy, M., & Corbett, J. (2020). Artificial 
			intelligence for sustainability: Challenges, opportunities, and a 
			research agenda. International journal of information management, 
			53, 102104.  
- 
			Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: 
			Global vectors for word representation. Proceedings of the 2014 
			conference on empirical methods in natural language processing 
			(EMNLP), Doha, Qatar. 
- 
			Peters, M. E., Neumann, M., Zettlemoyer, L., & Yih, W.-t. (2018). 
			Dissecting contextual word embeddings: Architecture and 
			representation. arXiv preprint arXiv:1808.08949.  
- 
			Petitpierre, R., & Guhennec, P. (2023). Effective annotation for 
			the automatic vectorization of cadastral maps. Digital Scholarship 
			in the Humanities, 38(3), 1227-1237.  
- 
			Rani, D., Kumar, R., & Chauhan, N. (2022). Study and comparision 
			of vectorization techniques used in text classification. 2022 13th 
			International Conference on Computing Communication and Networking 
			Technologies (ICCCNT),  
- 
			Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a 
			modern approach. Pearson.  
- 
			Saeidian, B., Rajabifard, A., Atazadeh, B., & Kalantari, M. 
			(2024). Managing underground legal boundaries in 3D-extending the 
			CityGML standard. Underground Space, 14, 239-262.  
- 
			Vaswani, A. (2017). Attention is all you need. Advances in neural 
			information processing systems, Long Beach, CA, USA. 
- 
			Williamson, I., Enemark, S., Wallace, J., & Rajabifard, A. 
			(2010). Land administration for sustainable development. Citeseer.
			 
- 
			Yıldız, F. B., Ayazlı, İ. E., & Takcı, H. (2021). Generating 
			temporal cadastral parcels with artificial intelligence algorithms 
			within the scope of cadastre 2034. Intercontinental Geoinformation 
			Days, 2, 96-99.  
- 
			Zhou, Z.-H. (2021). Machine learning. Springer nature. 
BIOGRAPHICAL NOTES
		Hamid Hosseini is a Ph.D. candidate in Geomatics at the University of 
		Melbourne. He is an active research member of the Centre for Spatial 
		Data Infrastructures and Land Administration (CSDILA). Hamid’s research 
		focuses on AI-driven land administration. His research interests involve 
		GIS, BIM, GeoAI, indoor positioning, 3D Land Administration, and Oracle 
		Database Administration, with four years of professional experience in 
		the industry.
		Behnam Atazadeh is an ARC DECRA Fellow (Senior Research Fellow) in 
		the Centre for Spatial Data Infrastructures and Land Administration, 
		Department of Infrastructure Engineering. He has worked on a wide range 
		of research and development projects advancing the science and practice 
		of land administration in Australia and overseas.
		Abbas Rajabifard is a Professor and Director of the Centre for 
		Spatial Data Infrastructures and Land Administration (CSDILA) at the 
		University of Melbourne. He is also Discipline Leader of Geomatics, and 
		Leader of the Future Infrastructure Research Program, at the Faculty of 
		Engineering and IT. Prof. Abbass is an Advisory Board Member of the 
		United Nations Academic Network Global Geospatial Information Management 
		(UNGGIM). 
		CONTACTS 
		Hamid Hosseini
		Department of Infrastructure Engineering, University of Melbourne, 
		Australia 
		VIC 3010 AUSTRALIA 
		Web site:
		
		https://www.linkedin.com/in/hamid-hosseini-01aa57153/
		Behnam Atazadeh 
		Department of Infrastructure Engineering, University of Melbourne, 
		Australia 
		VIC 3010 AUSTRALIA 
		Web site:
		
		https://findanexpert.unimelb.edu.au/profile/653223-behnam-atazadeh
		Abbas Rajabifard 
		Department of Infrastructure Engineering, University of Melbourne, 
		Australia 
		VIC 3010 AUSTRALIA