Um modelo de arquitetura em camadas empilhadas para Big Data
DOI:
https://doi.org/10.18046/syt.v14i37.2257Palavras-chave:
Big Data, Data Warehouse, arquitetura em camadas empilhadas, dados estruturados, dados não estruturados repetitivos, Hadoop, MapReduce, NoSql.Resumo
A questão da analítica de dados foi relacionada com o Data Warehouse, mas devido à necessidade de uma análise de novos tipos de dados não estruturados, repetitivos e não repetitivos, surge a Big Data. Embora o tema tenha sido amplamente difundido, não existe uma arquitetura de referência para os sistemas Big Data que incorpore o processamento de grandes volumes de dados brutos, agregados e não agregados; nem propostas completas para a gestão do ciclo de vida dos dados, nem uma terminologia padronizada nesta área, e menos uma metodologia que suporte a concepção e desenvolvimento de dita arquitetura. O que existe são arquiteturas em pequena escala, de tipo industrial, orientadas ao produto, limitadas ao alcance da solução de uma empresa ou grupo de empresas, focadas na tecnologia, mas que omitem o ponto de vista funcional. Este artigo explora os requisitos para a formulação de um modelo de arquitetura que possa suportar a analítica e a gestão de dados estruturados e não estruturados, repetitivos e não repetitivos. Dessa exploração contemplam-se algumas propostas arquiteturais de tipo industrial ou tecnológicas, eu propor um modelo lógico de arquitetura em camadas empilhadas, que visa responder às exigências que abrangem tanto Data Warehouse como Big Data.Referências
Agrawal, D. (2009). The reality of Real-Time Business Intelligence. En: M, Castellanos, U, Dayal. & T, Sellis. (Eds.), Lecture Notes in Business Information Processing. Vol. 27. Business Intelligence for the Real-Time Enterprise (pp. 75-88). Berlin Heidelberg : Germany : Springer-Verlag Berlin Heidelberg : Germany
Apache Hive TM. (n.d.). Retrieved from https://hive.apache.org/
Apache Impala. (n.d). Retrieved from: http://www.cloudera.com/products/apache-hadoop/impala.html
Apache Sqoop (2016, march 4). Retrieved from: http://sqoop.apache.org/
Apache SparkTM-Lightning-fast cluster computing. (n.d.). Retrieved from: http://spark.apache.org/
Apache Thrift - Home. (n.d.). Retrieved from https://thrift.apache.org/
Apache ZooKeeper - Home. (n.d.). Retrieved from https://zookeeper.apache.org/
Architecture - Apache Drill. (n.d.). Retrieved from http://drill.apache.org/architecture/
Bedi, P., Jindal, V., & Gautam, A. (2014). Beginning with big data simplified. In: Data Mining and Intelligent Computing (ICDMIC), 2014 International Conference on. IEEE. doi:10.1109/ICDMIC.2014.6954229
Brewer, E. (2012). CAP twelve years later: How the “rules” have changed. Computer. 45(2), 23-29.
Carter, S. (2013, Feb, 21). Social and BIG Data! #socbiz #ibmsocialbiz #bigdata #socialbusiness. Retrieved from: http://socialbusinesssandy.com/tag/big-data-2/page/14/
Chandarana, P. & Vijayalakshmi, M. (2014). Big data analytics frameworks. In Circuits, Systems, Communication and Information Technology Applications (CSCITA), 2014 international conference on (pp. 430-434. IEEE.
Cox, M. & Ellsworth, D. (1997). Application-controlled demand paging for out-of-core visualization [NASA Reports]. Retrieved from: http://www.nas.nasa.gov/assets/pdf/techreports/1997/nas-97-010.pdf
Cuzzocrea, A. (2014). Privacy and security of big data: current challenges and future research perspectives. In: Proceedings of the First International Workshop on Privacy and Security of Big Data (pp. 45-47). New York, NY: ACM. http://doi.acm.org/10.1145/2663715.2669614
Demchenko, Y., Laat, C. & Membrey, P. (2014). Defining architecture components of the big data ecosystem. In: Collaboration Technologies and Systems (CTS), 2014 International Conference on, 104-112. IEEE.
Díaz, Ma. (2011). Evaluación de la herramienta de código libre Apache Hadoop [thesis]. Universidad Carlos III de Madrid Escuela Politécnica Superior: Leganés, España.
Gudivada, V., Rao, D. & Raghavan, V. (2014). NoSQL systems for big data management. In: 2014 IEEE World Congress on Services (pp. 190-197). IEEE.
HDFS architecture guide. (2013, April 8). Retrieved from: http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
Hewlett Packard. (2013). HP Reference Architecture for MapR M5 [technical white paper]. Retrieved from: https://www.mapr.com/sites/default/files/hp_reference_architecture_for_mapr_m5.pdf
Inmon, W. (2005). Building the data warehouse [4a ed.]. Indianapolis, IN: Wiley.
Inmon, W.,Strauss, D. & Neushloss, G. (2008). DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Morgan Kaufmann
Inmon. H. & Linstedt, D. (2014). Data architecture: A primer for the data scientist: big data, data warehouse and data vault. Waltham, MA: Morgan Kaufmann.
Katal, A., Wazid, M. & Goudar, R. (2013). Big data: Issues, challenges, tools and good practices. In: Contemporary Computing (IC3), 2013 Sixth International Conference on (pp. 404-409). IEEE.
Kimball, R. (2011). The evolving role of the enterprise data warehouse in the era of big data analytics [Kimball Group white paper]. Retrieved from: http://www.montage.co.nz/assets/Brochures/DataWarehouseBigDataAnalyticsKimball.pdf
Kimball, R. (2012). Newly emerging best practices for big data [Kimball Group, white paper]. Retrieved from: http://www.kimballgroup.com/wp-content/uploads/2012/09/Newly-Emerging-Best-Practices-for-Big-Data1.pdf
Kimball, R., Ross, M., Thorthwaite, W., Becker, B. & Mundy, J. (2008). The data warehouse lifecycle toolkit [2a ed.]. Indianapolis, IN: Wiley.
Lomotey, R. K., & Deters, R. (2014). Towards knowledge discovery in big data. In: Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on (pp. 181-191). IEEE.
MacDonald, A. (2015). Integrating SAP HANA and hadoop. Boston, MA: SAP Press.
Maiorescu, T. (2010). General Information on Business Intelligence and OLAP systems architecture. In: Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on (V.2, pp. 294-297). IEEE.
Manikandan, S. G., & Ravi, S. (2014). Big data analysis using Apache Hadoop. In: IT Convergence and Security (ICITCS), 2014 International Conference on. doi: 10.1109/ICITCS.2014.7021746
Manning, C. & Schütze. H. (1999). Foundations of statistical natural language processing. Cambridge, MA: The MIT.
Marz, N. (n.d). Storm, distributed and fault-tolerant realtime computation. Retrieved from: http://cloud.berkeley.edu/data/storm-berkeley.pdf
Muntean, M., & Surcel, T. (2013). Agile BI - The Future of BI. Informatica Económica, 17(3), 114–124.
Nam, T., Choi, K., Ok, C. & Yeom, K. (2014). Service composition framework for big data service. In: Future Internet of Things and Cloud (FiCloud), 2014 International Conference on (pp. 328-333). IEEE.
Nandimath, J., Banerjee, E., Patil, A., Kakade, P., & Vaidya, S. (2013). Big Data analysis using Apache Hadoop. In: 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) (pp. 700-703). IEEE.
Oracle Corp. (2015). An enterprise architect's guide to big data [Oracle enterprise architecture - white paper.]. Retrieved from: http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf
Pal, A. & Agrawal, S. (2014). An experimental approach towards big data for analyzing memory utilization on a hadoop cluster using HDFS and MapReduce. In: Networks & Soft Computing (ICNSC), 2014 First International Conference on (pp. 442-447). IEEE.
Schaffner, J., Bog, A., Krüger, J., & Zeier, A. (2009). A hybrid row-column OLTP database architecture for operational reporting. In: M. Castellanos, U. Dayal, & T. Sellis (Eds.), Business intelligence for the real-time enterprise (pp. 61-74). Berlin Heidelberg, Germany: Springer.
Todman, C. (2001). Designing a data warehouse: Supporting customer relationship management. Nueva Jersey, NJ: Prentice Hall.
Vaish, G. (2013). Getting started with NoSQL. Birmingham UK: Packt.
Welcome to ApacheTM Hadoop®! (n.d.). Retrieved from: https://hadoop.apache.org/
YiChuan, S. & Yao, X. (2012). Research of Real-time Data Warehouse Storage Strategy Based on Multi-level Caches. Physics Procedia, 25, 2315–2321.
Zhang, R., Hildebrand, D., & Tewari, R. (2014). In unity there is strength: Showcasing a unified Big Data platform with MapReduce Over both object and file storage. In: Big Data (Big Data), 2014 IEEE International Conference on (pp. 960-966). IEEE.
Apache Hive TM. (n.d.). Retrieved from https://hive.apache.org/
Apache Impala. (n.d). Retrieved from: http://www.cloudera.com/products/apache-hadoop/impala.html
Apache Sqoop (2016, march 4). Retrieved from: http://sqoop.apache.org/
Apache SparkTM-Lightning-fast cluster computing. (n.d.). Retrieved from: http://spark.apache.org/
Apache Thrift - Home. (n.d.). Retrieved from https://thrift.apache.org/
Apache ZooKeeper - Home. (n.d.). Retrieved from https://zookeeper.apache.org/
Architecture - Apache Drill. (n.d.). Retrieved from http://drill.apache.org/architecture/
Bedi, P., Jindal, V., & Gautam, A. (2014). Beginning with big data simplified. In: Data Mining and Intelligent Computing (ICDMIC), 2014 International Conference on. IEEE. doi:10.1109/ICDMIC.2014.6954229
Brewer, E. (2012). CAP twelve years later: How the “rules” have changed. Computer. 45(2), 23-29.
Carter, S. (2013, Feb, 21). Social and BIG Data! #socbiz #ibmsocialbiz #bigdata #socialbusiness. Retrieved from: http://socialbusinesssandy.com/tag/big-data-2/page/14/
Chandarana, P. & Vijayalakshmi, M. (2014). Big data analytics frameworks. In Circuits, Systems, Communication and Information Technology Applications (CSCITA), 2014 international conference on (pp. 430-434. IEEE.
Cox, M. & Ellsworth, D. (1997). Application-controlled demand paging for out-of-core visualization [NASA Reports]. Retrieved from: http://www.nas.nasa.gov/assets/pdf/techreports/1997/nas-97-010.pdf
Cuzzocrea, A. (2014). Privacy and security of big data: current challenges and future research perspectives. In: Proceedings of the First International Workshop on Privacy and Security of Big Data (pp. 45-47). New York, NY: ACM. http://doi.acm.org/10.1145/2663715.2669614
Demchenko, Y., Laat, C. & Membrey, P. (2014). Defining architecture components of the big data ecosystem. In: Collaboration Technologies and Systems (CTS), 2014 International Conference on, 104-112. IEEE.
Díaz, Ma. (2011). Evaluación de la herramienta de código libre Apache Hadoop [thesis]. Universidad Carlos III de Madrid Escuela Politécnica Superior: Leganés, España.
Gudivada, V., Rao, D. & Raghavan, V. (2014). NoSQL systems for big data management. In: 2014 IEEE World Congress on Services (pp. 190-197). IEEE.
HDFS architecture guide. (2013, April 8). Retrieved from: http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
Hewlett Packard. (2013). HP Reference Architecture for MapR M5 [technical white paper]. Retrieved from: https://www.mapr.com/sites/default/files/hp_reference_architecture_for_mapr_m5.pdf
Inmon, W. (2005). Building the data warehouse [4a ed.]. Indianapolis, IN: Wiley.
Inmon, W.,Strauss, D. & Neushloss, G. (2008). DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Morgan Kaufmann
Inmon. H. & Linstedt, D. (2014). Data architecture: A primer for the data scientist: big data, data warehouse and data vault. Waltham, MA: Morgan Kaufmann.
Katal, A., Wazid, M. & Goudar, R. (2013). Big data: Issues, challenges, tools and good practices. In: Contemporary Computing (IC3), 2013 Sixth International Conference on (pp. 404-409). IEEE.
Kimball, R. (2011). The evolving role of the enterprise data warehouse in the era of big data analytics [Kimball Group white paper]. Retrieved from: http://www.montage.co.nz/assets/Brochures/DataWarehouseBigDataAnalyticsKimball.pdf
Kimball, R. (2012). Newly emerging best practices for big data [Kimball Group, white paper]. Retrieved from: http://www.kimballgroup.com/wp-content/uploads/2012/09/Newly-Emerging-Best-Practices-for-Big-Data1.pdf
Kimball, R., Ross, M., Thorthwaite, W., Becker, B. & Mundy, J. (2008). The data warehouse lifecycle toolkit [2a ed.]. Indianapolis, IN: Wiley.
Lomotey, R. K., & Deters, R. (2014). Towards knowledge discovery in big data. In: Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on (pp. 181-191). IEEE.
MacDonald, A. (2015). Integrating SAP HANA and hadoop. Boston, MA: SAP Press.
Maiorescu, T. (2010). General Information on Business Intelligence and OLAP systems architecture. In: Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on (V.2, pp. 294-297). IEEE.
Manikandan, S. G., & Ravi, S. (2014). Big data analysis using Apache Hadoop. In: IT Convergence and Security (ICITCS), 2014 International Conference on. doi: 10.1109/ICITCS.2014.7021746
Manning, C. & Schütze. H. (1999). Foundations of statistical natural language processing. Cambridge, MA: The MIT.
Marz, N. (n.d). Storm, distributed and fault-tolerant realtime computation. Retrieved from: http://cloud.berkeley.edu/data/storm-berkeley.pdf
Muntean, M., & Surcel, T. (2013). Agile BI - The Future of BI. Informatica Económica, 17(3), 114–124.
Nam, T., Choi, K., Ok, C. & Yeom, K. (2014). Service composition framework for big data service. In: Future Internet of Things and Cloud (FiCloud), 2014 International Conference on (pp. 328-333). IEEE.
Nandimath, J., Banerjee, E., Patil, A., Kakade, P., & Vaidya, S. (2013). Big Data analysis using Apache Hadoop. In: 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) (pp. 700-703). IEEE.
Oracle Corp. (2015). An enterprise architect's guide to big data [Oracle enterprise architecture - white paper.]. Retrieved from: http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf
Pal, A. & Agrawal, S. (2014). An experimental approach towards big data for analyzing memory utilization on a hadoop cluster using HDFS and MapReduce. In: Networks & Soft Computing (ICNSC), 2014 First International Conference on (pp. 442-447). IEEE.
Schaffner, J., Bog, A., Krüger, J., & Zeier, A. (2009). A hybrid row-column OLTP database architecture for operational reporting. In: M. Castellanos, U. Dayal, & T. Sellis (Eds.), Business intelligence for the real-time enterprise (pp. 61-74). Berlin Heidelberg, Germany: Springer.
Todman, C. (2001). Designing a data warehouse: Supporting customer relationship management. Nueva Jersey, NJ: Prentice Hall.
Vaish, G. (2013). Getting started with NoSQL. Birmingham UK: Packt.
Welcome to ApacheTM Hadoop®! (n.d.). Retrieved from: https://hadoop.apache.org/
YiChuan, S. & Yao, X. (2012). Research of Real-time Data Warehouse Storage Strategy Based on Multi-level Caches. Physics Procedia, 25, 2315–2321.
Zhang, R., Hildebrand, D., & Tewari, R. (2014). In unity there is strength: Showcasing a unified Big Data platform with MapReduce Over both object and file storage. In: Big Data (Big Data), 2014 IEEE International Conference on (pp. 960-966). IEEE.
Downloads
Publicado
2016-08-05
Edição
Seção
Discussion papers
Licença
Esta publicação está licenciada sob os termos da licença CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/deed.pt_BR).